View on GitHub

distributed-systems-readings

Recommended Readings in Distributed Systems

Distributed System Readings

The following are the recommended readings/notes taken inspired from Alex Xu’s System Design Interview book. Find more relevant papers within the papers folder.

References

  1. MS Azure - Cloud Design Patterns
  2. Understanding Distributed Systems
  3. Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
  4. System Design Interview – An Insider’s Guide
  5. Grokking the System Design Interview
  6. High Scalability Blog
  7. @donnemartin/system-design-primer
  8. @binhnguyennus/awesome-scalability
  9. System Design Cheatsheet

Chapter 1: Scale From Zero To Millions Of Users

[1] Hypertext Transfer Protocol
[2] Should you go Beyond Relational Databases?
[3] Replication
[4] Multi-master replication
[5] NDB Cluster Replication - Multi-Master and Circular Replication
[6] Caching Strategies and How to Choose the Right One
[7] Scaling Memcache at Facebook
[8] Single point of failure
[9] Amazon CloudFront Dynamic Content Delivery
[10] Configure Sticky Sessions for Your Classic Load Balancer
[11] Active-Active for Multi-Regional Resiliency
[12] Amazon EC2 High Memory Instances
[13] What it takes to run Stack Overflow
[14] What The Heck Are You Actually Using NoSQL For

Chapter 2: Back-of-the-envelope Estimation

[1] J. Dean.Google Pro Tip - Use Back-Of-The-Envelope-Calculations To Choose The Best Design
[2] System design primer
[3] Latency Numbers Every Programmer Should Know
[4] Amazon Compute Service Level Agreement
[5] Compute Engine Service Level Agreement (SLA)
[6] SLA summary for Azure services

Chapter 4: Design A Rate Limiter:

[1] Rate-limiting strategies and techniques
[2] Twitter rate limits
[3] Google docs usage limits
[4] IBM microservices
[5] Throttle API requests for better throughput
[6] Stripe rate limiters
[7] Shopify REST Admin API rate limits
[8] Better Rate Limiting With Redis Sorted Sets
[9] System Design - Rate limiter and Data modelling
[10] How we built rate limiting capable of scaling to millions of domains
[11] Redis website
[12] Lyft rate limiting
[13] Scaling your API with rate limiters
[14] What is edge computing
[15] Rate Limit Requests with Iptables
[16] OSI model

Chapter 5: Design Consistent Hashing

[1] Consistent hashing wiki
[2] Consistent Hashing
[3] Dynamo - Amazon’s Highly Available Key-value Store
[4] Cassandra - A Decentralized Structured Storage System
[5] How Discord Scaled Elixir to 5,000,000 Concurrent Users
[6] CS168 - The Modern Algorithmic Toolbox Lecture #1: Introduction and Consistent Hashing
[7] Maglev - A Fast and Reliable Software Network Load Balancer

Chapter 6: Design A Key-value Store

[1] Amazon DynamoDB
[2] memcached
[3] Redis
[4] Dynamo: Amazon’s Highly Available Key-value Store
[5] Cassandra
[6] Bigtable: A Distributed Storage System for Structured Data
[7] Merkle tree
[8] Cassandra architecture
[9] SStable
[10] Bloom filter

Chapter 7: Design A Unique Id Generator In Distributed Systems

[1] Universally unique identifier
[2] Ticket Servers - Distributed Unique Primary Keys on the Cheap
[3] Announcing Snowflake
[4] Network time protocol

Chapter 8: Design A Url Shortener

[1] A RESTful Tutorial
[2] Bloom filter

Chapter 9: Design A Web Crawler

[1] US Library of Congress
[2] EU Web Archive
[3] Digimarc
[4] Mercator: A scalable, extensible web crawler
[5] Web Crawling
[6] 29% Of Sites Face Duplicate Content Issues
[7] Rabin M.O., et al. Fingerprinting by random polynomials Center for Research in Computing Techn., Aiken Computation Laboratory, Univ. (1981)
[8] B. H. Bloom, Space/time trade-offs in hash coding with allowable errors, Communications of the ACM, vol. 13, no. 7, pp. 422-426, 1970.
[9] Donald J. Patterson, Web Crawling
[10] L. Page, S. Brin, R. Motwani, and T. Winograd, The PageRank citation ranking: Bringing order to the web, Technical Report, Stanford University, 1998.
[11] Burton Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7), pages 422–426, July 1970.
[12] Google Dynamic Rendering
[13] T. Urvoy, T. Lavergne, and P. Filoche, Tracking web spam with hidden style similarity, in Proceedings of the 2nd International Workshop on Adversarial Information Retrieval on the Web, 2006.
[14] IRLbot: Scaling to 6 billion pages and beyond

Chapter 10: Design A Notification System

[1] Twilio SMS
[2] Nexmo SMS
[3] Sendgrid
[4] Mailchimp
[5] You Cannot Have Exactly-Once Delivery
[6] App Keys & Secrets: Security
[7] RadditMQ

Chapter 11: Design A News Feed System

[1] How News Feed Works
[2] Friend of Friend recommendations Neo4j and SQL Sever

Chapter 12: Design A Chat System

[1] Erlang at Facebook
[2] Messenger and WhatsApp process 60 billion messages a day
[3] Long tail
[4] The Underlying Technology of Messages
[5] How Discord Stores Billions of Messages
[6] Announcing Snowflake
[7] Apache ZooKeeper
[8] From nothing: the evolution of WeChat background system (Article in Chinese)
[9] End-to-end encryption
[10] Flannel: An Application-Level Edge Cache to Make Slack Scale

13: Design A Search Autocomplete System

[1] The Life of a Typeahead Query
[2] How We Built Prefixy: A Scalable Prefix Search Service for Powering Autocomplete
[3] Prefix Hash Tree An Indexing Data Structure over Distributed Hash Tables
[4] MongoDB wikipedia
[5] Unicode frequently asked questions
[6] Apache hadoop
[7] Spark streaming
[8] Apache storm
[9] Apache kafka

Chapter 14: Design Youtube

[1] YouTube by the numbers
[2] 2019 YouTube Demographics
[3] Cloudfront Pricing
[4] Netflix on AWS
[5] Akamai homepage
[6] Binary large object
[7] Here’s What You Need to Know About Streaming Protocols
[8] SVE: Distributed Video Processing at Facebook Scale
[9] Weibo video processing architecture (in Chinese)
[10] Delegate access with a shared access signature
[11] YouTube scalability talk by early YouTube employee
[12] Understanding the characteristics of internet short video sharing: A youtube-based measurement study
[13] Content Popularity for Open Connect

Chapter 15: Design Google Drive

[1] Google Drive
[2] Upload file data
[3] Amazon S3
[4] Differential Synchronization
[5] Differential Synchronization youtube talk
[6] How We’ve Scaled Dropbox
[7] The rsync algorithm
[8] Librsync. (n.d.). Retrieved April 18, 2015, from
[9] ACID
[10] Dropbox security white paper
[11] Amazon S3 Glacier

Real-world systems

The following materials can help you understand general design ideas of real system architectures behind different companies.
Facebook Timeline: Brought To You By The Power Of Denormalization
Scale at Facebook
Building Timeline: Scaling up to hold your life story
Erlang at Facebook (Facebook chat)
Finding a needle in Haystack: Facebook’s photo storage
Serving Facebook Multifeed: Efficiency, performance gains through redesign
Scaling Memcache at Facebook
TAO: Facebook’s Distributed Data Store for the Social Graph
Amazon Architecture
Dynamo: Amazon’s Highly Available Key-value Store
A 360 Degree View Of The Entire Netflix Stack
It’s All A/Bout Testing: The Netflix Experimentation Platform
Netflix Recommendations: Beyond the 5 stars (Part 1)
Netflix Recommendations: Beyond the 5 stars (Part 2)
Google Architecture
The Google File System (Google Docs)
Differential Synchronization (Google Docs)
YouTube Architecture
Seattle Conference on Scalability: YouTube Scalability
Bigtable: A Distributed Storage System for Structured Data
Instagram Architecture: 14 Million Users, Terabytes Of Photos, 100s Of Instances, Dozens Of Technologies
The Architecture Twitter Uses To Deal With 150M Active Users
Scaling Twitter: Making Twitter 10000 Percent Faster
Announcing Snowflake (Snowflake is a network service for generating unique ID numbers at high scale with some simple guarantees)
Timelines at Scale
How Uber Scales Their Real-Time Market Platform
Scaling Pinterest
Pinterest Architecture Update
A Brief History of Scaling LinkedIn
Flickr Architecture
How We’ve Scaled Dropbox
The WhatsApp Architecture Facebook Bought For $19 Billion

Company engineering blogs

Airbnb
Amazon
Asana
Atlassian
Bittorrent
Cloudera
Docker
Dropbox
eBay
Facebook
GitHub
Google
Groupon
Highscalability
Instacart
Instagram
Linkedin
Mixpanel
Netflix
Nextdoor
PayPal
Pinterest
Quora
Reddit
Salesforce
Shopify
Slack
Soundcloud
Spotify
Stripe
System design primer
Twitter
Thumbtack
Uber
Yahoo
Yelp
Zoom

License

Content licensed under MIT