202208131327 System design

#wip #source #structure

Designing systems is one of the common, critical tasks of a 202206112233 Staff-plus engineer. It's something that sets a 202206112236 Senior engineer apart. It proves that you can understand tradeoffs in technology and organizational size and maturity. It shows that you understand how to build things that are optimal for now and have a clear path to the future you're working toward. Many engineers plateau at a place where they can understand and implement any type of tech but aren't capable of planning out new systems from scratch based on some need of the business.

Example systems to learn how to design

Rate limiter
Consistent hashing
Key-value store
Unique id generator in distributed systems
Url shortener
Web crawler
Notifications
News feed
Chat
Search autocomplete
YouTube
Google Drive

Example: Scale from 0 to 1M users

start with a single server. DNS somewhere directs all traffic to the web server where everything from apps, to APIs, to DBs are housed.
The first interesting scale point is when you need to separate your database from your web servers. At this point you probably have enough reason to discuss the exact details of the DB you're using. Should you be using a relational DB or a NoSQL document store?
The next decision point is the scaling strategy you'll take. When moving beyond the
- Should you scale horizontally? This will require additional complexity in load balancers and deployments. At the same time, this unlocks deployment and experimentation strategies like canaries and A/B testing.
- Should you scale vertically? This will require more money and a better understanding of whether you're optimizing your resource usage between peak and off-peak hours. You also retain simplicity. This is a common approach for DB servers until other constraints require distribution since stateful services (like a data store) are much more challenging to distribute correctly.
At some point you'll have to replicate your DB and distribute it in some way. A common strategy is the primary replica database distribution pattern. The primary instance is the only one that accepts writes which are then replicated to the others, while the others are allowed to accept read requests. This isn't ideal for certain types of traffic like OLTP systems, and can have side-effects like read-after-write staleness.
Caching can be a big performance win at this point. Typically with caching, you have to make decisions based on real data about the expiration and eviction strategy of the cache. Consistency can also be an issue when a cache hits on stale data.