System Design for Decision-Making

A note before section 1

Throughout this primer we will use one recurring example: QuickBite, a small food-delivery app. In QuickBite, users browse restaurant menus, place orders, pay, and track the driver on a map. Restaurants receive the order in their kitchen tablet. Drivers pick up the food and deliver it. We will return to QuickBite in every section so that one mental picture grows with the ideas.

1. The Mental Model

System design (deciding how the parts of a software system fit together) is a chain of choices. Every choice is a trade-off (when you take more of one good thing, you accept less of another). Every trade-off happens against constraints (hard limits: time, money, traffic, data size, team size).

There is no "best" design. There is only the design that fits your constraints today.

Think of it like… building a house. A house for a single person on a small budget is not a worse design than a hotel for 500 guests. It is a different design, made for different constraints.
In software, this looks like… QuickBite at launch with 100 orders per day is one shape. QuickBite at one million orders per day is another. Same product, different system.

A good design document is a short argument:

"Given these constraints, this is the simplest design that meets them. Here is the first part that will break when the constraints change."

Three habits make this work.

First: write constraints as numbers before drawing any boxes. "Fast" is not a constraint. "Each request must finish in under 200 milliseconds at 5,000 requests per second" is a constraint.

Second: find the dominant constraint (the one that, if you fail it, the project ends). Treat every other constraint as flexible.

Third: pick the simplest design that meets the dominant constraint. Complexity is a cost you pay every day.

Quick preview — the mindmap below names common constraints. We define each one in Section 2:
Latency — how long one request takes.
Throughput — how many requests per second.
Consistency — every reader sees the latest write.
Durability — once saved, data survives crashes.
Failure tolerance — the system keeps working when parts break.

Every box you draw later sits below that center node. Change a constraint, and the choice changes.

Don't confuse with… Architecture is the result. System design is the work of getting there.

2. The Trade-Off Axes

Each axis is a slider, not a switch. Every push in one direction has a price.

Consistency vs availability

Consistency (every reader sees the latest write, with no stale data) and availability (the system answers every request, even when something is broken) fight each other during a network partition (a network failure that splits the system into groups that cannot reach each other).

Think of it like… a bakery with two tills. The internet between them breaks. The last loaf of bread is in the system. If both tills sell it, two customers get a "yes" — that is available but not consistent. If both tills refuse the sale until they can confirm — that is consistent but not available.
In software, this looks like… during a network split between two QuickBite regions, you can either refuse new orders (safe) or accept them locally and reconcile later (fast, but two drivers might be sent for the same order).

Don't confuse with… Availability is "does it answer?" Durability is "did it remember after a crash?"

Latency vs throughput

Latency (how long one request takes, in milliseconds) and throughput (how many requests the system handles per second) are not the same.

Think of it like… a coffee shop. Making one cup at a time is low latency but low throughput. Brewing 50 cups in a big pot is high throughput but the first customer waits until the pot is full.
In software, this looks like… QuickBite can send one notification at a time (fast for that user), or batch 1,000 notifications every minute (cheaper, but each notification waits up to a minute).

Don't confuse with… Latency is "how fast is one?" Throughput is "how many in total?"

Read-heavy vs write-heavy

A read-heavy (mostly reads, few writes) system is cheap to scale. You add a cache (a fast store that holds copies of recent answers) or a replica (a copy of the data on another machine). A write-heavy (mostly writes, few reads) system is harder. You eventually shard (split data across many machines by some key, like user ID).

Think of it like… a public library (mostly reads — many people borrow the same book) vs a logistics depot (mostly writes — every parcel arriving must be recorded).
In software, this looks like… the QuickBite restaurant menu is read-heavy: thousands of users see the same menu. Driver location updates are write-heavy: each driver writes a new position every 4 seconds.

Sync vs async

Synchronous (sync) (the caller waits until the work is done) code is easy to follow. Asynchronous (async) (the caller does not wait; the work finishes later) code absorbs traffic spikes.

Think of it like… a phone call (sync — the other side must answer now) vs a WhatsApp message (async — they read it when they can).
In software, this looks like… in QuickBite, charging the card is synchronous (the user waits to know if payment succeeded). Sending the receipt email is asynchronous (the user does not wait).

Contracts vs flexibility

A strong schema (fixed columns, fixed types) catches bugs early. A flexible store (free-form documents) makes change easy and pushes bugs to runtime.

Think of it like… a paper form with fixed boxes vs a blank notebook. The form blocks wrong entries; the notebook accepts anything.
In software, this looks like… QuickBite orders use a strict schema (every order has the same fields). QuickBite analytics events use a flexible shape (each event type is different).

Cost

Two costs matter. The cloud bill is visible. The mental load on the team — the effort to operate, debug, and learn the system — is invisible and usually larger.

Think of it like… owning a car. The price tag is one cost. Fuel, insurance, repairs, and the time you spend on it every week are the other cost — and bigger over years.

Team size and operational maturity

Five engineers cannot run forty services.

Think of it like… a restaurant. A two-cook kitchen cannot run thirty stations. The kitchen layout must fit the staff you have today.

Before the table: p99 latency (the value that 99% of requests are faster than; describes the slow 1%) is the number users feel during bad moments.

If you push toward…	You pay in…	A QuickBite example
Strong consistency	Lower availability during failures, slower writes	Refuse new orders during a regional outage
Low p99 latency	Spare capacity, higher cloud bill	Keep extra menu servers idle for spikes
High throughput	Higher latency for each item	Batch driver pings every 10 seconds
Async everywhere	Harder debugging	Receipts arrive minutes after the order
Flexible schema	Runtime errors, drift between callers	Analytics events go missing fields
Many small services (microservices, see Section 4)	More operations work	Six oncall rotations instead of one
Strong durability	Slower writes, more storage cost	Wait for two regions to confirm payment
Live deployment in many regions	Weaker consistency, very high cost	Run in 14 cloud regions at once

The chart below maps real workloads (not architectures — those come in Section 4) onto two axes you have just met: latency you can tolerate, and consistency you must give the user.

The harder the corner, the more the system will cost you. Most products pick one corner and accept the others. We will place architectures on a similar map at the end of Section 4, after each archetype has been taught.

CAP during a partition, in plain shape:

You cannot live on both sides at once. You can switch sides by changing settings, but at any moment you are on one side.

3. The Building Blocks

These are the parts you snap together. Each block has a job, a "Real situation" example, and a clear cost.

Load balancer

A service that spreads incoming requests across many servers.

Real situation: QuickBite goes from one server to ten on a busy Friday night. Without a load balancer, users would have to know each server's address. With one, they all use quickbite.app, and the load balancer picks a healthy server for each request.

Use it when you have more than one server.
Skip it when you have one server (development, prototypes).
You pay one extra hop on every request and one more thing to monitor.

Common tools: NGINX (open-source web server / proxy / load balancer), HAProxy (open-source load balancer, very fast), AWS ALB (Application Load Balancer) (managed L7 load balancer on AWS), Cloudflare (global load balancer in front of the CDN).

API gateway

API (Application Programming Interface) (the way two programs talk to each other). The gateway is one front door for many services. It does authentication (checking who the caller is), routing, and rate limiting (blocking callers who send too many requests).

Real situation: QuickBite has separate services for menus, orders, payments, and drivers. The mobile app should not know about all of them. The gateway authenticates the user once and forwards each call to the right service.

Use it when many clients call many services.
Skip it when you have one client and one service.
You pay added latency and one shared point that, if down, takes everything down.

Common tools: Kong (open-source API gateway with plugins), Envoy (modern proxy used as a gateway and inside service meshes), AWS API Gateway (managed gateway on AWS), Apigee (enterprise gateway, by Google).

Cache

A fast store with copies of recent answers. Place it close to the reader: in the browser, in a CDN (Content Delivery Network) (servers around the world that hold copies of files near users), in the application, or in front of the database.

Real situation: A pizza restaurant in QuickBite goes viral. 1 million users open its page in one hour. Without a cache, the database answers the same "menu for restaurant 42" question 1 million times. With a cache, the database answers once. The next 999,999 reads come from memory.

Use it when reads repeat and slightly stale data is acceptable.
Skip it when the real fix is a missing index (a lookup helper that lets a database find rows fast).
You pay stale data and cache invalidation (making the cache forget old answers when data changes).

Common tools: Redis (in-memory cache, very fast, supports lists and sets), Memcached (simpler, only key-value), Cloudflare / Fastly (CDN-level caches that sit near the user), browser cache (free, lives on the device).

Relational database

A database with tables, columns, and strict rules. It supports transactions (a group of writes that all succeed or all fail together).

Real situation: A QuickBite order has three writes: create the order row, charge the card, decrement the restaurant's inventory. If any one fails, all three must roll back. A relational database guarantees that with one transaction.

Use it as the default.
Skip it when the shape of data changes weekly, or when one row gets millions of writes per second.
You pay a fixed schema and harder horizontal growth.

Common tools: PostgreSQL (open-source, very capable, default modern choice), MySQL / MariaDB (open-source, widely deployed), AWS RDS / Google Cloud SQL (managed Postgres or MySQL on a cloud).

Document / Key-Value (KV) store

A database that maps a key (like driver:42) to a value. Lookups by key are very fast.

Real situation: QuickBite stores each driver's last known location keyed by driver_id. Every 4 seconds the driver app writes a new position. The map screen reads by key. No joins, no complex queries.

Use it when you read by key and the shape is flexible.
Skip it when you need joins or strong transactions across documents.
You pay weaker query power.

Common tools: MongoDB (document store, flexible JSON shape), DynamoDB (managed key-value store on AWS), Firestore (managed document store on Google Cloud), Cassandra (wide-column store, scales across many machines).

Search index

A separate store built for fast text and filter search.

Real situation: A QuickBite user types "pizza near me, delivery under 30 min, rated 4+ stars." The relational database is poor at this. A search index built from the restaurant data answers in milliseconds.

Use it next to the main database, not instead of it.
Skip it as the source of truth — indexes can lose data.
You pay a second copy of the data and the work to keep it fresh.

Common tools: Elasticsearch (powerful, widely used search index), OpenSearch (open-source fork of Elasticsearch), Meilisearch (lightweight, easy to run), Algolia (hosted search service, fast to integrate).

Message queue

A list of tasks. A producer adds tasks; a worker takes one and the task is gone.

Real situation: QuickBite places an order. The user must not wait for the receipt email. The order service drops a "send receipt" task into a queue. A worker picks it up and sends the email seconds later.

Use it for point-to-point work hand-off.
Skip it when many independent consumers need the same stream.
You pay harder debugging and a need for idempotent (safe to run twice with the same result) workers.

Common tools: RabbitMQ (open-source broker, rich routing rules), AWS SQS (managed simple queue on AWS), Google Pub/Sub (managed messaging on Google Cloud, queue or fan-out modes).

Event log

An append-only (you can only add to the end; never change the past) list of events that many readers can replay.

Real situation: Every QuickBite order writes one event: OrderPlaced. The warehouse, analytics, recommendations, and fraud detection systems all read the same event independently. A new "loyalty points" service can be added later and replay the log from day one.

Use it when several services need the same stream, or when events are the source of truth.
Skip it for simple job queues.
You pay higher operational cost and a steeper learning curve.

Common tools: Apache Kafka (the standard event log, very high throughput), AWS Kinesis (managed Kafka-like service on AWS), Google Pub/Sub (also supports retained log mode), Redpanda (Kafka-compatible, simpler to run).

Don't confuse with… A queue removes a task once a worker takes it. A log keeps the event for everyone, including readers added later.

Object storage

Cheap, very durable storage for files (images, video, backups).

Real situation: QuickBite stores 200,000 restaurant cover photos. A relational database would be the wrong tool. Object storage holds them cheaply and serves them by URL.

Use it for any file the user uploads or that the system produces as a blob.
Skip it for low-latency random reads of small records.
You pay higher latency than a database.

Common tools: AWS S3 (the standard, very durable, very cheap), Google Cloud Storage (equivalent on Google Cloud), Cloudflare R2 (S3-compatible, no egress fees), MinIO (open-source, self-hosted, S3-compatible).

Background workers / cron

Programs that run slow or scheduled work off the request path.

Real situation: Every Monday, QuickBite emails a weekly sales report to each restaurant. A cron job runs at 6 a.m., reads the past week, and queues an email per restaurant. The user-facing app never slows down.

Use them for any job longer than a few hundred milliseconds, or any job the user is not waiting on.
Skip them when the user is waiting for the answer.
You pay another moving part to monitor.

Common tools: Celery (Python task queue), Sidekiq (Ruby background jobs, very fast), BullMQ (Node.js job queue on Redis), Temporal (durable workflows, retries built in).

Rate limiter

Code that rejects a caller after too many requests.

Real situation: A misbehaving QuickBite client sends 10,000 menu requests per second. Without a rate limiter, that one client would take down the menu service. With one, it gets 429 Too Many Requests and the others stay healthy.

Use it to protect the system from one bad caller.
Skip it as a tool for fairness — a global limit does not give each tenant a fair share.
You pay a small amount of latency at the front door.

Common tools: Envoy / NGINX (rate limiting built into the proxy), Redis (token-bucket counters, very fast), Kong (plugin-based limits at the gateway), Cloudflare (rate limit at the edge, before traffic reaches you).

Observability

Three signals that show what the system is doing.

Logs (text records of what happened) — high detail, expensive at scale.
Metrics (numbers measured over time, like requests per second) — cheap, good for alerts.
Traces (the path of one request through many services, with timing per step) — show where time was spent.

Real situation: QuickBite users complain that orders stick on "preparing" for too long. Metrics show order latency rose at 19:00. Logs show errors from the kitchen-tablet service. A trace of one slow order shows the delay was inside the payment provider call.

You need all three. They answer different questions.

Common tools: Prometheus + Grafana (open-source metrics + dashboards), ELK (Elasticsearch + Logstash + Kibana) or Loki (log search and aggregation), Jaeger / OpenTelemetry (distributed tracing), Datadog / New Relic (all-in-one, paid SaaS).

Queue and log, side by side:

The log keeps the event. The queue does not. That single fact drives most of the design choices.

4. The Architecture Archetypes

An archetype (a common shape that systems often take) is a starting template. For each one: shape, when it wins, when it hurts, a real product that uses it, and a small diagram.

Monolith

One program, one database, one deploy. Every feature lives in the same code base.

Wins when the team is small (1–10 engineers), the domain is new, and shipping fast matters most.
Hurts when teams want to deploy on their own schedule, or one slow page slows the whole system.
Real example: early Instagram ran as a single Django monolith for years, even past 30 million users. Simple shape; tiny team.
QuickBite stage: the first 6 months. One repository, one database. Two engineers ship 10 features a week.
Common tools: Ruby on Rails, Django (Python), Laravel (PHP), Spring Boot (Java) — all classic monolith frameworks.

Modular monolith

One deploy, with strict internal boundaries between modules. Modules talk through clear interfaces. They do not share tables.

Wins when you want clean boundaries without paying the cost of running many services.
Hurts when modules secretly share tables. The boundary is then fake.
Real example: Shopify rebuilt their monolith into modules after a microservices attempt; they now ship from a single very large modular code base.
QuickBite stage: 12–24 months. Orders, menus, drivers, and payments are separate modules in one deploy.
Common tools: Spring Modulith (Java, enforces module boundaries), Rails engines (modular Ruby on Rails), .NET modular monolith patterns.

Microservices

Many small programs, each owning one capability, talking over the network. Each service has its own database.

Wins when many teams need to deploy on their own schedule.
Hurts when teams adopt it for "scale" before they have an organization problem. Fast in-process calls become slow network calls with new failure modes.
Real example: Netflix split into hundreds of services as the company grew past a few hundred engineers; the deploy bottleneck was the reason.
QuickBite stage: year three, with 60 engineers in 8 teams. Payments, fraud, drivers, and notifications are each a separate service.
Common tools: Kubernetes (runs many service containers), Docker (packages each service), gRPC (efficient service-to-service calls), REST over HTTP (simpler service-to-service calls).

Event-driven

Services do not call each other. They publish events. Other services react. An event (a record that something happened, with a time) is the unit of communication.

Wins for fan-out (one input causes many parallel outputs) and absorbing traffic spikes.
Hurts because the full path of one user action is spread across many services. You must invest in tracing.
Real example: Uber's dispatch and post-trip flow is event-driven. One trip event fans out to billing, surge pricing, ratings, fraud, and analytics.
QuickBite stage: an OrderPlaced event drives email, kitchen tablet update, driver matching, and analytics — all in parallel.
Common tools: Apache Kafka (retained event log, replay supported), NATS (lightweight messaging), AWS EventBridge (managed event bus on AWS), Google Pub/Sub.

CQRS + event sourcing

CQRS (Command Query Responsibility Segregation) (split writes and reads into two paths with different stores). Event sourcing (store every change as an event; rebuild current state by replaying events).

Wins when you need full audit, full replay, or many different read shapes.
Hurts everywhere else. Reads are eventually consistent. Fixing a bug in production may mean rewriting history.
Real example: bank ledgers and trading systems are event-sourced because every change must be auditable forever.
QuickBite stage: this is overkill for QuickBite. The wallet/refund subsystem might use it later, when finance auditors require it.
Common tools: Apache Kafka (as the event log), EventStoreDB (purpose-built event store), Axon Framework (Java framework for CQRS and event sourcing).

Serverless / FaaS

FaaS (Function as a Service) (short programs the cloud runs on demand, with no server you manage). The platform starts your function when an event arrives and stops it after.

Wins for spiky traffic, parallel work, and small jobs that connect other systems.
Hurts because of cold starts (extra delay on the first call after the function was idle), vendor lock-in, and limits on run time and memory.
Real example: the iRobot home-app webhook layer runs on AWS Lambda; image-resizing and chat-bot webhooks are typical fits.
QuickBite stage: the receipt PDF generator runs as a serverless function. It is called rarely and scales itself.
Common tools: AWS Lambda (the original FaaS, on AWS), Google Cloud Functions (equivalent on Google Cloud), Cloudflare Workers (runs at the edge, very fast cold starts), Vercel (serverless for web apps).

Batch / data pipeline

Read large data, transform it, write it out, on a schedule. Freshness is in hours, not seconds.

Wins when the volume is huge and the user can wait.
Hurts when the product later needs real-time data and the pipeline assumes overnight runs.
Real example: Spotify Wrapped is built by an annual batch pipeline over a year of listening data; not one byte of it is real-time.
QuickBite stage: the weekly restaurant report is built by a nightly ETL (Extract, Transform, Load) (read from a source, change shape, save for analysis) pipeline.
Common tools: Apache Airflow (workflow scheduler, the standard), dbt (SQL-based data transformations), Apache Spark (distributed batch and stream processing), AWS Glue (managed ETL on AWS).

Edge / geo-distributed

Compute and data live in many regions, close to users.

Wins for global low latency and data residency (rules that force user data to stay inside a country).
Hurts because consistency across regions is now your problem, and deploys become harder.
Real example: Cloudflare Workers run user code at hundreds of edge sites; Netflix Open Connect caches video close to ISPs.
QuickBite stage: if QuickBite expands to 10 countries, the menu and search index move to edge regions. Orders still go to one region per country.
Common tools: Cloudflare Workers (runs code at hundreds of edge sites), Fastly Compute@Edge (equivalent), AWS CloudFront Functions (small functions at the AWS edge), Vercel Edge (edge runtime for web apps).

Quick reference

Archetype	Best when	Worst when	Named example
Monolith	Small team, new domain, ship fast	Many teams need own deploys	Early Instagram
Modular monolith	Want clean boundaries, one team	Modules need to scale apart	Shopify (current)
Microservices	Many teams, independent deploys	Small team, no operations skill	Netflix
Event-driven	Many readers of the same stream	Need clear end-to-end path	Uber dispatch
CQRS + event sourcing	Audit, replay, complex reads	Anything else	Bank ledgers
Serverless	Spiky or parallel work	Steady high load, long jobs	Image resizing
Batch	Hours-old data is fine	Real-time products	Spotify Wrapped
Edge	Global users, residency rules	Need strong global consistency	Cloudflare Workers

Now that every archetype has been taught, here is the same complexity vs scale-ceiling map you saw promised earlier. Read it as: the further up and right, the more pain you accept for the headroom you gain.

A QuickBite-shaped product starts in the bottom-left and migrates up only when a constraint forces it. You should be able to point at the dot for your system today and the dot it might move to in 18 months.

5. The Decision Framework

Run these six steps every time. Six is the number you can hold in your head. Do not skip steps.

Write the functional requirements in one paragraph. What does the system do, in user words?
Write the non-functional requirements as numbers. RPS (Requests Per Second) (how many requests arrive per second) peak and average. p99 latency (99% of requests must be faster than this). Data per day. Total data over three years. RPO (Recovery Point Objective) (how much recent data you can afford to lose after a crash). RTO (Recovery Time Objective) (how long the system can be down before you must be back up). Team size. Deadline. Budget.
Pick the dominant constraint. Which one, if you fail it, ends the project?
Cross out archetypes that cannot meet the dominant constraint.
Among the survivors, pick the simplest. Justify in one sentence.
Name what breaks first at 10× scale, and the next move.

Worked example: design a URL shortener

A URL shortener turns a long web address into a short code. Visiting the short code redirects the user to the long address. Real example: Bitly, TinyURL.

Functional requirements. Users submit a long URL and get back a short code. Visiting /<code> redirects to the long URL. Owners see click counts.
Non-functional numbers.
- 200 writes per second peak; 20,000 reads per second peak. Read-to-write ratio 100:1.
- p99 redirect latency under 80 ms, anywhere in the world.
- 50 million new links per year.
- Durability (saved links must never be lost): RPO close to zero.
- Team: 2 engineers. Deadline: 8 weeks.
Dominant constraint. Read latency at global scale, with very high read volume.
Cross out archetypes.
- Pure monolith in one region — fails global p99.
- CQRS + event sourcing — far too much; no audit need.
- Microservices — no team for it.
- Batch — wrong shape entirely.
- Survivors: modular monolith with a CDN, or serverless behind a CDN with a globally replicated key-value store.
Pick the simplest. Serverless redirect function behind a CDN, backed by a globally replicated key-value store. Writes go through a small write API into the same store. The CDN caches redirects for many minutes.
Justification: the read path is one key lookup with very high cache hit rate. Serverless plus CDN gives global p99 with two engineers and no servers to operate.
Note: the modular monolith with a CDN is also reasonable. It loses by a small margin only because of team size.
What breaks first at 10× scale? Click counting. Counting every redirect synchronously will overload the key-value store. Next move: publish a click event to a log, count clicks asynchronously, write totals back hourly. The redirect path stays untouched.

6. Three More Worked Examples

E-commerce checkout (think Amazon, Shopify)

Dominant axis: consistency. Money must not be charged twice. Inventory must not be sold twice.

The checkout service holds orders, payments, and inventory in one relational database. A single transaction covers all three. Everything else — email, shipping, recommendations — listens to events that the checkout service publishes.

The path the user waits on stays simple and consistent. The path that runs after — emails, warehouse, analytics — runs asynchronously. The user does not wait for it.

This shape is boring on purpose. You can rebuild a recommendation feed if it breaks. You cannot rebuild a customer's trust after a double charge. Strong consistency on the write path; eventual consistency for everything else. A team of 3–5 engineers can run it.

Ride-sharing dispatch (think Uber, Bolt, Lyft)

Dominant axis: low latency on geographic queries, plus very high write volume from driver location updates.

Driver phones send a location ping every few seconds. Pings flow through an ingest layer into an in-memory geographic index, sharded by city. The match service queries only the local shard for "nearest free drivers." When a match is made, the trip record is written to a durable database.

Pings are disposable. If one is lost, the next one is on the way. Trips are not. The design separates "data that must survive a crash" from "data that does not." This split is what lets the in-memory tier exist without compromise.

A separate service handles trip lifecycle, billing, and history. None of that is on the matching path.

Analytics ingest (think Google Analytics, Mixpanel)

Dominant axis: throughput per dollar. Freshness is flexible.

Client SDKs send billions of events per day. Each event is worth a fraction of a cent, so anything synchronous on the write path destroys the unit economics. The pipeline accepts events at the edge, writes them to an event log, and confirms to the client at once.

A batch loader runs every few minutes and copies events from the log into a columnar database (a database that stores data column by column; very fast for analytics on huge tables). Most queries hit the warehouse. A small streaming job builds real-time dashboards from the same log.

Hours-old data is fine for 95% of questions. The 5% that need real-time get a smaller, narrower pipeline. The split is the design.

7. Anti-Patterns and Common Traps

Microservices before product-market fit. You took on distributed-system problems to fix an organization problem you do not have. Example: a 4-engineer startup with 20 microservices spends more time on deploys than on features. Start with a modular monolith.
Premature event sourcing. "We may need an audit log later" is not enough. The cost is paid every day. The benefit may never arrive.
Shared database between services. Example: QuickBite splits "orders" and "drivers" into two services that both write to the same orders table. Deploys are still coupled, but now there are also network calls. Pick one: share the database, or split the service.
Caching to hide a missing index. The cache will miss in production at the worst time. Example: a slow report is "fixed" by caching it for 24 hours. One day, traffic spikes during a refresh, the cache misses, and the database falls over. Fix the index instead.
"Scalable" as a goal. Scalable to what number? At what cost? Without numbers, "scalable" is decoration on a slide.
Long synchronous chains. Chains of more than three sync calls compound latency at p99. Example: QuickBite's "place order" calls auth, then catalog, then pricing, then inventory, then payments, all sync. One slow link makes the whole chain slow.
Building infrastructure that you can rent. Running your own message log or database is a full-time job. Take it on only when the cloud bill is larger than an engineer's salary.
Designing for the team you wish you had. The system must be operable by the people on call tonight, not by a future platform team.
Resume-driven design. If a technology is in the diagram only because it looks good on a CV, remove it.
No deploy story. A design that ships once but cannot be safely changed is worse than a less elegant design that can.

8. What to Read Next

Designing Data-Intensive Applications by Martin Kleppmann — Teaches the trade-offs in this primer in much more depth, with real systems as examples.
Site Reliability Engineering by Google — Shows how to set targets for uptime and latency, and how to react when you miss them.
Release It! by Michael Nygard — Lists the failure modes you will hit in production and the patterns that prevent them.
The Log: What every software engineer should know about real-time data's unifying abstraction by Jay Kreps — Explains why event logs are the foundation of modern data systems.
A Critique of the CAP Theorem by Martin Kleppmann — A clear, careful look at what CAP actually says, and what it does not.
Patterns of Distributed Systems by Unmesh Joshi — A reference of recurring shapes in distributed systems, with names for each.
Architecture Decision Records (Michael Nygard's original article post) — A short method for writing down a design decision so it survives long after you leave.
AWS Well-Architected Framework (or the GCP / Azure equivalent) — A vendor-flavored checklist of operational concerns; the structure is useful even if the brand is not yours.

9. Glossary

API (Application Programming Interface) — The way two programs talk to each other.
API gateway — One front door for many services; handles auth, routing, and rate limits.
append-only — You can only add to the end; you can never change past entries.
archetype — A common shape that systems often take.
asynchronous (async) — The caller does not wait; the work finishes later.
authentication — Checking who the caller is.
availability — The share of time the system answers requests successfully.
backpressure — A signal from a slow part of the system telling the fast part to slow down.
batch — Work done in big groups on a schedule, not one item at a time.
cache — A fast store that holds copies of recent answers.
cache invalidation — Making the cache forget old answers when the underlying data changes.
CAP theorem — During a network partition, a system can keep consistency or availability, not both.
CDN (Content Delivery Network) — Servers around the world that hold copies of files near users.
cold start — Extra delay on the first call after a serverless function was idle.
columnar database — A database that stores data column by column; very fast for analytics on huge tables.
consistency — Every reader sees the latest write; no stale data.
constraint — A hard limit you must respect (time, money, traffic, data size, team size).
CQRS (Command Query Responsibility Segregation) — Split writes and reads into two paths with different stores.
data residency — Rules that force user data to stay inside a country or region.
dominant constraint — The single constraint that, if violated, ends the project.
durability — Once a write is confirmed, it survives crashes.
ETL (Extract, Transform, Load) — Read data from a source, change its shape, save it for analysis.
event — A record that something happened, with a time.
event log — An append-only list of events that many readers can consume and replay.
event sourcing — Store every change as an event; rebuild current state by replaying events.
fan-out — One input causes many parallel outputs.
FaaS (Function as a Service) — Short programs the cloud runs on demand, with no server you manage.
HTTP (HyperText Transfer Protocol) — The standard request-response protocol used by the web.
idempotent — Running the same operation twice has the same result as running it once.
index — A lookup helper that lets a database find rows fast without scanning everything.
KV store (Key-Value store) — A database that maps a key to a value, with very fast lookups by key.
latency — How long one request takes from start to finish, usually in milliseconds.
load balancer — A service that spreads requests across many servers.
logs (observability) — Text records of what happened; useful for debugging.
metrics — Numbers measured over time, like requests per second or error count.
microservices — Many small programs, each owning one capability, talking over the network.
modular monolith — One program with strict internal boundaries between modules.
monolith — One big program with one database that does everything.
network partition — A network failure that splits the system into groups that cannot reach each other.
non-functional requirement — A number the system must meet (latency, traffic, durability), not a feature.
p99 latency — The value that 99% of requests are faster than; describes the slow 1%.
partition (data) — Splitting data into pieces that live on different machines.
queue — A list of tasks; once a worker takes a task, it is removed.
rate limiter — Code that blocks a caller after too many requests.
read-heavy — Most traffic is reads; few writes.
relational database — A database with tables, columns, and strict rules between them.
replica — A copy of data on another machine, kept in sync with the original.
REST — A style of HTTP API where URLs name resources and HTTP verbs do actions.
reverse proxy — A server that sits in front of others and forwards requests.
RPO (Recovery Point Objective) — How much recent data you can afford to lose after a crash.
RPS (Requests Per Second) — The number of requests the system handles each second.
RTO (Recovery Time Objective) — How long the system can be down before you must be back online.
search index — A structure built for fast text or filter search, separate from the main database.
serverless — A runtime where you write functions and the cloud handles servers, scale, and idle time.
shard / sharding — Split a database into pieces by some key (user ID, region) so each piece is small.
synchronous (sync) — The caller waits until the work is done.
system design — Deciding how the parts of a software system fit together.
throughput — How many requests or items the system handles per second.
TLS (Transport Layer Security) — The standard that encrypts data in transit; the "S" in HTTPS.
traces — A record of one request as it moved through many services, with timing per step.
trade-off — When you take more of one good thing, you accept less of another.
transaction — A group of writes that all succeed or all fail together.
URL shortener — A service that turns a long web address into a short code that redirects to the long one.
write-heavy — Most traffic is writes; few reads.

Tools (alphabetical)

Algolia — Hosted search service, fast to integrate.
Apache Airflow — Workflow scheduler for data pipelines.
Apache Kafka — Distributed event log with replay; the standard.
Apache Spark — Distributed batch and stream processing.
Apigee — Enterprise API gateway, by Google.
AWS ALB (Application Load Balancer) — Managed L7 load balancer on AWS.
AWS API Gateway — Managed API gateway on AWS.
AWS CloudFront — AWS content delivery network.
AWS CloudFront Functions — Small functions at the AWS edge.
AWS EventBridge — Managed event bus on AWS.
AWS Glue — Managed ETL service on AWS.
AWS Kinesis — Managed Kafka-like streaming on AWS.
AWS Lambda — The original Function-as-a-Service, on AWS.
AWS RDS — Managed relational database service on AWS.
AWS S3 — Object storage on AWS; very durable, very cheap.
AWS SQS — Managed simple message queue on AWS.
Axon Framework — Java framework for CQRS and event sourcing.
BullMQ — Node.js job queue, runs on Redis.
Cassandra — Wide-column store, scales across many machines.
Celery — Python task queue.
Cloudflare — Global CDN, edge platform, and load balancer.
Cloudflare R2 — S3-compatible object storage, no egress fees.
Cloudflare Workers — Edge serverless platform.
Datadog — All-in-one observability SaaS.
dbt — SQL-based data transformations.
Django — Python web framework, common for monoliths.
Docker — Packages services as containers.
DynamoDB — Managed key-value store on AWS.
ELK — Elasticsearch + Logstash + Kibana log search stack.
Elasticsearch — Powerful, widely used search index.
Envoy — Modern proxy used as gateway and in service meshes.
EventStoreDB — Purpose-built event store for event sourcing.
Fastly — CDN and edge compute platform.
Fastly Compute@Edge — Fastly's edge runtime.
Firestore — Managed document store on Google Cloud.
Google Cloud Functions — Serverless functions on Google Cloud.
Google Cloud SQL — Managed relational database on Google Cloud.
Google Cloud Storage — Object storage on Google Cloud.
Google Pub/Sub — Managed messaging on Google Cloud; queue or fan-out modes.
Grafana — Metrics dashboards, open source.
gRPC — Efficient service-to-service RPC over HTTP/2.
HAProxy — Open-source load balancer, very fast.
Jaeger — Distributed tracing system.
Kong — Open-source API gateway with plugins.
Kubernetes — Container orchestration platform.
Laravel — PHP web framework.
Loki — Log aggregation system, by Grafana Labs.
MariaDB — MySQL-compatible open-source relational database.
Meilisearch — Lightweight, easy search engine.
Memcached — Simple in-memory key-value cache.
MinIO — Open-source S3-compatible object storage.
MongoDB — Document database, flexible JSON shape.
MySQL — Open-source relational database, widely deployed.
NATS — Lightweight messaging system.
Netflix Open Connect — Netflix's edge video CDN.
New Relic — All-in-one observability SaaS.
NGINX — High-performance web server, proxy, load balancer.
OpenSearch — Open-source fork of Elasticsearch.
OpenTelemetry — Open standard for collecting telemetry.
PostgreSQL — Open-source relational database; default modern choice.
Prometheus — Pull-based metrics database, open source.
RabbitMQ — Open-source message broker, rich routing rules.
Rails engines — Modular structure for Ruby on Rails apps.
Redis — In-memory cache, very fast, supports lists and sets.
Redpanda — Kafka-compatible streaming, simpler to run.
Ruby on Rails — Ruby web framework, classic monolith.
Sidekiq — Ruby background jobs, very fast.
Spring Boot — Java framework for monoliths and services.
Spring Modulith — Module structure for Spring monoliths.
Temporal — Durable workflow engine, retries built in.
Vercel — Serverless platform for web apps.
Vercel Edge — Edge runtime by Vercel.