Scaling
A real MongoDB deployment is never one server. There are two separate mechanisms here, and people constantly confuse them, so let me draw a clean line: replication is for survival, sharding is for size.
Replication: replica sets (for survival)
A replica set is a group of servers holding the same data.
- One Primary takes all writes. Secondaries copy them.
- Secondaries stay in sync by tailing the Primary's oplog — a capped, ring-buffer collection recording every write as an idempotent operation. (Change streams and CDC tools read this same oplog.)
- If the Primary dies, the survivors hold an election and promote a Secondary — automatic failover in seconds. This is why you want an odd number of voting members: to break ties.
Two knobs control the safety/speed trade-off:
| Knob | Meaning | Safe choice |
|---|---|---|
Write concern (w) | How many nodes must ack a write | w: "majority" for important data |
| Read preference | Where reads are routed | Primary for fresh data; secondaries to scale reads (may be slightly stale) |
That "slightly stale" is replication lag — read from a secondary and you might see data a moment behind the primary. That's eventual consistency, and it's a deliberate trade you opt into.
Sharding: horizontal scale (for size)
When the data or write throughput outgrows a single machine, you shard — partition the collection across many replica sets.
- You pick a shard key. MongoDB splits its range into chunks and spreads them across shards.
- A
mongosrouter sits in front; your app talks to it, and it routes each query to the right shard(s) using metadata from the config servers. A balancer keeps chunks evenly distributed.
The shard key is the whole ballgame
Choosing it well is hard and changing it later is painful. The failure mode to avoid is a hot shard:
- A monotonically increasing key (a timestamp, or
_id) sends all new writes to one shard — you bought N machines and use one. - A hashed shard key spreads writes evenly. A ranged key keeps range queries local. Pick something high-cardinality and evenly accessed.
You can also pin ranges of data to specific shards with zone sharding (tag-aware) — e.g. keep EU users on EU-region shards for data-residency, or hot data on faster hardware.
Change streams — subscribe to every write
The oplog isn't just for secondaries. Change streams are a real-time API on top of it: open one and you get an ordered feed of every insert/update/delete as it happens.
db.orders.watch([{ $match: { operationType: "insert" } }])
- It's resumable — each event carries a resume token, so after a disconnect you pick up exactly where you left off, no events lost or doubled.
- Because it reads the oplog, it needs a replica set (even a single-node one).
- This is what powers cache invalidation, live dashboards, notifications, and feeding data into search/ETL — the clean alternative to polling the database.
You've seen these ideas before
Notice that replication and sharding/partitioning aren't MongoDB-specific — they're the universal tools of distributed data, and you'll meet them again in Postgres, in Kafka, and in every system-design interview. Learning them here means you already understand them everywhere.
Recap
Replica sets = the same data on several nodes (Primary + Secondaries, oplog, elections, write/read concern) for availability and durability. Sharding = different data on different nodes (shard key → chunks →
mongosrouter) for scale. Change streams turn the oplog into a resumable event feed. Replication is for survival; sharding is for size.
👉 Next: Transactions & Consistency