Skip to main content

Scaling

A real MongoDB deployment is never one server. There are two separate mechanisms here, and people constantly confuse them, so let me draw a clean line: replication is for survival, sharding is for size.

Replication: replica sets (for survival)

A replica set is a group of servers holding the same data.

  • One Primary takes all writes. Secondaries copy them.
  • Secondaries stay in sync by tailing the Primary's oplog — a capped, ring-buffer collection recording every write as an idempotent operation. (Change streams and CDC tools read this same oplog.)
  • If the Primary dies, the survivors hold an election and promote a Secondary — automatic failover in seconds. This is why you want an odd number of voting members: to break ties.

Two knobs control the safety/speed trade-off:

KnobMeaningSafe choice
Write concern (w)How many nodes must ack a writew: "majority" for important data
Read preferenceWhere reads are routedPrimary for fresh data; secondaries to scale reads (may be slightly stale)

That "slightly stale" is replication lag — read from a secondary and you might see data a moment behind the primary. That's eventual consistency, and it's a deliberate trade you opt into.

Sharding: horizontal scale (for size)

When the data or write throughput outgrows a single machine, you shard — partition the collection across many replica sets.

  • You pick a shard key. MongoDB splits its range into chunks and spreads them across shards.
  • A mongos router sits in front; your app talks to it, and it routes each query to the right shard(s) using metadata from the config servers. A balancer keeps chunks evenly distributed.

The shard key is the whole ballgame

Choosing it well is hard and changing it later is painful. The failure mode to avoid is a hot shard:

  • A monotonically increasing key (a timestamp, or _id) sends all new writes to one shard — you bought N machines and use one.
  • A hashed shard key spreads writes evenly. A ranged key keeps range queries local. Pick something high-cardinality and evenly accessed.

You can also pin ranges of data to specific shards with zone sharding (tag-aware) — e.g. keep EU users on EU-region shards for data-residency, or hot data on faster hardware.

Change streams — subscribe to every write

The oplog isn't just for secondaries. Change streams are a real-time API on top of it: open one and you get an ordered feed of every insert/update/delete as it happens.

db.orders.watch([{ $match: { operationType: "insert" } }])
  • It's resumable — each event carries a resume token, so after a disconnect you pick up exactly where you left off, no events lost or doubled.
  • Because it reads the oplog, it needs a replica set (even a single-node one).
  • This is what powers cache invalidation, live dashboards, notifications, and feeding data into search/ETL — the clean alternative to polling the database.

You've seen these ideas before

Notice that replication and sharding/partitioning aren't MongoDB-specific — they're the universal tools of distributed data, and you'll meet them again in Postgres, in Kafka, and in every system-design interview. Learning them here means you already understand them everywhere.

Recap

Replica sets = the same data on several nodes (Primary + Secondaries, oplog, elections, write/read concern) for availability and durability. Sharding = different data on different nodes (shard key → chunks → mongos router) for scale. Change streams turn the oplog into a resumable event feed. Replication is for survival; sharding is for size.

👉 Next: Transactions & Consistency