Indexing

Indexing is where "it works on my laptop" turns into "it works with ten million documents." Here's the data structure underneath — once you see it, you'll know why a query is fast or slow without guessing.

The problem indexes solve

Without an index, this query has to read every single document and test it:

db.users.find({ email: "a@x.com" })

That's a collection scan — O(n). Fine for 100 docs, fatal for 10 million. An index is a separate, sorted structure that lets MongoDB jump straight to the match, turning that into roughly O(log n).

The structure: a B-tree

MongoDB indexes are B-trees (WiredTiger implements them as B+ trees). This is the same structure Postgres and MySQL use — learn it once, recognize it everywhere.

A B-tree is a sorted, self-balancing tree where each node holds many keys and many children:

Why "many keys per node" instead of a binary tree? Disk reads. Each node is sized to a page (one I/O). High fan-out makes the tree extremely shallow — billions of keys sit only ~3–4 levels deep. So a lookup is ~3–4 page reads instead of ~30. B-trees exist to minimize disk reads, which is the whole game.

The B+ tree refinement (what WiredTiger uses): all real values live in the leaf nodes, and leaves are linked in sorted order like a linked list. That's why these are all fast on an indexed field:

Query shape	Why it's fast
Equality (`email = x`)	Walk root → leaf, `O(log n)`
Range (`age >= 18 && age <= 30`)	Find 18, then walk the linked leaves to 30
Sort (`sort({ age: 1 })`)	The index is already sorted — no in-memory sort needed

Play with one live

The thing that finally made B-trees click for me was watching them rebalance. Open the USFCA B-tree visualizer and insert keys one at a time — watch a node fill up, split, and push a key up to its parent. That split-and-promote is exactly how the tree stays shallow and balanced no matter what order the data arrives in.

Creating and reading indexes

db.users.createIndex({ email: 1 })
db.users.createIndex({ email: 1 }, { unique: true })
db.users.createIndex({ lastName: 1, firstName: 1 })
db.users.getIndexes()

1 = ascending, -1 = descending. _id is always indexed automatically.

The compound index rule everyone trips on

A compound index { a: 1, b: 1, c: 1 } is one tree sorted by a, then b, then c — like sorting a spreadsheet by column A, then B, then C. It can serve a query that uses a left prefix of those keys:

A handy ordering heuristic is ESR: put Equality fields first, then Sort fields, then Range fields.

The index types beyond the basics

A plain single/compound index is just the start. The tree is the same B-tree underneath — these types change what key gets stored in it:

Type	What it indexes	Reach for it when
Multikey	created automatically the moment you index an array field — one index entry per element	you query inside arrays (`{ roles: "admin" }`). Caveat: you can't compound-index two array fields together
TTL	a single `Date` field; a background thread deletes docs N seconds past it	sessions, logs, carts, anything self-expiring
Partial	only the docs matching a filter — a smaller, cheaper tree	you only ever query a subset (e.g. `{ status: "active" }`); prefer this over sparse
Sparse	only docs where the field exists	an optional field most docs lack
Text	tokenized words (stemming, stop-words) for `$text` search	basic keyword search (one text index per collection)
2dsphere	GeoJSON points/shapes	geo queries — `$near`, `$geoWithin`
Hashed	the hash of a field's value	hashed sharding & equality (never range)
Wildcard	`{ "$**": 1 }` — arbitrary/unknown field names	grab-bag documents where you can't predict the keys

Two properties worth knowing: collation makes an index case/locale-insensitive, and a hidden index is maintained but invisible to the planner — perfect for testing whether dropping an index hurts, with an instant rollback.

Covered queries — a free win

If a query's filter and the fields it returns are all inside the index, MongoDB answers entirely from the index tree and never fetches the document:

db.users.createIndex({ email: 1, name: 1 })
db.users.find({ email: "a@x.com" }, { _id: 0, email: 1, name: 1 })

Don't index everything

Every index is another B-tree that must be updated on every write, and it competes for that precious RAM cache. So: index the fields you actually filter and sort on, prefer compound indexes that serve several queries, and drop the ones nobody uses.

Prove it with `explain()`

MongoDB has a query planner that picks an index. You can see its choice:

db.users.find({ email: "a@x.com" }).explain("executionStats")

What I look for:

IXSCAN (index scan) = 🟢 good. COLLSCAN (collection scan) = 🔴 no usable index.
The winningPlan is what ran; rejectedPlans are the alternatives the planner raced and lost. The planner actually trial-runs candidates and caches the winner for similar queries.
The three numbers that tell the truth: keysExamined → docsExamined → nReturned. The dream is 1 : 1 : 1 (or n : 0 : n for a covered query). If you examined 1,000,000 keys/docs to return 10, your index is wrong or missing.

Recap

An index is a separate, sorted B+ tree. High fan-out keeps it shallow (few disk reads); sorted, linked leaves make equality, range, and sort O(log n). Compound indexes follow the left-prefix / ESR rule, covered queries skip the document fetch, and explain() tells you the truth — aim for IXSCAN.

👉 Next: Querying & Aggregation

The problem indexes solve​

The structure: a B-tree​

Creating and reading indexes​

The compound index rule everyone trips on​

The index types beyond the basics​

Covered queries — a free win​

Don't index everything​

Prove it with explain()​

Recap​