Indexing
Indexing is where "it works on my laptop" turns into "it works with ten million documents." Here's the data structure underneath — once you see it, you'll know why a query is fast or slow without guessing.
The problem indexes solve
Without an index, this query has to read every single document and test it:
db.users.find({ email: "a@x.com" })
That's a collection scan — O(n). Fine for 100 docs, fatal for 10 million. An index is a separate, sorted structure that lets MongoDB jump straight to the match, turning that into roughly O(log n).
The structure: a B-tree
MongoDB indexes are B-trees (WiredTiger implements them as B+ trees). This is the same structure Postgres and MySQL use — learn it once, recognize it everywhere.
A B-tree is a sorted, self-balancing tree where each node holds many keys and many children:
Why "many keys per node" instead of a binary tree? Disk reads. Each node is sized to a page (one I/O). High fan-out makes the tree extremely shallow — billions of keys sit only ~3–4 levels deep. So a lookup is ~3–4 page reads instead of ~30. B-trees exist to minimize disk reads, which is the whole game.
The B+ tree refinement (what WiredTiger uses): all real values live in the leaf nodes, and leaves are linked in sorted order like a linked list. That's why these are all fast on an indexed field:
| Query shape | Why it's fast |
|---|---|
Equality (email = x) | Walk root → leaf, O(log n) |
Range (age >= 18 && age <= 30) | Find 18, then walk the linked leaves to 30 |
Sort (sort({ age: 1 })) | The index is already sorted — no in-memory sort needed |
The thing that finally made B-trees click for me was watching them rebalance. Open the USFCA B-tree visualizer and insert keys one at a time — watch a node fill up, split, and push a key up to its parent. That split-and-promote is exactly how the tree stays shallow and balanced no matter what order the data arrives in.
Creating and reading indexes
db.users.createIndex({ email: 1 })
db.users.createIndex({ email: 1 }, { unique: true })
db.users.createIndex({ lastName: 1, firstName: 1 })
db.users.getIndexes()
1 = ascending, -1 = descending. _id is always indexed automatically.
The compound index rule everyone trips on
A compound index { a: 1, b: 1, c: 1 } is one tree sorted by a, then b, then c — like sorting a spreadsheet by column A, then B, then C. It can serve a query that uses a left prefix of those keys:
A handy ordering heuristic is ESR: put Equality fields first, then Sort fields, then Range fields.
The index types beyond the basics
A plain single/compound index is just the start. The tree is the same B-tree underneath — these types change what key gets stored in it:
| Type | What it indexes | Reach for it when |
|---|---|---|
| Multikey | created automatically the moment you index an array field — one index entry per element | you query inside arrays ({ roles: "admin" }). Caveat: you can't compound-index two array fields together |
| TTL | a single Date field; a background thread deletes docs N seconds past it | sessions, logs, carts, anything self-expiring |
| Partial | only the docs matching a filter — a smaller, cheaper tree | you only ever query a subset (e.g. { status: "active" }); prefer this over sparse |
| Sparse | only docs where the field exists | an optional field most docs lack |
| Text | tokenized words (stemming, stop-words) for $text search | basic keyword search (one text index per collection) |
| 2dsphere | GeoJSON points/shapes | geo queries — $near, $geoWithin |
| Hashed | the hash of a field's value | hashed sharding & equality (never range) |
| Wildcard | { "$**": 1 } — arbitrary/unknown field names | grab-bag documents where you can't predict the keys |
Two properties worth knowing: collation makes an index case/locale-insensitive, and a hidden index is maintained but invisible to the planner — perfect for testing whether dropping an index hurts, with an instant rollback.
Covered queries — a free win
If a query's filter and the fields it returns are all inside the index, MongoDB answers entirely from the index tree and never fetches the document:
db.users.createIndex({ email: 1, name: 1 })
db.users.find({ email: "a@x.com" }, { _id: 0, email: 1, name: 1 })
Don't index everything
Every index is another B-tree that must be updated on every write, and it competes for that precious RAM cache. So: index the fields you actually filter and sort on, prefer compound indexes that serve several queries, and drop the ones nobody uses.
Prove it with explain()
MongoDB has a query planner that picks an index. You can see its choice:
db.users.find({ email: "a@x.com" }).explain("executionStats")
What I look for:
IXSCAN(index scan) = 🟢 good.COLLSCAN(collection scan) = 🔴 no usable index.- The
winningPlanis what ran;rejectedPlansare the alternatives the planner raced and lost. The planner actually trial-runs candidates and caches the winner for similar queries. - The three numbers that tell the truth:
keysExamined→docsExamined→nReturned. The dream is1 : 1 : 1(orn : 0 : nfor a covered query). If you examined 1,000,000 keys/docs to return 10, your index is wrong or missing.
Recap
An index is a separate, sorted B+ tree. High fan-out keeps it shallow (few disk reads); sorted, linked leaves make equality, range, and sort
O(log n). Compound indexes follow the left-prefix / ESR rule, covered queries skip the document fetch, andexplain()tells you the truth — aim forIXSCAN.
👉 Next: Querying & Aggregation