Skip to main content

Querying & Aggregation

This is the part you'll use every day. I'll keep it tight: CRUD, the operator toolbox, and the aggregation pipeline for when find isn't enough.

CRUD in one screen

db.users.insertOne({ name: "Avina", age: 28 })
db.users.insertMany([{ name: "A" }, { name: "B" }])

db.users.find({ age: { $gte: 18 } })
db.users.findOne({ email: "a@x.com" })

db.users.updateOne({ _id: id }, { $set: { age: 29 } })
db.users.updateMany({ active: false }, { $set: { active: true } })

db.users.deleteOne({ _id: id })
db.users.deleteMany({ active: false })
The #1 beginner trap

An update without an operator replaces the entire document. { age: 29 } wipes every other field. Always use $set (and friends). I learned this the hard way.

The query operator toolbox

Operators are how you express "WHERE." They go inside find(), updateMany(), and $match:

GroupOperatorsExample
Compare$eq $ne $gt $gte $lt $lte{ age: { $gte: 18 } }
Sets$in $nin{ role: { $in: ["admin","editor"] } }
Logic$and $or $not $nor{ $or: [{ a: 1 }, { b: 2 }] }
Existence$exists $type{ phone: { $exists: true } }
Arrays$all $elemMatch $size{ items: { $elemMatch: { qty: { $gt: 5 } } } }
Text/Eval$regex $text $expr $mod{ $expr: { $gt: ["$spent", "$budget"] } } (compare two fields)

Two everyday tricks:

db.users.find({ "profile.city": "NYC" }) // dot notation reaches into nested objects
db.users.find({ roles: "admin" }) // matches if the array CONTAINS "admin"

Update operators — $set is just the start

The filter half of an update reuses the toolbox above; the change half has its own operators. Knowing these is the difference between one atomic round-trip and a read-modify-write race:

OperatorDoes
$set / $unsetset or remove a field
$inc / $mulincrement / multiply a number atomically (no read first)
$min / $maxwrite only if the new value is lower / higher
$renamerename a field
$push / $addToSetappend to an array / append only if absent (set semantics)
$pull / $popremove matching elements / remove first or last
$setOnInsertset a field only when an upsert actually inserts

Three things that level up your writes:

  • Positional array updates: $ hits the first matching element, $[] hits all, and $[<id>] + arrayFilters hits a filtered subset — e.g. bump the qty of one line item without rewriting the array.
  • Upsert ({ upsert: true }): update if it exists, insert if it doesn't — in one atomic op.
  • findOneAndUpdate returns the document (before or after) in the same operation, and bulkWrite batches mixed inserts/updates/deletes — ordered: false lets them run in parallel and not stop at the first error.

Shaping results

db.users.find({}, { name: 1, email: 1 }) // projection: only these fields
db.users.find().sort({ createdAt: -1 }).skip(20).limit(10)
db.users.countDocuments({ active: true })

For big datasets, prefer range pagination (find({ _id: { $gt: lastId } }).limit(n)) over skipskip(20000) still walks 20,000 entries before returning anything.

Aggregation: when find isn't enough

When you need to group, compute, or join, you build a pipeline — an array of stages, each transforming documents and passing them to the next. If you know .filter().map().reduce(), you already get this.

db.orders.aggregate([
{ $match: { status: "paid" } },
{ $group: { _id: "$customerId", total: { $sum: "$amount" } } },
{ $sort: { total: -1 } },
{ $limit: 5 },
{ $project:{ _id: 0, customerId: "$_id", total: 1 } }
])

The stages I reach for most:

StageDoesArray analogy
$matchfilter.filter()
$groupgroup + aggregate (sum/avg/count).reduce()
$projectchoose / rename / compute fields.map()
$sortorder.sort()
$unwindexplode an array → one doc per elementflatten
$lookupjoin another collectionmerge by key

Two things that'll save you:

  • "$amount" (with $) means the value of that field; "amount" is a literal string.
  • Put $match and $sort early so they can use indexes and shrink the dataset before the expensive stages run.

The power stages worth knowing exist

Once $group/$lookup aren't enough, these are the ones that turn the pipeline into a real compute engine:

StageDoesReal use
$addFields / $setadd computed fields without dropping the restderive fullName, flags
$replaceRoot / $replaceWithpromote a sub-document to the top levelunwrap nested results
$facetrun several sub-pipelines in one passresults + total count + filters for a search page
$bucket / $bucketAutogroup values into ranges (a histogram)price/age distribution charts
$setWindowFieldswindow functions over a partitionrunning totals, moving averages, rank (5.0+)
$graphLookuprecursive self-joinorg charts, category trees, social graphs
$merge / $outwrite the results back to a collectionmaterialized views — $merge upserts incrementally, $out replaces wholesale
$unionWith / $sample / $countconcat collections / random sample / countreporting odds and ends
Two pipeline gotchas

Each stage is capped at 100 MB of RAM — a big $group or $sort that exceeds it errors unless you pass { allowDiskUse: true }. And aggregation honors read concern, so you can run heavy analytics pipelines against a secondary to keep load off the primary.

Recap

insert / find / update($set!) / delete plus the operator toolbox covers daily work. Shape results with projection, sort, limit. When you need grouping or joins, reach for the aggregation pipeline$match → $group → $sort → $project — keep the cheap, index-friendly stages first, and remember the power stages ($facet, $setWindowFields, $graphLookup, $merge) for when it has to do real work.

👉 Next: Data Modeling