Querying & Aggregation
This is the part you'll use every day. I'll keep it tight: CRUD, the operator toolbox, and the aggregation pipeline for when find isn't enough.
CRUD in one screen
db.users.insertOne({ name: "Avina", age: 28 })
db.users.insertMany([{ name: "A" }, { name: "B" }])
db.users.find({ age: { $gte: 18 } })
db.users.findOne({ email: "a@x.com" })
db.users.updateOne({ _id: id }, { $set: { age: 29 } })
db.users.updateMany({ active: false }, { $set: { active: true } })
db.users.deleteOne({ _id: id })
db.users.deleteMany({ active: false })
An update without an operator replaces the entire document. { age: 29 } wipes every other field. Always use $set (and friends). I learned this the hard way.
The query operator toolbox
Operators are how you express "WHERE." They go inside find(), updateMany(), and $match:
| Group | Operators | Example |
|---|---|---|
| Compare | $eq $ne $gt $gte $lt $lte | { age: { $gte: 18 } } |
| Sets | $in $nin | { role: { $in: ["admin","editor"] } } |
| Logic | $and $or $not $nor | { $or: [{ a: 1 }, { b: 2 }] } |
| Existence | $exists $type | { phone: { $exists: true } } |
| Arrays | $all $elemMatch $size | { items: { $elemMatch: { qty: { $gt: 5 } } } } |
| Text/Eval | $regex $text $expr $mod | { $expr: { $gt: ["$spent", "$budget"] } } (compare two fields) |
Two everyday tricks:
db.users.find({ "profile.city": "NYC" }) // dot notation reaches into nested objects
db.users.find({ roles: "admin" }) // matches if the array CONTAINS "admin"
Update operators — $set is just the start
The filter half of an update reuses the toolbox above; the change half has its own operators. Knowing these is the difference between one atomic round-trip and a read-modify-write race:
| Operator | Does |
|---|---|
$set / $unset | set or remove a field |
$inc / $mul | increment / multiply a number atomically (no read first) |
$min / $max | write only if the new value is lower / higher |
$rename | rename a field |
$push / $addToSet | append to an array / append only if absent (set semantics) |
$pull / $pop | remove matching elements / remove first or last |
$setOnInsert | set a field only when an upsert actually inserts |
Three things that level up your writes:
- Positional array updates:
$hits the first matching element,$[]hits all, and$[<id>]+arrayFiltershits a filtered subset — e.g. bump the qty of one line item without rewriting the array. - Upsert (
{ upsert: true }): update if it exists, insert if it doesn't — in one atomic op. findOneAndUpdatereturns the document (before or after) in the same operation, andbulkWritebatches mixed inserts/updates/deletes —ordered: falselets them run in parallel and not stop at the first error.
Shaping results
db.users.find({}, { name: 1, email: 1 }) // projection: only these fields
db.users.find().sort({ createdAt: -1 }).skip(20).limit(10)
db.users.countDocuments({ active: true })
For big datasets, prefer range pagination (find({ _id: { $gt: lastId } }).limit(n)) over skip — skip(20000) still walks 20,000 entries before returning anything.
Aggregation: when find isn't enough
When you need to group, compute, or join, you build a pipeline — an array of stages, each transforming documents and passing them to the next. If you know .filter().map().reduce(), you already get this.
db.orders.aggregate([
{ $match: { status: "paid" } },
{ $group: { _id: "$customerId", total: { $sum: "$amount" } } },
{ $sort: { total: -1 } },
{ $limit: 5 },
{ $project:{ _id: 0, customerId: "$_id", total: 1 } }
])
The stages I reach for most:
| Stage | Does | Array analogy |
|---|---|---|
$match | filter | .filter() |
$group | group + aggregate (sum/avg/count) | .reduce() |
$project | choose / rename / compute fields | .map() |
$sort | order | .sort() |
$unwind | explode an array → one doc per element | flatten |
$lookup | join another collection | merge by key |
Two things that'll save you:
"$amount"(with$) means the value of that field;"amount"is a literal string.- Put
$matchand$sortearly so they can use indexes and shrink the dataset before the expensive stages run.
The power stages worth knowing exist
Once $group/$lookup aren't enough, these are the ones that turn the pipeline into a real compute engine:
| Stage | Does | Real use |
|---|---|---|
$addFields / $set | add computed fields without dropping the rest | derive fullName, flags |
$replaceRoot / $replaceWith | promote a sub-document to the top level | unwrap nested results |
$facet | run several sub-pipelines in one pass | results + total count + filters for a search page |
$bucket / $bucketAuto | group values into ranges (a histogram) | price/age distribution charts |
$setWindowFields | window functions over a partition | running totals, moving averages, rank (5.0+) |
$graphLookup | recursive self-join | org charts, category trees, social graphs |
$merge / $out | write the results back to a collection | materialized views — $merge upserts incrementally, $out replaces wholesale |
$unionWith / $sample / $count | concat collections / random sample / count | reporting odds and ends |
Each stage is capped at 100 MB of RAM — a big $group or $sort that exceeds it errors unless you pass { allowDiskUse: true }. And aggregation honors read concern, so you can run heavy analytics pipelines against a secondary to keep load off the primary.
Recap
insert / find / update($set!) / deleteplus the operator toolbox covers daily work. Shape results with projection,sort,limit. When you need grouping or joins, reach for the aggregation pipeline —$match → $group → $sort → $project— keep the cheap, index-friendly stages first, and remember the power stages ($facet,$setWindowFields,$graphLookup,$merge) for when it has to do real work.
👉 Next: Data Modeling