Querying & Aggregation

This is the part you'll use every day. I'll keep it tight: CRUD, the operator toolbox, and the aggregation pipeline for when find isn't enough.

CRUD in one screen

db.users.insertOne({ name: "Avina", age: 28 })
db.users.insertMany([{ name: "A" }, { name: "B" }])

db.users.find({ age: { $gte: 18 } })
db.users.findOne({ email: "a@x.com" })

db.users.updateOne({ _id: id }, { $set: { age: 29 } })
db.users.updateMany({ active: false }, { $set: { active: true } })

db.users.deleteOne({ _id: id })
db.users.deleteMany({ active: false })

The #1 beginner trap

An update without an operator replaces the entire document. { age: 29 } wipes every other field. Always use $set (and friends). I learned this the hard way.

The query operator toolbox

Operators are how you express "WHERE." They go inside find(), updateMany(), and $match:

Group	Operators	Example
Compare	`$eq $ne $gt $gte $lt $lte`	`{ age: { $gte: 18 } }`
Sets	`$in $nin`	`{ role: { $in: ["admin","editor"] } }`
Logic	`$and $or $not $nor`	`{ $or: [{ a: 1 }, { b: 2 }] }`
Existence	`$exists $type`	`{ phone: { $exists: true } }`
Arrays	`$all $elemMatch $size`	`{ items: { $elemMatch: { qty: { $gt: 5 } } } }`
Text/Eval	`$regex $text $expr $mod`	`{ $expr: { $gt: ["$spent", "$budget"] } }` (compare two fields)

Two everyday tricks:

db.users.find({ "profile.city": "NYC" })   // dot notation reaches into nested objects
db.users.find({ roles: "admin" })          // matches if the array CONTAINS "admin"

Update operators — `$set` is just the start

The filter half of an update reuses the toolbox above; the change half has its own operators. Knowing these is the difference between one atomic round-trip and a read-modify-write race:

Operator	Does
`$set` / `$unset`	set or remove a field
`$inc` / `$mul`	increment / multiply a number atomically (no read first)
`$min` / `$max`	write only if the new value is lower / higher
`$rename`	rename a field
`$push` / `$addToSet`	append to an array / append only if absent (set semantics)
`$pull` / `$pop`	remove matching elements / remove first or last
`$setOnInsert`	set a field only when an upsert actually inserts

Three things that level up your writes:

Positional array updates: $ hits the first matching element, $[] hits all, and $[<id>] + arrayFilters hits a filtered subset — e.g. bump the qty of one line item without rewriting the array.
Upsert ({ upsert: true }): update if it exists, insert if it doesn't — in one atomic op.
findOneAndUpdate returns the document (before or after) in the same operation, and bulkWrite batches mixed inserts/updates/deletes — ordered: false lets them run in parallel and not stop at the first error.

Shaping results

db.users.find({}, { name: 1, email: 1 })           // projection: only these fields
db.users.find().sort({ createdAt: -1 }).skip(20).limit(10)
db.users.countDocuments({ active: true })

For big datasets, prefer range pagination (find({ _id: { $gt: lastId } }).limit(n)) over skip — skip(20000) still walks 20,000 entries before returning anything.

Aggregation: when `find` isn't enough

When you need to group, compute, or join, you build a pipeline — an array of stages, each transforming documents and passing them to the next. If you know .filter().map().reduce(), you already get this.

db.orders.aggregate([
  { $match:  { status: "paid" } },
  { $group:  { _id: "$customerId", total: { $sum: "$amount" } } },
  { $sort:   { total: -1 } },
  { $limit:  5 },
  { $project:{ _id: 0, customerId: "$_id", total: 1 } }
])

The stages I reach for most:

Stage	Does	Array analogy
`$match`	filter	`.filter()`
`$group`	group + aggregate (sum/avg/count)	`.reduce()`
`$project`	choose / rename / compute fields	`.map()`
`$sort`	order	`.sort()`
`$unwind`	explode an array → one doc per element	flatten
`$lookup`	join another collection	merge by key

Two things that'll save you:

"$amount" (with $) means the value of that field; "amount" is a literal string.
Put $match and $sort early so they can use indexes and shrink the dataset before the expensive stages run.

The power stages worth knowing exist

Once $group/$lookup aren't enough, these are the ones that turn the pipeline into a real compute engine:

Stage	Does	Real use
`$addFields` / `$set`	add computed fields without dropping the rest	derive `fullName`, flags
`$replaceRoot` / `$replaceWith`	promote a sub-document to the top level	unwrap nested results
`$facet`	run several sub-pipelines in one pass	results + total count + filters for a search page
`$bucket` / `$bucketAuto`	group values into ranges (a histogram)	price/age distribution charts
`$setWindowFields`	window functions over a partition	running totals, moving averages, rank (5.0+)
`$graphLookup`	recursive self-join	org charts, category trees, social graphs
`$merge` / `$out`	write the results back to a collection	materialized views — `$merge` upserts incrementally, `$out` replaces wholesale
`$unionWith` / `$sample` / `$count`	concat collections / random sample / count	reporting odds and ends

Two pipeline gotchas

Each stage is capped at 100 MB of RAM — a big $group or $sort that exceeds it errors unless you pass { allowDiskUse: true }. And aggregation honors read concern, so you can run heavy analytics pipelines against a secondary to keep load off the primary.

Recap

insert / find / update($set!) / delete plus the operator toolbox covers daily work. Shape results with projection, sort, limit. When you need grouping or joins, reach for the aggregation pipeline — $match → $group → $sort → $project — keep the cheap, index-friendly stages first, and remember the power stages ($facet, $setWindowFields, $graphLookup, $merge) for when it has to do real work.

👉 Next: Data Modeling

CRUD in one screen​

The query operator toolbox​

Update operators — $set is just the start​

Shaping results​

Aggregation: when find isn't enough​

The power stages worth knowing exist​

Recap​