High level ideas

  • We are sifting through an entire collection and transforming it stage-by-stage to get a result.

  • All pipelines are just arrays of javascript objects

  • $ signs have signifigant meaning in these queries

$match

Use Query Selectors to filter documents.

You should almost always have a $match stage at the beginning of any pipeline you run.

The Usual suspects

  • $eq

  • $ne

  • $gt/gte

  • $lt/lte

Insert some documents
db.articles.insertMany([
{ "author" : "dave", "score" : 80, "views" : 100 },
{ "author" : "dave", "score" : 85, "views" : 521 },
{ "author" : "ahn", "score" : 60, "views" : 1000 },
{ "author" : "li", "score" : 55, "views" : 5000 },
{ "author" : "annT", "score" : 60, "views" : 50 },
{ "author" : "li", "score" : 94, "views" : 999 },
{ "author" : "ty", "score" : 95, "views" : 1000 }
]);
We can match an exact literal value
db.articles.aggregate(
    [ { $match : { author : "dave" } } ]
);

/** Returns:
{ "_id" : ObjectId("512bc95fe835e68f199c8686"), "author" : "dave", "score" : 80, "views" : 100 }
{ "_id" : ObjectId("512bc962e835e68f199c8687"), "author" : "dave", "score" : 85, "views" : 521 }
*/
Or match any view count that is not equal to a value
db.articles.aggregate(
    [ { $match : { views : {$ne: 1000 } } } ]
);

/** Returns:
{ "_id" : ObjectId("512bc95fe835e68f199c8686"), "author" : "dave", "score" : 80, "views" : 100 }
{ "_id" : ObjectId("512bc962e835e68f199c8687"), "author" : "dave", "score" : 85, "views" : 521 }
{ "_id" : ObjectId("55f5a192d4bede9ac365b258"), "author" : "li", "score" : 55, "views" : 5000 }
{ "_id" : ObjectId("55f5a1d3d4bede9ac365b259"), "author" : "annT", "score" : 60, "views" : 50 }
{ "_id" : ObjectId("55f5a1d3d4bede9ac365b25a"), "author" : "li", "score" : 94, "views" : 999 }
*/
Or perhaps look for people with scores less than 90
db.articles.aggregate(
    [ { $match : { score: {$lt: 90} } } ]
);

/** Returns:
{ "_id" : ObjectId("512bc95fe835e68f199c8686"), "author" : "dave", "score" : 80, "views" : 100 }
{ "_id" : ObjectId("512bc962e835e68f199c8687"), "author" : "dave", "score" : 85, "views" : 521 }
{ "_id" : ObjectId("55f5a192d4bede9ac365b257"), "author" : "ahn", "score" : 60, "views" : 1000 }
{ "_id" : ObjectId("55f5a192d4bede9ac365b258"), "author" : "li", "score" : 55, "views" : 5000 }
{ "_id" : ObjectId("55f5a1d3d4bede9ac365b259"), "author" : "annT", "score" : 60, "views" : 50 }
*/
Cleanup
db.articles.drop(); // Cleanup

Joining terms

  • $and

  • $not

  • $nor

  • $or

All of these selectors take an array of sub-query selectors to create a more complex expression. For example:

Arrays with $in and $nin

These selectors take an array of values which we wish the value in a matching document to either have or not.

db.inventory.insertMany([
	{ name: "item 1", qty: 1 },
	{ name: "item 2", qty: 5 },
	{ name: "item 3", qty: 6 },
	{ name: "item 4", qty: 15 },
	{ name: "item 5", qty: 3 }
]);

db.inventory.aggregate( {$match: { qty: { $in: [ 5, 15 ] } }} );

/** Returns:
{ name: "item 2", qty: 5 },
{ name: "item 4", qty: 15 }
*/

db.inventory.aggregate( {$match: { qty: { $nin: [ 5, 15 ] } }} );

/** Returns:
{ name: "item 1", qty: 1 },
{ name: "item 3", qty: 6 },
{ name: "item 5", qty: 3 }
*/

db.inventory.drop(); // cleanup

$project

Project renames fields and performs mutations of documents before being passed to the next pipeline stage.

You will probably want a $project pipeline stage at the end of any pipeline you run to beautify the output. In addition, it is useful to rename fields mid-pipeline to ease query writing and to make pipelines compatible with one another.

Some sample data
db.articles.insertMany([
{ "author" : "dave", "score" : 80, "views" : 100 },
{ "author" : "dave", "score" : 85, "views" : 521 },
{ "author" : "ahn", "score" : 60, "views" : 1000 },
{ "author" : "li", "score" : 55, "views" : 5000 },
{ "author" : "annT", "score" : 60, "views" : 50 },
{ "author" : "li", "score" : 94, "views" : 999 },
{ "author" : "ty", "score" : 95, "views" : 1000 }
]);

Specify fields to be included or excluded

We can include specific fields by specifying their key with a true value
db.articles.aggregate([
	{$project:{
		author: 1,
		score: 1
	}}
]);

/** Returns: (NOTE: _id came along for the ride)
{ "_id" : ObjectId("5ba2b5e92184ad6b87653a41"), "author" : "dave", "score" : 80 }
{ "_id" : ObjectId("5ba2b5e92184ad6b87653a42"), "author" : "dave", "score" : 85 }
{ "_id" : ObjectId("5ba2b5e92184ad6b87653a43"), "author" : "ahn", "score" : 60 }
{ "_id" : ObjectId("5ba2b5e92184ad6b87653a44"), "author" : "li", "score" : 55 }
{ "_id" : ObjectId("5ba2b5e92184ad6b87653a45"), "author" : "annT", "score" : 60 }
{ "_id" : ObjectId("5ba2b5e92184ad6b87653a46"), "author" : "li", "score" : 94 }
{ "_id" : ObjectId("5ba2b5e92184ad6b87653a47"), "author" : "ty", "score" : 95 }
*/
We can exclude only specific fields by specifying their key with a falsey value
db.articles.aggregate([
	{$project:{
		author: 0,
	}}
]);

/** Returns: (NOTE: _id came along for the ride)
{ "_id" : ObjectId("5ba2b5e92184ad6b87653a41"), "score" : 80, "views" : 100 }
{ "_id" : ObjectId("5ba2b5e92184ad6b87653a42"), "score" : 85, "views" : 521 }
{ "_id" : ObjectId("5ba2b5e92184ad6b87653a43"), "score" : 60, "views" : 1000 }
{ "_id" : ObjectId("5ba2b5e92184ad6b87653a44"), "score" : 55, "views" : 5000 }
{ "_id" : ObjectId("5ba2b5e92184ad6b87653a45"), "score" : 60, "views" : 50 }
{ "_id" : ObjectId("5ba2b5e92184ad6b87653a46"), "score" : 94, "views" : 999 }
{ "_id" : ObjectId("5ba2b5e92184ad6b87653a47"), "score" : 95, "views" : 1000 }
*/

Any combination of the above functions, but if you only exclude keys then all other are included and if you only include keys, then all other (except _id) are excluded.

We can exclude _id explicitly
db.articles.aggregate([
	{$project:{
		_id: 0,
		author: 1,
	}}
]);

/** Returns:
{ "author" : "dave" }
{ "author" : "dave" }
{ "author" : "ahn" }
{ "author" : "li" }
{ "author" : "annT" }
{ "author" : "li" }
{ "author" : "ty" }
*/

Rename a field

We can rename fields by providing a new key/structure and a path
db.articles.aggregate([
	{$project:{
		_id: 0,
		newAuthor: "$author",
		newObject: {
			canBe: {
				nested: {
					score: "$score"
				}
			}
		}
	}}
]);

/** Returns:
  {
      "newAuthor": "dave",
      "newObject": {
          "canBe": {
              "nested": {
                  "score": 80
              }
          }
      }
  }
  { "newAuthor" : "dave", "newObject" : { "canBe" : { "nested" : { "score" : 85 } } } }
  { "newAuthor" : "ahn", "newObject" : { "canBe" : { "nested" : { "score" : 60 } } } }
  { "newAuthor" : "li", "newObject" : { "canBe" : { "nested" : { "score" : 55 } } } }
  { "newAuthor" : "annT", "newObject" : { "canBe" : { "nested" : { "score" : 60 } } } }
  { "newAuthor" : "li", "newObject" : { "canBe" : { "nested" : { "score" : 94 } } } }
  { "newAuthor" : "ty", "newObject" : { "canBe" : { "nested" : { "score" : 95 } } } }
*/

Include literal values

We can insert literal values that every document will include
db.articles.aggregate([
	{$project: {
		_id: 0,
		foo: "bar",
		author: 1
	}}
]);

/** Returns:
{ "author" : "dave", "foo" : "bar" }
{ "author" : "dave", "foo" : "bar" }
{ "author" : "ahn", "foo" : "bar" }
{ "author" : "li", "foo" : "bar" }
{ "author" : "annT", "foo" : "bar" }
{ "author" : "li", "foo" : "bar" }
{ "author" : "ty", "foo" : "bar" }
*/

Perfom operators

We can do various math operations that are NOT aggregate, that is: they operate only using a single document at a time. Some of these operators include:

  • $add / $subtract

  • $divide / $multiply

  • $ceil / $floor

  • $exp / $pow / $sqrt

  • $ln / $log / $log10

  • $split / $concat

  • $dateToString / $dateFromString

  • and more

For example:

Add sore and views, then multiply by 0.8
db.articles.aggregate([
	{$project: {
		_id: 0,
		author: 1,
		value: {$multiply: [{$add: ["$score", "$views"]}, 0.8]}
	}}
]);

/** Returns:
{ "author" : "dave", "value" : 144 }
{ "author" : "dave", "value" : 484.8 }
{ "author" : "ahn", "value" : 848 }
{ "author" : "li", "value" : 4044 }
{ "author" : "annT", "value" : 88 }
{ "author" : "li", "value" : 874.4000000000001 }
{ "author" : "ty", "value" : 876 }
*/
Cleanup
db.articles.drop(); // Cleanup

$group

Group will bring multiple documents together into buckets indicated by a new _id field.

Sample data
db.inventory.insert([
   { "_id" : 1, "sku" : "almonds", description: "product 1", "instock" : 120 },
   { "_id" : 2, "sku" : "bread", description: "product 2", "instock" : 80 },
   { "_id" : 3, "sku" : "cashews", description: "product 3", "instock" : 60 },
   { "_id" : 4, "sku" : "pecans", description: "product 4", "instock" : 70 },
   { "_id" : 5, "sku": null, description: "Incomplete" },
   { "_id" : 6 }
])

Statistics

  • $avg

  • $max / $min

  • $stdDevPop / $stdDevSamp

  • $sum

Use min/max to find most/least in stock
db.inventory.aggregate([
	{$match: {
		sku: {$exists: true},
		instock: {$exists: true}
	}},
	{$group: {
		_id: null,
		minstock: {$min: "$instock"},
		maxstock: {$max: "$instock"}
	}}
]);

/** Results:
{ "_id" : null, "minstock" : 60, "maxstock" : 120 }
*/

Values

  • $addToSet

  • $push

  • $first / $last

Use push to group all inventory items
db.inventory.aggregate([
	{$match: {
		sku: {$exists: true},
		instock: {$exists: true}
	}},
	{$group: {
		_id: null,
		sku: {$push: "$sku"},
		instock: {$avg: "$instock"}
	}}
]);

/** Results:
{ "_id" : null, "sku" : [ "almonds", "bread", "cashews", "pecans" ], "instock" : 82.5 }
*/

$unwind

We must deal with arrays in Mongo using unwind. This is an expensive operation, so we generally want to $match out irrelevant documents first. Sometimes it is necessary to use $unwind in tandem with a group.

Average grades in an array per student
db.students.insertMany([
	{ _id: 0, name: "Ivan", grades: [0, 10, 50]},
	{ _id: 1, name: "Erica", grades: [98, 92, 100]},
	{ _id: 2, name: "Jordan", grades: [88, 80, 72]},
]);

db.students.aggregate([
	{$unwind: "$grades"},
	{$group: {
		_id: "$_id",
		name: {$first: "$name"},
		grades: {$avg: "$grades"}
	}},
	{$sort: { _id: 1 }}
]);

/* Results:
{ "_id" : 0, "name" : "Ivan", "grades" : 20 }
{ "_id" : 1, "name" : "Erica", "grades" : 96.66666666666667 }
{ "_id" : 2, "name" : "Jordan", "grades" : 80 }
*/

db.students.drop();

"Undoing" $unwind

You may regroup after doing some particular unwind using $group with a $push.

The identity function
db.students.insertMany([
	{ _id: 0, name: "Ivan", grades: [0, 10, 50]},
	{ _id: 1, name: "Erica", grades: [98, 92, 100]},
	{ _id: 2, name: "Jordan", grades: [88, 80, 72]},
]);

db.students.aggregate([
	{$unwind: "$grades"},
	{$group: {
		_id: "$_id",
		name: {$first: "$name"},
		grades: {$push: "$grades"}
	}},
	{$sort: { _id: 1 }}
]);

/* Results:
{ "_id" : 0, "name" : "Ivan", "grades" : [ 0, 10, 50 ] }
{ "_id" : 1, "name" : "Erica", "grades" : [ 98, 92, 100 ] }
{ "_id" : 2, "name" : "Jordan", "grades" : [ 88, 80, 72 ] }<Paste>
*/

db.students.drop();

$sort

Use $sort to order the results. This (along with limit/skip/count) should be one of the final operations. (example in next section)

$limit, $skip, $count

These control the output of the query. You should generally only use them for debugging purposes, but occasionally there are valid uses in real queries, such as the following.

db.zipcodes.insertMany([
	{
	  "_id": "10280",
	  "city": "NEW YORK",
	  "state": "NY",
	  "pop": 55740,
	  "loc": [
		-74.016323,
		40.710537
	  ]
	},
	{
	  "_id": "10285",
	  "city": "Albany",
	  "state": "NY",
	  "pop": 574,
	  "loc": [
		-74.016323,
		40.710537
	  ]
	},
	{
	  "_id": "10284",
	  "city": "BUFFALO",
	  "state": "NY",
	  "pop": 4412,
	  "loc": [
		-74.016323,
		40.710537
	  ]
	},
]);

db.zipcodes.aggregate( [
   { $group:
      {
        _id: { state: "$state", city: "$city" },
        pop: { $sum: "$pop" }
      }
   },
   { $sort: { pop: 1 } },
   { $group:
      {
        _id : "$_id.state",
        biggestCity:  { $last: "$_id.city" },
        biggestPop:   { $last: "$pop" },
        smallestCity: { $first: "$_id.city" },
        smallestPop:  { $first: "$pop" }
      }
   },

  // the following $project is optional, and
  // modifies the output format.

  { $project:
    { _id: 0,
      state: "$_id",
      biggestCity:  { name: "$biggestCity",  pop: "$biggestPop" },
      smallestCity: { name: "$smallestCity", pop: "$smallestPop" }
    }
  }
]);

/** Returns:
{
    "biggestCity": {
        "name": "NEW YORK",
        "pop": 55740
    },
    "smallestCity": {
        "name": "Albany",
        "pop": 574
    },
    "state": "NY"
}
*/

db.zipcodes.drop();

"Relational" additions since 3.4

There are some tools you may use to ease the pain of relational queries. We will not focus on these in this class, but they do exist.

$facet

Runs many queries from one pipeline.

{ $facet:
	{
		<outputField1>: [ <stage1>, <stage2>, ... ],
		<outputField2>: [ <stage1>, <stage2>, ... ],
		...
	}
}

This is useful if you would like to run more than one query for different views on the same data. Note that rejoining these fields back together would be somewhat expensive within the query.

$graphLookup

Recursive search. If you have one document that links to another document, that links to, …​, etc. Then you may use this to automate writing many queries to find the leaf documents.

From Mongo’s example:

db.employees.insertMany([{ "_id" : 1, "name" : "Dev" }
{ "_id" : 2, "name" : "Eliot", "reportsTo" : "Dev" }
{ "_id" : 3, "name" : "Ron", "reportsTo" : "Eliot" }
{ "_id" : 4, "name" : "Andrew", "reportsTo" : "Eliot" }
{ "_id" : 5, "name" : "Asya", "reportsTo" : "Ron" }
{ "_id" : 6, "name" : "Dan", "reportsTo" : "Andrew" }]);

db.employees.aggregate( [
   {
      $graphLookup: {
         from: "employees",
         startWith: "$reportsTo",
         connectFromField: "reportsTo",
         connectToField: "name",
         as: "reportingHierarchy"
      }
   }
] );

db.employees.drop(); // cleanup

$lookup

Perform a join (with restrictions). This essentially issues secondary queries on your behalf. You must have a non-sharded (non-distributed) collection, which is not necessarily likely in a production environment.

db.orders.insert([
   { "_id" : 1, "item" : "almonds", "price" : 12, "quantity" : 2 },
   { "_id" : 2, "item" : "pecans", "price" : 20, "quantity" : 1 },
   { "_id" : 3  }
])

db.inventory.insert([
   { "_id" : 1, "sku" : "almonds", description: "product 1", "instock" : 120 },
   { "_id" : 2, "sku" : "bread", description: "product 2", "instock" : 80 },
   { "_id" : 3, "sku" : "cashews", description: "product 3", "instock" : 60 },
   { "_id" : 4, "sku" : "pecans", description: "product 4", "instock" : 70 },
   { "_id" : 5, "sku": null, description: "Incomplete" },
   { "_id" : 6 }
])

db.orders.aggregate([
   {
     $lookup:
       {
         from: "inventory",
         localField: "item",
         foreignField: "sku",
         as: "inventory_docs"
       }
  }
])

/** Returns:

{
   "_id" : 1,
   "item" : "almonds",
   "price" : 12,
   "quantity" : 2,
   "inventory_docs" : [
      { "_id" : 1, "sku" : "almonds", "description" : "product 1", "instock" : 120 }
   ]
}
{
   "_id" : 2,
   "item" : "pecans",
   "price" : 20,
   "quantity" : 1,
   "inventory_docs" : [
      { "_id" : 4, "sku" : "pecans", "description" : "product 4", "instock" : 70 }
   ]
}
{
   "_id" : 3,
   "inventory_docs" : [
      { "_id" : 5, "sku" : null, "description" : "Incomplete" },
      { "_id" : 6 }
   ]
}

*/

db.orders.drop(); // cleanup