MongoDB Aggregation

An example aggregation pipeline of two stages. Source: Develop Paper 2019.
An example aggregation pipeline of two stages. Source: Develop Paper 2019.

Data can be processed in the application layer or in the database layer. Aggregation framework or pipeline is the manner in which MongoDB does data processing in the database layer. MongoDB's aggregation framework can do any kind of processing. Therefore it's really a Turing complete functional language.

Often just retrieving data is inadequate. We wish to perform some computations on the data or report on live data. Sometimes we don't want to copy data and process it elsewhere. These are good reasons for using MongoDB's aggregations.

While being flexible and intuitive for developers, MongoDB's aggregation is not as performant as some other databases or tools.

Discussion

  • What are the ways to do aggregation in MongoDB?

    There are three ways to do aggregation in MongoDB:

    • Aggregation Pipeline: We can think of data as moving along a multi-stage pipeline. Each stage does specific processing on the data. Among the basic stages are filters that obtain a subset of matching documents and transforms that reshape data into a suitable output form. Other stages do sorting, grouping, string concatenation, array processing, and so on. The aggregation pipeline can operate on a sharded collection. Some stages can use indexes for better performance.
    • Map-Reduce Function: This legacy approach was deprecated in MongoDB 5.0. Aggregation pipeline is now the preferred approach since it offers better performance and usability.
    • Single Purpose Aggregation Methods: From the Collection class, the methods estimatedDocumentCount(), count() and distinct() aggregate from a single collection. While handy, these lack the flexibility and capability of the aggregation pipeline.
  • Could you describe some aggregation pipeline operators or stages?

    An aggregation pipeline is constructed with the aggregate() method of the Collection class and takes an array of stages: db.collection.aggregate( [ { <stage> }, ... ] ). Except for $out, $merge and $geoNear, other stages can appear multiple times in a pipeline.

    We mention a few stages:

    • $addFields: Add new fields to documents. Operator $set is an alias.
    • $count: Count number of documents at this stage.
    • $group: Takes an identifier expression for grouping input documents. If specified, applies an accumulator expression for each group. Outputs one document per distinct group.
    • $limit: Passes only the first n documents to the next stage.
    • $lookup: Does a left outer join to another collection in the same database.
    • $match: Filters documents based on standard MongoDB queries.
    • $merge: Writes the output to a collection. Must be the last stage of the pipeline.
    • $project: Reshapes documents by adding or removing fields. Operator $unset is an alias for removing fields.
    • $sample: Randomly selects n documents.
    • $skip: Skips first n documents.
    • $sort: Reorders document stream by a specified sort key.
    • $unwind: Given an array field, outputs one document per array element.
  • What are expression operators in MongoDB aggregation?
    Averaging as accumulator and non-accumulator. Source: Adapted from MongoDB Docs 2021h.
    Averaging as accumulator and non-accumulator. Source: Adapted from MongoDB Docs 2021h.

    An expression operator has a name and takes either an array of arguments or a single argument. MongoDB has plenty of such operators. By becoming familiar with these, developers can simplify their application code by doing most of the processing in MongoDB.

    There are many categories of expression operators: arithmetic, Boolean, comparison, conditional, data size, date, literal, object, set, string, text, trigonometry, type, etc. Custom expression operators include $accumulator and $function. Developers can write custom JavaScript functions (MongoDB 4.4) for these.

    Accumulators are special expression operators that compute totals, maxima, minima and other values. They maintain their state as documents pass through the pipeline. These can be used only in some stages: $bucket, $bucketAuto, $group and $setWindowFields. However, some accumulators can be used in other stages but not as accumulators, that is, they don't maintain state. An example of this is shown in the figure: (a) averaging happens across documents within a group; (b) averaging happens over values within a document.

  • How do I make use of variables in MongoDB aggregation?
    Illustrating the use of variables with $let operator. Source: Adapted from MongoDB Docs 2021i.
    Illustrating the use of variables with $let operator. Source: Adapted from MongoDB Docs 2021i.

    Let's consider a sales collection with numeric fields price and tax, and Boolean field applyDiscount. We can use aggregation to compute the total for each sale. This is where the $let operator becomes useful.

    The $let operator has two parts: vars block where the variables are assigned and in expression where the variables are used for computation. The vars block can access variables defined externally, including system variables. If externally defined variables are modified in vars, the changed values are seen only within the inexpression. Externally, old values are retained.

    System variables include NOW, CLUSTER_TIME, ROOT, CURRENT, REMOVE, DESCEND, PRUNE and KEEP.

  • What's the Aggregation Pipeline Builder?
    A sample pipeline of three stages in MongoDB Compass. Source: Melnik 2018.
    A sample pipeline of three stages in MongoDB Compass. Source: Melnik 2018.

    MongoDB Compass is a GUI for MongoDB. Aggregation Pipeline Builder is one of the tools within Compass. Developers can add/remove stages to the pipeline graphically. In Auto Preview mode, matching documents are automatically previewed. By default, first 20 documents are previewed. If Sample Mode is enabled, a default limit of 100,000 is applied. This affects $group, $bucket, and $bucketAuto stages.

    When we select a pipeline operators, Compass gives the full syntax. We need to fill in the placeholders with relevant fields and expressions. If we make a mistake, useful suggestions are given to correct the same. This can be a useful feature for beginners.

    It's possible to save the pipeline and reload it later within Compass. With the export feature, the pipeline can be exported in JSON or in a format compatible with a language driver. Likewise, it's possible to import into Compass a pipeline written in MongoDB Query Language.

  • How's the performance of MongoDB's aggregations?
    MongoDB aggregation performance compared against other databases. Source: Weinberger 2018.
    MongoDB aggregation performance compared against other databases. Source: Weinberger 2018.

    Compared to the legacy map-reduce, aggregation framework was 6x faster in a test done back in 2015. Unless you're stuck with legacy code, it makes sense to migrate to the aggregation framework.

    Comparing MySQL 8.0 and MongoDB 4.0.3, it was seen that MongoDB is typically faster on more complex queries. It's faster from disk when there are no indexes, whereas MySQL is faster from RAM. BI Connector is slower for simple queries and not as fast as hand-crafted aggregation.

    In a separate test, MongoDB's insertMany() and updateMany() were faster compared to their MySQL equivalents. However, a simple find() query was slower than MySQL's SELECT.

    On large collections of millions of documents, MongoDB's aggregation was shown to be much worse than Elasticsearch. Performance worsens with collection size when MongoDB starts using the disk due to limited system RAM.

    The $lookup stage used without indexes can be very slow.

    While there are many powerful external compute engines (Spark, Hadoop, R, Python, Java, C), data transfer may become the bottleneck due to cost, network bandwidth, hardware limitations and security considerations. Therefore, use MongoDB aggregations when you can.

  • Could you share some useful tips when building MongoDB aggregation pipelines?
    Blocking stages must be designed carefully to mitigate poor pipeline performance. Source: Done 2021, sec. 2.4.
    Blocking stages must be designed carefully to mitigate poor pipeline performance. Source: Done 2021, sec. 2.4.

    The Aggregation Pipeline Quick Reference is a useful reference for developers.

    When complex processing is required, aggregation pipelines help us break down the problem into smaller parts. We can think of each stage as solving a subproblem before handing over the intermediate result to the next stage. Pipelines also aid debugging or prototyping because we can comment out some stages.

    The $project stage can be verbose and non-intuitive. Instead use $set and $unset. This makes the pipeline more maintainable when new fields are added to documents later.

    While MongoDB automatically does optimization, use db.coll.explain() to see if further optimizations are possible.

    Stages $sort and $group are blocking, that is, they can't send their output to the next stage until all input documents are processed. Blocking stages reduce concurrency and consume more memory. Some mitigation strategies include sorting on index, using $limit with sorting, and sorting later on a smaller subset. Group summary data only. Avoid unnecessary grouping. Use array operators instead of unwinding and regrouping. Move match filters to earlier in the pipeline.

  • How do aggregation operators map to concepts in SQL?
    MongoDB aggregations and their SQL counterparts. Source: Adapted from MongoDB Docs 2021c.
    MongoDB aggregations and their SQL counterparts. Source: Adapted from MongoDB Docs 2021c.

    The SQL to Aggregation Mapping Chart is a useful resource. From here, we note the following SQL-MongoDB equivalence:

    • Filtering: SQL uses WHERE and HAVING while MongDB uses the $match aggregation operator.
    • Grouping: SQL uses GROUP BY while MongDB uses the $group aggregation operator.
    • Selecting: SQL uses SELECT while MongDB uses the $project aggregation operator.
    • Sorting: SQL uses ORDER BY while MongDB uses the $sort aggregation operator.
    • Limiting: SQL uses LIMIT while MongDB uses the $limit aggregation operator.
    • Joining: Joining in MongoDB is a left outer join performed using the $lookup aggregation operator.
    • Summing/Counting: SQL's SUM() and COUNT() are implemented in MongoDB with the $sum aggregation operator.
    • Combining: SQL uses UNION ALL while MongDB uses the $unionWith aggregation operator.

    The figure gives some examples. The orders collection includes an array of items with each item containing fields sku, qty and price.

Milestones

Feb
2009

MongoDB 1.0 is released. By August, this version becomes generally available for production environments.

Dec
2009

MongoDB 1.2 is released with support for aggregations. This is implemented in Node.js via a Map-Reduce API. This is slow and queries are non-intuitive for developers.

Aug
2012

MongoDB 2.2 is released. This introduces the Aggregation Framework. This is exposed via the aggregate command and the aggregate() helper in the shell. Compared to earlier implementation of aggregations, this is far more intuitive, powerful, efficient and scalable. It soon becomes the go-to tool for developers. However, MongoDB continues to support Map-Reduce for legacy reasons.

Dec
2015

MongoDB 3.2 is released. New stages include $sample, $indexStats and $lookup (left outer join). Many new aggregation arithmetic operators and aggregation array operators are introduced. Results of aggregations can be routed to any shard for merging, thus avoiding an overload on primary shard.

Nov
2016

MongoDB 3.4 is released. New stages include $graphLookup, $bucket, $bucketInfo, $facet, $sortByCount, $addFields, $replaceRoot and $count. Many aggregation array/string/date operators and $switch aggregation control flow expression are introduced.

Nov
2017

MongoDB 3.6 is released. New stages include $currentOp, $listSessions and $listLocalSessions. New aggregation operators include $arrayToObject, $objectToArray, $mergeObjects, $dateFromString, $dateFromParts and $dateToParts. For date operators, time zones are supported. Aggregation variable REMOVE is added.

Jun
2018

MongoDB 4.0 is released. New aggregation operators for type conversion and trimming strings are added. MongoDB Compass 1.14 is released with Aggregation Pipeline Builder. In Compass 1.15 (Aug 2018), import/export feature is added to the builder. In Compass 1.16 (Nov 2018), collation support is added. In Compass 1.19 (Aug 2019), views can be created from pipeline results. Builder comes with new settings of sample size, number of documents to preview, and a maximum timeout. In Compass 1.20 (Dec 2019), $set, $unset and $replaceWith pipeline operators are added. In Compass 1.26.1 (Apr 2021), functions are allowed in pipelines.

Aug
2019

MongoDB 4.2 is released. Stage $merge is added. This helps create on-demand materialized views. Other new stages include $planCacheStats, $replaceWith, $set and $unset. Trigonometry expressions are added. New regex expressions include $regexFind, $regexFindAll, and $regexMatch. In earlier versions, only $regex query operator could be used in $match stage. New variables include NOW and CLUSTER_TIME. Aggregation pipeline can be used findAndModify and update commands.

Jul
2020

MongoDB 4.4 is released. Stage $unionWith can combine pipeline results from multiple collections. Operators $accumulator and $function allow custom aggregation expressions instead of using mapReduce and $where. Other new operators include $binarySize, $bsonSize, $first (first array item), $last, $isNumber, $replaceOne and $replaceAll. From this version, $merge can output to the same collection that's being aggregated; $out can output to a collection in a different database.

Jul
2021

MongoDB 5.0 is released. Pipeline stage $setWindowFields is added. Some date aggregation operators are added. Aggregation operators $rand and $sampleRate are added. For better performance, comparison operators with $expr use indexes. Multiple input expressions are possible for $ifNull. For better readability, let can be used to define and use variables within the pipeline. Concise correlated subqueries are supported for $lookup.

References

  1. Develop Paper. 2019. "Mongodb series — deep understanding of mongodb aggregation." Develop Paper, November 30. Accessed 2021-10-16.
  2. Done, Paul. 2021. "Practical MongoDB Aggregations." v3.00, MongoDB, Inc, October. Accessed 2021-10-10.
  3. Henry, Onyancha Brian. 2019. "MongoDB vs MySQL NoSQL - Why Mongo is Better." Blog, Severalnines, February 28. Accessed 2021-10-16.
  4. Kukic, Ado. 2020. "How to Use Custom Aggregation Expressions in MongoDB 4.4." How To, MongoDB, July 28. Accessed 2021-10-16.
  5. Marturana, Luca. 2015. "MongoDB showdown: Aggregate vs map-reduce." Blog, Sysdig, March 12. Accessed 2021-10-16.
  6. Maréchal, Le. 2015. "MongoDB vs. Elasticsearch: The Quest of the Holy Performances." Blog, Quickslab, March 23. Accessed 2021-10-16.
  7. Mastering JS Weekly. 2020. "Why You (Probably) Shouldn't Use the MongoDB Aggregation Framework." Issue#37, Mastering JS Weekly, July 16. Accessed 2021-10-16.
  8. Melnik, Grigori. 2018. "Introducing the Aggregation Pipeline Builder in MongoDB Compass." Blog, MongoDB, May 30. Updated 2019-03-11. Accessed 2021-10-17.
  9. MongoDB. 2009. "1.0 GA Released." Blog, MongoDB, August 27. Accessed 2021-10-16.
  10. MongoDB Docs. 2012. "Release Notes for MongoDB 2.2." August. Accessed 2021-10-16.
  11. MongoDB Docs. 2015. "Release Notes for MongoDB 3.2." December 8. Accessed 2021-10-16.
  12. MongoDB Docs. 2016. "Release Notes for MongoDB 3.4." November 29. Accessed 2021-10-16.
  13. MongoDB Docs. 2017. "Release Notes for MongoDB 3.6." November. Accessed 2021-10-16.
  14. MongoDB Docs. 2018. "Release Notes for MongoDB 4.0." August. Accessed 2021-10-16.
  15. MongoDB Docs. 2019. "Release Notes for MongoDB 4.2." August. Accessed 2021-10-16.
  16. MongoDB Docs. 2020. "Release Notes for MongoDB 4.4." June. Accessed 2021-10-16.
  17. MongoDB Docs. 2021a. "Aggregation." Documentation, MongoDB 5.0. Accessed 2021-10-16.
  18. MongoDB Docs. 2021b. "Aggregation Pipeline Quick Reference." Documentation, MongoDB 5.0. Accessed 2021-10-16.
  19. MongoDB Docs. 2021c. "SQL to Aggregation Mapping Chart." Documentation, MongoDB 5.0. Accessed 2021-10-16.
  20. MongoDB Docs. 2021d. "Aggregation Pipeline Builder." Documentation, MongoDB Compass 1.28.4. Accessed 2021-10-17.
  21. MongoDB Docs. 2021e. "Import Pipeline from Text." Documentation, MongoDB Compass 1.28.4. Accessed 2021-10-17.
  22. MongoDB Docs. 2021f. "MongoDB Compass: Release Notes." MongoDB Compass 1.28.4, August 30. Accessed 2021-10-17.
  23. MongoDB Docs. 2021g. "SQL to Aggregation Mapping Chart." Documentation, MongoDB 5.0. Accessed 2021-10-17.
  24. MongoDB Docs. 2021h. "$avg (aggregation)." Documentation, MongoDB 5.0. Accessed 2021-11-04.
  25. MongoDB Docs. 2021i. "$let (aggregation)." Documentation, MongoDB 5.0. Accessed 2021-11-04.
  26. MongoDB Docs. 2021j. "Release Notes for MongoDB 5.0." July 13. Accessed 2021-10-16.
  27. Page, John. 2018. "MongoDB Aggregation Performance." Slides, MongoDB Europe'18, via SlideShare, November 13. Accessed 2021-10-16.
  28. Weinberger, Claudius. 2018. "NoSQL Performance Benchmark 2018 – MongoDB, PostgreSQL, OrientDB, Neo4j and ArangoDB." ArangoDB, February 14. Accessed 2021-10-16.

Further Reading

  1. Paul, Rohan. 2019. "Aggregation in Mongodb." Medium, April 11. Accessed 2021-10-16.
  2. Develop Paper. 2019. "Mongodb series — deep understanding of mongodb aggregation." Develop Paper, November 30. Accessed 2021-10-16.
  3. MongoDB Docs. 2021a. "Aggregation." Documentation, MongoDB 5.0. Accessed 2021-10-16.
  4. MongoDB Docs. 2021b. "Aggregation Pipeline Quick Reference." Documentation, MongoDB 5.0. Accessed 2021-10-16.
  5. Done, Paul. 2021. "Practical MongoDB Aggregations." v3.00, MongoDB, Inc, October. Accessed 2021-10-10.
  6. Maréchal, Le. 2015. "MongoDB vs. Elasticsearch: The Quest of the Holy Performances." Blog, Quickslab, March 23. Accessed 2021-10-16.

Article Stats

Author-wise Stats for Article Edits

Author
No. of Edits
No. of Chats
DevCoins
2
0
1302
1991
Words
0
Likes
13
Hits

Cite As

Devopedia. 2021. "MongoDB Aggregation." Version 2, November 4. Accessed 2021-11-04. https://devopedia.org/mongodb-aggregation
Contributed by
1 author


Last updated on
2021-11-04 15:50:44
  • MongoDB View
  • MongoDB Query Language
  • MongoDB Query Optimization
  • Data Modelling with MongoDB
  • Data Pipeline
  • Data-Oriented Programming