MongoDB Query Optimization

MongoDB query performance depends on indexes defined on a collection and how those indexes are employed within the query. MongoDB offers different types of indexes. Precisely what indexes to define and how to use them depends very much on the application. This article introduces indexes and shares some guidelines on their usage.

MongoDB also provides an inspection system to obtain statistics on query execution and what happens under the hood. This information helps a developer figure out ways to optimize the query.

Discussion

  • How can I get insights into MongoDB query performance?
    Explain Plan in MongoDB Compass. Source: MongoDB Docs 2021a.
    Explain Plan in MongoDB Compass. Source: MongoDB Docs 2021a.

    On the MongoDB shell, methods cursor.explain("executionStats") and db.collection.explain("executionStats") give insights into the performance of a query. For example, we append explain() to a query thus, db.inventory.find( { quantity: { $gte: 100, $lte: 200 } } ).explain("executionStats"). MongoDB Compass also has Explain Plan tab to display the same statistics graphically.

    Explainable classes can be obtained by running db.collection.explain().help(). The explain method can be called with verbosity modes queryPlanner (default), executionStats, or allPlansExecution. In all modes, for write operations, we get information about the write without executing the write.

    The query plan is presented in the explain results as a tree of stages. These stages can be COLLSCAN (collection scan), IXSCAN (index scan), FETCH (retrieving documents), SHARD_MERGE and SHARDING_FILTER. Explain can be run on either unsharded or sharded collections.

  • How do I interpret the results of the explain methods?
    Explain's queryPlanner output showing winning plan based on indexed field 'quantity'. Source: MongoDB Docs 2021c.
    Explain's queryPlanner output showing winning plan based on indexed field 'quantity'. Source: MongoDB Docs 2021c.

    Among the important explain results are:

    • executionStats.nReturned: Number of documents returned.
    • executionStats.executionTimeMillis: Total time in milliseconds for query plan selection and execution.
    • executionStats.totalKeysExamined: Number of index entries scanned.
    • executionStats.totalDocsExamined: Number of documents examined, by COLLSCAN and FETCH stages.
    • executionStats.executionStages: Details of execution of the winning plan as a tree of stages. Where indexes are used (IXSCAN), result includes index key pattern, direction of traversal, and index bounds.

    As an example, if totalKeysExamined is zero, it implies that indexes are not used. If totalDocsExamined is much higher than nReturned, this means that too many documents were scanned when only a few were returned. Both these suggest better performance can be obtained with the use of indexes. Making these changes would give an efficient winning plan with root stage FETCH and an input stage IXSCAN.

  • How can indexes improve query performance?

    Indexes improve performance by limiting the number of documents that need to be scanned. Indexes are usually in RAM or located sequentially on disk. Though indexes take up space, this becomes a worthwhile trade-off if queries use them often. Indeed, designing the right indexes requires a deep understanding of the application's queries and how often they're called.

    For example, if an inventory collection is queried on field type, create an index on that field: db.inventory.createIndex({ type: 1 }). This is an example of a covered query. A covered query is one that can be satisfied by indexes alone without needing to examine the documents. More specifically, all fields in the query are part of an index, all returned fields are in the same index, and no fields in the query are equal to null.

    Two limitations are worth noting. Geospatial indexes cannot cover a query. Multikey indexes cannot cover queries over array fields.

    Usage of indexes can be obtained via the $indexStats aggregation stage or in the outputs of serverStatus, collStats and dbStats.

  • What are the different types of indexes available in MongoDB?
    Illustrating the use of indexes in MongoDB. Source: Adapted from MongoDB Docs.
    Illustrating the use of indexes in MongoDB. Source: Adapted from MongoDB Docs.

    MongoDB supports the following index types:

    • Single-Field Index: Based on a single field of a document, such as, {score: 1}.
    • Compound Index: Based on multiple fields, such as, {userid: 1, score: -1}, with 1 and -1 referring to ascending and descending sort order. The starting keys (userid) are called index prefix keys.
    • Multikey Index: Indexes the content stored in arrays. When you index an array field, MongoDB creates a separate index for every array element. Index {"addr.zip": 1} is an example of indexing a nested field.
    • Geospatial Index: Supports efficient querying of geospatial coordinate data, either 2D or 3D geometry.
    • Text Index: For searching text data. MongoDB does stemming and ignores stop words.
    • Hashed Index: Indexing hash of a field value leading to more even data distribution across shards. Supports equality matches but not range-based queries.

    Indexes have useful properties: unique, partial indexing, sparse (don't index documents without the index field), TTL (documents that expire), hidden (not seen by query planner). Unique index on _id is created by default.

  • How should I use indexes to improve performance?
    Use of compound indexes improves performance in this example. Source: Adapted from Eswaran 2021.
    Use of compound indexes improves performance in this example. Source: Adapted from Eswaran 2021.

    Sometimes matching on multiple fields may be inefficient. Instead, use compound indexes.

    When a query doesn't use index prefix keys or uses a different sort order, compound indexes won't work. Instead use index intersection. Index intersection is when MongoDB uses two separate indexes or their prefixes to improve query performance. However, index intersection doesn't apply when the sorting uses an index completely separate from the query predicate.

    In addition to using indexes for matching, use indexes for sorting and lookups as well.

    While the query optimizer often selects the optimal index, we can also use the hint() method to force the use of a specific index.

    Workload performance can suffer during index builds. For this reason, rolling index builds are preferred. Starting from secondary members, rolling builds affect one replica set member at a time. Deleting many documents in a short time may also affect reading performance due to index rebuilds. Use rolling builds or mark document for deletion and delete them later with a background scheduler.

  • How can I improve the performance of my aggregation pipeline?
    MongoDB's database engine reorders stages for better overall performance. Source: Done 2021, sec. 1.1.
    MongoDB's database engine reorders stages for better overall performance. Source: Done 2021, sec. 1.1.

    MongoDB's database engine can optimize performance of an aggregation pipeline at runtime. It can reorder the stages if it sees that this will improve performance. For example, sorting an entire collection and then matching is inefficient. It's more efficient to match first and then sort so that fewer documents need to be sorted. Matching itself is run in parallel on multiple shards.

    Sometimes, the engine needs help from the developer to optimize performance. Assume a pipeline contains a stage that matches on a computed field that's not indexed. The engine can't optimize by moving the matching stage before the computation. Computation is done on the entire collection. An easy solution is to rewrite the matching stage to use an already existing and indexed field. Even if computation happens earlier, the engine can move the matching to precede the computation.

  • What are query plans in MongoDB?

    For any given query, actual query execution can have a few possible ways. Each of these is called a query plan. Based on indexes, the MongoDB query optimizer chooses the most efficient query plan. For each candidate plan, the optimizer looks at the number of work units that need to be performed. More indexes imply more plans to inspect and query selection can take more time.

    If a plan is selected, it's also added to the plan cache, which contains the query shape, works cost and a state. A query shape consists of a combination of query, sort, and projection specifications. Cache state can be Missing, Inactive or Active. Query plans for subsequent queries of the same shape are read from the cache if active and cost meets the selection criterion. The optimizer considers only those indexes specified in the filter of a given query shape.

    If mongod restarts or shuts down, cache doesn't persist. Index or collection drops clear the cache. Cache can also be cleared manually. If cache is lower than 0.5GB for all collections, complete debug information is stored. Otherwise, additional cache entries are stored with minimal information.

  • Could you share some tips towards optimizing MongoDB queries?
    Rockset enables fast joins and aggregations at scale. Source: Liu 2020.
    Rockset enables fast joins and aggregations at scale. Source: Liu 2020.

    Use a highly selective query, that is, a query that matches a small percentage of the collection's documents. Less selective queries can't use indexes effectively. For example, $nin or $ne operators are not very selective. Given an employees collection, the query {name: "John", sex: "male"} with indexed name is likely to be more selective than the query {sex: "male", name: "John"} with indexed sex.

    Use projection to obtain only those fields required for further processing.

    To avoid application-level joins, denormalize data. For example, instead of having two collections Employees and Companies, and then linking them, the latter can be included as part of employee documents. If this is not possible, use $lookup instead.

    Third-party services such as Rockset does real-time indexing to enable fast joins. Idealo's MongoDB profiler tool can also be useful.

    By default db.collection.update() updates a single document. Use multi: true to update all matching documents efficiently.

  • Apart from query optimization, how do I tune the performance of MongoDB?

    Analyze lock metrics. Long lock times might suggest problems with schema design, query structure, or system architecture. Look at WiredTiger's cache usage and see if it needs to be increased. For read-heavy applications, increase the size of replica sets. For write-heavy applications, do sharding. Monitor replication state and lag.

    In the cloud, MongoDB provides a monitoring tool that gives many useful metrics: execution times, disk utilization, memory usage, network I/O, op-counters, query targeting, queues, and CPU usage. Database operations can also be logged for later analysis.

    When doing high-speed writes, enable journaling. In case of a system crash during a write, journaling ensures that data is in a consistent state.

    Try to keep documents to a few kilobytes. Avoid large array fields. A large and growing array may cause the document to be relocated on disk, leading to index updates. Alternatively, large fields when not used in queries, could be compressed and stored.

    Batching improves resource utilization. For example, Kafka events can be consumed in a batch of 100 rather than processed one by one.

Milestones

Feb
2009

MongoDB 1.0 is released. By August, this version becomes generally available for production environments.

Mar
2015

MongoDB 3.0 is released. This includes a new query introspection system towards query planning and execution. The formatting and fields of explain results are improved. Starting in this version, with the exception of _id index, an index can't cover a query on a sharded collection if the index doesn't contain the shard key.

Nov
2017

MongoDB 3.6 is released. An index can cover a query on fields within embedded documents. Multikey indexes can cover queries over non-array keys if the index tracks which fields make it multikey.

Aug
2019

MongoDB 4.2 is released. Plan cache entry is associated with a state: Missing, Inactive or Active. Aggregation stage $planCacheStats is available to view cache information. To help identify slow queries, query plans also now have associated hash strings queryHash and planCacheKey. The former is dependent on only the query shape while the latter depends on query shape and indexes. An aggregation pipeline with $out stage can't contain db.collection.explain() in executionStats or allPlansExecution modes.

Jul
2021

MongoDB 5.0 is released. This puts a size limit of 0.5GB for the plan cache, which is also backported to updates of some earlier versions.

References

  1. Done, Paul. 2021. "Practical MongoDB Aggregations." v3.00, MongoDB, Inc, October. Accessed 2021-10-10.
  2. Dunham, Casey. 2018. "MongoDB Performance Tuning: Everything You Need to Know." September 13. Accessed 2021-11-08.
  3. Eswaran, Sankarganesh. 2021. "Advanced MongoDB Performance Tuning." Idealo Tech Blog, on Medium, June 14. Accessed 2021-11-08.
  4. Factor, Phil. 2018. "14 Things I Wish I’d Known When Starting with MongoDB." InfoQ, September 13. Accessed 2021-11-08.
  5. Kumar, Surendra, Balgopal Sharma, and Sankalp Kataria. 2019. "MongoDB Query Performance – Optimization Techniques." Habilelabs, November 21. Updated 2019-12-16. Accessed 2021-11-08.
  6. Liu, Justin. 2020. "Handling Slow Queries in MongoDB - Part 2: Solutions." Blog, Rockset, August 25. Accessed 2021-11-08.
  7. López, Mauricio Francisco. 2017. "Query Read Optimization in MongoDB." Major League, on Medium, February 14. Accessed 2021-11-08.
  8. MongoDB. 2009. "1.0 GA Released." Blog, MongoDB, August 27. Accessed 2021-11-09.
  9. MongoDB Docs. 2015. "Release Notes for MongoDB 3.0." March 3. Accessed 2021-11-09.
  10. MongoDB Docs. 2017. "Release Notes for MongoDB 3.6." November. Accessed 2021-11-09.
  11. MongoDB Docs. 2019. "Release Notes for MongoDB 4.2." August. Accessed 2021-11-09.
  12. MongoDB Docs. 2021a. "View Query Performance." MongoDB Compass. Accessed 2021-11-08.
  13. MongoDB Docs. 2021b. "Query Plans." Documentation, MongoDB 5.0. Accessed 2021-11-08.
  14. MongoDB Docs. 2021c. "Analyze Query Performance." Documentation, MongoDB 5.0. Accessed 2021-11-08.
  15. MongoDB Docs. 2021d. "Query Optimization." Documentation, MongoDB 5.0. Accessed 2021-11-08.
  16. MongoDB Docs. 2021e. "Rolling Index Builds on Replica Sets." Documentation, MongoDB 5.0. Accessed 2021-11-08.
  17. MongoDB Docs. 2021f. "Release Notes for MongoDB 5.0." July 13. Accessed 2021-11-09.
  18. MongoDB Docs. 2021g. "db.collection.explain()." Documentation, MongoDB 5.0. Accessed 2021-11-09.
  19. MongoDB Docs. 2021h. "Explain Results." Documentation, MongoDB 5.0. Accessed 2021-11-09.
  20. MongoDB Docs. 2021i. "Indexing Strategies." Documentation, MongoDB 5.0. Accessed 2021-11-09.
  21. MongoDB Docs. 2021j. "Indexes." Documentation, MongoDB 5.0. Accessed 2021-11-09.
  22. MongoDB Docs. 2021k. "Measure Index Use." Documentation, MongoDB 5.0. Accessed 2021-11-09.
  23. MongoDB Docs. 2021l. "Compound Indexes." Documentation, MongoDB 5.0. Accessed 2021-11-09.
  24. MongoDB Docs. 2021m. "Index Intersection." Documentation, MongoDB 5.0. Accessed 2021-11-09.
  25. Oles, Bart. 2018. "A Performance Cheat Sheet for MongoDB." Blog, Severalnines, July 3. Accessed 2021-11-08.
  26. Pour, Hamoon Mohammadian. 2018. "Query Optimization in MongoDB." SlideShare, February 19. Accessed 2021-11-08.

Further Reading

  1. López, Mauricio Francisco. 2017. "Query Read Optimization in MongoDB." Major League, on Medium, February 14. Accessed 2021-11-08.
  2. Pour, Hamoon Mohammadian. 2018. "Query Optimization in MongoDB." SlideShare, February 19. Accessed 2021-11-08.
  3. Loon, Tek. 2019. "MongoDB Performance 101: How To Improve the Speed of MongoDB App." Better Programming, on Medium, October 3. Accessed 2021-11-08.
  4. Kumar, Surendra, Balgopal Sharma, and Sankalp Kataria. 2019. "MongoDB Query Performance – Optimization Techniques." Habilelabs, November 21. Updated 2019-12-16. Accessed 2021-11-08.
  5. Hamza, Ameer. 2021. "How to do Query Optimization in MongoDB." Blog, Around25, January 27. Accessed 2021-11-08.
  6. Keep, Mat and Henrik Ingo. 2020. "Performance Best Practices: Transactions and Read / Write Concerns." Blog, MongoDB, February 25. Accessed 2021-11-08.

Article Stats

Author-wise Stats for Article Edits

Author
No. of Edits
No. of Chats
DevCoins
3
0
1238
1867
Words
0
Likes
11
Hits

Cite As

Devopedia. 2021. "MongoDB Query Optimization." Version 3, November 10. Accessed 2021-11-10. https://devopedia.org/mongodb-query-optimization
Contributed by
1 author


Last updated on
2021-11-10 14:42:57