Data Modelling with MongoDB
- Summary
-
Discussion
- Given that MongoDB is schema-less, why do we need data modelling?
- What's the difference between embedded data models and normalized data models?
- How do I define 1-to-1, 1-to-n and n-to-n relationships in MongoDB?
- Could you share essential tips towards MongoDB schema design?
- What schema design patterns are available in MongoDB?
- Could you share an example showing the use of MongoDB schema design patterns?
- What are some schema design anti-patterns in MongoDB?
- Milestones
- References
- Further Reading
- Article Stats
- Cite As
Though MongoDB is schema-less, there's an implied structure and hence a data model.
When designing a data model, developers should ask what data needs to be stored, what data is likely to be accessed together, how often will a piece of data be accessed, and how fast will data grow and will it be unbounded. Answers to these questions will lead to a data model that's right for each application.
In designing the data model, there's no format process, algorithms or rules. Based on years of experience, MongoDB practitioners have come up with a set of design patterns and how they fit into common use cases. These are only guidelines. Ultimately, the developer has to analyze the application and see what fits best.
Discussion
-
Given that MongoDB is schema-less, why do we need data modelling? By being schema-less, we can easily change the structure in which data is stored. Documents in a collection need not conform to a single rigid structure. The trade-off is that by not having a schema and not validating against that schema, software bugs can creep in. When data is stored in many different ways, application complexity increases. Schema is self-documenting and leads to cleaner code. With sophisticated validations, application code can become simpler.
While MongoDB is schema-less, in reality, data is seldom completely unstructured. Most data has some implied structure, though MongoDB may not validate that structure. Even when data evolves over time, there's some base structure that stays constant. Starting from MongoDB 3.2, document validation is possible.
Here are some specific areas where schema helps:
- When matching on nested document fields, the order of fields matters.
- Data with a complicated structure is easier to understand and process given the schema.
- When using an Object Document Manager (ODM), the ODM can benefit from the schema.
- Frequent changes to the document structure without a schema can cause performance issues.
-
What's the difference between embedded data models and normalized data models? While MongoDB is document-oriented, it stills allows for relations among collections. This is supported by the
$lookup
and$graphLookup
pipeline operators. When designing a data model, we have to therefore decide if it makes sense to embed information within a document or to keep it in a separate collection.Embedded data model is applicable when there's a "contains" relationship. Reads are performant when there's a need to access nested documents along with the main document. Document can also be updated in a single atomic operation. To model one-to-many relationship, an array of documents can be nested. However, MongoDB limits a document size to 16MB and at most 100 levels of nesting.
Normalized data model uses object references to model relationships between documents. Such a model avoids data duplication: a document can be referenced by many other documents without duplicating its content. Complex many-to-many relationships are easily modelled. Likewise, large hierarchical datasets can be modelled. Data can also be referenced across collections.
-
How do I define 1-to-1, 1-to-n and n-to-n relationships in MongoDB? Consider a User document. The person works for only one company. Company field or document is embedded within the User document. This is an example of 1-to-1 relationship.
A person may have multiple addresses, which is a 1-to-n relationship. This is modelled in MongoDB as an array of documents, that is, an array of addresses is embedded into the User document.
Another 1-to-n example is a Product document that contains many Part documents. There could be dozens of parts. It may be necessary to access each part on its own independent of the product. Hence, we might embed into a Product document an array of ObjectID references to the parts. Parts are stored in a separate collection.
Consider a to-do application that assigns tasks to users. A user can have many tasks and a task can be assigned to many users. This is an example of n-to-n relationship. A User document would embed an array of ObjectID references to tasks. A Task document would embed an array of ObjectID references to users.
-
Could you share essential tips towards MongoDB schema design? Embed documents unless there's a good reason not to. Objects that need to be accessed on their own or high-cardinality arrays are compelling reasons not to embed.
Arrays model 1-to-n relationship. If n is few, embed the documents. If n is in the hundreds, embed ObjectID references, called child referencing. If n is in the thousands, embed the
1
ObjectID reference from withinn
documents, called parent referencing.Don't shun application-level joins. If data is correctly indexed and results are minimally projected, they're quite efficient.
Consider the read-to-write ratio. A high ratio favours denormalization and improves read performance. A field that's updated often is not a good candidate for denormalization since it has to be updated in multiple places.
Analyze your application and its data access patterns. Structure the data to match the application.
-
What schema design patterns are available in MongoDB? We briefly describe each pattern:
- Approximation: Fewer writes and calculations by saving only approximate values.
- Attribute: On large documents, index and query only on a subset of fields.
- Bucket: For streaming data or IoT applications, bucket values to reduce the number of documents. Pre-aggregation (sum, mean) simplifies data access.
- Computed: Avoids repeated computations on reads by doing them at writes or at regular intervals.
- Document Versioning: Allows different versions of documents to coexist.
- Extended Reference: Avoid lots of joins by embedding only frequently accessed fields.
- Outlier: Data model and queries are designed for typical use cases, and not influenced by outliers.
- Pre-Allocation: Reduce memory reallocation and improve performance when document structure is known in advance.
- Polymorphic: Useful when documents are similar but don't have the same structure.
- Schema Versioning: Useful when schema evolves during the application's lifetime. Avoids downtime and technical debt.
- Subset: Useful when only some data is used by application. Smaller dataset will fit into RAM and improve performance.
- Tree: Suited for hierarchical data. Application needs to manage updates to the graph.
-
Could you share an example showing the use of MongoDB schema design patterns? The example in the figure pertains to an e-commerce application. It shows the use of five design patterns among three collections:
- Schema Versioning: Every collection includes an integer field
schema
to store the schema version. - Subset: Items and reviews are stored as separate collections but since
top_reviews
are frequently accessed from items, these are embedded into items documents. Other reviews are rarely accessed. Another example isstaff
information embedded into stores collection. - Computed: Since
sum_reviews
andnum_reviews
are frequently accessed, these are pre-computed and stored in reviews. Another example are fieldstot_rating
andnum_ratings
in items collection. - Bucket: Rather than store each review as a separate document, reviews are bucketed into a time window (
start_date
andend_date
). - Extended Reference: Fields part of other collections, but frequently accessed, are duplicated for higher read performance and avoid joins. Fields
sold_at
in items anditems_in_stock
in stores are two examples.
- Schema Versioning: Every collection includes an integer field
-
What are some schema design anti-patterns in MongoDB? We could have large arrays stored in documents, some of which are unbounded. Storing lots of a data together leads to bloated documents. Storing numerous collections in a database, some of which are unused, is another anti-pattern.
A collection may have many indexes, some of which could be unnecessary. Remove indexes that are rarely used. Remove indexes already covered by another compound index.
Another anti-pattern is storing information in separate collections although they're often accessed together. The
$lookup
operator is similar to JOIN in relational databases. It's slow and resource intensive. Instead, it's better to denormalize this data.A case-insensitive query without having a case-insensitive index to cover it is another anti-pattern. We could use
$regex
with thei
option but this doesn't efficiently utilize case-insensitive indexes. Instead, create a case-insensitive index, that is, index with a collation strength of 1 or 2.On MongoDB Atlas, Performance Advisor or Data Explorer can spot anti-patterns and warn developers of the same.
Milestones
2009
2015
MongoDB 3.2 is released. Schema validation is now possible during updates and inserts. Validation rules can be specified with the validator
option via the db.createCollection()
method and collMod
command. Validation rules follow the same syntax as query expression. This release also adds $lookup
pipeline stage to join collections.
2017
2018
2019
References
- Coupal, Daniel and Ken W. Alger. 2019. "Building with Patterns: A Summary." Blog, MongoDB, April 27. Accessed 2021-11-10.
- Done, Paul. 2021. "History Of MongoDB Aggregations." Section 1.2 in: Practical MongoDB Aggregations, v3.00, MongoDB, Inc. Accessed 2021-10-10.
- Genkina, Yulia. 2020. "Data Modeling with MongoDB." MongoDB, on YouTube, October 29. Accessed 2021-11-10.
- Karlsson, Joe. 2020a. "MongoDB Schema Design Best Practices." MongoDB, December 4. Accessed 2021-11-10.
- Ludwig, Robert. 2015. "MongoDB: To Use Schemas or Not?" Compose, September 30. Accessed 2021-11-10.
- MongoDB. 2009. "1.0 GA Released." Blog, MongoDB, August 27. Accessed 2021-11-12.
- MongoDB Docs. 2015. "Release Notes for MongoDB 3.2." December 8. Accessed 2021-11-12.
- MongoDB Docs. 2017. "Release Notes for MongoDB 3.6." November. Accessed 2021-11-12.
- MongoDB Docs. 2018. "Release Notes for MongoDB 4.0." August. Accessed 2021-11-12.
- MongoDB Docs. 2021a. "Data Model Design." Documentation, MongoDB 5.0. Accessed 2021-11-10.
- MongoDB Docs. 2021b. "Schema Validation." Documentation, MongoDB 5.0. Accessed 2021-11-10.
- MongoDB Docs. 2021e. "Release Notes for MongoDB 5.0." July 13. Accessed 2021-11-12.
- MongoDB Docs. 2021f. "MongoDB Limits and Thresholds." Documentation, MongoDB 5.0. Accessed 2021-11-12.
- MongoDB YouTube. 2020. "MongoDB Atlas - Schema Suggestions Tutorial." MongoDB, on YouTube, May 30. Accessed 2021-11-10.
- Schaefer, Lauren and Daniel Coupal. 2020a. "A Summary of Schema Design Anti-Patterns and How to Spot Them." MongoDB, July 17. Updated 2020-10-08. Accessed 2021-11-10.
- Schaefer, Lauren and Daniel Coupal. 2020b. "Separating Data That is Accessed Together." MongoDB, June 30. Accessed 2021-11-10.
- Schaefer, Lauren and Daniel Coupal. 2020c. "Case-Insensitive Queries Without Case-Insensitive Indexes." MongoDB, July 7. Updated 2020-10-08. Accessed 2021-11-10.
- Zola, William. 2014a. "6 Rules of Thumb for MongoDB Schema Design: Part 1." Blog, MongoDB, May 29. Updated 2020-06-17. Accessed 2021-11-10.
- Zola, William. 2014b. "6 Rules of Thumb for MongoDB Schema Design: Part 2." Blog, MongoDB, June 5. Updated 2018-05-02. Accessed 2021-11-10.
- Zola, William. 2014c. "6 Rules of Thumb for MongoDB Schema Design: Part 3." Blog, MongoDB, June 11. Updated 2017-05-06. Accessed 2021-11-10.
Further Reading
- Genkina, Yulia. 2020. "Data Modeling with MongoDB." MongoDB, on YouTube, October 29. Accessed 2021-11-10.
- Karlsson, Joe. 2020b. "MongoDB Schema Design Best Practices." On YouTube, April 16. Accessed 2021-11-10.
- Coupal, Daniel and Ken W. Alger. 2019. "Building with Patterns: A Summary." Blog, MongoDB, April 27. Accessed 2021-11-10.
- MongoDB Docs. 2021c. "Data Models." Documentation, MongoDB 5.0. Accessed 2021-11-10.
- MongoDB Docs. 2021d. "Data Model Examples and Patterns." Documentation, MongoDB 5.0. Accessed 2021-11-10.
- MongoDB. 2020. "RDBMS to MongoDB Migration Guide." White paper, MongoDB, February. Accessed 2021-11-10.
Article Stats
Cite As
See Also
- Database Model
- Database Normalization
- MongoDB Query Language
- MongoDB Query Optimization
- MongoDB Aggregation
- MongoDB