- In MongoDB, when dealing with relationships between data, you can choose between two main approaches: embedding and referencing. Each of these approaches has specific use cases and trade-offs. Let's explore both in detail.
- Embedding means that you nest related documents inside a parent document. The related data is stored within the same document as an array or sub-document.
- Data is stored together: All related information is part of the same document.
- Fast data retrieval: Since all the related data is contained in a single document, no additional queries are required to fetch it.
- Simplicity: You avoid the complexity of managing multiple collections for related data.
- Document size: MongoDB has a document size limit of 16 MB. If the embedded data grows too large, it can hit this limit.
- When the related data is tightly coupled and frequently used together.
- When the embedded data won’t grow indefinitely or change independently of the parent document.
- For small, static datasets where duplicating information isn’t a concern.
- Let's assume we are modeling users and their profiles in MongoDB. Each user has only one profile, so we can store the profile data inside the user document.
    {
        "_id": 1,
        "name": "John Doe",
        "email": "john@example.com",
        "profile": {
            "age": 30,
            "gender": "Male",
            "address": "123 Main Street"
        }
    }
- User data and profile data are stored together.
- The user document contains a profile field, which is itself an object containing age, gender, and address.
- Since the profile data is specific to this user and always accessed along with the user, embedding is a good choice.
- Single Query: You only need one query to get the parent document and its related data.
    db.users.findOne({ _id: 1 })
- No Joins: Joins or lookups are unnecessary, which improves read performance.
- Atomicity: Since the entire document is stored together, updates to the document (including its embedded fields) are atomic, meaning they happen completely or not at all.
- Document Growth: If the embedded data grows too large, it could hit MongoDB’s document size limit (16 MB).
- Data Duplication: If multiple documents contain similar embedded data, duplication can occur.
- Difficulty in Independent Updates: If the embedded data needs to be updated independently of the parent document, it can become inefficient since you have to update the entire document.
- Referencing means that related data is stored in separate documents, and these documents reference each other through a shared key, typically using the _id field of the referenced document.
- Data is stored separately: The related data lives in different collections (or different documents within the same collection).
- More flexible: The related documents can grow independently, and there’s no strict size limit on either.
- Requires multiple queries or joins: You often need multiple queries to retrieve the related data or use MongoDB’s $lookup operation for joins.
- Normalization: Referencing allows you to normalize your data, reducing duplication and improving data consistency.
- When the related data is loosely coupled or changes frequently, or when the data grows over time.
- When related data is large and may not always be required.
- When there’s potential for shared relationships (e.g., many-to-many relationships).
- Let’s consider the same example of users and profiles, but this time we’ll store the Profile in a separate collection and reference it from the User document.
    // Users Collection:
    {
        "_id": 1,
        "name": "John Doe",
        "email": "john@example.com",
        "profile_id": 101 // Reference to profile document
    }
    // Profiles Collection:
    {
        "_id": 101,
        "age": 30,
        "gender": "Male",
        "address": "123 Main Street"
    }
- The Users document contains a reference (profile_id) to the corresponding Profiles document.
- The Profiles document holds the profile data in a separate collection.
- Avoids Data Duplication: The same referenced data can be shared by multiple documents.
- Scalability: Documents can grow independently without being constrained by the 16 MB document size limit.
- Modularity: Related data can be updated independently without affecting the parent document.
- Data Normalization: You store a single version of data and reference it, which makes updates simpler and avoids inconsistency.
- Additional Queries: You need multiple queries to retrieve related data. For example, you first query the users collection to get the profile_id, and then query the profiles collection to get the profile.
- Complexity: Managing and querying related data in separate collections can be more complex than embedding.
- Joins/Lookups: When you want to combine data from different collections in a single query, you need to use the $lookup aggregation, which can be slower compared to embedding.
- In a referencing approach, if you want to get a user along with their profile, you can either:
- 1. Use two separate queries (one for the user and another for the profile).
- Query the User document:
    var user = db.users.findOne({ _id: 1 })
- Query the Profile document using profile_id:
    db.profiles.findOne({ _id: user.profile_id })
- 2. Use the $lookup operator to join the users and profiles collections:
- Using Aggregation for Lookup:
  db.users.aggregate([
    {
      $lookup: {
        from: "profiles",            // The collection to join
        localField: "profile_id",    // Field from users
        foreignField: "_id",         // Field from profiles
        as: "profile"                // Output array field name
      }
    }
  ])
  // Result:
  [
    {
      "_id": 1,
      "name": "John Doe",
      "email": "john@example.com",
      "profile_id": 101,
      "profile": [
        {
          "_id": 101,
          "age": 30,
          "gender": "Male",
          "address": "123 Main Street"
        }
      ]
    }
  ]
When to Use Embedding:
- Tightly Coupled Data: If the related data is always accessed together (e.g., user and profile), embedding makes sense.
- Small Subdocuments: If the embedded data is small and doesn’t grow too large over time.
- Atomic Updates: If you need to update the document and its embedded data in a single, atomic operation.
- Loosely Coupled Data: If the related data is often accessed separately (e.g., orders and customers).
- Data Growth: If the related data could grow large or change frequently.
- Multiple Relationships: If the related data might be used by multiple documents (e.g., users and posts in a social media app).
- Document Size Limits: If you are concerned about the 16 MB document size limit.
Summary of Embedding vs. Referencing:
No comments:
Post a Comment