Embedding and Referencing approaches in mongo db

In MongoDB, when dealing with relationships between data, you can choose between two main approaches: embedding and referencing. Each of these approaches has specific use cases and trade-offs. Let's explore both in detail.

Embedding in MongoDB

Embedding means that you nest related documents inside a parent document. The related data is stored within the same document as an array or sub-document.

Characteristics of Embedding:

Data is stored together: All related information is part of the same document.
Fast data retrieval: Since all the related data is contained in a single document, no additional queries are required to fetch it.
Simplicity: You avoid the complexity of managing multiple collections for related data.
Document size: MongoDB has a document size limit of 16 MB. If the embedded data grows too large, it can hit this limit.

Use Cases for Embedding:

When the related data is tightly coupled and frequently used together.
When the embedded data won’t grow indefinitely or change independently of the parent document.
For small, static datasets where duplicating information isn’t a concern.

Example of Embedding:

Let's assume we are modeling users and their profiles in MongoDB. Each user has only one profile, so we can store the profile data inside the user document.

    {
        "_id": 1,
        "name": "John Doe",
        "email": "john@example.com",
        "profile": {
            "age": 30,
            "gender": "Male",
            "address": "123 Main Street"
        }
    }

In this case:

User data and profile data are stored together.
The user document contains a profile field, which is itself an object containing age, gender, and address.
Since the profile data is specific to this user and always accessed along with the user, embedding is a good choice.

Advantages of Embedding:

Single Query: You only need one query to get the parent document and its related data.

    db.users.findOne({ _id: 1 })

No Joins: Joins or lookups are unnecessary, which improves read performance.
Atomicity: Since the entire document is stored together, updates to the document (including its embedded fields) are atomic, meaning they happen completely or not at all.

Disadvantages of Embedding:

Document Growth: If the embedded data grows too large, it could hit MongoDB’s document size limit (16 MB).
Data Duplication: If multiple documents contain similar embedded data, duplication can occur.
Difficulty in Independent Updates: If the embedded data needs to be updated independently of the parent document, it can become inefficient since you have to update the entire document.

Referencing in MongoDB

Referencing means that related data is stored in separate documents, and these documents reference each other through a shared key, typically using the _id field of the referenced document.

Characteristics of Referencing:

Data is stored separately: The related data lives in different collections (or different documents within the same collection).
More flexible: The related documents can grow independently, and there’s no strict size limit on either.
Requires multiple queries or joins: You often need multiple queries to retrieve the related data or use MongoDB’s $lookup operation for joins.
Normalization: Referencing allows you to normalize your data, reducing duplication and improving data consistency.

Use Cases for Referencing:

When the related data is loosely coupled or changes frequently, or when the data grows over time.
When related data is large and may not always be required.
When there’s potential for shared relationships (e.g., many-to-many relationships).

Example of Referencing:

Let’s consider the same example of users and profiles, but this time we’ll store the Profile in a separate collection and reference it from the User document.

    // Users Collection:
    {
        "_id": 1,
        "name": "John Doe",
        "email": "john@example.com",
        "profile_id": 101 // Reference to profile document
    }

    // Profiles Collection:
    {
        "_id": 101,
        "age": 30,
        "gender": "Male",
        "address": "123 Main Street"
    }

In this case:

The Users document contains a reference (profile_id) to the corresponding Profiles document.
The Profiles document holds the profile data in a separate collection.

Advantages of Referencing:

Avoids Data Duplication: The same referenced data can be shared by multiple documents.
Scalability: Documents can grow independently without being constrained by the 16 MB document size limit.
Modularity: Related data can be updated independently without affecting the parent document.
Data Normalization: You store a single version of data and reference it, which makes updates simpler and avoids inconsistency.

Disadvantages of Referencing:

Additional Queries: You need multiple queries to retrieve related data. For example, you first query the users collection to get the profile_id, and then query the profiles collection to get the profile.
Complexity: Managing and querying related data in separate collections can be more complex than embedding.
Joins/Lookups: When you want to combine data from different collections in a single query, you need to use the $lookup aggregation, which can be slower compared to embedding.

Performing Queries with Referencing:

In a referencing approach, if you want to get a user along with their profile, you can either:

1. Use two separate queries (one for the user and another for the profile).

Query the User document:

    var user = db.users.findOne({ _id: 1 })

Query the Profile document using profile_id:

    db.profiles.findOne({ _id: user.profile_id })

2. Use the $lookup operator to join the users and profiles collections:

Using Aggregation for Lookup:

  db.users.aggregate([
    {
      $lookup: {
        from: "profiles",            // The collection to join
        localField: "profile_id",    // Field from users
        foreignField: "_id",         // Field from profiles
        as: "profile"                // Output array field name
      }
    }
  ])

  // Result:
  [
    {
      "_id": 1,
      "name": "John Doe",
      "email": "john@example.com",
      "profile_id": 101,
      "profile": [
        {
          "_id": 101,
          "age": 30,
          "gender": "Male",
          "address": "123 Main Street"
        }
      ]
    }
  ]

Choosing Between Embedding and Referencing

When to Use Embedding:

Tightly Coupled Data: If the related data is always accessed together (e.g., user and profile), embedding makes sense.
Small Subdocuments: If the embedded data is small and doesn’t grow too large over time.
Atomic Updates: If you need to update the document and its embedded data in a single, atomic operation.

When to Use Referencing:

Loosely Coupled Data: If the related data is often accessed separately (e.g., orders and customers).
Data Growth: If the related data could grow large or change frequently.
Multiple Relationships: If the related data might be used by multiple documents (e.g., users and posts in a social media app).
Document Size Limits: If you are concerned about the 16 MB document size limit.

Summary of Embedding vs. Referencing:

Both approaches have their strengths, and the choice depends on your application's requirements. Generally, embedding works well for smaller, tightly related datasets, while referencing is better for larger, more loosely related or frequently changing data.

CodeWithGagan | Programming Language and IT Lectures

Embedding and Referencing approaches in mongo db

No comments:

Post a Comment

What is Machine Learning?