In MongoDB, a one-to-many relationship can be modeled in two main ways:
- Embedding: Embed many related documents in an array within the parent document.
- Referencing: Store related documents in a separate collection and reference them via an identifier.
- I'll provide an example using both approaches in the MongoDB shell.
- Scenario: Modeling Authors and Books
- We will model an author who writes multiple books, which is a classic one-to-many relationship.
1. Embedding Approach (One-to-Many)
- In this approach, the books are embedded directly inside the author document as an array.
- Step 1: Inserting Data (Embedding)
use libraryDB # Switch to or create the database
db.authors.insertOne({
_id: 1,
name: "George Orwell",
age: 46,
books: [
{
title: "1984",
genre: "Dystopian",
published_year: 1949
},
{
title: "Animal Farm",
genre: "Political Satire",
published_year: 1945
}
]
})
- In this case, the books field is an array, and each book is stored as an embedded document within the author document.
- Step 2: Querying Data (Embedding)
- To retrieve an author along with their books, you can simply query the authors collection:
db.authors.findOne({ _id: 1 })
// Output:
{
"_id": 1,
"name": "George Orwell",
"age": 46,
"books": [
{
"title": "1984",
"genre": "Dystopian",
"published_year": 1949
},
{
"title": "Animal Farm",
"genre": "Political Satire",
"published_year": 1945
}
]
}
Explanation of Embedding:
Advantages:
- Simpler data retrieval: Since the books are stored directly within the author document, you don’t need to perform additional queries.
- Single query for updates: You can update the entire author and their books in one go.
Disadvantages:
- Document size limit: MongoDB has a 16 MB document size limit. If an author writes too many books, the document can grow large.
- Data duplication: If books are referenced by other entities (e.g., publishers), duplication can occur.
2. Referencing Approach (One-to-Many)
- In this approach, the books are stored in a separate collection, and the author document references the book_ids in an array. This approach avoids embedding large amounts of data inside a single document.
- Step 1: Inserting Data (Referencing)
- Insert data into the books collection:
db.books.insertMany([
{
_id: 101,
title: "1984",
genre: "Dystopian",
published_year: 1949
},
{
_id: 102,
title: "Animal Farm",
genre: "Political Satire",
published_year: 1945
}
])
- Insert data into the authors collection with references to book_ids:
db.authors.insertOne({
_id: 1,
name: "George Orwell",
age: 46,
book_ids: [101, 102] # Array of references to the books
})
- Step 2: Querying Data (Referencing)
- To get an author and their books, you need to:
- 1. Query the authors collection to get the book_ids.
- 2. Use the book_ids to query the books collection.
- Step 2.1: Find the author:
var author = db.authors.findOne({ _id: 1 })
- Step 2.2: Find the books using the book_ids:
db.books.find({ _id: { $in: author.book_ids } })
// Output (from the books collection):
[
{
"_id": 101,
"title": "1984",
"genre": "Dystopian",
"published_year": 1949
},
{
"_id": 102,
"title": "Animal Farm",
"genre": "Political Satire",
"published_year": 1945
}
]
- Step 3: Using $lookup to Join Collections
- Alternatively, you can use the $lookup operator to join the authors and books collections in a single query:
db.authors.aggregate([
{
$lookup: {
from: "books", // Collection to join with
localField: "book_ids", // Field in the authors collection
foreignField: "_id", // Field in the books collection
as: "books" // Output array field name
}
}
])
// Output:
[
{
"_id": 1,
"name": "George Orwell",
"age": 46,
"book_ids": [101, 102],
"books": [
{
"_id": 101,
"title": "1984",
"genre": "Dystopian",
"published_year": 1949
},
{
"_id": 102,
"title": "Animal Farm",
"genre": "Political Satire",
"published_year": 1945
}
]
}
]
Explanation of Referencing:
Advantages:
- Flexible and scalable: The books can grow in number without causing the author document to become too large.
- No duplication: Since the books are stored in a separate collection, other entities (like publishers or libraries) can reference them without duplicating data.
Disadvantages:
- More complex queries: You need to perform multiple queries or use $lookup to retrieve related data.
- Data consistency: It’s possible for an author to reference a book that doesn’t exist, which introduces potential data integrity issues unless you enforce checks at the application level.
Conclusion: When to Use Embedding vs. Referencing in One-to-Many Relationships
- Use Embedding when:
- The related data (e.g., books) is always accessed with the parent document (e.g., author).
- The size of the embedded data is small and won’t grow indefinitely.
- You want simplicity in your data model with fewer collections to manage.
- Use Referencing when:
- The related data (e.g., books) might be accessed independently of the parent document (e.g., author).
- The size of the related data is large or could grow over time.
- You want to share related data between different entities (e.g., a book is written by an author but also published by a publisher).
- You need to avoid hitting the document size limit (16 MB in MongoDB).
- By using either approach, you can effectively model one-to-many relationships based on the specific requirements of your application.
No comments:
Post a Comment