MongoDB Aggregation Functions with Examples
- Step 1: Insert a Dataset into MongoDB Collection
- First, we'll insert a sample dataset into a MongoDB collection named `orders`. Here’s how you can insert the data using the MongoDB shell or a Node.js script.
- Insert the Dataset:
db.orders.insertMany([
{
"_id": 1,
"customerId": "C001",
"items": 2,
"amount": 500,
"status": "Completed"
},
{
"_id": 2,
"customerId": "C002",
"items": 1,
"amount": 300,
"status": "Completed"
},
{
"_id": 3,
"customerId": "C001",
"items": 4,
"amount": 1200,
"status": "Pending"
},
{
"_id": 4,
"customerId": "C003",
"items": 5,
"amount": 1500,
"status": "Completed"
},
{
"_id": 5,
"customerId": "C002",
"items": 3,
"amount": 900,
"status": "Pending"
},
{
"_id": 6,
"customerId": "C001",
"items": 1,
"amount": 400,
"status": "Completed"
}
])
- Step 2: MongoDB Aggregation Functions
- MongoDB provides several aggregation functions that help in processing data in various ways. Below, we'll go through common aggregation functions such as `$sum`, `$avg`, `$min`, `$max`, `$count`, `$first`, and `$last`, using the above dataset.
- $group (Group by Fields): The $group operator groups documents by a specified identifier field and allows for calculations like sum, average, etc.
- Example: Group orders by status and calculate the total amount for each status.
db.orders.aggregate([
{ $group: { _id: "$status", totalAmount: { $sum: "$amount" } } }
])
- Explanation: This query groups the orders by status and calculates the total sum of the amount for each group.
[
{ _id: 'Completed', totalAmount: 2700 },
{ _id: 'Pending', totalAmount: 2100 }
]
- $sum (Summation): The `$sum` function calculates the total sum of a field across documents or grouped documents.
- Example: Calculate the total amount spent by each customer.
db.orders.aggregate([
{ $group: { _id: "$customerId", totalAmount: { $sum: "$amount" } } }
])
- Explanation: This query groups the documents by `customerId` and calculates the total sum of the `amount` field for each customer.
- Output:
[
{ "_id": "C001", "totalAmount": 2100 },
{ "_id": "C002", "totalAmount": 1200 },
{ "_id": "C003", "totalAmount": 1500 }
]
- $avg (Average): The `$avg` function computes the average of the values across documents or grouped documents.
- Example: Find the average number of items ordered by each customer.
db.orders.aggregate([
{ $group: { _id: "$customerId", avgItems: { $avg: "$items" } } }
])
- Explanation: This query calculates the average number of items ordered by each customer by grouping the orders by `customerId`.
- Output:
[
{ "_id": "C001", "avgItems": 2.3333333333333335 },
{ "_id": "C002", "avgItems": 2 },
{ "_id": "C003", "avgItems": 5 }
]
- $min (Minimum): The `$min` function returns the minimum value from the documents in the group.
- Example: Find the minimum amount spent by each customer.
db.orders.aggregate([
{ $group: { _id: "$customerId", minAmount: { $min: "$amount" } } }
])
- Explanation: This query finds the minimum amount spent by each customer by grouping the documents by `customerId`.
- Output:
[
{ "_id": "C001", "minAmount": 400 },
{ "_id": "C002", "minAmount": 300 },
{ "_id": "C003", "minAmount": 1500 }
]
- $max (Maximum): The `$max` function returns the maximum value from the documents in the group.
- Example: Find the maximum amount spent by each customer.
db.orders.aggregate([
{ $group: { _id: "$customerId", maxAmount: { $max: "$amount" } } }
])
- Explanation: This query finds the maximum amount spent by each customer by grouping the documents by `customerId`.
- Output:
[
{ "_id": "C001", "maxAmount": 1200 },
{ "_id": "C002", "maxAmount": 900 },
{ "_id": "C003", "maxAmount": 1500 }
]
- $count (Document Count): The `$count` operator counts the number of documents in a group.
- Example: Count how many orders each customer has placed.
db.orders.aggregate([
{ $group: { _id: "$customerId", orderCount: { $count: {} } } }
])
- Explanation: This query counts the number of orders placed by each customer by grouping the documents by `customerId`.
- Output:
[
{ "_id": "C001", "orderCount": 3 },
{ "_id": "C002", "orderCount": 2 },
{ "_id": "C003", "orderCount": 1 }
]
- $first (First Document): The `$first` function returns the first document in each group based on the sort order.
- Example: Get the first order placed by each customer.
db.orders.aggregate([
// Sort by _id to ensure the first document is returned based on _id
{ $sort: { _id: 1 } },
{ $group: { _id: "$customerId", firstOrder: { $first: "$_id" } } }
])
- Explanation: This query groups the documents by `customerId` and returns the first order `_id` for each customer.
- Output:
[
{ "_id": "C001", "firstOrder": 1 },
{ "_id": "C002", "firstOrder": 2 },
{ "_id": "C003", "firstOrder": 4 }
]
- $last (Last Document): The `$last` function returns the last document in each group based on the sort order.
- Example: Get the last order placed by each customer.
db.orders.aggregate([
// Sort by _id to ensure the last document is returned based on _id
{ $sort: { _id: 1 } },
{ $group: { _id: "$customerId", lastOrder: { $last: "$_id" } } }
])
- Explanation: This query groups the documents by `customerId` and returns the last order `_id` for each customer.
- Output:
[
{ "_id": "C001", "lastOrder": 6 },
{ "_id": "C002", "lastOrder": 5 },
{ "_id": "C003", "lastOrder": 4 }
]
- $push: The $push operator appends a value to an array of values in a group. It is often used when you want to collect all values of a particular field into an array within a grouped result.
- Example: Collect all order amounts for each customer.
db.orders.aggregate([
{ $group: { _id: "$customerId", allAmounts: { $push: "$amount" } } }
])
- Explanation: This groups the documents by customerId and returns an array of all amount values for each customer.
- Output:
[
{ "_id": "C001", "allAmounts": [500, 1200, 400] },
{ "_id": "C002", "allAmounts": [300, 900] },
{ "_id": "C003", "allAmounts": [1500] }
]
- $addToSet: The $addToSet operator adds a value to an array only if it doesn’t already exist in the array. It ensures there are no duplicate values in the result array.
- Example: Collect unique order statuses for each customer.
db.orders.aggregate([
{ $group: { _id: "$customerId", uniqueStatuses: { $addToSet: "$status" } } }
])
- Explanation: This groups the documents by customerId and collects unique status values for each customer.
- Output:
[
{ "_id": "C001", "uniqueStatuses": ["Completed", "Pending"] },
{ "_id": "C002", "uniqueStatuses": ["Completed", "Pending"] },
{ "_id": "C003", "uniqueStatuses": ["Completed"] }
]
- $unwind: The $unwind operator deconstructs an array field from the input documents and outputs one document for each element of the array.
- Example: Deconstruct orders with multiple items.
- Assume the orders collection has an items array field:
{
"_id": 1,
"customerId": "C001",
"items": [
{
"product": "A",
"quantity": 2
},
{
"product": "B",
"quantity": 1
}
],
"amount": 500
}
- Now, deconstruct this array using $unwind:
db.orders.aggregate([
{ $unwind: "$items" }
])
- Explanation: This query deconstructs the items array, producing a document for each item.
- Output:
[
{
"_id": 1,
"customerId": "C001",
"items": {
"product": "A",
"quantity": 2
},
"amount": 500
},
{
"_id": 1,
"customerId": "C001",
"items": {
"product": "B",
"quantity": 1
},
"amount": 500
}
]
- $sort: The $sort operator sorts the documents based on the specified fields in ascending (1) or descending (-1) order.
- Example: Sort orders by amount in descending order.
db.orders.aggregate([
{ $sort: { amount: -1 } }
])
- Explanation: This query sorts the orders by the amount field in descending order.
- Output:
[
{
"_id": 4,
"customerId": "C003",
"items": 5,
"amount": 1500,
"status": "Completed"
},
{
"_id": 3,
"customerId": "C001",
"items": 4,
"amount": 1200,
"status": "Pending"
},
{
"_id": 5,
"customerId": "C002",
"items": 3,
"amount": 900,
"status": "Pending"
},
{
"_id": 1,
"customerId": "C001",
"items": 2,
"amount": 500,
"status": "Completed"
},
{
"_id": 6,
"customerId": "C001",
"items": 1,
"amount": 400,
"status": "Completed"
},
{
"_id": 2,
"customerId": "C002",
"items": 1,
"amount": 300,
"status": "Completed"
}
]
- $project: The $project operator is used to include, exclude, or reshape fields in the output documents.
- Example: Project only the customerId and amount fields.
db.orders.aggregate([
{ $project: { customerId: 1, amount: 1, _id: 0 } }
])
- Explanation: This query projects only the customerId and amount fields, excluding the _id field from the output.
- Output:
[
{ "customerId": "C001", "amount": 500 },
{ "customerId": "C002", "amount": 300 },
{ "customerId": "C001", "amount": 1200 },
{ "customerId": "C003", "amount": 1500 },
{ "customerId": "C002", "amount": 900 },
{ "customerId": "C001", "amount": 400 }
]
- $switch (Multiple Conditional Expressions): The $switch operator allows for multiple conditional branches (similar to a switch-case statement).
- Example: Classify orders based on the amount.
db.orders.aggregate([
{
$project: {
amountClassification: {
$switch: {
branches: [{
case: {
$lt: ["$amount", 500]
}, then: "Small"
}, {
case: {
$and: [{
$gte: ["$amount", 500]
}, {
$lt: ["$amount", 1000]
}]
}, then: "Medium"
}, {
case: {
$gte: ["$amount", 1000]
}, then: "Large"
}],
default: "Unknown"
}
}
}
}
])
- Explanation: This query classifies each order based on its amount into "Small", "Medium", or "Large".
- Output:
[
{ _id: 1, amountClassification: 'Medium' },
{ _id: 2, amountClassification: 'Small' },
{ _id: 3, amountClassification: 'Large' },
{ _id: 4, amountClassification: 'Large' },
{ _id: 5, amountClassification: 'Medium' },
{ _id: 6, amountClassification: 'Small' }
]
- '$toUpper' and '$toLower' (String Case Conversion)
- $toUpper converts a string to uppercase.
- $toLower converts a string to lowercase.
- Example: Convert the status field to uppercase.
db.orders.aggregate([
{ $project: { upperCaseStatus: { $toUpper: "$status" } } }
])
- Explanation: This query converts the status field to uppercase.
[
{ _id: 1, upperCaseStatus: 'COMPLETED' },
{ _id: 2, upperCaseStatus: 'COMPLETED' },
{ _id: 3, upperCaseStatus: 'PENDING' },
{ _id: 4, upperCaseStatus: 'COMPLETED' },
{ _id: 5, upperCaseStatus: 'PENDING' },
{ _id: 6, upperCaseStatus: 'COMPLETED' }
]
No comments:
Post a Comment