Aggregation function with examples part 02 ($merge, $out, $cond, $type, $literal, $match, $limit, $skip, $facet, $bucket, $redact, $lookup)

  • $merge: The $merge stage in MongoDB's aggregation framework is used to merge the results of an aggregation pipeline into an existing collection, or create a new collection if it doesn't exist. It can be thought of as a way to "upsert" (insert or update) documents based on the aggregation results. This operation can be especially useful for materializing views, performing data transformations, or archiving data.
  • First, we'll insert a sample dataset into a MongoDB collection named `orders`. Here’s how you can insert the data using the MongoDB shell or a Node.js script.
  • Insert the Dataset:

    db.orders.insertMany([
        {
            "_id": 1,
            "customerId": "C001",
            "items": 2,
            "amount": 500,
            "status": "Completed"
        },
        {
            "_id": 2,
            "customerId": "C002",
            "items": 1,
            "amount": 300,
            "status": "Completed"
        },
        {
            "_id": 3,
            "customerId": "C001",
            "items": 4,
            "amount": 1200,
            "status": "Pending"
        },
        {
            "_id": 4,
            "customerId": "C003",
            "items": 5,
            "amount": 1500,
            "status": "Completed"
        },
        {
            "_id": 5,
            "customerId": "C002",
            "items": 3,
            "amount": 900,
            "status": "Pending"
        },
        {
            "_id": 6,
            "customerId": "C001",
            "items": 1,
            "amount": 400,
            "status": "Completed"
        }
    ])

  • Using $merge function Example 01:

    db.orders.aggregate([
        {
            $group: {
                _id: "$customerId", totalAmount: {
                    $sum: "$amount"
                }
            }
        },
        {
            $merge: "customerTotals"
        }
    ])

  • Now a customerTotals collection is created. the result is stored in customerTotals collection.

    db.customerTotals.find()

  • Output

    [
        { _id: 'C001', totalAmount: 2100 },
        { _id: 'C003', totalAmount: 1500 },
        { _id: 'C002', totalAmount: 1200 }
    ]


Key Capabilities of $merge:
  • Insert new documents: If a document in the aggregation result doesn't exist in the target collection, it is inserted.
  • Update existing documents: If a document with a matching identifier exists in the target collection, it is updated based on specified rules.
  • Choose the target collection: The aggregation results can be merged into the same collection, a different collection, or even a new collection.
  • Custom merge behavior: You can define how to handle merging—whether to keep existing documents, overwrite them, or perform custom updates.
  • Syntax:

    {
        $merge: {
            into: <collectionName>, // Target collection
            on: <identifierField>, // Field(s) to match on
            whenMatched: <operation>, // Operation to perform when a match is found
            whenNotMatched: <operation> // Operation to perform when no match is found
        }
    }

  • into: Specifies the target collection where the data should be merged.
  • on: Specifies the field(s) that will be used to match documents between the source and target collections.
  • whenMatched: Defines what happens when a document from the aggregation result matches a document in the target collection:
    • "replace": Replaces the entire document.
    • "merge": Merges fields from the aggregation result into the existing document.
    • "keepExisting": Keeps the existing document and ignores the incoming one.
    • "fail": Throws an error if a match is found.
    • Or a custom pipeline to modify matching documents.
    • whenNotMatched: Defines what happens when no matching document is found:
      • "insert": Inserts the document into the target collection.
      • "discard": Ignores the document and doesn't insert it.
      • "fail": Throws an error if no match is found.
    • Example 2: Simple Insert and Update: Let’s assume we have a sales collection that records sales transactions, and we want to create a summary of total sales per product in a new collection called productSalesSummary.
    • sales Collection:

        db.salesCollection.insertMany([
            {
                "_id": 1,
                "product": "Laptop",
                "quantity": 5
            },
            {
                "_id": 2,
                "product": "Phone",
                "quantity": 3
            },
            {
                "_id": 3,
                "product": "Laptop",
                "quantity": 2
            },
            {
                "_id": 4,
                "product": "Tablet",
                "quantity": 1
            }
        ])

    • We want to aggregate the total quantity sold for each product and store this result in the productSalesSummary collection. If the product already exists in that collection, we will update the total sales. If it doesn’t exist, we will insert a new document.
    • Aggregation Query with $merge:

        db.salesCollection.aggregate([
            {
                $group: {
                    _id: "$product",
                    totalQuantity: { $sum: "$quantity" }
                }
            },
            {
                $merge: {
                    // Target collection to merge into
                    into: "productSalesSummary",

                    // Match on the product name (_id in the result is the product)
                    on: "_id",

                    // Merge the totalQuantity if a product already exists
                    whenMatched: "merge",

                    // Insert a new document if the product does not exist
                    whenNotMatched: "insert"
                }
            }
        ])


    Explanation
    • $group: Groups the sales by product and sums the quantity for each product.
    • $merge:
      • into: Targets the productSalesSummary collection.
      • on: Matches documents by the _id field, which corresponds to the product name.
      • whenMatched: "merge": If the product already exists in productSalesSummary, merge the new totalQuantity into the existing document.
      • whenNotMatched: "insert": If the product doesn't exist, insert it as a new document.
    • Resulting productSalesSummary Collection:

        db.productSalesSummary.find()

    • Output:

        [
            { _id: 'Laptop', totalQuantity: 7 },
            { _id: 'Phone', totalQuantity: 3 },
            { _id: 'Tablet', totalQuantity: 1 }
        ]

    • Here, "Laptop" appears twice in the original collection, but $merge combines these into a single document with a totalQuantity of 7.
    • Example 3: Updating with Custom Logic
    • Let’s consider a situation where you want to apply more complex logic during the merge process. In this case, we can use a pipeline for the whenMatched option.
    • Suppose you have a students collection that tracks the grades of students in various subjects. We want to update a studentScores collection to store the highest grade achieved by each student in each subject.
    • students Collection:

        db.students.insertMany([
            { "_id": 1, "name": "Alice", "subject": "Math", "score": 85 },
            { "_id": 2, "name": "Bob", "subject": "Math", "score": 90 },
            { "_id": 3, "name": "Alice", "subject": "Physics", "score": 95 },
            { "_id": 4, "name": "Alice", "subject": "Math", "score": 88 }
        ])

    • Aggregation Query with Custom Logic in $merge:

        db.students.aggregate([
            {
                $group: {
                    _id: { name: "$name", subject: "$subject" },
                    maxScore: { $max: "$score" }
                }
            },
            {
                $merge: {
                    into: "studentScores",
                    on: ["_id.name", "_id.subject"],  // Match on both name and subject
                    whenMatched: [
                        {
                            $set: { maxScore: {
                                $max: ["$$new.maxScore", "$$current.maxScore"]
                            } }
                        }
                    ],
                    whenNotMatched: "insert"  // Insert if no match
                }
            }
        ])


    Explanation:
    • $group: Groups the students by both name and subject, and computes the maximum score for each subject.
    • $merge:
      • on: Matches documents based on both the name and subject fields.
      • whenMatched: Uses a pipeline that sets maxScore to the higher of the two values: the current document’s score ($$current.maxScore) or the new score ($$new.maxScore).
      • whenNotMatched: "insert": Inserts a new document if no match is found.
    • Resulting studentScores Collection:

        [
            { "_id": { "name": "Alice", "subject": "Math" }, "maxScore": 88 },
            { "_id": { "name": "Bob", "subject": "Math" }, "maxScore": 90 },
            { "_id": { "name": "Alice", "subject": "Physics" }, "maxScore": 95 }
        ]

    • In this example, Alice's scores for Math are merged, and the highest score (88) is kept in the studentScores collection.
    Key Takeaways:
    • $merge allows you to upsert aggregation results into a target collection.
    • It offers flexibility with operations like "replace", "merge", "keepExisting", and custom logic via pipelines for when matches are found.
    • You can define different behaviors when a match is found or not (whenMatched and whenNotMatched).
    • It's especially useful for materializing views, updating summary tables, and transforming data across collections.
    • $out (Write Aggregation Results to a New Collection): Similar to $merge, the $out operator writes the results of an aggregation pipeline into a new collection.
    • Example: Write the results of total orders by customer to a new collection customerOrders.

        db.orders.aggregate([
            { $group: { _id: "$customerId", totalAmount: { $sum: "$amount" } } },
            { $out: "customerOrders" }
        ])

    • Explanation: This query calculates the total order amount per customer and writes the result into a new collection customerOrders. The difference from $merge is that $out creates a new collection instead of merging into an existing one.
    • Output:

        db.customerOrders.find()

        // Output
        [
            { _id: 'C002', totalAmount: 1200 },
            { _id: 'C003', totalAmount: 1500 },
            { _id: 'C001', totalAmount: 2100 }
        ]

    • $cond (Conditional Expression): The $cond operator acts like an "if-else" statement. It evaluates a condition and returns one value if true and another if false.
    • Example: Add a field to indicate if the order amount is greater than 1000.

        db.orders.aggregate([
            {
                $project: {
                    customerId: 1,
                    amount: 1,
                    isLargeOrder: {
                        $cond: {
                            if: { $gt: ["$amount", 1000] }, then: true, else: false
                        }
                   }
                }
            }
        ])

    • Explanation: This query checks if the amount is greater than 1000 and adds an isLargeOrder field that is true or false accordingly.

        [
            { _id: 1, customerId: 'C001', amount: 500, isLargeOrder: false },
            { _id: 2, customerId: 'C002', amount: 300, isLargeOrder: false },
            { _id: 3, customerId: 'C001', amount: 1200, isLargeOrder: true },
            { _id: 4, customerId: 'C003', amount: 1500, isLargeOrder: true },
            { _id: 5, customerId: 'C002', amount: 900, isLargeOrder: false },
            { _id: 6, customerId: 'C001', amount: 400, isLargeOrder: false }
        ]

    • $type (Type Checking): The $type operator returns the BSON type of a field.
    • Example: Get the type of the amount field for each document.

        db.orders.aggregate([
            {
                $project: {
                    customerId: 1,
                    amountType: { $type: "$amount" },
                    statusType: { $type: "$status" }
                }
            }
        ])

    • Explanation: This query adds a new field amountType, showing the BSON type of the amount field for each document.
    • Output:

        [
            {
                _id: 1,
                customerId: 'C001',
                amountType: 'int',
                statusType: 'string'
            },
            {
                _id: 2,
                customerId: 'C002',
                amountType: 'int',
                statusType: 'string'
            },
            {
                _id: 3,
                customerId: 'C001',
                amountType: 'int',
                statusType: 'string'
            },
            {
                _id: 4,
                customerId: 'C003',
                amountType: 'int',
                statusType: 'string'
            },
            {
                _id: 5,
                customerId: 'C002',
                amountType: 'int',
                statusType: 'string'
            },
            {
                _id: 6,
                customerId: 'C001',
                amountType: 'int',
                statusType: 'string'
            }
        ]

    • $literal: The $literal operator in MongoDB is used to return a specific value without parsing or interpreting it as an expression. This means that when you use $literal, MongoDB treats the value as a constant and does not attempt to evaluate it. It is useful when you want to return a specific value (like a string, number, object, or array) that could otherwise be interpreted as an expression or field path.
    • Syntax

        {
            $literal: <value>
        }

    • <value>: The specific value you want MongoDB to treat as a constant and return literally.
    Why Use $literal?
    • Normally, when MongoDB encounters field paths or expressions, it tries to interpret and evaluate them. However, in some cases, you may need MongoDB to return the exact value you provide, without treating it as an expression or trying to evaluate it. That's where $literal comes in handy.
    Example Scenarios
    • Return a Static Value: Let's say you want to return a string "status" as part of the result, but without treating it as a field name or expression. You can use $literal to ensure MongoDB treats it as a string value.
    • Query:

        db.orders.aggregate([
            {
                $project: {
                    statusLiteral: { $literal: "status" }
                }
            }
        ])

        // Output
        [
            { "_id": 1, "statusLiteral": "status" },
            { "_id": 2, "statusLiteral": "status" },
            { "_id": 3, "statusLiteral": "status" }
        ]

    • Without $literal, MongoDB might try to interpret "status" as a field name or variable, but here it is treated as a constant string value.
    • Avoiding Field Interpretation: Suppose you want to return an object or array as a constant value within a projection, but don't want MongoDB to interpret it as a field.
    • Query (using an array):

        db.orders.aggregate([
            {
                $project: {
                    literalArray: { $literal: [1, 2, 3] }
                }
            }
        ])

        // Output
        [
            { "_id": 1, "literalArray": [1, 2, 3] },
            { "_id": 2, "literalArray": [1, 2, 3] }
        ]

    • Here, $literal ensures that the array [1, 2, 3] is returned as is, without interpreting it as field paths or expressions.
    • Escaping Dollar Signs: MongoDB uses the $ symbol to refer to field names or expressions in queries. If you need to return a string with a dollar sign (e.g., "price$"), you can use $literal to ensure MongoDB treats it as a string rather than an expression.
    • Query:

        db.orders.aggregate([
            {
                $project: {
                    escapedField: { $literal: "$price" }
                }
            }
        ])

        // Output
        [
            { "_id": 1, "escapedField": "$price" },
            { "_id": 2, "escapedField": "$price" }
        ]

    • Without $literal, MongoDB would treat "$price" as a field reference, but $literal ensures it's returned as the literal string "$price".
    • Use Cases of $literal: Returning constant values: When you want to include constants like strings, numbers, arrays, or objects in your query result.
    • Avoiding field interpretation: Prevents MongoDB from treating certain values as field paths or variables.
    • Escaping special characters: Handles values like dollar signs that MongoDB would otherwise interpret as expressions.
    • Example with Condition: $literal is also useful when combined with operators like $cond. You can use $literal to provide constant values in conditional expressions.
    • Example with $cond:

        db.sales.aggregate([
            {
                $project: {
                    status: {
                        $cond: {
                            if: { $gt: ["$total", 100] },
                            then: { $literal: "high" },
                            else: { $literal: "low" }
                        }
                    }
                }
            }
        ])

        //   Output
        [
            { "_id": 1, "status": "high" },
            { "_id": 2, "status": "low" }
        ]

    • In this case, the $literal ensures that "high" and "low" are treated as constant string values in the result.
    Key Takeaways
    • Literal Value: $literal tells MongoDB to treat the value exactly as given, without evaluating or interpreting it.
    • Prevent Parsing: It is especially useful for avoiding interpretation of strings, objects, arrays, or special symbols like $ that MongoDB would otherwise process as expressions or field paths.
    • $match: The $match operator filters documents by a given condition. It is similar to the find() operation but used within the aggregation pipeline.
    • Example: Find all orders with an amount greater than 1000.

        db.orders.aggregate([
            { $match: { amount: { $gt: 1000 } } }
        ])

    • Explanation: This query filters and returns only the documents where the amount field is greater than 1000.
    • Output:

        [
            {
                "_id": 3,
                "customerId": "C001",
                "items": 4,
                "amount": 1200,
                "status": "Pending"
            },
            {
                "_id": 4,
                "customerId": "C003",
                "items": 5,
                "amount": 1500,
                "status": "Completed"
            }
        ]

    • $limit: The $limit operator limits the number of documents returned.
    • Example: Return the first 3 orders.

        db.orders.aggregate([
            { $limit: 3 }
        ])

    • Explanation: This query returns only the first 3 documents from the orders collection.
    • Output:

        [
            {
                "_id": 1,
                "customerId": "C001",
                "items": 2,
                "amount": 500,
                "status": "Completed"
            },
            {
                "_id": 2,
                "customerId": "C002",
                "items": 1,
                "amount": 300,
                "status": "Completed"
            },
            {
                "_id": 3,
                "customerId": "C001",
                "items": 4,
                "amount": 1200,
                "status": "Pending"
            }
        ]

    • $skip: The $skip operator skips a specified number of documents before returning results.
    • Example: Skip the first 2 orders and return the rest.

        db.orders.aggregate([
            { $skip: 2 }
        ])

    • Explanation: This query skips the first 2 documents and returns the remaining ones.
    • Output:

        [
            {
                "_id": 3,
                "customerId": "C001",
                "items": 4,
                "amount": 1200,
                "status": "Pending"
            },
            {
                "_id": 4,
                "customerId": "C003",
                "items": 5,
                "amount": 1500,
                "status": "Completed"
            },
            {
                "_id": 5,
                "customerId": "C002",
                "items": 3,
                "amount": 900,
                "status": "Pending"
            },
            {
                "_id": 6,
                "customerId": "C001",
                "items": 1,
                "amount": 400,
                "status": "Completed"
            }
        ]

    • $facet: The $facet operator allows you to perform multiple aggregations on the same dataset in parallel. It returns a document containing the results of each aggregation pipeline in separate fields.
    • Example: Perform multiple aggregations to get the total amount and count of orders.

        db.orders.aggregate([
            {
                $facet: {
                    totalAmount: [
                        { $group: { _id: null, totalAmount: { $sum: "$amount" } } }
                    ],
                    orderCount: [
                        { $count: "totalOrders" }
                    ]
                }
            }
        ])

    • Explanation: This query performs two parallel aggregations: one to calculate the total amount and the other to count the total number of orders.
    • Output:

        [
            {
                "totalAmount": [{ "_id": null, "totalAmount": 4800 }],
                "orderCount": [{ "totalOrders": 6 }]
            }
        ]

    • $bucket: The $bucket operator categorizes documents into user-defined buckets, based on a field's values.
    • Example: Bucket orders into price ranges.

        db.orders.aggregate([
            {
                $bucket: {
                    groupBy: "$amount",             // Field to group by
                    boundaries: [0, 500, 1000, 1500, 2000], // Bucket boundaries
                    default: "Other",               // Bucket for out-of-bound values
                    output: {
                        "orderCount": { $sum: 1 },
                        "averageAmount": { $avg: "$amount" }
                    }
                }
            }
        ])

    • Explanation: This query divides the orders collection into price ranges: 0-500, 500-1000, and so on, counting the number of orders in each range and calculating the average order amount for each bucket.
    • Output:

        [
            { "_id": 0, "orderCount": 2, "averageAmount": 400 },
            { "_id": 500, "orderCount": 2, "averageAmount": 700 },
            { "_id": 1500, "orderCount": 2, "averageAmount": 1350 }
        ]

    • $redact (Conditional Exclusion of Fields): The $redact operator is used for conditionally including or excluding fields in documents. It can help when you want to filter fields inside nested documents.
    • Example: Redact the amount array and remove any amount whose value is less than 800 (if we had an array).

        db.orders.aggregate([
            {
                $redact: {
                    $cond: {
                        if: { $gte: ["$amount", 800] }, then: "$$DESCEND", else: "$$PRUNE"
                    }
                }
            }
        ])

    • Explanation: This query conditionally excludes fields where the value of amount is less than 800.

        [
            {
                _id: 3,
                customerId: 'C001',
                items: 4,
                amount: 1200,
                status: 'Pending'
            },
            {
                _id: 4,
                customerId: 'C003',
                items: 5,
                amount: 1500,
                status: 'Completed'
            },
            {
                _id: 5,
                customerId: 'C002',
                items: 3,
                amount: 900,
                status: 'Pending'
            }
        ]

    • $lookup (Join Operation): The $lookup operator allows you to perform a left outer join to another collection in the same database. It is used to merge data from different collections based on a related field.
    • Example: Perform a join to get customer details from another collection.
    • Assume we have another collection customers:

        [
            { "_id": "C001", "name": "John Doe", "email": "john@example.com" },
            { "_id": "C002", "name": "Jane Smith", "email": "jane@example.com" },
            { "_id": "C003", "name": "Alice Brown", "email": "alice@example.com" }
        ]

    • Now, let's join this customers collection with the orders collection.

        db.orders.aggregate([
            {
                $lookup: {
                    from: "customers",
                    localField: "customerId",
                    foreignField: "_id",
                    as: "customerDetails"
                }
            }
        ])

    • Explanation: This query joins the orders collection with the customers collection based on customerId and returns customer details along with their orders.
    • Output:

        [
            {
                "_id": 1,
                "customerId": "C001",
                "items": 2,
                "amount": 500,
                "status": "Completed",
                "customerDetails": [
                    {
                        "_id": "C001",
                        "name": "John Doe",
                        "email": "john@example.com"
                    }
                ]
            },
            {
                "_id": 2,
                "customerId": "C002",
                "items": 1,
                "amount": 300,
                "status": "Completed",
                "customerDetails": [
                    {
                        "_id": "C002",
                        "name": "Jane Smith",
                        "email": "jane@example.com"
                    }
                ]
            },
            {
                "_id": 3,
                "customerId": "C001",
                "items": 4,
                "amount": 1200,
                "status": "Pending",
                "customerDetails": [
                    {
                        "_id": "C001",
                        "name": "John Doe",
                        "email": "john@example.com"
                    }
                ]
            },
            // more documents...
        ]

    • s
    • s
    • s
    • s