String related aggregation functions ($concat, $substr, $toUpper, $toLower, $strcasecmp, $indexOfBytes, $indexOfCP, $split, $regexFind, $regexFindAll, $regexMatch, $trim, $ltrim, $rtrim, $replaceOne, $replaceAll, $strLenBytes, $strLenCP, $regexReplace)

  • Let's go through the string-related aggregation functions in MongoDB, explaining each one with examples. For simplicity, let's use a sample collection of students that includes details like their names, subjects, and a bio description.

  • Sample Dataset

    db.students.insertMany([
      {
        _id: 1,
        name: "John Doe",
        bio: "John loves programming and playing guitar",
        subjects: ["Math", "Science", "History"]
      },
      {
        _id: 2,
        name: "Jane Smith",
        bio: "Jane enjoys swimming, reading, and writing",
        subjects: ["English", "Math", "Geography"]
      },
      {
        _id: 3,
        name: "David Green",
        bio: "David is passionate about basketball and coding",
        subjects: ["Science", "Math", "History"]
      }
    ])

$concat

  • The $concat operator concatenates (joins) two or more strings into a single string.
  • Example: Concatenate the name field with the bio field to create a full description for each student:


  db.students.aggregate([
    {
      $project: {
        fullDescription: { $concat: ["$name", " - ", "$bio"] }
      }
    }
  ])

  // Output
  [
    {
      "_id": 1,
      "fullDescription": "John Doe - John loves programming and playing guitar"
    },
    {
      "_id": 2,
      "fullDescription": "Jane Smith - Jane enjoys swimming, reading, and writing"
    },
    {
      "_id": 3,
      "fullDescription": "David Green - David is passionate about basketball and coding"
    }
  ]

$substr

  • The $substr operator extracts a substring from a string, based on a starting position and the number of characters to include.
  • Example: Extract the first 5 characters from each student's bio:


    db.students.aggregate([
      {
        $project: {
          shortBio: { $substr: ["$bio", 0, 5] }
        }
      }
    ])

    // Output
    [
      { "_id": 1, "shortBio": "John " },
      { "_id": 2, "shortBio": "Jane " },
      { "_id": 3, "shortBio": "David" }
    ]

$toUpper

  • The $toUpper operator converts a string to uppercase.
  • Example: Convert each student's name to uppercase:


  db.students.aggregate([
    {
      $project: {
        nameUpperCase: { $toUpper: "$name" }
      }
    }
  ])

  // Output
  [
    { "_id": 1, "nameUpperCase": "JOHN DOE" },
    { "_id": 2, "nameUpperCase": "JANE SMITH" },
    { "_id": 3, "nameUpperCase": "DAVID GREEN" }
  ]

$toLower

  • The $toLower operator converts a string to lowercase.
  • Example: Convert each student's bio to lowercase:


  db.students.aggregate([
    {
      $project: {
        bioLowerCase: { $toLower: "$bio" }
      }
    }
  ])

  // Output
  [
    {
      "_id": 1,
      "bioLowerCase": "john loves programming and playing guitar"
    },
    {
      "_id": 2,
      "bioLowerCase": "jane enjoys swimming, reading, and writing"
    },
    {
      "_id": 3,
      "bioLowerCase": "david is passionate about basketball and coding"
    }
  ]

$strcasecmp

  • The $strcasecmp operator compares two strings in a case-insensitive manner. It returns:
    • 0 if the strings are equal,
    • 1 if the first string is greater,
    • -1 if the first string is smaller.
  • Example: Compare the names of two students:


    db.students.aggregate([
      {
        $project: {
          nameComparison: { $strcasecmp: ["$name", "John Doe"] }
        }
      }
    ])

    // Output
    [
      { _id: 1, nameComparison: 0 },
      { _id: 2, nameComparison: -1 },
      { _id: 3, nameComparison: -1 }
    ]

$indexOfBytes

  • The $indexOfBytes operator returns the position (index) of the first occurrence of a substring within a string. It works on the byte level, so it's useful for ASCII-based searches.
  • Example: Find the index of the word "coding" in the bio field:


    db.students.aggregate([
      {
        $project: {
          indexOfCoding: { $indexOfBytes: ["$bio", "coding"] }
        }
      }
    ])

    // Output
    [
      { "_id": 1, "indexOfCoding": -1 },  // Not found in John's bio
      { "_id": 2, "indexOfCoding": -1 },  // Not found in Jane's bio
      { "_id": 3, "indexOfCoding": 40 }   // Found in David's bio at index 40
    ]

$indexOfCP

  • The $indexOfCP operator also finds the position of a substring, but it works on the code points of characters (useful for Unicode).
  • Example: Find the index of the word "reading" in Jane's bio:

    db.students.aggregate([
      {
        $project: {
          indexOfReading: { $indexOfCP: ["$bio", "reading"] }
        }
      }
    ])

    // Output
    [
      { "_id": 1, "indexOfReading": -1 },  // Not found in John's bio
      { "_id": 2, "indexOfReading": 20 },  // Found in Jane's bio at index 20
      { "_id": 3, "indexOfReading": -1 }   // Not found in David's bio
    ]


$split

  • The $split operator splits a string into an array of substrings based on a delimiter.
  • Example: Split the bio field into individual words:

  db.students.aggregate([
    {
      $project: {
        bioWords: { $split: ["$bio", " "] }
      }
    }
  ])

  // Output
  [
    {
      "_id": 1,
      "bioWords": ["John", "loves", "programming", "and", "playing", "guitar"]
    },
    {
      "_id": 2,
      "bioWords": ["Jane", "enjoys", "swimming,", "reading,", "and", "writing"]
    },
    {
      "_id": 3,
      "bioWords": ["David", "is", "passionate", "about", "basketball", "and", "coding"]
    }
  ]


$regexFind

  • The $regexFind operator applies a regular expression and returns information about the first match it finds.
  • Example: Find the first word in the bio field that starts with the letter 'p':

  db.students.aggregate([
    {
      $project: {
        firstPWord: { $regexFind: { input: "$bio", regex: "\\bp\\w+" } }
      }
    }
  ])

  // Output
  [
    {
      "_id": 1,
      "firstPWord": { "match": "programming", "idx": 10 }
    },
    {
      "_id": 2,
      "firstPWord": null   // No word starting with 'p'
    },
    {
      "_id": 3,
      "firstPWord": { "match": "passionate", "idx": 8 }
    }
  ]


$regexFindAll

  • The $regexFindAll operator applies a regular expression and returns all matches found in the input string.
  • Example: Find all words in the bio that start with the letter 'a':

    db.students.aggregate([
      {
        $project: {
          allAWords: { $regexFindAll: { input: "$bio", regex: "\\ba\\w+" } }
        }
      }
    ])

    // Output
    [
      {
        _id: 1,
        allAWords: [
          { match: 'and', idx: 23, captures: [] }
        ]
      },
      {
        _id: 2,
        allAWords: [
          { match: 'and', idx: 31, captures: [] }
        ]
      },
      {
        _id: 3,
        allAWords: [
          { match: 'about', idx: 20, captures: [] },
          { match: 'and', idx: 37, captures: [] }
        ]
      }
    ]


$regexMatch

  • The $regexMatch operator checks if the input string matches a specified regular expression and returns a boolean value.
  • Example: Check if the word "programming" is present in the bio:

    db.students.aggregate([
      {
        $project: {
          hasProgramming: {
            $regexMatch: { input: "$bio", regex: "programming" }
          }
        }
      }
    ])

    // Output
    [
      { _id: 1, hasProgramming: true },
      { _id: 2, hasProgramming: false },
      { _id: 3, hasProgramming: false }
    ]


$trim

  • The $trim operator removes specified characters from the beginning and end of a string. By default, it removes whitespace, but you can specify which characters to remove.
  • Example: Trim whitespace from the beginning and end of the bio field:

    db.students.aggregate([
      {
        $project: {
          trimmedBio: { $trim: { input: "$bio" } }
        }
      }
    ])

  • You can also specify characters to trim, such as commas, periods, etc.:

    db.students.aggregate([
      {
        $project: {
          trimmedBio: { $trim: { input: "$bio", chars: "," } }
        }
      }
    ])


$ltrim

  • The $ltrim operator removes specified characters from the beginning (left side) of a string. Similar to $trim, it removes whitespace by default unless other characters are specified.
  • Example: Trim only leading whitespace from the bio field:

    db.students.aggregate([
      {
        $project: {
          leftTrimmedBio: { $ltrim: { input: "$bio" } }
        }
      }
    ])


$rtrim

  • The $rtrim operator removes specified characters from the end (right side) of a string. By default, it trims whitespace, but you can specify other characters.
  • Example: Trim trailing whitespace from the bio field:

    db.students.aggregate([
      {
        $project: {
          rightTrimmedBio: { $rtrim: { input: "$bio" } }
        }
      }
    ])


$replaceOne

  • The $replaceOne operator replaces the first occurrence of a specified string with another string in a given input.
  • Example: Replace the first occurrence of "and" with "&" in the bio field:

  db.students.aggregate([
    {
      $project: {
        updatedBio: {
          $replaceOne: {
            input: "$bio", find: "and", replacement: "&"
          }
        }
      }
    }
  ])

  // Output
  [
    { "_id": 1, "updatedBio": "John loves programming & playing guitar" },
    { "_id": 2, "updatedBio": "Jane enjoys swimming, reading, & writing" },
    { "_id": 3, "updatedBio": "David is passionate about basketball & coding" }
  ]


$replaceAll

  • The $replaceAll operator replaces all occurrences of a specified string with another string in a given input.
  • Example: Replace all occurrences of "and" with "&" in the bio field:

  db.students.aggregate([
    {
      $project: {
        updatedBio: {
          $replaceAll: {
            input: "$bio", find: "and", replacement: "&"
          }
        }
      }
    }
  ])

  // Output
  [
    { "_id": 1, "updatedBio": "John loves programming & playing guitar" },
    { "_id": 2, "updatedBio": "Jane enjoys swimming, reading, & writing" },
    { "_id": 3, "updatedBio": "David is passionate about basketball & coding" }
  ]


$strLenBytes

  • The $strLenBytes operator returns the length of a string in bytes. This is useful for applications where you need to account for byte-based encodings.
  • Example: Calculate the length of each bio string in bytes:

  db.students.aggregate([
    {
      $project: {
        bioLengthBytes: { $strLenBytes: "$bio" }
      }
    }
  ])

  // Output
  [
    { "_id": 1, "bioLengthBytes": 40 },
    { "_id": 2, "bioLengthBytes": 44 },
    { "_id": 3, "bioLengthBytes": 47 }
  ]


$strLenCP

  • The $strLenCP operator returns the length of a string in code points (Unicode code points). This is useful for Unicode strings where a character might be more than 1 byte.
  • Example: Calculate the length of each bio string in code points:

  db.students.aggregate([
    {
      $project: {
        bioLengthCP: { $strLenCP: "$bio" }
      }
    }
  ])

  // Output
  [
    { "_id": 1, "bioLengthCP": 40 },
    { "_id": 2, "bioLengthCP": 44 },
    { "_id": 3, "bioLengthCP": 47 }
  ]


$regexReplace

  • The $regexReplace operator allows you to replace portions of a string that match a regular expression with a specified replacement string.
  • Example: Replace all instances of the word "and" with "&" using a regular expression:

  db.students.aggregate([
    {
      $project: {
        updatedBio: {
          $regexReplace: {
            input: "$bio",
            regex: "and",
            replacement: "&"
          }
        }
      }
    }
  ])

  // Output
  [
    { "_id": 1, "updatedBio": "John loves programming & playing guitar" },
    { "_id": 2, "updatedBio": "Jane enjoys swimming, reading, & writing" },
    { "_id": 3, "updatedBio": "David is passionate about basketball & coding" }
  ]




No comments:

Post a Comment

Date and Time related aggregation functions ($year, $month, $dayOfMonth, $hour, $minute, $second, $dayOfWeek, $dateToString, $dateSubtract, $dateAdd, $isoWeek, $isoDayOfWeek, $dateTrunc, $dateFromString, $dateToParts, $dateFromParts, $dateDiff, $week)

In this blog post, we will explore Date/Time-related Aggregation Functions in MongoDB. MongoDB provides a robust set of aggregation operator...