- Let's go through the string-related aggregation functions in MongoDB, explaining each one with examples. For simplicity, let's use a sample collection of students that includes details like their names, subjects, and a bio description.
- Sample Dataset
db.students.insertMany([
{
_id: 1,
name: "John Doe",
bio: "John loves programming and playing guitar",
subjects: ["Math", "Science", "History"]
},
{
_id: 2,
name: "Jane Smith",
bio: "Jane enjoys swimming, reading, and writing",
subjects: ["English", "Math", "Geography"]
},
{
_id: 3,
name: "David Green",
bio: "David is passionate about basketball and coding",
subjects: ["Science", "Math", "History"]
}
])
$concat
- The $concat operator concatenates (joins) two or more strings into a single string.
- Example: Concatenate the name field with the bio field to create a full description for each student:
db.students.aggregate([
{
$project: {
fullDescription: { $concat: ["$name", " - ", "$bio"] }
}
}
])
// Output
[
{
"_id": 1,
"fullDescription": "John Doe - John loves programming and playing guitar"
},
{
"_id": 2,
"fullDescription": "Jane Smith - Jane enjoys swimming, reading, and writing"
},
{
"_id": 3,
"fullDescription": "David Green - David is passionate about basketball and coding"
}
]
$substr
- The $substr operator extracts a substring from a string, based on a starting position and the number of characters to include.
- Example: Extract the first 5 characters from each student's bio:
db.students.aggregate([
{
$project: {
shortBio: { $substr: ["$bio", 0, 5] }
}
}
])
// Output
[
{ "_id": 1, "shortBio": "John " },
{ "_id": 2, "shortBio": "Jane " },
{ "_id": 3, "shortBio": "David" }
]
$toUpper
- The $toUpper operator converts a string to uppercase.
- Example: Convert each student's name to uppercase:
db.students.aggregate([
{
$project: {
nameUpperCase: { $toUpper: "$name" }
}
}
])
// Output
[
{ "_id": 1, "nameUpperCase": "JOHN DOE" },
{ "_id": 2, "nameUpperCase": "JANE SMITH" },
{ "_id": 3, "nameUpperCase": "DAVID GREEN" }
]
$toLower
- The $toLower operator converts a string to lowercase.
- Example: Convert each student's bio to lowercase:
db.students.aggregate([
{
$project: {
bioLowerCase: { $toLower: "$bio" }
}
}
])
// Output
[
{
"_id": 1,
"bioLowerCase": "john loves programming and playing guitar"
},
{
"_id": 2,
"bioLowerCase": "jane enjoys swimming, reading, and writing"
},
{
"_id": 3,
"bioLowerCase": "david is passionate about basketball and coding"
}
]
$strcasecmp
- The $strcasecmp operator compares two strings in a case-insensitive manner. It returns:
- 0 if the strings are equal,
- 1 if the first string is greater,
- -1 if the first string is smaller.
- Example: Compare the names of two students:
db.students.aggregate([
{
$project: {
nameComparison: { $strcasecmp: ["$name", "John Doe"] }
}
}
])
// Output
[
{ _id: 1, nameComparison: 0 },
{ _id: 2, nameComparison: -1 },
{ _id: 3, nameComparison: -1 }
]
$indexOfBytes
- The $indexOfBytes operator returns the position (index) of the first occurrence of a substring within a string. It works on the byte level, so it's useful for ASCII-based searches.
- Example: Find the index of the word "coding" in the bio field:
db.students.aggregate([
{
$project: {
indexOfCoding: { $indexOfBytes: ["$bio", "coding"] }
}
}
])
// Output
[
{ "_id": 1, "indexOfCoding": -1 }, // Not found in John's bio
{ "_id": 2, "indexOfCoding": -1 }, // Not found in Jane's bio
{ "_id": 3, "indexOfCoding": 40 } // Found in David's bio at index 40
]
$indexOfCP
- The $indexOfCP operator also finds the position of a substring, but it works on the code points of characters (useful for Unicode).
- Example: Find the index of the word "reading" in Jane's bio:
db.students.aggregate([
{
$project: {
indexOfReading: { $indexOfCP: ["$bio", "reading"] }
}
}
])
// Output
[
{ "_id": 1, "indexOfReading": -1 }, // Not found in John's bio
{ "_id": 2, "indexOfReading": 20 }, // Found in Jane's bio at index 20
{ "_id": 3, "indexOfReading": -1 } // Not found in David's bio
]
- The $split operator splits a string into an array of substrings based on a delimiter.
- Example: Split the bio field into individual words:
db.students.aggregate([
{
$project: {
bioWords: { $split: ["$bio", " "] }
}
}
])
// Output
[
{
"_id": 1,
"bioWords": ["John", "loves", "programming", "and", "playing", "guitar"]
},
{
"_id": 2,
"bioWords": ["Jane", "enjoys", "swimming,", "reading,", "and", "writing"]
},
{
"_id": 3,
"bioWords": ["David", "is", "passionate", "about", "basketball", "and", "coding"]
}
]
- The $regexFind operator applies a regular expression and returns information about the first match it finds.
- Example: Find the first word in the bio field that starts with the letter 'p':
db.students.aggregate([
{
$project: {
firstPWord: { $regexFind: { input: "$bio", regex: "\\bp\\w+" } }
}
}
])
// Output
[
{
"_id": 1,
"firstPWord": { "match": "programming", "idx": 10 }
},
{
"_id": 2,
"firstPWord": null // No word starting with 'p'
},
{
"_id": 3,
"firstPWord": { "match": "passionate", "idx": 8 }
}
]
- The $regexFindAll operator applies a regular expression and returns all matches found in the input string.
- Example: Find all words in the bio that start with the letter 'a':
db.students.aggregate([
{
$project: {
allAWords: { $regexFindAll: { input: "$bio", regex: "\\ba\\w+" } }
}
}
])
// Output
[
{
_id: 1,
allAWords: [
{ match: 'and', idx: 23, captures: [] }
]
},
{
_id: 2,
allAWords: [
{ match: 'and', idx: 31, captures: [] }
]
},
{
_id: 3,
allAWords: [
{ match: 'about', idx: 20, captures: [] },
{ match: 'and', idx: 37, captures: [] }
]
}
]
- The $regexMatch operator checks if the input string matches a specified regular expression and returns a boolean value.
- Example: Check if the word "programming" is present in the bio:
db.students.aggregate([
{
$project: {
hasProgramming: {
$regexMatch: { input: "$bio", regex: "programming" }
}
}
}
])
// Output
[
{ _id: 1, hasProgramming: true },
{ _id: 2, hasProgramming: false },
{ _id: 3, hasProgramming: false }
]
- The $trim operator removes specified characters from the beginning and end of a string. By default, it removes whitespace, but you can specify which characters to remove.
- Example: Trim whitespace from the beginning and end of the bio field:
db.students.aggregate([
{
$project: {
trimmedBio: { $trim: { input: "$bio" } }
}
}
])
- You can also specify characters to trim, such as commas, periods, etc.:
db.students.aggregate([
{
$project: {
trimmedBio: { $trim: { input: "$bio", chars: "," } }
}
}
])
- The $ltrim operator removes specified characters from the beginning (left side) of a string. Similar to $trim, it removes whitespace by default unless other characters are specified.
- Example: Trim only leading whitespace from the bio field:
db.students.aggregate([
{
$project: {
leftTrimmedBio: { $ltrim: { input: "$bio" } }
}
}
])
- The $rtrim operator removes specified characters from the end (right side) of a string. By default, it trims whitespace, but you can specify other characters.
- Example: Trim trailing whitespace from the bio field:
db.students.aggregate([
{
$project: {
rightTrimmedBio: { $rtrim: { input: "$bio" } }
}
}
])
- The $replaceOne operator replaces the first occurrence of a specified string with another string in a given input.
- Example: Replace the first occurrence of "and" with "&" in the bio field:
db.students.aggregate([
{
$project: {
updatedBio: {
$replaceOne: {
input: "$bio", find: "and", replacement: "&"
}
}
}
}
])
// Output
[
{ "_id": 1, "updatedBio": "John loves programming & playing guitar" },
{ "_id": 2, "updatedBio": "Jane enjoys swimming, reading, & writing" },
{ "_id": 3, "updatedBio": "David is passionate about basketball & coding" }
]
- The $replaceAll operator replaces all occurrences of a specified string with another string in a given input.
- Example: Replace all occurrences of "and" with "&" in the bio field:
db.students.aggregate([
{
$project: {
updatedBio: {
$replaceAll: {
input: "$bio", find: "and", replacement: "&"
}
}
}
}
])
// Output
[
{ "_id": 1, "updatedBio": "John loves programming & playing guitar" },
{ "_id": 2, "updatedBio": "Jane enjoys swimming, reading, & writing" },
{ "_id": 3, "updatedBio": "David is passionate about basketball & coding" }
]
- The $strLenBytes operator returns the length of a string in bytes. This is useful for applications where you need to account for byte-based encodings.
- Example: Calculate the length of each bio string in bytes:
db.students.aggregate([
{
$project: {
bioLengthBytes: { $strLenBytes: "$bio" }
}
}
])
// Output
[
{ "_id": 1, "bioLengthBytes": 40 },
{ "_id": 2, "bioLengthBytes": 44 },
{ "_id": 3, "bioLengthBytes": 47 }
]
- The $strLenCP operator returns the length of a string in code points (Unicode code points). This is useful for Unicode strings where a character might be more than 1 byte.
- Example: Calculate the length of each bio string in code points:
db.students.aggregate([
{
$project: {
bioLengthCP: { $strLenCP: "$bio" }
}
}
])
// Output
[
{ "_id": 1, "bioLengthCP": 40 },
{ "_id": 2, "bioLengthCP": 44 },
{ "_id": 3, "bioLengthCP": 47 }
]
- The $regexReplace operator allows you to replace portions of a string that match a regular expression with a specified replacement string.
- Example: Replace all instances of the word "and" with "&" using a regular expression:
db.students.aggregate([
{
$project: {
updatedBio: {
$regexReplace: {
input: "$bio",
regex: "and",
replacement: "&"
}
}
}
}
])
// Output
[
{ "_id": 1, "updatedBio": "John loves programming & playing guitar" },
{ "_id": 2, "updatedBio": "Jane enjoys swimming, reading, & writing" },
{ "_id": 3, "updatedBio": "David is passionate about basketball & coding" }
]
No comments:
Post a Comment