- Reference >
- Operators >
- Aggregation Pipeline Operators >
- $regexFind (aggregation)
$regexFind (aggregation)¶
On this page
Definition¶
-
$regexFind¶ New in version 4.2.
Provides regular expression (regex) pattern matching capability in aggregation expressions. If a match is found, returns a document that contains information on the first match. If a match is not found, returns null.
MongoDB uses Perl compatible regular expressions (i.e. “PCRE” ) version 8.41 with UTF-8 support.
Prior to MongoDB 4.2, aggregation pipeline can only use the query operator
$regexin the$matchstage. For more information on using regex in a query, see$regex.
Syntax¶
The $regexFind operator has the following syntax:
Operator Fields¶
| Field | Description | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| input | The string on which you wish to apply the regex pattern. Can be a string or any valid expression that resolves to a string. |
||||||||||
| regex | The regex pattern to apply. Can be any valid expression that resolves to either a string or regex
pattern
Alternatively, you can also specify the regex options with the
options field. To specify the You cannot specify options in both the |
||||||||||
| options | Optional. The following Note You cannot specify options in both the
|
Returns¶
If the operator does not find a match, the result of the operator is a
null.
If the operator finds a match, the result of the operator is a document that contains:
- the first matching string in the input,
- the code point index (not byte index) of the matching string in the input, and
- An array of the strings that corresponds to the groups captured by
the matching string. Capturing groups are specified with
unescaped parenthesis
()in the regex pattern.
See also
Behavior¶
$regexFind and Collation¶
$regexFind ignores the collation specified for the
collection, db.collection.aggregate(), and the index, if used.
For example, the create a sample collection with collation strength
1 (i.e. compare base character only and ignore other differences
such as case and diacritics):
Insert the following documents:
Using the collection’s collation, the following operation performs a case-insensitive and diacritic-insensitive match:
The operation returns the following 3 documents:
However, the aggregation expression $regexFind ignores
collation; that is, the following regular expression pattern matching examples
are case-sensitive and diacritic sensitive:
Both operations return the following:
To perform a case-insensitive regex pattern matching, use the i Option instead. See i Option for an example.
captures Output Behavior¶
If your regex pattern contains capture groups
and the pattern finds a match in the input, the
captures array in the results corresponds to the groups captured by
the matching string. Capture groups are specified with unescaped
parentheses () in the regex pattern. The
length of the captures array equals the number of capture groups in
the pattern and the order of the array matches the order in which the
capture groups appear.
Create a sample collection named contacts with the following
documents:
The following pipeline applies the regex
pattern /(C(ar)*)ol/ to the fname field:
The regex pattern finds a match with fname
values Carol and Colleen:
The pattern contains the capture group (C(ar)*) which contains the
nested group (ar). The elements in the captures array correspond
to the two capture groups. If a matching document is not captured by a
group (e.g. Colleen and the group (ar)),
$regexFind replaces the group with a null placeholder.
As shown in the previous example, the captures array contains an
element for each capture group (using null for non-captures).
Consider the following example which searches for phone numbers with
New York City area codes by applying a logical or of capture
groups to the phone field. Each group represents a New York City
area code:
For documents which are matched by the regex
pattern, the captures array includes the matching capture group
and replaces any non-capturing groups with null:
Examples¶
$regexFind and Its Options¶
To illustrate the behavior of the $regexFind operator as
discussed in this example, create a sample collection products with
the following documents:
By default, $regexFind performs a case-sensitive match.
For example, the following aggregation performs a case-sensitive
$regexFind on the description field. The regex
pattern /line/ does not specify any grouping:
The operation returns the following:
The following regex pattern /lin(e|k)/ specifies a grouping
(e|k) in the pattern:
The operation returns the following:
In the return option, the idx field is the code point index and not the byte
index. To illustrate, consider the following example that uses the
regex pattern /tier/:
The operation returns the following where only the last record
matches the pattern and the returned idx is 2 (instead of 3
if using a byte index)
i Option¶
Note
You cannot specify options in both the regex and the
options field.
To perform case-insensitive pattern matching, include the i option as part of the regex field or in the options field:
For example, the following aggregation performs a case-insensitive
$regexFind on the description field. The regex
pattern /line/ does not specify any grouping:
The operation returns the following documents:
m Option¶
Note
You cannot specify options in both the regex and the
options field.
To match the specified anchors (e.g. ^, $) for each line of a
multiline string, include the m option as
part of the regex field or in the
options field:
The following example includes both the i and the m options to
match lines starting with either the letter s or S for
multiline strings:
The operation returns the following:
x Option¶
Note
You cannot specify options in both the regex and the
options field.
To ignore all unescaped white space characters and comments (denoted by
the un-escaped hash # character and the next new-line character) in
the pattern, include the s option in the
options field:
The following example includes the x option to skip unescaped white
spaces and comments:
The operation returns the following:
s Option¶
Note
You cannot specify options in both the regex and the
options field.
To allow the dot character (i.e. .) in the pattern to match all
characters including the new line character, include the s option in the options
field:
The following example includes the s option to allow the dot
character (i.e. .) to match all characters including new line as well
as the i option to perform a case-insensitive match:
The operation returns the following:
Use $regexFind to Parse Email from String¶
Create a sample collection feedback with the following documents:
The following aggregation uses the $regexFind to extract
the email from the comment field (case insensitive).
- First Stage
The stage uses the
$addFieldsstage to add a new fieldemailto the document. The new field contains the result of performing the$regexFindon thecommentfield:- Second Stage
The stage use the
$setstage to reset theemailto the current"$email.match"value. If the current value ofemailis null, the new value ofemailis set to null.
Apply $regexFind to String Elements of an Array¶
Create a sample collection contacts with the following documents:
The following aggregation uses the $regexFind to convert
the details array into an embedded document with an email and
phone fields:
- First Stage
The stage
$unwindsthe array into separate documents:- Second Stage
The stage uses the
$addFieldsstage to add new fields to the document that contains the result of the$regexFindfor phone number and email:- Third Stage
The stage use the
$projectstage to output documents with the_idfield, thenamefield and thedetailsfield. Thedetailsfield is set to a document withemailandphonefields, whose values are determined from theregexemailandregexphonefields, respectively.- Fourth Stage
The stage uses the
$groupstage to groups the input documents by their_idvalue. The stage uses the$mergeObjectsexpression to merge thedetailsdocuments.- Fifth Stage
The stage uses the
$sortstage to sort the documents by the_idfield.
Use Captured Groupings to Parse User Name¶
Create a sample collection employees with the following documents:
The employee email has the format
<firstname>.<lastname>@example.com. Using the captured field
returned in the $regexFind results, you can parse out
user names for employees.
- First Stage
The stage uses the
$addFieldsstage to add a new fieldusernameto the document. The new field contains the result of performing the$regexFindon theemailfield:- Second Stage
The stage use the
$setstage to reset theusernameto the zero-th element of the"$username.captures"array. If the current value ofusernameis null, the new value ofusernameis set to null.
See also
For more information on the behavior of the captures array and
additional examples, see
captures Output Behavior.