- Reference >
- Operators >
- Aggregation Pipeline Operators >
- $sampleRate (aggregation)
$sampleRate (aggregation)¶
On this page
Definition¶
-
$sampleRate
¶ New in version 4.4.2.
Matches a random selection of input documents. The number of documents selected approximates the sample rate expressed as a percentage of the total number of documents.
The
$sampleRate
operator has the following syntax:
Behavior¶
The selection process uses a uniform random distribution. The sample rate is a floating point number between 0 and 1, inclusive, which represents the probability that a given document will be selected as it passes through the pipeline.
For example, a sample rate of 0.33
selects roughly one document in
three.
This expression:
is equivalent to using the $rand
operator as follows:
Repeated runs on the same data will produce different outcomes since the selection process is non-deterministic. In general, smaller datasets will show more variability in the number of documents selected on each run. As collection size increases, the number of documents chosen will approach the expected value for a uniform random distribution.
Note
If an exact number of documents is required from each run, the
$sample
operator should be used instead of $sampleRate
.
Examples¶
This code creates a small collection with 100 documents.
The $sampleRate
operator can be used in a pipeline to select random
documents from the collection. In this example we use $sampleRate
to select about one third of the documents.
This is the output from 5 runs on the sample collection: