- Aggregation Operations >
- Map-Reduce >
- Map-Reduce to Aggregation Pipeline
Map-Reduce to Aggregation Pipeline¶
An aggregation pipeline provides better performance and usability than a map-reduce operation.
Map-reduce operations can be rewritten using aggregation pipeline
operators, such as
$group, $merge, and others.
For map-reduce operations that require custom functionality, MongoDB
provides the $accumulator and $function
aggregation operators starting in version 4.4. Use these operators to
define custom aggregation expressions in JavaScript.
Map-reduce expressions can be re-written as shown in the following sections.
Map-Reduce to Aggregation Pipeline Translation Table¶
The table is only an approximate translation. For instance, the table
shows an approximate translation of mapFunction using the
$project.
However, the
mapFunctionlogic may require additional stages, such as if the logic includes iteration over an array:Then, the aggregation pipeline includes an
$unwindand a$project:The
emitsfield in$projectmay be named something else. For visual comparison, the field nameemitswas chosen.
| Map-Reduce | Aggregation Pipeline |
|---|---|
db.collection.mapReduce(
{
query: <queryFilter>,
sort: <sortOrder>,
limit: <number>,
finalize: :highlight-green:`<finalizeFunction>`,
}
)
|
db.collection.aggregate( [
{ $match: <queryFilter> },
{ $sort: <sortOrder> },
{ $limit: <number> },
_id: “$emits.k”},
value: { :highlight-yellow:`$accumulator`: {
init: <initCode>,
accumulate: :highlight-yellow:`<reduceFunction>`,
accumulateArgs: [ “$emit.v”],
finalize: :highlight-green:`<finalizeFunction>`,
lang: “js” }}
} },
{ $out: :highlight-blue:`<collection>` }
] )
|
db.collection.mapReduce(
{
query: <queryFilter>,
sort: <sortOrder>,
limit: <number>,
finalize: :highlight-green:`<finalizeFunction>`,
}
)
|
db.collection.aggregate( [
{ $match: <queryFilter> },
{ $sort: <sortOrder> },
{ $limit: <number> },
_id: “$emits.k”},
value: { :highlight-yellow:`$accumulator`: {
init: <initCode>,
accumulate: :highlight-yellow:`<reduceFunction>`,
accumulateArgs: [ “$emit.v”],
finalize: :highlight-green:`<finalizeFunction>`,
lang: “js” }}
} },
{ $out: { :highlight-blue:`db: <db>, coll: <collection>` } }
] )
|
db.collection.mapReduce(
{
query: <queryFilter>,
sort: <sortOrder>,
limit: <number>,
finalize: :highlight-green:`<finalizeFunction>`,
}
)
|
db.collection.aggregate( [
{ $match: <queryFilter> },
{ $sort: <sortOrder> },
{ $limit: <number> },
_id: “$emits.k”},
value: { :highlight-yellow:`$accumulator`: {
init: <initCode>,
accumulate: :highlight-yellow:`<reduceFunction>`,
accumulateArgs: [ “$emit.v”],
finalize: :highlight-green:`<finalizeFunction>`,
lang: “js” }}
} },
on: “_id”
whenMatched: “replace”,
whenNotMatched: “insert”
} },
] )
|
db.collection.mapReduce(
{
query: <queryFilter>,
sort: <sortOrder>,
limit: <number>,
finalize: :highlight-green:`<finalizeFunction>`,
}
)
|
db.collection.aggregate( [
{ $match: <queryFilter> },
{ $sort: <sortOrder> },
{ $limit: <number> },
_id: “$emits.k”},
value: { :highlight-yellow:`$accumulator`: {
init: <initCode>,
accumulate: :highlight-yellow:`<reduceFunction>`,
accumulateArgs: [ “$emit.v”],
finalize: :highlight-green:`<finalizeFunction>`,
lang: “js” }}
} },
on: “_id”
whenMatched: [
{ $project: {
value: { $function: {
args: [
“$_id”,
[ “$value”, “$$new.value” ]
],
lang: “js”
} }
} }
]
whenNotMatched: “insert”
} },
] )
|
db.collection.mapReduce(
{
query: <queryFilter>,
sort: <sortOrder>,
limit: <number>,
finalize: :highlight-green:`<finalizeFunction>`,
out: { inline: 1 }
}
)
|
db.collection.aggregate( [
{ $match: <queryFilter> },
{ $sort: <sortOrder> },
{ $limit: <number> },
_id: “$emits.k”},
value: { :highlight-yellow:`$accumulator`: {
init: <initCode>,
accumulate: :highlight-yellow:`<reduceFunction>`,
accumulateArgs: [ “$emit.v”],
finalize: :highlight-green:`<finalizeFunction>`,
lang: “js” }}
} }
] )
|
Examples¶
Various map-reduce expressions can be rewritten using aggregation
pipeline operators, such as
$group, $merge, and others, without requiring
custom functions. However, for illustrative purposes, the following
examples provide both alternatives.
Example 1¶
The following map-reduce operation on the orders collection groups
by the cust_id, and calculates the sum of the price for each
cust_id:
Alternative 1: (Recommended) You can rewrite the operation into an aggregation pipeline without translating the map-reduce function to equivalent pipeline stages:
Alternative 2: (For illustrative purposes only) The
following aggregation pipeline provides a translation of the various
map-reduce functions, using $accumulator to define custom
functions:
First, the
$projectstage outputs documents with anemitfield. Theemitfield is a document with the fields:keythat contains thecust_idvalue for the documentvaluethat contains thepricevalue for the document
Then, the
$groupuses the$accumulatoroperator to add the emitted values:Finally, the
$outwrites the output to the collectionagg_alternative_2. Alternatively, you could use$mergeinstead of$out.
Example 2¶
The following map-reduce operation on the orders collection
groups by the item.sku field and calculates the number of
orders and the total quantity ordered for each sku. The operation
then calculates the average quantity per order for each sku value
and merges the results into the output collection.
Alternative 1: (Recommended) You can rewrite the operation into an aggregation pipeline without translating the map-reduce function to equivalent pipeline stages:
Alternative 2: (For illustrative purposes only) The following
aggregation pipeline provides a translation of the various
map-reduce functions, using $accumulator to define custom
functions:
The
$matchstage selects only those documents withord_dategreater than or equal tonew Date("2020-03-01").The
$unwindstage breaks down the document by theitemsarray field to output a document for each array element. For example:The
$projectstage outputs documents with anemitfield. Theemitfield is a document with the fields:keythat contains theitems.skuvaluevaluethat contains a document with theqtyvalue and acountvalue
The
$groupuses the$accumulatoroperator to add the emittedcountandqtyand calculate theavgfield:Finally, the
$mergewrites the output to the collectionagg_alternative_4. If an existing document has the same key_idas the new result, the operation overwrites the existing document. If there is no existing document with the same key, the operation inserts the document.
See also