introduction to
mongo aggregation framework
-- By Mrugen. D.
twitter : @mrugenmike
blog: http://computinggreenhorn.blogspot.in/
What's Aggregation ?
Web definitions
Collection: several things grouped together or considered as a whole.OR (More Technical Definition)
Processing of values of multiple rows which are grouped together as input on certain criteria to form a single value.
Aggregation in "SQL" World
O_Id OrderDate OrderPrice Customer
1 2008/11/12 1000 Hansen
2 2008/10/23 1600 Nilsen
3 2008/09/02 700 Hansen
4 2008/09/03 300 Hansen
5 2008/08/30 2000 Jensen
6 2008/10/04 100 Nilsen
SELECT AVG(OrderPrice) AS OrderAverage FROM Orders
OrderAverage
950
Aggregation in mongo
-> Has Roots in the "Group By" clause from the
SQL .
-> Helps Developers to do operations like:
*Summation Of Values.
*Calculating Averages.
*Projecting and manipulating results.
Framework components
1. *Aggregation Pipeline .
2. Aggregation Operators/Stages.
*Data Does flow through it! :)
Aggregation pipeline
Similar to Unix pipelines:
eg: ls -1 | wc -l

Aggregation Operators
1. $project. 2. $match
3. $limit 4. $skip
5. $unwind 6. $group
7. $sort
$project operator
Characteristics:
-> It helps to selects keys of interest.
-> Helps reshape documents.
-> It has "1:1" mapping for input <-->output
Dataset:

$Project in action
db.products.aggregate([{
$project:{
name:1,
_id:0
}
}]);
Result

$match operator
Characteristics:
-> Helps to filter documents matched by a criteria.
-> Filtered out docs don't make it to next stage.
-> Preferably to be used as early as possible in the pipeline so as to leverage indexes like normal queries.
$match example
db.products.aggregate([{
$match:{
manufacturer:"apple",
}
}]);
$match Result

$limit Operator
-> It limits JSON Documents passing through it.
-> It takes a single numeric value as a parameter.
Example:
db.products.aggregate([{
$limit: 2
}]);
Limit Result

$skip operator
-> Will Skip over specified number of documents.
-> Generally will be used in conjunction with $limit
Example:
db.products.aggregate([{
$skip: 4
}]);
Skip Result

$unwind operator
Characteristics:
-> Peels of the elements of the Array.
-> Will Return a document with Every Element of the unwound Array.
Unwind Scenario
Scenario: Find tag counts in first 100 documents in posts collections.
db.posts.aggregate([{
$limit: 100,
},{$project:{
"tags":1,
"_id":0
}},{
$unwind :"$tags"
},{
$group:{"_id":"$tags","count":{$sum:1}},
}]);Unwind result

$group operator
Characteristics:
--> Groups documents to calculate aggregate values
--> Starts by having an "_id" field, which is holds grouping clause.
$grouping Example
Scenario
We would like to know products per manufacturer
How do we do it
?
Grouping to Rescue
db.products.aggregate([
{
"$group":{"_id":"$manufacturer","count":{$sum:1}}
}]);
Combined Grouping

$sort
Sorts all I/P documents and outputs them in sorted order
Uses a sort key to do so
Can use indexes
establishes1:1 relation between I/P & O/P documents
Sort Scenario
--> We would like sort cities in state of "New York" according to the population in descending order.
db.zips.aggregate([
{
$match:{"state":"NY"}
},
{
$sort:{"pop":-1}
}
])
Aggregation Expressions
$sum: sums values of field ,alternatively increments field value by the specified number.
$avg: Return average values of the field in all documents selected by the group.
$max/min: Return max/min values of the field in all documents selected by the group.
$first/last:Return first/last value encountered in the group.
$push:Return an array of values found in selected field of the group.
$addToSet:Similar to $push but will not take duplicate values in array.
Limitations of aggregation framework
-
Result-set document can only be of max 16 MB.
- Can utilize only 10% of system memory.
- Can slow down applications in a sharded environments.
-> Mongo-MapReduce.
-> Hadoop with mongo connector.
introduction to mongo aggregation framework
By mrugenmike
introduction to mongo aggregation framework
- 1,367
