introduction     to

mongo    aggregation   framework


                 -- By Mrugen. D.
                   twitter : @mrugenmike

             blog:  http://computinggreenhorn.blogspot.in/

                 


What's Aggregation ?


Web definitions
Collection: several things grouped together or considered as a whole.

OR (More Technical Definition)

    Processing  of values of multiple rows which  are grouped together as input on certain criteria to form a single value.


Aggregation in  "SQL" World


    O_Id     OrderDate 	OrderPrice 	Customer
1 	2008/11/12 	1000 	         Hansen
2 	2008/10/23 	1600 	         Nilsen
3 	2008/09/02 	700 	         Hansen
4 	2008/09/03 	300 	         Hansen
5 	2008/08/30 	2000 	         Jensen
6 	2008/10/04 	100 	         Nilsen
    

SELECT AVG(OrderPrice) AS OrderAverage FROM Orders

OrderAverage
950

Aggregation in mongo

-> Has Roots in the "Group By" clause from the
SQL .
-> Helps Developers to do operations like: 

                      *Summation Of Values.

                      *Calculating Averages.

                      *Projecting and manipulating results.

Framework components



   1.     *Aggregation  Pipeline  . 

   2.     Aggregation Operators/Stages.




 *Data Does flow through it!  :)



Aggregation pipeline

Similar to Unix pipelines:
                                 eg: ls -1 | wc -l                     


Aggregation Operators


         1.  $project.                        2.   $match 

         3.   $limit                              4.  $skip
          
         5. $unwind                          6. $group
         
                                        7. $sort








$project operator

Characteristics: 

 -> It helps to selects keys of interest.

-> Helps reshape documents.

-> It has  "1:1" mapping for input <-->output




Dataset:



$Project in action


db.products.aggregate([{
$project:{
name:1,
_id:0
}
}]);

Result


$match operator

Characteristics: 

    -> Helps to filter documents matched by a criteria.

     -> Filtered out docs don't make it to next stage.
     
      -> Preferably to be used as early as possible in the pipeline so as to leverage indexes like normal queries.

$match example


db.products.aggregate([{
$match:{
manufacturer:"apple",
}
}]);

$match Result



$limit Operator


   -> It limits JSON Documents passing through it.


    -> It takes a single numeric value as a parameter.
                                         Example: 
 
 db.products.aggregate([{
$limit: 2
}]);

Limit Result



$skip operator

-> Will Skip over specified number of documents.

-> Generally will be used in conjunction with $limit

      Example:    
                                                      db.products.aggregate([{
$skip: 4 
}]);

Skip Result



$unwind operator

   Characteristics:

     ->  Peels of the elements of the Array.
     -> Will Return a document with Every Element of the unwound Array.


Unwind Scenario


Scenario: Find tag counts in first 100 documents in posts collections.
db.posts.aggregate([{
$limit: 100,
},{$project:{
"tags":1,
"_id":0
}},{
$unwind :"$tags" 
},{
$group:{"_id":"$tags","count":{$sum:1}},
}]);

Unwind result



$group operator

Characteristics:

--> Groups documents to calculate aggregate values
 --> Starts by having an "_id" field, which is holds grouping clause. 

$grouping Example

Scenario

 We would like to know products per manufacturer 


How do we do it

?

Grouping to Rescue

db.products.aggregate([
{
"$group":{"_id":"$manufacturer","count":{$sum:1}}
}]);

Combined Grouping


$sort

Sorts all I/P documents and outputs them in sorted order
Uses a sort key to do so
Can use indexes 
establishes1:1 relation between I/P & O/P documents 

Sort Scenario

--> We would like sort cities in state of "New York" according to the population in descending order.

db.zips.aggregate([
{
$match:{"state":"NY"} 
},
{
$sort:{"pop":-1}
}
])

Aggregation Expressions

$sum:  sums values of field ,alternatively increments field value by the specified number.
$avg: Return average values of the field in all documents selected by the group.
$max/min: Return max/min values of the field in all documents selected by the group.
$first/last:Return first/last value encountered in the group.
$push:Return an array of values found in selected field of the group.
$addToSet:Similar to $push but will not take duplicate values in array.


Limitations of aggregation framework

  1. Result-set document can only be of max 16 MB.
  2. Can utilize only 10% of system memory.
  3. Can slow down applications in a sharded environments.

Suggested Alternatives :

            -> Mongo-MapReduce.

            -> Hadoop with mongo connector.
 


introduction to mongo aggregation framework

By mrugenmike

introduction to mongo aggregation framework

  • 1,367