Aggregation Framework Example

(also posted to the 10gen blog here)

In this blog post, you run a concise set of aggregation framework examples on the mongo Javascript shell against a MongoLab hosted 2.2 database.  The framework includes the aggregation operators $project, $unwind, $group, and others.  These operators allow you to calculate values across documents in a collection, like averages and sums.  They also let you reshape documents, unpacking nested structures and regrouping them as needed.

The aggregation framework, one of the most powerful and highly anticipated features in the forthcoming production MongoDB 2.2 release, lets you construct a server-side processing pipeline to be run on a collection.  A rich set of operations are available for incorporation in the pipeline so as to achieve various kinds of collection transforms, ranging from simple multi-document calculations (e.g., sums and averages) to complex projections and pivots.

The framework fits nicely in a range of data manipulation tools available in MongoDB from basic built-in functions like document counts to map-reduce and Javascript, to custom code and language-specific packages, including Hadoop.

Overview

  1. Create a 2.2 MongoLab database with your own unique name, say <myaggdemo>.  Instructions on how to do that are here. You'll need your mongod username and password.
  2. On your database's home page, copy the mongo shell connection to your clipboard.
  3. git clone git://gist.github.com/1401585.git aggdemo ; cd aggdemo
  4. Edit articles.js and aggregation.js to use the your db <myaggdemo> (see below)
  5. mongo <your connection> -u <mongod username> -p <mongod password> articles.js  (inserts the data into your database, 3 documents)
  6. mongo --shell <your connection> -u <mongod username> -p <mongod password> aggregation.js (performs several aggregation examples and leaves you in the mongo shell.)
  7. Type g1 in the mongo shell to see the first $group result discussed below.

(I've tested this to work with the production 2.0.6 mongo client, and the latest development 2.1.2 mongo client.)

Code snippets

articles.js

/* sample articles for aggregation demonstrations */

// make sure we're using the right db; this is the same as "use mydb;" in shell
db = db.getSiblingDB("aggdb"); //Put your MongoLab database name here.
db.article.drop();

db.article.save( {
    title : "this is my title" ,
    author : "bob" ,
    posted : new Date(1079895594000) ,
    pageViews : 5 ,
    tags : [ "fun" , "good" , "fun" ] ,
    comments : [
        { author :"joe" , text : "this is cool" } ,
        { author :"sam" , text : "this is bad" }
    ],
    other : { foo : 5 }
});
//...snip

aggregation.js

// make sure we're using the right db; this is the same as "use aggdb;" in shell
db = db.getSiblingDB("aggdb"); //Put your MongoLab database name here.
// ...snip...
// grouping
var g1 = db.runCommand(
    { aggregate : "article", pipeline : [
        { $project : {
            author : 1,
            tags : 1,
            pageViews : 1
        }},
        { $unwind : "$tags" },
        { $group : {
            _id : "$tags",
            docsByTag : { $sum : 1 },
            viewsByTag : { $sum : "$pageViews" },
            mostViewsByTag : { $max : "$pageViews" },
            avgByTag : { $avg : "$pageViews" }
        }}
    ]});
// ...snip

g1 aggregation result

{
	"result" : [
//...snip...
		{
			"_id" : "fun",
			"docsByTag" : 3,
			"viewsByTag" : 17,
			"mostViewsByTag" : 7,
			"avgByTag" : 5.666666666666667
		}
	],
	"Ok" : 1
}
  • Props to Chris Westin, 10gen architect for the aggregation framework for providing these examples
  • See also his presentation here.

Discussion

The results of the aggregation are saved to convenient variables for examination. The group operations (g1 and g5) at the end of the aggregation.js file are noteworthy because they rollup three operators into a common pivot and aggregation example. The g1 data flow is shown above.  Click it for a larger .png version or here for a .pdf version.

  1. Collection -> Intermediate-1: First using the initial Collection of documents as input, g1 uses a $project to filter the document list's fields to only include author, tags, and pageViews fields. The output is shown in Intermediate-1.
  2. Intermediate-1 -> Intermediate-2: Then g1 $unwinds Intermediate-1 by the embedded tags array so that each tag instance its own document with the output shown in Intermediate-2.
  3. Intermediate-2 -> Result: Then g1 uses the $group operator to create a list of documents by each tag instance, calculating statistics like total and average page views, shown as Result.

(Note that both Intermediate forms are internal to the processing engine and are not visible to the shell directly; Intermediate-2 is actually shown as example p2.)

For another example, you can look at g5. It also pivots on the embedded tag arrays but this time rolls up authors as embedded arrays using $addToSet, essentially completing the pivot.

NB: There's a slight bug in the design of the g1 aggregation.  The first object has the "fun" tag twice.  I intentionally chose this one as it shows how the $unwind duplicates "fun" in the Intermediate-2 output for the first document, meaning that its aggregates are counted twice.  A free MongoLab T-shirt to the first person who can correct the code to properly calculate the aggregates.  Enter in the comments.  (@cwestin63, you're disqualified; you get a T-shirt anyway)

Summary

The MongoDB 2.2 Aggregation Framework is a powerful mechanism that can help you answer questions across documents. You can try it out with minimal risk by using the MongoLab hosted experimental service. Happy aggregating!

(Update 2012-07-10 untabify indentation in aggregation.js for proper formatting. 2012-07-11 Re-arranged images.  2012-09-07 to reference 2.2)

About benwen

I'm MongoLab's VP of Sales and Marketing. And I'm here to serve our customers' needs for MongoDB hosting in the cloud.

24 Responses to Aggregation Framework Example

  1. Shawn Brownfield 2012/08/07 at 4:35 pm #

    The fix for g1 aggregation is pretty easy.  Add these operations to the pipeline after the unwind:


    { $group : {
    _id : "$_id",
    author : { $first : "$author" },
    tags : { $addToSet : "$tags" },
    pageViews : { $first : "$pageViews" }}},
    { $unwind : "$tags" }

    The group command will undo the unwind on tags, but will remove duplicates ($addToSet).  We then unwind again, and we’re ready to go.

    It’d be nice to have this sort of de-duplication built in, but it isn’t too bad to do.

  2. benwen 2012/08/07 at 5:04 pm #

    Yep, that works!  email me ben at mongolab.com to claim your T-shirt.

  3. Niko Schmuck 2013/02/23 at 5:49 am #

    Fantastic article making the matter of the aggregation pipeline clear!

    Unfortunately your webpage seem to have crumbled up the code snippets (articles.js and following), they contain a lot of HTML extra encoding of the brackets ().

    The referenced code seems to also be available from https://gist.github.com/cwestin/1401585

  4. MongoLab 2013/02/23 at 9:46 am #

    Glad you like the article.

    Thanks Niko, I’m looking into the bug.  (seems to be a change in our WordPress environment) -Ben

  5. Abhishek Vaid 2013/02/28 at 12:28 pm #

    Does anyone know how can I access the whole MongoDocument in aggregation pipeline ?

  6. jerseys wholesale stock 2017/03/10 at 1:05 pm #

    Do you have a spam problem on this site; I also am a blogger, and I
    was wanting to know your situation; many of us have created some nice procedures and we are looking to swap
    methods with others, be sure to shoot me an email if interested.

  7. Winstoniceks 2017/03/11 at 11:46 am #

    thanks benefit of this countless informative website, living up the great jobless check out this [url=http://onlinecasinos-x.com]casino[/url] offers , buy [url=http://esextoyfun.com]sex toys[/url]

  8. rv storage oceanside 2017/03/29 at 10:11 pm #

    Most popular are the animal shows, including California sea lion dolphin, whale, and Asian – clawed performances.

  9. 97Susana 2017/03/31 at 3:04 am #

    I must say you have high quality articles here. Your website
    can go viral. You need initial boost only. How to get it?
    Search for: Miftolo’s tools go viral

  10. blog commenting seo 2012 2017/04/01 at 3:37 am #

    iipjbajxc jvemt fkdbgdn bkih myctrinhzyhgnob

  11. motor club of america 2017/04/01 at 11:54 am #

    914922 451551We will give deal reviews, deal coaching, and follow up to ensure you win the deals you cant afford to shed. 596933

  12. halotestin for sale 2017/04/04 at 4:11 pm #

    918183 66452Hey there! Someone in my Myspace group shared this site with us so I came to take a appear. Im undoubtedly enjoying the details. Im bookmarking and will likely be tweeting this to my followers! Superb weblog and outstanding style and style. 990882

  13. Best Best Online News 2017/04/06 at 11:41 am #

    646116 696983Most suitable boyfriend speeches, or else toasts. are almost always transported eventually by means of the entire wedding party and are nonetheless required to be really fascinating, amusing and even enlightening together. finest mans speech 820972

  14. Best Best Online News in the World 2017/04/06 at 3:22 pm #

    539008 756336Which is some inspirational stuff. Never knew that opinions may possibly be this varied. Thank you for all of the enthusiasm to give such valuable data here. 171403

  15. Best Best Online News in the World 2017/04/07 at 6:26 am #

    901970 849273The the next time I just read a blog, I really hope that this doesnt disappoint me approximately brussels. Get real, Yes, it was my option to read, but I really thought youd have some thing intriguing to say. All I hear is usually a couple of whining about something that you could fix when you werent too busy searching for attention. 939095

  16. Best Best Online News in the World 2017/04/07 at 8:40 pm #

    274878 60905What a outstanding viewpoint, nonetheless is just not produce every sence by any means discussing this mather. Just about any technique thanks and also i had try and discuss your post directly into delicius but it surely appears to be an problem inside your blogging is it possible you need to recheck this. thank you just as before. 111494

  17. GVK BIO 2017/04/07 at 11:36 pm #

    415912 886558Really fascinating topic , thanks for posting . 882675

  18. Best Best Online News in the World 2017/04/08 at 3:00 am #

    706162 148011Sewing Machines […]any time to read or go to the content or perhaps internet websites we definitely have associated with[…] 284695

  19. Best Best Online News in the World 2017/04/08 at 5:43 am #

    259853 194046Wohh exactly what I was looking for, appreciate it for posting . 638837

  20. GVK BIO 2017/04/08 at 10:13 am #

    58865 987854This is a good common sense article. Very helpful to one who is just finding the resouces about this part. It will certainly help educate me. 977652

  21. Best Best Online News in the World 2017/04/08 at 2:37 pm #

    681940 621190I like this web site because so a lot helpful material on here : D. 545896

  22. UK Chat Rooms 2017/04/09 at 12:57 am #

    749490 416160Um, consider adding pictures or much more spacing to your weblog entries to break up their chunky look. 719291

  23. GVK Biosciences 2017/04/09 at 7:39 am #

    468538 563905magnificent submit, quite informative. I ponder why the opposite experts of this sector dont realize this. You ought to proceed your writing. Im confident, youve a terrific readers base already! 610029

  24. Winstoniceks 2017/04/13 at 4:03 am #

    thanks benefit of this great revealing website, living up the massive jobless check out this [url=http://onlinecasinos-x.com]online casino[/url] offers

Leave a Reply