Analyze MongoLab Data with Hadoop in Mortar

The following is a guest post by Doug Daniels, CTO of Mortar Data Inc.

Today, we're excited to announce integration between MongoLab and Mortar, the Hadoop platform for high-scale data science. If you have one of the 100,000+ databases at MongoLab, you can now seamlessly use Hadoop to:

  • Run advanced algorithms (like recommendation engines)
  • Build reports that run quickly in parallel against large collections
  • Join multiple collections (and outside data) together for analysis
  • Store results to Google Drive, back to MongoLab, or many other destinations

In this article we'll show you how to connect your MongoLab database to Hadoop, and then use Hadoop to do something simple but very useful: gather schema information from an entire collection, including histograms of common values, data types, and more. Mortar handles all deployment, monitoring and cluster management, so no prior knowledge of Hadoop is required.

Quick Start

We first need to connect your MongoLab database to Mortar and Hadoop. If you haven't already, head over to the MongoLab sign-up page to create an account. After completing the form, you can immediately begin to provision new databases. Make sure that you choose the AWS us-east-1 datacenter for your MongoDB.

**If you're unsure which plan is right for you, visit the MongoLab plans page or email the MongoLab team at support@mongolab.com

Next, login to your MongoLab console. For this tutorial, we'll be using a replica set cluster and will connect to a secondary node. It's recommended to use a secondary node for analytics so that you don't affect regular traffic on the primary node (which can lead to performance degradation). For a deeper dive and alternate connection strategies, see the full Mongo-->Hadoop tutorial.

In your MongoLab console, open up the MongoDB cluster and database you'd like to process with Hadoop.

 mongolab-databases

Click on the Users tab for that database. Add a new user that you can use to connect to the database. We'll call ours "mortar_user". If you want to save results back to the database, make sure the user has write privileges.

mongolab-users

Next, sign up for a free account at Mortar. If you don't mind your project being public, stick with the free Public plan. If you need your project to be private, grab a free 7-day trial on the Solo plan.

Install Mortar and Connect to MongoLab

Now that your account is setup, use Mortar's installer to set up your workstation with everything you need to run and deploy Hadoop and Pig jobs.

Next, use Mortar to fork an example project for working with Mongo data:

  mortar projects:fork git@github.com:mortardata/mortar-mongo-examples.git <your_project_name_goes_here>

Now, grab the standard Mongo URI connection details for your database from MongoLab. If you have a Replica Set, use the credentials for the secondary node to keep traffic off the primary.

You can get the Mongo URI by clicking on your MongoLab Cluster and then choosing the Servers tab. If you have a secondary node, choose that one from the list. Then, select the database underneath you'd like to analyze.

database-twitter

At the top of the page, you'll see a box that says "To connect using a driver via the standard URI". Grab your database's Mongo URI from there, and fill in the missing <dbuser> and <dbpassword> with the user credentials you created above.

With your filled-out URI in hand, set the configuration for your Mortar project to point to your MongoLab server by running:

  cd <my_project_folder>
mortar config:set MONGO_URI='put your Mongo URI here'

This will store your encrypted configuration at Mortar for running jobs against MongoLab.

Run a Small Hadoop Job Locally

Now we're ready to run our first Hadoop job against Mongo. As an example, we'll run an Apache Pig script that connects to a collection in your database and emits statistics about every field in the collection. We'll run this script on your local computer first, so choose a small collection! Otherwise, you'll spend a lot of time trying to stream data from your MongoLab database to your local computer. We'll try larger collections when we run in the cloud next.

In your project directory, open the params/characterize-local.params file. Change INPUT_COLLECTION to a small collection you'd like to see stats on, and OUTPUT_COLLECTION to where you'd like the stats delivered. Then run:

  mortar local:run pigscripts/characterize_collection.pig -f params/characterize-local.params

This will first download all of the dependencies you need to run a Pig job to a local sandbox for your project. Once complete, it will do a local run of the characterize_collection Pigscript against your input collection.

When finished, you'll have a new Mongo document in your output collection with detailed information about each field in the input collection, including the number of unique values in the field, example values and predicted data types.

characterize_result

Run a Full Hadoop Job in the Cloud

Running locally is fine for smaller datasets, but to process larger data, we'll want to use the full power of a Hadoop cluster. With Mortar, one command deploys a snapshot of your code to a private Github repository, launches a private AWS Elastic MapReduce Hadoop cluster, and runs your code at scale.

Let's try it out. Open up the params/characterize-cloud.params file.  Set the INPUT_COLLECTION parameter to a larger collection that you'd like to analyze. Set the OUTPUT_COLLECTION to either the same output you used before or a new collection.

Now, run:

  mortar run pigscripts/characterize_collection.pig -f params/characterize-cloud.params --clustersize 3

This will validate your script, launch a new private, 3-node Hadoop cluster on AWS Spot Instances, and analyze your collection. Cluster startup will take about 10-15 minutes, and the job should cost less than $0.40 for the whole hour on 3 machines--Mortar passes AWS cluster costs directly back with no up-charge.

When you start your job, mortar will print out a job URL. Open it up, and you'll see realtime progress tracking, logs, and visualization for your job.

 job-tracking

When your job finishes, your results will be ready to view in the output collection you chose.

What's Next?

The example we ran is a fairly simple one. You'll want to go deeper on your own data--bringing in multiple collections, joining and aggregating, and using your own code. Our Mongo --> Hadoop tutorial will step you through the process, showing you how to work with your MongoLab data in Hadoop and Pig.

Mortar also has a growing number of open-source data apps pre-built on top of the platform, such as recommendation engines and Google Drive / Data Hero dashboards. We're quickly adding more, but if your use case isn't yet available, we have tutorials to help build your own data app.

If you have any questions about getting your data connected, contact us @mortardata or drop a question to our Q&A Forum.

14 Responses to Analyze MongoLab Data with Hadoop in Mortar

  1. katopz 2015/06/09 at 10:07 pm #

    http://app.mortardata.com is seem to be down http://www.isup.me/http://app.mortardata.com and help page seem to have problem with heroku SSL tho and main page seem to have problem with www redirect which appear down too http://www.isup.me/www.mortardata.com

  2. faux bague or blanc cartier 2017/03/26 at 6:48 am #

    cartierbraceletlove I can always count on you for a tasty sweet treat!
    faux bague or blanc cartier http://www.bestcalove.ru/fr/the-fashion-replica-cartier-love-ring-white-gold-316l-titanium-steel-b4084700-p779/

  3. bracelet or rose cartier imitation 2017/03/26 at 6:49 am #

    cartierbraceletlove This is milkshake lusciousness.
    bracelet or rose cartier imitation http://www.toplevejewelry.com.ru/fr/

  4. Minerva 2017/04/16 at 4:29 am #

    Moi pour le jeu, j’avais initialement pensé à &la&;ouqnbsp;Léonardo Da Vinci », où là aussi on place ses pions sur le plateau pour récolter des ressources ! Mais l’âge de Pierre, très bien, j’adore aussi !!!!!

  5. cartier love armband gebraucht replik 2017/05/08 at 12:28 pm #

    There are extremely lots of details that method to think about. That is actually a amazing examine bring up. I provide thoughts above as common inspiration but clearly you can discover questions just like the one you retrieve the spot that the most significant factor is going to be within the honest outstanding faith. I don?t know if guidelines have emerged about items like that, but Almost definitely that your chosen job is clearly labeled as a reasonable game. Both youngsters notice the impact of just a little moment’s pleasure, for the rest of their lives.
    cartier love armband gebraucht replik http://www.neuestearmband.net/tag/cartier-love-armband-replica

  6. copie love cartier bracelets 2017/05/08 at 12:28 pm #

    I saw yet yet another thing concerning this on an additional blog. Youve certainly spent some time on this. Effectively done!
    copie love cartier bracelets http://www.toplevejewelry.com.ru/

  7. Another hipster pipe (bong) dream that will go nowhere.
    how much is the fake rolex mens steel and yellow gold watch http://www.finewristwatchshop.com/en/rolex-daydate-president-yellow-gold-mens-watch-fluted-bezel-champagne-wave-dial-118238chwap-p734/

  8. Hi, for all time i used to check webpage posts here in the early hours in the daylight, as i love to find out more and more.
    where to buy fake rolex lady silver watch http://www.oystermontre.ru/en/replica-swiss-rolex-daytona-steel-mens-watch-replica-white-dial-sticks-markers-p141/

  9. This is the first I’ve seen of the metabolic cooking system, and I must admit that the creamy peach oatmeal is looking quite tasty right now! Still, I’d love to know more about the system in general. I can see it’s full of good food, but how exactly does it help me with my metabolism?
    how much is the copy rolex womens yellow gold watch http://www.montresmarqueclassic.ru/rolex-date-gmtmaster-ii-watches-yellow-gold-green-dial-116718g-p15/

  10. С сылки, указанной здесь – попадаю на страницу форума, а со своего сайта не попадаю по этой же ссылке, выдаёт, что страницы не существут.
    bulgari tubogas watch rose gold review http://www.top-bulgari.ru/replica-bvlgari-serpenti-tubogas-35mm-watch-twotone-pink-gold-sp35c6spg2t-p766/

  11. Unexpected Hits Vandam 1Hot Breezy Bree Aint Trippin
    bulgari tubogas watch for women Knockoff http://www.b-accessoires.ru/low-price-replica-bvlgari-serpenti-tubogas-steel-luxury-watch-sp35c6ss1t-p65/

  12. A fascinating and insightful article on disidealist’s perceptions of leadership. I also do a lot of work with staff in schools either wishing to prepare for leadership roles or who are already in those roles and are wanting to become even better.
    bvlgari snake watch for men imitation http://www.beauty-jewelry.nl/bulgari-serpenti-tubogas-watch-pink-gold-single-helix-bracelet-spp35bgdg2t-replica-p563/

  13. Olá, por favor se alguem puder me ajudem….tomei NOCICLIM por 5 meses, mas como acontecia de eu esquecer de tomar resolvi trocar para o metodo de injeção, acontece que até ai estava tudo ok, tomei a injeção dia 15 de janeiro de 2009, e a proxima aplicação foi marcada para o dia 14 de fevereiro , masa minha menstruação não vei até essa data então esperei e não fiz a aplicação dia 23 de fevereiro eu fiz um exame de sangue e me deu negativo, mas acontece que até agora minha menstruação ainda não veio e ja não sei mais oque fazer, me ajudem por favor!!!!!eu fiz aplicação do depo-provera 50.em janeiro.
    bvlgari serpenti watch automatic replica http://www.bzero.cn/en/bvlgari-serpenti-tubogas-diamond-pink-gold-twotone-watch-sp35bspgd2t-p-227.html

  14. bvlgari ring yellow copy 2017/05/10 at 2:09 am #

    That’s not true, most theists have been deceived by the world we live in.
    bvlgari ring yellow copy http://www.beauty-jewelry.nl/bulgari-serpenti-ring-replica-c36/

Leave a Reply