Harnessing the power of Twitter and MongoDB

Hey Mongoers!

I recently had the pleasure of joining the MongoLab team.  I share this with you for two reasons: First, you can too! (We're hiring!). But also because I remember when I heard about MongoDB, I created an account on MongoLab and thought... now what?


With open-source technologies proliferating as "Big Data" and analytics explode, we thought it would be beneficial to let our users and friends utilize a script that takes care of the nitty gritty and allows them to explore what makes MongoDB great.  We're excited to present Twitter-Harvest, a Python script that utilizes the Twitter REST API v1.1 to retrieve tweets from a user's timeline and inserts them into a MongoDB database.

Quick Demo

Update 5/8/14 11:40 AM: The twitter credentials previously provided in the gist below no longer work (we've been rate limited!). Please go to Twitter's Dev Center to create your own set of credentials.

The details on installation and running the app are located on this GitHub repo. For the impatient, I empathize... we've provided some Twitter credentials and an out-of-the-box command that you can run to see that everything works. After you have downloaded/unzipped the repo, run:

Straight out of the box, you'll notice that the script will print in your console all the tweets that it is harvesting.  Peruse the help docs and pass arguments accordingly- most notably you'll want to tack on a MongoDB URI using the --db flag so that you can store the tweets in your database.  Also keep in mind that if you'd like to use this script more than once, you should obtain your own Twitter credentials for security and rate limiting reasons.

Diving in

Once you have the necessary modules set up, you'll notice that the run script has quite a few options. *Twitter OAuth credentials are required. To help you store the harvested tweets, you can create a free Sandbox database with us! We have included the following options that we thought would be popular with users:

  • harvesting native retweets (-r)
  • printing each tweet the program iterates over (-v)
  • MongoDB URI, allow insertion into a MongoDB (--db)
  • setting the number of tweets to be harvested (--numtweets)
  • user timeline that you would like to harvest from *default is mongolab (--user)

So, let's say I want to harvest and print 100 of @mongolab's tweets (and retweets). The command and arguments would be:

Just like that, we have 100 tweets in a collection called "mongolab".

To help you along, we also have help documentation available:

Now, onto the fun stuff. Let's see what interesting data or projects you can come up with using this tool!

We challenge you!

In case you're stumped, here's a few challenges we've thought up that really highlight both Twitter's vast array of information and MongoDB features.

1. Compile a list of "successful"- retweeted and/or favorited- tweets and return only a few of the fields. Hint: Aggregation Framework

2. Harvest from a variety of users (friends, family, athletes) and see who has tweeted near you and with what frequency. Hint: Geospatial Indexes

3. Experiment with text indexes - after all, tweets are text- and examine your queries. Can you make them faster?  Hint: Text Search + Cursor Explain

4. Use this as an example to set up a public stream- great for data mining! Hint: Twitter Public Streams

Happy coding, and be sure to keep us posted on your projects. We're always here to help!



*special thanks to our Swedish friend Gustav Arngården @arngarden over at @aitellu for the harvesting idea!

, , , , , , , ,

27 Responses to Harnessing the power of Twitter and MongoDB

  1. Tomas 2014/02/04 at 11:09 am #

    Really helpful! Thanks.

  2. Chris Chang 2014/02/04 at 11:25 am #

    Glad you liked it!

  3. SUSHANT 2014/07/21 at 5:31 am #

    Please help me in getting the database argument. There are no password for the database and user is ‘hduser’ . I am getting the following error

    hduser@sush-comp:~/my/mongodb/software/twitter-mongolab-w/twitter-harvest-master$ python twitter-harvest.py –consumer-key consumer-key –consumer-secret consumer-secret –access-token access-token –access-secret access-secret –db mongodb:///localhost:27017/mydb –dbuser:dbpassword@dbhnn.mongolab.com:port/dbname
    usage: twitter-harvest.py [-h] [-r] [-v] [–numtweets NUMTWEETS] [–user USER]
    [–db DB] –consumer-key CONSUMER_KEY
    –consumer-secret CONSUMER_SECRET –access-token
    twitter-harvest.py: error: unrecognized arguments: –dbuser:dbpassword@dbhnn.mongolab.com:port/dbname

  4. Chris Chang 2014/07/21 at 1:31 pm #

    Hi there Sushant,

    If you’re using a MongoDB on MongoLab, you’re required to have a database password. If you write into support@mongolab.com we’d be happy to help you figure this out!

    Alternatively, if you’re running on a local MongoDB you shouldn’t need a username/pass.

  5. Ben 2014/11/07 at 7:29 pm #

    Hey Chris, I am using Twitter-Harvest for a data science project for my course.
    I was wondering how can I pass multiple value to the argument –user. I know that Twitter API allows multiple users requests, however I am not sure how I can do this with Twitter-Harvest.

    I am looking forward to share my project insight once it will be finished.



  6. Chris Chang 2014/11/12 at 2:29 pm #

    Hi Ben,

    Out of the box the script doesn’t support multiple values for the –user argument but you can modify the code to take multiple user arguments and generate multiple urls (see this line): https://github.com/mongolab/twitter-harvest/blob/master/twitter-harvest.py#L71

    If you have any more questions feel free to email us at support@mongolab.com


  7. Ben 2014/11/13 at 12:11 am #

    Great, I ll go for this! Thanks again!

  8. Ben 2014/11/13 at 1:49 am #

    I just finished a quick fix on my fork https://github.com/BenCoDev/twitter-harvest

  9. Zkk 2016/05/24 at 1:13 am #

    Can anyone help me figure out a problem which I have been puzzled for several days. Have you ever heard about LabVIEW? How can I achieve to read data from MongoDB in LabVIEW?

  10. seo 2017/03/14 at 1:58 pm #

    Hello Web Admin, I noticed that your On-Page SEO is is missing a few factors, for one you do not use all three H tags in your post, also I notice that you are not using bold or italics properly in your SEO optimization. On-Page SEO means more now than ever since the new Google update: Panda. No longer are backlinks and simply pinging or sending out a RSS feed the key to getting Google PageRank or Alexa Rankings, You now NEED On-Page SEO. So what is good On-Page SEO?First your keyword must appear in the title.Then it must appear in the URL.You have to optimize your keyword and make sure that it has a nice keyword density of 3-5% in your article with relevant LSI (Latent Semantic Indexing). Then you should spread all H1,H2,H3 tags in your article.Your Keyword should appear in your first paragraph and in the last sentence of the page. You should have relevant usage of Bold and italics of your keyword.There should be one internal link to a page on your blog and you should have one image with an alt tag that has your keyword….wait there’s even more Now what if i told you there was a simple WordPress plugin that does all the On-Page SEO, and automatically for you? That’s right AUTOMATICALLY, just watch this 4minute video for more information at. Seo Plugin

  11. is motor club america a scam 2017/04/01 at 12:34 pm #

    726560 501287Merely wanna state that this is very helpful , Thanks for taking your time to write this. 150334

  12. I do not even know how I ended up right here, however I believed this submit used to be
    great. I don’t recognise who you’re but definitely you
    are going to a well-known blogger in the event you
    are not already. Cheers!

  13. Best Best Online News in the World 2017/04/06 at 3:13 pm #

    462147 491590i just didnt need a kindle at first, but when receiving one for christmas im utterly converted. It supply genuine advantages over a book, and makes it such a lot additional convenient. i might undoubtedly advocate this item: 524339

  14. 503676 321890It can be difficult to write about this subject. I believe you did an outstanding job though! Thanks for this! 697094

  15. Best Best Online News in the World 2017/04/06 at 10:01 pm #

    903581 479489You created some decent points there. I looked on the internet for the difficulty and discovered most individuals will go coupled with along together with your web site. 141492

  16. www.youtube.com 2017/04/15 at 2:27 pm #

    I’m extremely inspired with your writing talents as neatly
    as with the format in your weblog. Is that this
    a paid topic or did you customize it yourself? Anyway keep up the excellent quality
    writing, it is uncommon to peer a great weblog like this one these days..

  17. Cleveland Tree Services 2017/04/15 at 5:50 pm #

    naturally like your website but you have to test the spelling on several of your posts.
    Many of them are rife with spelling issues and I find it very bothersome to inform
    the reality on the other hand I’ll surely come back again.

  18. homes for sale near me 2017/04/21 at 8:00 am #

    Attractive component of content. I just stumbled upon your blog and in accession capital to assert that I acquire actually enjoyed account your weblog posts.

    Anyway I’ll be subscribing to your augment and even I success you
    get right of entry to persistently rapidly.

  19. fast muscle growth supplements 2017/04/25 at 12:00 pm #

    It is truly a great and helpful piece of info. I’m glad that you shared this helpful
    info with us. Please keep us informed like this. Thank you for sharing.

  20. Slimera Garcinia Cambogia 2017/04/27 at 1:14 pm #

    wonderful issues altogether, you simply gained a new reader.
    What would you recommend in regards to your post that you made
    a few days in the past? Any certain?

  21. best supplements to put on mass 2017/04/29 at 1:42 am #

    Hi there! Do you use Twitter? I’d like to follow you if that would be ok.

    I’m definitely enjoying your blog and look forward to new posts.

  22. Currently it seems like Movable Tyype iis the top blogging platform available right now.
    (from what I’ve read) Is thzt what you’re using on your

  23. testosterone and muscle growth 2017/05/02 at 3:53 pm #

    Wonderful, what a website it is! This website presents useful facts
    to us, keep it up.

  24. bodybuilding websites 2017/05/07 at 7:57 pm #

    fantastic submit, very informative. I wonder why the other specialists of
    this sector do not realize this. You should proceed your writing.
    I’m sure, you’ve a huge readers’ base already!

  25. all bodybuilding supplements 2017/05/08 at 6:49 pm #

    My developer is trying to persuade me to move to .net from PHP.
    I have always disliked the idea because of the costs.
    But he’s tryiong none the less. I’ve been using WordPress on various websites for about a year and am worried about switching to another platform.
    I have heard great things about blogengine.net.
    Is there a way I can transfer all my wordpress posts into it?

    Any kind of help would be really appreciated!


  1. Harnessing the power of Twitter and MongoDB | thoughts... - 2013/08/08

    […] http://blog.mongolab.com/2013/08/harnessing-the-power-of-twitter-and-mongodb/ […]

Leave a Reply