Harnessing the power of Twitter and MongoDB

Hey Mongoers!

I recently had the pleasure of joining the MongoLab team.  I share this with you for two reasons: First, you can too! (We're hiring!). But also because I remember when I heard about MongoDB, I created an account on MongoLab and thought... now what?


With open-source technologies proliferating as "Big Data" and analytics explode, we thought it would be beneficial to let our users and friends utilize a script that takes care of the nitty gritty and allows them to explore what makes MongoDB great.  We're excited to present Twitter-Harvest, a Python script that utilizes the Twitter REST API v1.1 to retrieve tweets from a user's timeline and inserts them into a MongoDB database.

Quick Demo

Update 5/8/14 11:40 AM: The twitter credentials previously provided in the gist below no longer work (we've been rate limited!). Please go to Twitter's Dev Center to create your own set of credentials.

The details on installation and running the app are located on this GitHub repo. For the impatient, I empathize... we've provided some Twitter credentials and an out-of-the-box command that you can run to see that everything works. After you have downloaded/unzipped the repo, run:

Straight out of the box, you'll notice that the script will print in your console all the tweets that it is harvesting.  Peruse the help docs and pass arguments accordingly- most notably you'll want to tack on a MongoDB URI using the --db flag so that you can store the tweets in your database.  Also keep in mind that if you'd like to use this script more than once, you should obtain your own Twitter credentials for security and rate limiting reasons.

Diving in

Once you have the necessary modules set up, you'll notice that the run script has quite a few options. *Twitter OAuth credentials are required. To help you store the harvested tweets, you can create a free Sandbox database with us! We have included the following options that we thought would be popular with users:

  • harvesting native retweets (-r)
  • printing each tweet the program iterates over (-v)
  • MongoDB URI, allow insertion into a MongoDB (--db)
  • setting the number of tweets to be harvested (--numtweets)
  • user timeline that you would like to harvest from *default is mongolab (--user)

So, let's say I want to harvest and print 100 of @mongolab's tweets (and retweets). The command and arguments would be:

Just like that, we have 100 tweets in a collection called "mongolab".

To help you along, we also have help documentation available:

Now, onto the fun stuff. Let's see what interesting data or projects you can come up with using this tool!

We challenge you!

In case you're stumped, here's a few challenges we've thought up that really highlight both Twitter's vast array of information and MongoDB features.

1. Compile a list of "successful"- retweeted and/or favorited- tweets and return only a few of the fields. Hint: Aggregation Framework

2. Harvest from a variety of users (friends, family, athletes) and see who has tweeted near you and with what frequency. Hint: Geospatial Indexes

3. Experiment with text indexes - after all, tweets are text- and examine your queries. Can you make them faster?  Hint: Text Search + Cursor Explain

4. Use this as an example to set up a public stream- great for data mining! Hint: Twitter Public Streams

Happy coding, and be sure to keep us posted on your projects. We're always here to help!



*special thanks to our Swedish friend Gustav Arngården @arngarden over at @aitellu for the harvesting idea!

, , , , , , , ,

11 Responses to Harnessing the power of Twitter and MongoDB

  1. Tomas 2014/02/04 at 11:09 am #

    Really helpful! Thanks.

  2. Chris Chang 2014/02/04 at 11:25 am #

    Glad you liked it!

  3. SUSHANT 2014/07/21 at 5:31 am #

    Please help me in getting the database argument. There are no password for the database and user is ‘hduser’ . I am getting the following error

    hduser@sush-comp:~/my/mongodb/software/twitter-mongolab-w/twitter-harvest-master$ python twitter-harvest.py –consumer-key consumer-key –consumer-secret consumer-secret –access-token access-token –access-secret access-secret –db mongodb:///localhost:27017/mydb –dbuser:dbpassword@dbhnn.mongolab.com:port/dbname
    usage: twitter-harvest.py [-h] [-r] [-v] [–numtweets NUMTWEETS] [–user USER]
    [–db DB] –consumer-key CONSUMER_KEY
    –consumer-secret CONSUMER_SECRET –access-token
    twitter-harvest.py: error: unrecognized arguments: –dbuser:dbpassword@dbhnn.mongolab.com:port/dbname

  4. Chris Chang 2014/07/21 at 1:31 pm #

    Hi there Sushant,

    If you’re using a MongoDB on MongoLab, you’re required to have a database password. If you write into support@mongolab.com we’d be happy to help you figure this out!

    Alternatively, if you’re running on a local MongoDB you shouldn’t need a username/pass.

  5. Ben 2014/11/07 at 7:29 pm #

    Hey Chris, I am using Twitter-Harvest for a data science project for my course.
    I was wondering how can I pass multiple value to the argument –user. I know that Twitter API allows multiple users requests, however I am not sure how I can do this with Twitter-Harvest.

    I am looking forward to share my project insight once it will be finished.



  6. Chris Chang 2014/11/12 at 2:29 pm #

    Hi Ben,

    Out of the box the script doesn’t support multiple values for the –user argument but you can modify the code to take multiple user arguments and generate multiple urls (see this line): https://github.com/mongolab/twitter-harvest/blob/master/twitter-harvest.py#L71

    If you have any more questions feel free to email us at support@mongolab.com


  7. Ben 2014/11/13 at 12:11 am #

    Great, I ll go for this! Thanks again!

  8. Ben 2014/11/13 at 1:49 am #

    I just finished a quick fix on my fork https://github.com/BenCoDev/twitter-harvest

  9. Zkk 2016/05/24 at 1:13 am #

    Can anyone help me figure out a problem which I have been puzzled for several days. Have you ever heard about LabVIEW? How can I achieve to read data from MongoDB in LabVIEW?


  1. Harnessing the power of Twitter and MongoDB | thoughts... - 2013/08/08

    […] http://blog.mongolab.com/2013/08/harnessing-the-power-of-twitter-and-mongodb/ […]

Leave a Reply