About will

Author Archive | will

About will

Here are my most recent posts

How to choose a DBaaS? Check out our piece in DZone

DZone recently asked us to contribute an article on choosing a Database-as-a-Service provider. I was excited to write this because it gave us a chance to summarize some of our experience hosting hundreds of thousands of MongoDB deployments over the past five years.

The piece, titled How to Choose a DBaaS, is also part of DZone’s new Guide to Data Persistence. Beyond our piece, there are some other interesting articles in it, like Vadim Tkachenko’s article comparing B-Trees, LSM Trees, and Fractal Trees, which are various data structures used for implementing indexes, and a very interesting and appropriate read for those of us who use MongoDB.

Check it out — hopefully you will find the guide useful, especially if you are interested in databases and learning about tools and techniques that other developers are using.

Heartbleed security update

As many of you know, a serious vulnerability in the OpenSSL cryptographic software library was recently discovered: CVE-2014-0160. This vulnerability is commonly called the “Heartbleed Bug” and is described at http://heartbleed.com.

The Heartbleed vulnerability can be exploited by an attacker to gain access to the cryptographic keys used to secure communication between clients and servers using SSL, which includes most communication with web servers using HTTPS. Furthermore, this vulnerability can be used to access the system memory of running servers. As a result, an attacker can potentially listen to client-server traffic, steal passwords, and even hijack an HTTP session. Continue Reading →

Production MongoDB Replica Sets now available on Windows Azure!

After many months of development and testing we are pleased to announce MongoLab‘s first production-ready database plans on the Windows Azure platform with immediate availability in Windows Azure’s East US and West US datacenters.

What does this new plan include?

  • A three-node Replica Set cluster (two data-bearing nodes plus one arbiter node)
  • Dedicated mongod processes on shared Windows Azure virtual machines
  • Up to 8GB of storage
  • High-availability via automatic failover in the event that the primary node fails or should become unreachable
  • Integration with MongoDB Monitoring Service (MMS)
  • Log file access (real-time and historical)

This is in addition to what every MongoLab user enjoys:

  • Continuous monitoring, 24/7
  • The ability to create backup plans (hourly/daily/weekly/monthly) and initiate one-time database snapshots
  • Rich, web-based management tools
  • Thoughtful, timely email support (support@mongolab.com) from real developers
  • Standard driver and REST API support

Continue Reading →

{ "comments": 11 }

Build your own lead capture page with Meteor and MongoDB in minutes

This is a guest blog post written by Niall O’Higgins and Peter Braden at Frozen Ridge, a full-stack web consultancy offering services around databases, node.js, testing & continuous deployment, mobile web and more. They can be contacted at hello@frozenridge.co.

Meteor is a framework for building real-time client-server applications in JavaScript. It is build from the ground up to work with MongoDB – a JSON database which gives you storage that’s idiomatic for JavaScript.

We were incredibly impressed with how easy it is to write apps with Meteor using MongoLab as our MongoDB provider. With less that 100 lines of Javascript code we were able to build a fully-functioning newsletter signup application, and with MongoLab we don’t have to think about database management or hosting.

To demonstrate Meteor working with MongoLab, we’ll walk you though building a lead capture web application.

Since MongoDB is a document-oriented database, it is very easy to modify the application to store any data you want. In our example, we are building this as an email newsletter signup system. However, you could just as easily make this into a very simple CRM by capturing additional fields like phone number, full name etc.

Overview of our newsletter signup app

Our newsletter signup app will consist of two views:

  • A user-facing landing page for people to enter their email address
  • An internal-facing page with tabular display of signups and other metadata such as timestamp, referrer, etc.

You can grab the complete source to the finished newsletter signup app on Github here and view a fully-functional, running example of the application here.

Create the Meteor app

First install Meteor:

> curl https://install.meteor.com | sh

Once Meteor is on your system, you can create an app called “app” with the command:

> meteor create app

Now you will have a directory named app which contains files app.jsapp.css and app.html.

Landing page template

First we need a nice HTML landing page. In the Meteor app you just created, your templates are stored in app.html. At the moment, Meteor only supports handlebars for templating.

It’s worth noting that everything must be specified in template tags, as Meteor will render everything else immediately. This enforces thinking of your app as a series of views rather than a series of pages.

Let’s look at an example from our finished app to illustrate. We have a “main” template which looks like this:

Data is bound from client-side code to templates through the Meteor template API.

Hence, the variable showAdmin is actually bound to the return value of the JavaScript functionTemplate.main.showAdmin in the client-side code. In our app.js, the implementation is as follows:

Due to Meteor’s data bindings, when the session variable “showAdmin” is set to true, the “admin” template will be rendered. Otherwise, the “signup” template will be rendered. Meteor doesn’t have to be explicitly told to switch the views – it will update automatically when the value changes.

This brings us to the client-side code.

Client-side code

Since Meteor shares code between the client and the server, both client and server code are contained in app.js. We can add client specific code by testing Meteor.isClient:

Inserting data on form submit

For the user-facing landing page, we merely need to insert data into the MongoDB collection when the form is submitted. We thus bind to the form’s submit event in the “signup” template and check to see if the email appears to be valid, and if so, we insert it into the data model:

One of the nice things about Meteor is that the client and server side data model API’s are the same.  If we insert the data here in the client, it is transparently synced with the server and persisted to MongoDB.

This is very powerful. Because we can use any MongoDB client to also connect directly to the database, we can easily use this data from other parts of our system. For example,  we can later link-up mailmerge software to make use of our database of emails to send newsletters.

Adding authentication

Now that we’ve got our newsletter signup form working, we will want the ability to see a list of emails in the database. However, because this is sensitive information, we don’t want it to be publicly visible. We only want a select list of authenticated users to be able to see it.

Fortunately, Meteor makes it easy to add authentication to your application. For demonstration purposes, we piggy-back off our Github accounts via OAuth2. We don’t want to create additional passwords just to view newsletter signups. Instead, we’ll consider a hardcoded list of Github usernames to view the admin page:

Meteor makes it very easy to add a “login with Github” UI flow to your application with the accounts and accounts-ui packages. You can add these with the command:

> meteor add accounts-ui accounts-github

Once these are added to your app, you can render a “login with Github” button in your templates by adding the special template variable {{loginButtons}}. For example in our finished app we have:

Email list view

The data display table is simply a handlebars table that we’ll populate with data from the database. Meteor likes to live-update data, which means if you specify your templates in terms of data accessors, when the underlying data changes, the DOM will automatically reflect the changes:

This is a pretty different approach to typical frameworks where you have to manually specify when a view needs to refresh.

We also make it possible for admin users to toggle the display of the email list in the app by inverting the value of the ‘showAdmin’ Meteor session variable:

Server-side code

Meteor makes it super easy to handle the server-side component and marshalling data between MongoDB and the browser. Our newsletter signup simply has to publish the signups collection for the data display view to be notified of its contents and it will update the view in real-time.

The entire server-side component of our Meteor application consists of:

With a unified data model between client and server, Meteor.publish is how you make certain sets of server-side data available to clients. In our case, we wish to make the Github username available in the current user object. We also only wish to publish the emails collection to admin users for security reasons.

Bundling the Meteor app

For deployment, Meteor apps can be translated to Node.JS applications using the meteor bundle command. This will output a tarball archive. To run this application, uncompress it and install its only dependency – fibers.

Fibers can be installed with the command

> npm install fibers

Deploying the Meteor app with MongoLab

Now your Meteor application is ready to run. There are a number of configuration options which can be set at start-time via UNIX environment variables. This is where we specify which MongoDB database to use. MongoLab is a great choice, taking a lot of the hassle out of running and managing your database, with a nice free Sandbox plan that you can create in seconds here.

In order to have you Meteor application persist data to your MongoLab database, set the MONGO_URL environment variable to the MongoDB URI provided by MongoLab for your database:

> export MONGO_URL=mongodb://user:password@dsNNNNNN.mongolab.com:port/db

For Meteor to correctly set up authentication with Github, you need to set the ROOT_URL environment variable:

> export ROOT_URL=http://localhost:8080

To run your Meteor application on port 8080, simply execute main.js:

> PORT=8080 node main.js

You should now be able to connect to it at http://localhost:8080!

{ "comments": 3 }

MongoLab now supports Google Cloud Platform!


This week at Google I/O we are launching support for MongoLab‘s fifth cloud provider – Google Cloud Platform. You can now use MongoLab to provision and manage MongoDB deployments on Google Compute Engine (GCE)!

So far we are very impressed with the capabilities of the GCE infrastructure.  In particular:

  • The network is fast. I mean really fast. Some of the bandwidth and latency benchmark scores are astounding. Since I/O is king for databases this will be great for connecting your GCE-hosted application to a MongoDB instance hosted by MongoLab.

  • GCE has a global private network connecting GCE regions across the world. This will be great for global multi-region clusters. We don’t support this quite yet, but when we do GCE will provide a high-speed private backbone upon which to build a great solution.

  • The API is clean, and VMs spin-up fast. This is key for automation, and we like to automate.

For now we are in an early access beta, supporting only our free Sandbox database plans in GCE’s us-central1 region. We will be launching support for the rest of our product line in subsequent releases.

We will have a Developer Sandbox (a.k.a “booth”) at the conference on Friday May 17th. If you are at Google I/O and into MongoDB come visit us!

{ "comments": 7 }

MongoLab on Windows Azure

Over the past few months we have been working closely with Microsoft to bring MongoLab to the Windows Azure platform, and today we are proud to announce our official Preview launch of MongoLab in Azure’s East US and West US datacenters.

Windows Azure is the fourth cloud provider we have added support for, and we find their offering to be very exciting for the industry. Azure is both IaaS (like EC2), and a PaaS (like Heroku or AppFog). It offers both Windows and Linux VMs (bet you did not expect that!) and supports multiple programming environments including Node.js, PHP, Java, and Python in addition to its .NET platform. It even has awesome command-line support as well as a web-based console.  We have high hopes for Azure becoming a great platform for developers.

So what does this integration with Azure mean?

With this integration, you can now use MongoLab on Windows Azure in two ways:

(1) Via MongoLab. Now when you create a database on http://mongolab.com, Windows Azure will be offered as a deployment option. Just select Windows Azure as your cloud provider, select which Azure datacenter you want, and you are good to go. While previously unannounced, we have been supporting our free sandbox database in this way for several months with great success. Now it is official!

(2) Via the Windows Azure Store. As of today we now offer seamless integration with the Windows Azure PaaS platform via an add-on service that you can provision directly from the Windows Azure management console. Just click on the MongoLab icon and follow the instructions from there.

With either method, you get the full MongoLab experience on the Windows Azure platform with a nice low-latency connection between your Azure-based application and your MongoDB database.

Is it ready for production?

Almost, but not quite yet. Right now the Azure Linux VMs we use to run our MongoDB instances are in “Preview” (i.e. Beta), and we expect them to go GA (Generally Available) in the coming months. Shortly after the Linux VMs go GA we will come out of Beta and go GA with our offering. So for now we are only offering our free sandbox plans on Azure with our Dedicated plans available to a select set of Beta customers. We plan to make the rest of our plans generally available as soon as possible.

How do I get started?

It’s easy! If you don’t yet have a MongoLab account, you can create one here. If you already have an account, just use our UI to make a new free database on Window’s Azure, and if you already have a Window’s Azure account, you can start here and have a database running in seconds.

We are also working on some great content to help you start writing apps using Azure and MongoLab. Our first installment is an example using C#, with more language examples to follow:


We look forward to hearing your feedback as you play around with MongoLab on Azure. Stay tuned… this is just the beginning.


P.S. The press release is available here: BusinessWire

Update 2012-10-31 09:45 : added BusinessWire press release

{ "comments": 32 }

Why is MongoDB wildly popular? It’s a data structure thing.

Updated 11/7/14: Fixed typos

“Show me your code and conceal your data structures, and I shall continue to be mystified. Show me your data structures, and I won’t usually need your code; it’ll be obvious.” – Eric Raymond, in The Cathedral and the Bazaar, 1997

Linguistic innovation

The fundamental task of programming is telling a computer how to do something.  Because of this, much of the innovation in the field of software development has been linguistic innovation; that is, innovation in the ease and effectiveness with which a programmer is able to instruct a computer system.

While machines operate in binary, we don’t talk to them that way. Every decade has introduced higher-level programming languages, and with each, an advancement in the ability of programmers to express themselves. These advancements include improvements in how we express data structures as well as how we express algorithms.

The Object-Relational impedance mismatch

Almost all modern programming languages support OO, and when we model entities in our code, we usually model them using a composition of primitive types (ints, strings, etc…), arrays, and objects.

While each language might handle the details differently, the idea of nested object structures has become our universal language for describing ‘things’.

The data structures we use to persist data have not evolved at the same rate. For the past 30 years the primary data structure for persistent data has been the Table – a set of Rows comprised of Columns containing scalar values (ints, strings, etc…). This is the world of the relational database, popularized in the 1980’s by its transactionality, speedy queries, space efficiency over other contemporary database systems, and a meat-eating ORCL salesforce.

The difference between the way we model things in code, via objects, and the way they are represented in persistent storage, via tables, has been the source of much difficulty for programmers. Millennia of man-effort have been put  against solving the problem of changing the shape of data from the object form to the relational form and back.

Tools called Object-Relational Mapping systems (ORMs) exist for every object-oriented language in existence, and even with these tools, almost any programmer will complain that doing O/R mapping in any meaningful way is a time-consuming chore.

Ted Neward hit it spot on when he said:

“Object-Relational mapping is the Vietnam of our industry”

There were attempts made at object databases in the 90s, but there was no technology that ever became a real alternative to the relational database. The document database, and in particular MongoDB, is the first successful Web-era object store, and because of that, represents the first big linguistic innovation in persistent data structures in a very long time. Instead of flat, two-dimensional tables of records, we have collections of rich, recursive, N-dimensional objects (a.k.a. documents) for records.

An Example: the Blog Post

Consider the blog post. Most likely you would have a class / object structure for modeling blog posts in your code, but if you are using a relational database to store your blog data, each entry would be spread across a handful of tables.

As a developer, you need to know how to convert each ‘BlogPost’ object to and from the set of tables that house them in the relational model.

A different approach

Using MongoDB, your blog posts can be stored in a single collection, with each entry looking like this:

    _id: 1234,
    author: { name: "Bob Davis", email : "bob@bob.com" },
    post: "In these troubled times I like to …",
    date: { $date: "2010-07-12 13:23UTC" },
    location: [ -121.2322, 42.1223222 ],
    rating: 2.2,
    comments: [
       { user: "jgs32@hotmail.com",
         upVotes: 22,
         downVotes: 14,
         text: "Great point! I agree" },
       { user: "holly.davidson@gmail.com",
         upVotes: 421,
         downVotes: 22,
         text: "You are a moron" }
    tags: [ "Politics", "Virginia" ]

With a document database your data is stored almost exactly as it is represented in your program. There is no complex mapping exercise (although one often chooses to bind objects to instances of particular classes in code).

What’s MongoDB good for?

MongoDB is great for modeling many of the entities that back most modern web-apps, either consumer or enterprise:

  • Account and user profiles: can store arrays of addresses with ease
  • CMS: the flexible schema of MongoDB is great for heterogeneous collections of content types
  • Form data: MongoDB makes it easy to evolve the structure of form data over time
  • Blogs / user-generated content: can keep data with complex relationships together in one object
  • Messaging: vary message meta-data easily per message or message type without needing to maintain separate collections or schemas
  • System configuration: just a nice object graph of configuration values, which is very natural in MongoDB
  • Log data of any kind: structured log data is the future
  • Graphs: just objects and pointers – a perfect fit
  • Location based data: MongoDB understands geo-spatial coordinates and natively supports geo-spatial indexing

Looking forward: the data is the interface

There is a famous quote by Eric Raymond, in The Cathedral and the Bazaar (rephrasing an earlier quote by Fred Brooks from the famous The Mythical Man-Month):

“Show me your code and conceal your data structures, and I shall continue to be mystified. Show me your data structures, and I won’t  usually need your code; it’ll be obvious.”

Data structures embody the essence of our programs and our ideas. Therefore, as programmers, we are constantly inviting innovation in the ease with which we can define expressive data structures to model our application domain.

People often ask me why MongoDB is so wildly popular. I tell them it’s a data structure thing.

While MongoDB may have ridden onto the scene under the banner of scalability with the rest of the NoSQL database technologies,  the disproportionate success of MongoDB is largely based on its innovation as a data structure store that lets us more easily and expressively model the ‘things’ at the heart of our applications. For this reason MongoDB, or something very like it, will become the dominant database paradigm for operational data storage, with relational databases filling the role of a specialized tool.

Having the same basic data model in our code and in the database is the superior method for most use-cases, as it dramatically simplifies the task of application development, and eliminates the layers of complex mapping code that are otherwise required. While a JSON-based document database may in retrospect seem obvious (if it doesn’t yet, it will), doing it right, as the folks at 10gen have, represents a major innovation.


{ "comments": 60 }

Optimize slow MongoDB queries using MongoLab

You’ve launched your application, driven some traffic to it (maybe it’s even “gone viral”!), accumulated a meaningful volume of data and now it feels like some of your queries have become sluggish. What now? Time for query profiling!  MongoLab can help.

Step 1: Turn on the profiler

To turn on the query profiler log in to your MongoLab account and navigate to your database’s home screen. From there select the “Tools” tab and the “commands” sub-tab. Then select the “profile (log slow)” command.

The default threshold for slow queries is 100ms. To change this threshold, pass a different value for “slowms”:

   "profile" : 1,
   "slowms" : 50

Step 2: Examine the ‘system.profile’ collection

The profiler deposits all profile data into a collection named ‘system.profile’. After running your application for a bit (and hopefully accessing the code paths that you believe might be executing slow queries), go into the ‘system.profile’ collection and take a look.

The resulting documents represent operations taking more than 100ms (or whatever you set the threshold to be). Each profile entry has the following form:

   "ts" : <timestamp>,
   "info" : <detailed-info-on-the-operation>,
   "millis" : <how-long-it-took>,

Step 3: Diagnose and take action

If your slow operations are queries, the most likely problem is that you are doing needless “collection scans” on large collections. This means that the database is iterating through each document in the collection to compute the query result instead of using an index.

You can identify queries doing collection scans by examining the “info” field of the profile entries. If the ‘nscanned’ (number of documents scanned) value is significantly greater than the ‘nreturned’ (number of documents returned) value, you should consider creating an index to optimize the query. You can see an example of this in the second profile entry in the screenshot above.

The next step in the diagnosis would be to look at the query itself. Is it filtering on a field that could be indexed? If that field has a decent range of values (vs. say two values, like with booleans) an index might help a great deal. Is the query performing a sort on an unindexed field? If so, indexing that field is a good idea. Finally, consider compound indexes if you are filtering and sorting by different fields in the same query. You can add indexes for a collection in MongoLab by going to the “Indexes” tab on any collection home screen. See MongoDB’s documentation on indexes here.

Of course, all of this applies to any operation with a query or sort component. This can include updates, deletes, and the use of findAndModify.

Other possible reasons why your queries are slow:

  • You not limiting a large result set (i.e. you are shipping a million records back to your app instead of using skip() and limit() to page through the results)
  • Your documents are simply enormous. Consider masking some of the fields so that only a subset of fields are returned.
  • You are executing a count() on a query that filters by one or more fields that have not been indexed
  • You are using server-side javascript execution (can be slow)

Step 4: Turn off the profiler

You don’t want to leave the profiler on if you are not using it. While not huge, there is some overhead to leaving it on (i.e. it can slow things down a tad). To turn it off simply go back to the same screen you used to turn in on in Step 1, select the “profile (turn off)” command from the menu, and run the command.

Further reading

{ "comments": 16 }