As developers, we often look for tools to make our work and processes more efficient. Sometimes we have to search for what we're looking for and sometimes we're lucky enough that it finds us! When our friends over at Treasure Data wrote to me about Fluentd, an open-source logging daemon written in Ruby that they created and maintain, I immediately saw value for MongoDB users looking for a quick way to collect data streams and store information in MongoDB.
Intro to Fluentd
Fluentd is an open source data collector designed to simplify and scale log management. Open-sourced in October 2011, it has gained traction steadily over the last 2.5 years: today, Fluentd has a thriving community of ~50 contributors and 1,900+ stargazers on GitHub with companies like Slideshare and Nintendo deploying it across hundreds of machines in production.
Most relevant to MongoDB developers, many folks use Fluentd to aggregate logs into MongoDB. The MongoDB community was one of the first to take notice of Fluentd, and the MongoDB plugin is one of the most downloaded Fluentd plugins to date.
Tutorial: Using MongoDB serverStatus for real-time & historical metrics
Today we'll provide a tutorial on using Fluentd with MongoDB. To make things interesting, we decided to get a bit meta; we'll be showing you how to store MongoDB serverStatus output into a MongoDB. The serverStatus command returns a document that provides an overview of the database process's state.
With this data you can easily create real-time and/or historical metrics that you're interested in. These metrics may be particularly useful for benchmarking, testing in development or monitoring your MongoDB's overall health.
If you need to install Fluentd, you can find detailed installation instructions on the project site. Fluentd is written in Ruby for flexibility, with performance-sensitive parts in C. However, since not all developers use Ruby, a stable distribution of Fluentd called td-agent was created. This allows developers unfamiliar with Ruby to quickly get up and running with Fluentd and avoid having to install the "fluentd" gem. The differences between td-agent and the fluentd gem can be found here.
For the purposes of this tutorial, we'll assume you've installed td-agent for Mac; I'll be using the Mac OSX distribution. However, if you're using fluentd just replace all instances of "td-agent" with "fluentd" and all the steps will still apply.
Setting up your Fluentd configuration file
First, you'll need to locate your td-agent.conf file. This is the config file that allows the user to control the input and output behavior of Fluentd by selecting plugins and specifying plugin parameters. If you don't know where it is, you can run the command "td-agent" from your terminal and the streaming logs will output the config file path location (amongst other information). By default on OSX, the file path is /usr/local/etc/td-agent/td-agent.conf.
Configuring the serverStatus input plugin
Once you've found the config file, you can define a data input source to collect from.
First we'll specify an input plugin - the serverStatus plugin that we've written for this tutorial. You'll want to change your config file to look like the following:
Next you'll need to save the serverStatus plugin code so that Fluentd can load and run the plugin. In the same directory as your config file there resides a "plugins" folder. Go ahead and save the serverStatus plugin code in a file named "in_serverstatus.rb" in the "plugins" folder.
The serverStatus input plugin executes the serverStatus() command every stats_interval seconds and also applies a tag to the data- in this case, serverstatus.hostName.portNumber. The tag is used by the output plugin to easily identify and store tagged data. For more on tags, I recommend checking out these 5 quick slides about the "Life of a Fluentd event".
Configuring the out_mongo output plugin
Now that we have our input plugin set up, we need to set up an output plugin to store our data to our target destination (a MongoDB). If you're using td-agent, it already comes bundled with a MongoDB output plugin called out_mongo or out_mongo_replset. If you're using fluentd, you can install it by running the command below.
% fluent-gem install fluent-plugin-mongo
With the output plugin installed, we can now add output parameters to our config file such as database location, credentials and other options. We'll add to our existing config file the following code.
The output plugin begins with a match regex that we've set to match the tag ("serverstatus") tagged by the input plugin. Specify where you'd like the output to be stored (your database information) and you're good to go!
Using multiple inputs/outputs
If you'd like to monitor multiple MongoDB deployments and/or use multiple outputs, the plugins support this too! To get serverStatus of more than one MongoDB, you can list URIs in the config file using the "uris" array parameter. The output for these MongoDBs will have different tags, making it easy to determine what data came from where.
To configure multiple outputs, you'll need to use the "copy" output plugin. In the example below, we've modified our existing configuration file's output code to also print the input results to the console.
Once you have the plugins set up, you can run Fluentd with either the 'fluentd' or 'td-agent' command from the command line. If you're using the multiple output configuration from above, you'll instantly see the serverStatus data printing in your console and storing to your MongoDB every 10 seconds.
With access to this data, you can calculate many interesting metrics that can help monitor the health of your MongoDB. However, you may notice that a lot of the metrics reported in serverStatus are growing totals as opposed to rates. For instance, instead of getting a simple updates-per-second number, serverStatus will give you the total number of update queries that have been made against the server since it started.
Creating useful metrics
Luckily it's very simple to extract rates from multiple serverStatus documents. Since we've set the stats interval to every 5 seconds, to get updates-per-second we take the update numbers from 2 sequential serverStatus documents, subtract them and divide by 5. Assuming serverStatusB was recorded after serverStatusA:
(serverStatusB.opcounters.updates - serverStatusA.opcounters.updates) / 5 seconds
This will give you the average rate of update queries during that 5 second period.
You can use this same technique with any of the counted metrics in serverStatus, including other opcounters, asserts, network bytesIn and bytesOut, page faults and index hits and misses.
We hope you found this tutorial helpful and informative... the possibilities with Fluentd and MongoDB are endless! Be sure to reach out to firstname.lastname@example.org if you have any questions about MongoDB and check out the Fluentd mailing list for any questions about Fluentd!