Using MongoDB $indexStats to identify and remove unused indexes

Available for Dedicated plans on mLab*

Proper indexing is critical to database performance. A single unindexed query is enough to cause significant performance degradation.

It is relatively easy to spot a missing index using mLab’s Slow Query Analyzer, but the tool doesn’t provide an obvious way to identify and remove indexes that aren’t actually being used.

Because unused indexes impact write performance and consume valuable resources, periodic index review and maintenance is recommended. MongoDB 3.2 has introduced a new feature to help identify unused indexes:

* This feature will also be available on our Sandbox and Shared plans once the fix for SERVER-26734 is available for supported versions.

The $indexStats operator

The $indexStats aggregation pipeline operator displays a running tally of index usage since the server was last started. The operator reports on the usage for all of the indexes within a specified collection, and (with additional aggregation operators) can be targeted at a particular index. Knowing if and how often indexes are being used allows an administrator to make informed decisions on which indexes are providing benefit.

How to analyze index usage with $indexStats

It’s not necessarily obvious which collections might contain unused indexes. To obtain a comprehensive list of all index usage, you’ll need to run $indexStats on each collection.

Until SERVER-26734 is resolved, you will need to connect with an admin database user to run $indexStats.

Connecting with an admin database user

You will need to create an admin database user if you don’t already have one.

To run the $indexStats command, you will need to connect to your database with an admin database user. The following command will use your “admin” database user credentials to authenticate to the “admin” database, then connect to the target database (“myDatabase”):

> mongo ds123456-a0.mlab.com:12345/myDatabase -u <adminUser> -p <adminUserPassword> --authenticationDatabase admin

Running $indexStats on an entire collection

Now that you are connected to the target database, the $indexStats command can be run using the MongoDB shell:

> db.myColl.aggregate( { $indexStats: { } } )

{

"name" : "color_1",

"key" : {

"color" : 1

},

"host" : "examplehost.local:27017",

"accesses" : {

"ops" : NumberLong(50),

"since" : ISODate("2017-01-06T16:52:50.744Z")

}

}

{

"name" : "type_1",

"key" : {

"type" : 1,

},

"host" : "examplehost.local:27017",

"accesses" : {

"ops" : NumberLong(0),

"since" : ISODate("2017-01-06T16:52:48.362Z")

}

}

{

"name" : "name_1",

"key" : {

"name" : 1

},

"host" : "examplehost.local:27017",

"accesses" : {

"ops" : NumberLong(100),

"since" : ISODate("2017-01-06T16:32:44.609Z")

}

}

The return document includes the following fields:

Output Field Description
name Index name
key Index key specification
host The hostname and port of the mongod process
accesses.ops The number of operations that used the index
accesses.since The time from which MongoDB started gathering the index usage statistics

Source: https://docs.mongodb.com/manual/reference/operator/aggregation/indexStats/

Interpreting $indexStats

Every database query, update, or command that uses an index will be counted toward that index’s usage statistics.

  • The “name”, “key”, and “host” output fields provide the metadata for each index.
  • The “accesses.ops” value displays the number of operations that have used the index. Any indexes with zero access operations suggest a potentially unused index which can potentially be deleted.
  • The “accesses.since” value is the point in time from which MongoDB began gathering the index statistics. This value is set either upon index creation or as of a mongod server restart. 

Importantly, note that $indexStats shows index data as of the last database server process restart. Therefore, these running tallies are wiped out and restarted with each server restart. Keep in mind that a database server process restart can occur during maintenance events, plan changes, intentional deployment restarts, or as a part of unexpected failures requiring restart.

If you would like a fresh set of statistics you can choose to perform an intentional cluster restartIf you would prefer not to perform a cluster restart, you can use $indexStats to sample from two points in time and calculate the difference in access operations over time for each index.

Running $indexStats for a particular index

You can use the $match operator within the aggregation pipeline to specify a particular index. This allows you to match indexes based on index name or index key. You can run the following commands in the MongoDB shell:

Based on index name:

> db.myColl.aggregate([{$indexStats: {}}, {$match: {"name": "color_1"}}])

Based on index key:

> db.myColl.aggregate([{$indexStats: {}}, {$match: {"key": {"color": 1}}}])

The return document only displays the index statistics for the particular key.

 

{

"name" : "color_1",

"key" : {

"color" : 1

},

"host" : "examplehost.local:27017",

"accesses" : {

"ops" : NumberLong(50),

"since" : ISODate("2017-01-06T16:52:50.744Z")

}

}

Removing unused indexes

Proceed with caution

As with all delete operations on the database, always err on the side of caution when removing an index.

  • Do not drop an index if there is any uncertainty surrounding its use.
  • Accidentally removing a necessary index can result in significant performance degradation.
  • Closely monitor database performance immediately after making index changes.

In addition, here are some checks to perform before removing an unused index:

  • Are there infrequent operations which require the index?
  • Are there query patterns that are failing to use the index?
  • Are there plans to use the index in the near future?

More information about indexing can be found in our documentation: http://docs.mlab.com/indexing/

Take a backup first (optional)

An additional precaution is to take a backup before dropping a series of unused indexes. We recommend a block storage snapshot (e.g., an EBS Snapshot) over a mongodump since this type of backup tends to be orders of magnitude faster to both take and restore.

Drop the unused index

After reviewing the considerations above, you can proceed with removing any unused indexes.
To drop the “color_1” index, perform the following command in the MongoDB shell:

> db.myColl.dropIndex( { "color": 1 } );

Thank you for reading!  If you have questions on this exciting new feature or on MongoDB indexing/performance in general, please email our team at support@mlab.com for help.

 

Comments are closed

Recent MongoDB ransom attacks

Many of you have likely heard that an estimated 27,000 MongoDB databases have had their data removed and held at ransom by hackers. We have received many questions about the news and wanted to discuss and share MongoDB security best practices to prevent future incidents.

All database deployments hosted at mLab are safe from such attacks.

How could 27,000 databases be held at ransom?

First, it is important to understand the nature of these “breaches”. In a sense these were not breaches at all. All of the databases that were attacked:

  1. Were running without authentication enabled, and
  2. Had their MongoDB ports open to the public internet

This means these databases were configured to accept connections from any client, and to not require that clients authenticate to the database via valid credentials (e.g. username and password).

With this in mind, one can see how such an attack was implemented. Two years ago, one security researcher discovered that 30,000+ MongoDB databases were exposed on the internet running without authentication enabled or firewalls configured.

It is also important to note that there are no known vulnerabilities in MongoDB that would allow for such an attack against databases with authentication enabled.

Are mLab-hosted databases vulnerable to this attack?

No. All mLab databases are configured to require database authentication by clients. Furthermore, on our Dedicated plans, you may firewall your database to only accept connections from IP addresses that you whitelist; this allows you to enforce that only your application infrastructure can connect to your database.

You can read more about how mLab handles security at http://docs.mlab.com/security/. In particular, note that our Dedicated plans allow deployments to be firewalled from the public internet, have SSL enabled, and be housed in private networks (i.e., VPC peering) to limit communication between the application and database.

If you have any questions, please email support@mlab.com for help.

What if I host my own MongoDB?

If you host your own MongoDB deployments, you should make sure that you enable authentication and firewall your database to restrict access from unauthorized IP addresses.

MongoDB has also published a security checklist, which you can follow and implement to protect your MongoDB installation.

Comments are closed

Configuring a MongoDB replica set for analytics

MongoDB replica sets make it easy for developers to ensure high availability for their database deployments.

A common replica set configuration is composed of three member nodes: two data-bearing nodes and one arbiter node. With two electable, data-bearing nodes, users are protected from scenarios that cause downtime for single-node deployments, such as maintenance events and hardware failures.

However, it may be tempting to read from the redundant, secondary server to scale reads and/or run queries for the purpose of analytics. We strongly advise against secondary reads when there are only two electable, data-bearing nodes in the replica set.

The main reason for this recommendation is that relying on secondary reads can compromise the high availability replica sets are meant to provide. While occasional use of the secondary for non-critical ad-hoc queries is fine, if your app requires both the primary and the secondary to shoulder the database load of your application, your system is no longer in a position to handle this load if one of the nodes in the cluster goes down or becomes unavailable.

This is discussed in more depth in the following resources:

Run analytics queries against hidden, analytics nodes instead

If you would like to run more than the occasional, ad-hoc or analytics query, we highly recommend that you properly configure your replica set to handle analytics queries.  In particular, we recommend adding a node designated for analytics as a hidden, non-electable member of the replica set.

Hidden members have properties that make them great for analytics. A hidden replica set member:

Maintains a copy of the primary’s data set – Querying on a hidden member will be nearly identical to querying the primary node (minus some replication delay).

Cannot become primary and is invisible to your application – It’s important to isolate analytics traffic from production application traffic. If the analytics node became the replica set primary, it may be unable to handle the combined analytics and production application traffic.

Can be useful for disaster recovery as well if a slaveDelay is configured – See advanced configuration considerations below.

If you’re interested in adding an analytics node to your mLab deployment:

  1. Email us at support@mlab.com to request that the node be added.
  2. mLab will add the node seamlessly into your replica set as a hidden member and provide you with its address.
  3. You will then be able to start to create single-node connections using that address for your analytics queries.

Advanced configuration considerations

Enabling slaveDelay on the analytics node for replica set disaster recovery

MongoDB’s slaveDelay option allows you to configure a replication delay on a hidden replica set member. Configuring a delay is helpful for recovering from disaster scenarios such as accidentally dropping a collection or database.

For example, imagine that you configure a one-hour delay on an analytics node. If a developer accidentally drops/deletes data from the primary node, the changes will be applied to the analytics node an hour later (as opposed to immediately). This allows you to query the analytics node to retrieve the deleted data.

Reading from secondaries in a Sharded Cluster

If you are running a Sharded deployment and would like to read from the secondary members of your shards, there are important considerations you should be aware of.  We will be publishing a blog post on this advanced topic in the future.

Comments are closed

MongoDB tips & tricks: Collection-level access control

As your database or project grows, you may be tasked with configuring access controls to allow different stakeholders access to the database. Rather than create a new user with full database privileges, it may be more appropriate to create a user that only has access to the data or collections they need. This allows users to query against the collections you define and limits their access to the rest of the database.

Here’s a step-by-step example that demonstrates how to set up collection-level access control. This example will create a user named “finance” on the “acme” database. The “finance” user will only have “find” (read) access to the “billing” collection.

Step 1. Connect to the “acme” database using an existing user

> mongo ds123456.mlab.com:12345/acme -u dba -p password

Note that the “dba” user will need the userAdmin role to create and modify roles and users on the “acme” database. By default, mLab database users created through the UI are granted the dbOwner role, which combines the privileges granted by the readWrite, dbAdmin, and userAdmin roles.

Step 2. Create a new user-defined role for the “billing” collection

> db.createRole({ role: "readBillingOnly", privileges: [ { resource: { db: "acme", collection: "billing" }, actions: [ "find" ] } ], roles: [] })]

You can also add more privilege actions to the “actions” array, such as “insert” or “update”.

Step 3. Create a new user named “finance” with the role you just created

> db.createUser({ user: "finance", pwd: "password", roles: [ { role: "readBillingOnly", db: "acme" } ] })

Alternatively, if the user already exists, you can use the grantRolesToUser() method:

> db.grantRolesToUser("finance", [ { role: "readBillingOnly", db: "acme" } ])

 

And that’s it! You now have a user named “finance” that has read-only access on the “billing” collection in the “acme” database.

Comments are closed

Counting Down the Parse Migration

On January 28, 2017 Parse will be fully retired. To help with the transition, mLab has published a comprehensive guide to migrating Parse data onto an mLab-hosted MongoDB database. This guide aims to help existing Parse customers by highlighting migration best practices and addressing commonly asked questions that we’ve handled migrating over 8600 applications to our platform.

The guide is ideal for Parse users who are working on their migration. It helps them understand how to:

  • Migrate their Parse data onto an mLab hosted MongoDB database
  • Create and test a local Parse Server
  • Deploy a Parse Server onto Heroku
  • Connect their application to the Parse Server
  • Use Parse Server to store files for their application

We are proud that our fully managed Database-as-a-Service has been the chosen platform for 78% of Parse data migrations to date. Parse users still contemplating the move should get started on their migration as soon as possible to ensure all data storage needs are met. In addition, proactive migration off the Parse backend service to a self-hosted Parse Server will provide time for development teams to learn how to maintain and scale the open-source server.

In the meantime, if you have any questions or need migration help we invite you to email us at support@mongolab.com. We look forward to helping you with all your data needs.

Comments are closed

Introducing mLab Private Environments

Today we are excited to announce the private beta of mLab Private Environments. mLab Private Environments are virtual private networks you can provision to house your various database deployments hosted with mLab. These private networks isolate your database from public networks while allowing your application infrastructure secure access to your database deployments.

With Private Environments, you can continue to use the mLab platform for dynamic database provisioning and scaling while leveraging security features that are traditionally only found in private networks.

mLab Private Environments overview

When you provision a Private Environment with mLab (currently only available on AWS) we provision a dedicated AWS VPC for that environment. You can place any number of mLab MongoDB deployments inside of that Private Environment.

You can then peer the VPC underlying your Private Environment to the AWS VPC that houses your application infrastructure. This peering operation will create a single, extended, private network consisting of both your application infrastructure and your database deployments.

mlab-private-environments

From there, you can very conveniently and scalably design network ACLs and routing rules to only allow access to your database deployment from the parts of your application infrastructure that need it.

You can provision and maintain any number of Private Environments.

Benefits of using Private Environments

The move to the public cloud has been a huge win in terms of simplicity, but also a big step backwards from a networking perspective. In order to move to the public cloud, organizations had to abandon the more sophisticated networking techniques they used to employ when working in traditional data centers.

Recently, however, public cloud providers have been reintroducing some of the networking functionality that has been missing. For example, AWS VPC (Virtual Private Clouds) allow you to create virtual private networks with subnets, route tables, and network ACLs, just like you would have in a traditional datacenter, only virtualized.

Upon this infrastructure (AWS VPC) we have implemented a new deployment solution that allows you to:

  • Isolate your database from public networks while allowing secure access to your application infrastructure.
  • Create sophisticated network topologies to ensure least privilege access to your database deployments using CIDR ranges and Security Groups.
  • Easily auto-scale your application tier without having to modify database firewall rules.

How are Private Environments used?

With Private Environments, you can use all of the traditional network security best practices and techniques for designing your application. You can place your front-end load balancers in a public subnet, and place your application servers, microservices, and databases in private subnets protected from the internet, but accessible to each other.

Furthermore, if your application tier has an auto-scaling component that accesses the database, Private Environments are extremely convenient. Before Private Environments, it was impossible to add application servers to your app tier without adding new allow rules to your database firewall. This made autoscaling VMs that required access to your database deployment extremely difficult, requiring either a NAT layer or opening your database to more sources than necessary.

With Private Environments, you simply allow the proper CIDR block for the subnet holding your application VMs, or the AWS Security Group you wish to give access to (Security Groups coming soon). You can then add and remove app infrastructure without needing to touch the definition of your database deployment’s firewall.

Availability

Private Environments is currently in private beta and only available with our Dedicated plans. If you would like to join the waitlist, please email our team at support@mlab.com.

Comments are closed

How to choose a DBaaS? Check out our piece in DZone

DZone recently asked us to contribute an article on choosing a Database-as-a-Service provider. I was excited to write this because it gave us a chance to summarize some of our experience hosting hundreds of thousands of MongoDB deployments over the past five years.

The piece, titled How to Choose a DBaaS, is also part of DZone’s new Guide to Data Persistence. Beyond our piece, there are some other interesting articles in it, like Vadim Tkachenko’s article comparing B-Trees, LSM Trees, and Fractal Trees, which are various data structures used for implementing indexes, and a very interesting and appropriate read for those of us who use MongoDB.

Check it out — hopefully you will find the guide useful, especially if you are interested in databases and learning about tools and techniques that other developers are using.

MongoLab is now mLab

We have some exciting news to share! Below is the email we sent to our users earlier today:


mLab-logo-onlight

I’m very excited to share some important news with you.

Today we are changing our name from MongoLab to mLab in order to better align with our long-term vision.

When we started this business five years ago, our MongoLab database service was the first step of a larger mission. Our ultimate goal was to simplify server-side development.

We envisioned a cloud-based laboratory where developers could build and deploy the entire server side of their application with a completely software-defined interface. Our premise was that a cloud-based server-side application stack built around JSON, microservices, document databases (like MongoDB), and software-defined networking could radically simplify server-side development.

We decided to start at the bottom of the stack and work our way upwards. This meant building a Database-as-a-Service platform upon which we could layer the rest of the stack. Around the same time we discovered MongoDB and recognized it as the operational data store of the future.

And so we set out on Phase 1 and created MongoLab: the game-changing cloud database service developers worldwide have grown to love.

I’m proud to say that mLab has become THE place to run MongoDB in the cloud and now manages over 250,000 deployments on AWS, Azure, and Google. If you are using MongoDB in the cloud, there really is no better experience.

MongoLab was a great name for us when our only product was MongoDB-as-a-Service. But now we feel it is time to change our name to one that can accommodate the larger vision that we have around cloud infrastructure and cloud data services.

Over the coming quarters we will focus on our broader premise that server-side development is largely about securely moving and transforming JSON objects between the client and the database, and that it should be much easier than it is today.

To be clear, we are still fully committed to MongoDB and our cloud MongoDB service, as this is the backbone of our future vision. MongoDB Inc., the stewards of MongoDB, will continue to concentrate on enterprise customers who host the database themselves. Our job at mLab will be to focus on the rest of the world and to continue to provide the best fully managed MongoDB-as-a-Service for developers who want to build great apps without worrying about database operations.

Thank you for helping make us such a success. We truly appreciate that you have chosen our service and look forward to helping you build great software.

Will Shulman
CEO of mLab

Please help us spread the word!

Team@mLab

{ "comments": 17 }