In our previous post on MongoDB storage structure and dbStats metrics, we covered how MongoDB stores data and the differences between the dataSize, storageSize and fileSize metrics. We can now apply this knowledge to evaluate strategies for re-using MongoDB disk space.
When documents or collections are deleted, empty record blocks within data files arise. MongoDB attempts to reuse this space when possible, but it will never return this space to the file system. This behavior explains why fileSize never decreases despite deletes on a database.
If your app frequently deletes or if your fileSize is significantly larger than the size of your data plus indexes, you can use one of the methods below reclaim free space.
Getting your free space back
Compacting individual collections
You can compact individual collections using the compact command. This command rewrites and defragments all data in a collection, as well as all of the indexes on that collection.
Important notes on compacting:
This operation blocks all other database activity when running and should be used only when downtime for your database is acceptable. If you are running a replica set, you can perform compaction on secondaries in order to avoid blocking the primary and use failover to make the primary a secondary before compacting it.
Compacting individual collections will not reduce your storage footprint on disk (i.e., your fileSize) but it will defragment the collections you compact.
Compacting one or more databases
For a single-node MongoDB deployment, you can use the db.repairDatabase() command to compact all the collections in the database. This operation rewrites all the data and indexes for each collection in the database from scratch and thereby compacts and defragments the entire database.
To compact all the databases on your server process, you can stop your mongod process and run it with the "--repair" option.
Important notes on running a repair:
This operation blocks all other database activity when running and should be used only when downtime for your database is acceptable.
Running a repair requires free disk space equal to the size of your current data set plus 2 GB. You can use space in a different volume than the one that your mongod is running in by specifying the "--repairpath" option.
Compacting all databases on a server by re-syncing replica set nodes
For a multi-node MongoDB deployment, you can resync a secondary from scratch to reclaim space. By resyncing each node in your replica set you effectively rewrite the data files from scratch and thereby defragment your database.
Please note that if your cluster is comprised of only two electable nodes, you will sacrifice high availability during the resync because the secondary is completely wiped before syncing.
If your app is sensitive to downtime, we recommend a process similar to the one we use here at MongoLab which we call a "rolling node replacement." This process replaces each node in your cluster in turn by bringing a new node into the cluster, replicating the data to that new node and removing the old node. In this way, your cluster can maintain the same level of redundancy during the compaction as during normal operations.
A tip about efficiently using space
Setting the usePowerof2Sizes option is a proactive approach to reusing space in collections that experience frequent document moves or deletions. This option supersedes the default padding factor mechanism and reduces the impact of fragmentation within the collection by allocating additional space for each document in intervals that follow the powers of 2. Setting this option for a specific collection makes it less likely that documents in that collection need to be moved when they grow in size, less likely that a document will need to be moved more than once in its lifetime, and more likely that space left by moving documents can be reused by new or other moved documents.
Thanks for reading!
We hope the above strategies help guide you in evaluating options for reusing empty space in your MongoDB.