A Primer on Geospatial Data and MongoDB

MongoDB offers new geospatial features in versions 2.4 and 2.6.  The core of these features is the introduction of GeoJSON, an open-source format for rich geospatial types that go beyond what MongoDB has supported in previous versions.

This post is a primer for developers new to geospatial data in MongoDB. We aim to familiarize you with geospatial fundamentals in MongoDB and help you get the most out of your data.

A brief word on prior MongoDB versions (<2.4)

In the past, MongoDB geospatial features made use of coordinates stored in longitude / latitude coordinate pair form. Users would store a coordinate pair in a location field in a document. MongoDB documentation now refers to this format as "legacy coordinate pairs".

A collection of documents with legacy coordinate pairs represents a field of points.

MongoDB Legacy Coordinates

Using a geospatial (2d) index, these points were queried in two ways:

  • Proximity - To determine a set of points near a point, or within a certain distance from point, users provided another coordinate pair in their geospatial queries.  For example:
  • Inclusion - To determine if any of the stored points were within a specified area, users provided special MongoDB-specific operators like $box and $polygon in their queries. For example:

As of MongoDB 2.4, geospatial queries no longer use MongoDB-specific shape operators like $box or $polygon as part of geospatial queries. While you can still store and query legacy coordinate pairs, geospatial queries now make use of GeoJSON.

Introducing GeoJSON

GeoJSON is an open-source specification for the JSON-formatting of shapes in a coordinate space. The GeoJSON spec is used in the geospatial community and there is growing library support in most popular languages. By conforming to an open standard, MongoDB aims to make it easy for developers to work with the data they already have.

Perhaps the best way to understand GeoJSON is to see it. The following example shows the syntax for representing a Point and a Polygon.

Each GeoJSON document (or subdocument) is generally composed of two fields:

  • type - the shape being represented, which informs a GeoJSON reader how to interpret the "coordinates" field
  • coordinates - an array of points, the specific arrangement of which is determined by "type" field

Now that MongoDB supports GeoJSON, users can:

  • Store far richer data. As of version 2.6, you can store and index on the following GeoJSON types: Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon and GeometryCollection. Our field of points can now look like:MongoDB GeoJSON
  • Use GeoJSON data interchangeably in both documents AND queries. By providing the GeoJSON geometry component in a generic MongoDB $geometry special operator, arbitrary query syntax is avoided. Let's revisit our previous example of finding Points within a Square, but this time using GeoJSON:
  • Leverage entirely new features. $geoIntersects, for example, returns all locations - Points and Shapes -  that intersect with a GeoJSON point or shape. This operation, "give me the shapes that intersect this shape" is something you couldn't do before.

If you're interested in getting involved in the GeoJSON community, you can subscribe and contribute through this GeoJSON discussion list.

Geospatial Indexes

There are three kinds of geospatial indexes in MongoDB. Consult the query compatibility matrix provided in MongoDB's documentation for specific information about how each query type functions on each index type. In general:

  • 2d - These indexes support legacy coordinate pair data and allow old-form queries of the type we discuss above (such as $geoWithin using $box or $polygon syntax). 2d indexes do not support GeoJSON-formatted queries or GeoJSON data values. It's also important to note that 2d indexes operate on a flat geometry, so some client-side effort may be involved in ensuring their query results reflect real-world geospatial data.
  • 2dsphere - This new index type supports GeoJSON queries and GeoJSON data values, and also supports legacy coordinate pairs in data values. That means if you create a 2dsphere index on a field with legacy pair formatting, you can still take advantage of GeoJSON query formats. However, you can't query using legacy pairs.
  • geoHaystack - This index is optimized for searches over small areas and is only usable through the geoSearch database command. We don't discuss haystacking in this blog, but you can read more about it here.

Some geospatial tips

Number of geospatial indexes

You can have multiple geospatial indexes per collection unless you're making use of the geoNear database command and/or the $geoNear aggregation pipeline operator. Read about these considerations here.

Importing CSV geospatial data

The mongoimport tool doesn't support rich data like arrays and nested documents. Consequently, mongoimport doesn't support reading geospatial data. As a workaround, we suggest converting CSVs into JSON, which is the best format for importing geospatial data. One such converter can be found in this repo.

Formatting a Polygon

An aspect of GeoJSON that is easy to overlook when getting started is the proper formatting of polygons: the last coordinate pair provided should be the same as the first, to create a closed shape.

For more information about how to format polygons, refer to the GeoJSON Polygon and  LinearRing documentation.

GeoJSON native libraries

Many languages have native libraries that make working with GeoJSON data easier. For example, Python has the "geojson" library that contains classes for all GeoJSON objects and functions for easily creating, encoding, and decoding GeoJSON objects.

Units for Query Calculation

Units used for calculating query results vary depending on the index and the query type. Refer to the MongoDB geospatial matrix when in doubt.

This usually applies to the $maxDistance operator.  The operator limits a $near query to return only those documents that fall within a maximum distance of a point. If you query for a GeoJSON point, specify $maxDistance in meters. If you query for legacy coordinate pairs, specify $maxDistance in radians.

Thanks for reading!

There's a ton of geospatial data out there waiting to be mined and analyzed by new developers. We hope this post is a good starting point for working with geospatial data in MongoDB. If you need help working with geospatial data on MongoLab, you can reach out to our team at support@mongolab.com anytime.

Happy hacking!

Chris & Eric @ MongoLab