A Primer on Geospatial Data and MongoDB

MongoDB offers new geospatial features in versions 2.4 and 2.6.  The core of these features is the introduction of GeoJSON, an open-source format for rich geospatial types that go beyond what MongoDB has supported in previous versions.

This post is a primer for developers new to geospatial data in MongoDB. We aim to familiarize you with geospatial fundamentals in MongoDB and help you get the most out of your data.

A brief word on prior MongoDB versions (<2.4)

In the past, MongoDB geospatial features made use of coordinates stored in longitude / latitude coordinate pair form. Users would store a coordinate pair in a location field in a document. MongoDB documentation now refers to this format as "legacy coordinate pairs".

A collection of documents with legacy coordinate pairs represents a field of points.

MongoDB Legacy Coordinates

Using a geospatial (2d) index, these points were queried in two ways:

  • Proximity - To determine a set of points near a point, or within a certain distance from point, users provided another coordinate pair in their geospatial queries.  For example:
  • Inclusion - To determine if any of the stored points were within a specified area, users provided special MongoDB-specific operators like $box and $polygon in their queries. For example:

As of MongoDB 2.4, geospatial queries no longer use MongoDB-specific shape operators like $box or $polygon as part of geospatial queries. While you can still store and query legacy coordinate pairs, geospatial queries now make use of GeoJSON.

Introducing GeoJSON

GeoJSON is an open-source specification for the JSON-formatting of shapes in a coordinate space. The GeoJSON spec is used in the geospatial community and there is growing library support in most popular languages. By conforming to an open standard, MongoDB aims to make it easy for developers to work with the data they already have.

Perhaps the best way to understand GeoJSON is to see it. The following example shows the syntax for representing a Point and a Polygon.

Each GeoJSON document (or subdocument) is generally composed of two fields:

  • type - the shape being represented, which informs a GeoJSON reader how to interpret the "coordinates" field
  • coordinates - an array of points, the specific arrangement of which is determined by "type" field

Now that MongoDB supports GeoJSON, users can:

  • Store far richer data. As of version 2.6, you can store and index on the following GeoJSON types: Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon and GeometryCollection. Our field of points can now look like:MongoDB GeoJSON
  • Use GeoJSON data interchangeably in both documents AND queries. By providing the GeoJSON geometry component in a generic MongoDB $geometry special operator, arbitrary query syntax is avoided. Let's revisit our previous example of finding Points within a Square, but this time using GeoJSON:
  • Leverage entirely new features. $geoIntersects, for example, returns all locations - Points and Shapes -  that intersect with a GeoJSON point or shape. This operation, "give me the shapes that intersect this shape" is something you couldn't do before.

If you're interested in getting involved in the GeoJSON community, you can subscribe and contribute through this GeoJSON discussion list.

Geospatial Indexes

There are three kinds of geospatial indexes in MongoDB. Consult the query compatibility matrix provided in MongoDB's documentation for specific information about how each query type functions on each index type. In general:

  • 2d - These indexes support legacy coordinate pair data and allow old-form queries of the type we discuss above (such as $geoWithin using $box or $polygon syntax). 2d indexes do not support GeoJSON-formatted queries or GeoJSON data values. It's also important to note that 2d indexes operate on a flat geometry, so some client-side effort may be involved in ensuring their query results reflect real-world geospatial data.
  • 2dsphere - This new index type supports GeoJSON queries and GeoJSON data values, and also supports legacy coordinate pairs in data values. That means if you create a 2dsphere index on a field with legacy pair formatting, you can still take advantage of GeoJSON query formats. However, you can't query using legacy pairs.
  • geoHaystack - This index is optimized for searches over small areas and is only usable through the geoSearch database command. We don't discuss haystacking in this blog, but you can read more about it here.

Some geospatial tips

Number of geospatial indexes

You can have multiple geospatial indexes per collection unless you're making use of the geoNear database command and/or the $geoNear aggregation pipeline operator. Read about these considerations here.

Importing CSV geospatial data

The mongoimport tool doesn't support rich data like arrays and nested documents. Consequently, mongoimport doesn't support reading geospatial data. As a workaround, we suggest converting CSVs into JSON, which is the best format for importing geospatial data. One such converter can be found in this repo.

Formatting a Polygon

An aspect of GeoJSON that is easy to overlook when getting started is the proper formatting of polygons: the last coordinate pair provided should be the same as the first, to create a closed shape.

For more information about how to format polygons, refer to the GeoJSON Polygon and  LinearRing documentation.

GeoJSON native libraries

Many languages have native libraries that make working with GeoJSON data easier. For example, Python has the "geojson" library that contains classes for all GeoJSON objects and functions for easily creating, encoding, and decoding GeoJSON objects.

Units for Query Calculation

Units used for calculating query results vary depending on the index and the query type. Refer to the MongoDB geospatial matrix when in doubt.

This usually applies to the $maxDistance operator.  The operator limits a $near query to return only those documents that fall within a maximum distance of a point. If you query for a GeoJSON point, specify $maxDistance in meters. If you query for legacy coordinate pairs, specify $maxDistance in radians.

Thanks for reading!

There's a ton of geospatial data out there waiting to be mined and analyzed by new developers. We hope this post is a good starting point for working with geospatial data in MongoDB. If you need help working with geospatial data on MongoLab, you can reach out to our team at support@mongolab.com anytime.

Happy hacking!

Chris & Eric @ MongoLab

12 Responses to A Primer on Geospatial Data and MongoDB

  1. guest2014 2014/11/18 at 12:10 am #

    I have rectangle floor maps and not using spherical earth like system. It’s flat coordinate system, but still saving data in geojson format. (point, polygon etc on rectangle map). Seems like i can’t use 2dsphere index as it fails when coordinates fails longitude/latitude validations and i can’t use 2d index as it does not support GeoJSON. Is there a way to change/remove long/lat validation from 2dsphere index ? Thanks

  2. wsmoak 2014/11/24 at 10:43 am #

    I just ran into a similar thing — you can insert arbitrary values for a Point, but you can only query $geoWithin { $geometry { … with values that are valid longitude and latitude. Otherwise it complains “Can’t canonicalize query: BadValue bad geo query:”

  3. Sharry 2015/04/14 at 6:21 am #

    Thanks for this. Lots of help for MongoDb & geospatial queries

  4. guest2015 2015/04/19 at 11:41 am #

    MongoDB official docs say that the format to store location data is longitude then latitude but you’ve showed it opposite in your post.

  5. TaoTeG 2015/06/02 at 2:12 pm #

    The sequencing of long/lat varies depending on whether or not you are approaching it from a software perspective or a traditional mapping perspective. It is ultimately a subjective debate, but the GeoJSON spec specifies long/lat as the order so for Mongo that is the way to go. That said, if you have used other GIS tools before, or come from a Leaflet/GoogleMaps/Apple Mapkit background, you will be accustomed to lat/long and will have to pay attention to that detail. This post explains in greater detail than I can: http://www.macwright.org/2015/03/23/geojson-second-bite.html#coordinate

  6. Marian 2015/07/13 at 12:27 am #

    How should i insert a field that defines a circle?I need an example of schema…ex:: location:{loc:[latitude,longitude],radius:1}…i’ve tried but some errors appears..Thanks for any help :)

  7. Chris Worthington 2015/09/09 at 10:23 pm #

    Wow, this is really cool. Wish our affiliate search site had been developed with MongoDB, the search queries would run much faster.

    Our current system calculates local address distances based on drawing a circle between the two points and measuring the radius. Using “$near” would speed up those calculations.

  8. Kai Mast 2016/01/25 at 10:04 pm #

    I am currently evaulating 2d indexes in MongoDB and they’re really slow for me. Do you have any idea what I am doing wrong?

    I set up the collection like this:
    db.srb.createIndex({“pos”: “2d”}, {min: 0, max: 10000, bits: 32})

    It takes very long (=hours) to finish one of the microbenchmarks I wrote (multiple clients querying small fractions of the keyspace).
    I also get stuff like this printed to the console:
    2016-01-26T05:58:14.493+0000 I COMMAND [conn55] command test.srb command: find { find: “srb”, filter: { position: { $geoWithin: { $box: [ [ 6512.537414344119, 8030.729960779566 ], [ 6522.537414344119, 8040.729960779566 ] ] } } }, shardVersion: [ Timestamp 10000|0, ObjectId(’56a70878c07739efc3a97634′) ] } planSummary: COLLSCAN keysExamined:0 docsExamined:245818 cursorExhausted:1 keyUpdates:0 writeConflicts:0 numYields:1920 nreturned:0 reslen:151 locks:{ Global: { acquireCount: { r: 3842 } }, MMAPV1Journal: { acquireCount: { r: 1921 } }, Database: { acquireCount: { r: 1921 } }, Collection: { acquireCount: { R: 1921 } } } protocol:op_command 239ms

    I also tried both sharding and using replica sets to increase throughput. But both doesn’t seem to work.
    I wonder why it needs to examine so many objects? This is on MongoDB 3.2

  9. Chris Chang 2016/01/26 at 5:34 pm #

    Hi there,

    It looks like your query/queries are not using an index (as evidenced by the COLLSCAN in the log message). The index you’ve created specified the field name “pos”, whereas the query that is in the log message specifies “position”.

    Proper indexes are critical for good database performance. A good rule of thumb is that if you see “COLLSCAN”, you should review your indexes to make sure they cover your queries.

    Happy hacking!

  10. Kai Mast 2016/01/26 at 5:46 pm #

    Woops. What a stupid mistake. Thanks for spotting that.

Trackbacks/Pingbacks

  1. DB Weekly No.26 | ENUE Blog - 2015/08/14

    […] A Primer on Geospatial Data and MongoDB […]

  2. Geospatial data sources and Bluemix | Cloudy with SaaS-Shine - 2015/12/18

    […] http://blog.mongolab.com/2014/08/a-primer-on-geospatial-data-and-mongodb/ […]

Leave a Reply