The Sweet Java Topology Suite – Part II

by

In a previous post, we described how we started using the Java Topology Suite (JTS) to manipulate postal/zip code polygons that we are viewing in an application built on MapQuest’s Flex API. Since then, we have added the ability to join multiple postal codes into territories. Sometimes over 1,000 postal code polygons will be combined to form a single territory.

We ran into two significant technical hurdles. First, MapQuest’s API doesn’t support polygons with inner holes. So, a donut-shaped polygon would just look like a circle, with no hole in the middle. The other problem was that some of the postal codes were so complicated that the unify process would fail.

Union of postal code polygons with a hole in the middle

Union of postal code polygons with a hole in the middle

This union of postal code polygons should have a hole in the middle

Union of postal code polygons, missing the hole in the middle

If you read the other article, you saw that we did use JTS to simplify polygons (by reducing the number of points that make up the polygon). However, we didn’t end up using those in production because the edges of the simplified polygons would not line up. They end up looking like broken glass, because the simplify process had no regard for adjacent polygon edges.

Simplified polygons with edges that don't line up

Simplified polygons with edges that don't line up

So, we set out on an adventure to simplify the polygons so that the edges of the simplified postal codes matched up. We received some very responsive and helpful guidance from Martin Davis, one of the principle developers of JTS. He also pointed us to the open source tool OpenJUMP, which he also helped to build. Source code from that tool was very helpful as we created our own automated simplification process.

Here’s the simplification process in a nutshell:

  1. Convert the MapQuest postal code polygon data for the current patch (like the lower 48 states) to Well-Known Text (WKT) and save each postal code polygon to an individual file on the file system. For the lower 48 states, this resulted in more than 41,000 files. Here is an unsimplified version of the few polygons we’ll simplify in this example:

    Original, unsimplified postal code polygons

    Original, unsimplified postal code polygons

  2. Read all of the WKT files, one per postal code, and store them as JTS Geometry objects in a collection. To support step six (below), we store the postal code in the geometry object using the very handy Geometry.userData property. That way, each original/source geometry remembers what postal code it represents.
  3. Use JTS to convert the polygons to merged LineString objects. This creates a collection of the outlines of every polygon, where the common polygon edges become a single line.

    Extracted border lines of original polygons

    Extracted border lines of original polygons

  4. Use JTS to simplify the merged LineStrings by reducing the number of coordinates that define each line. Our code iterates across every merged LineString and uses JTS’s DouglasPeuckerSimplifier with a simplify tolerance of 0.01.

    Simplified polygon border lines

    Simplified polygon border lines

  5. Use JTS to create polygons from the simplified LineStrings. The primary JTS class was the magic Polygonizer class, along with code from OpenJUMP that prepared the line data for the Polygonizer.

    New polygons made from simplified lines

    New polygons made from simplified lines

  6. Now the tough part. We have a collection of simplified polygons, but they aren’t linked to any postal codes, so we can’t find the polygon and use it in our application. We needed to match the simplified polygon with the original. Since this is among the most involved processes, I’ll describe it in a bit more detail:
    1. Add each of the original polygons to a JTS SpatialIndex called STRtree. The STRtree provides a quick query interface to find polygons that fall within a spatial constraint.
    2. Iterate through each of the simplified polygons, and:
      1. Query the STRtree to find all of the original polygons that touch the envelope (bounding rectangle) of the current simplified polygon.
      2. Find the polygon in that set which has the smallest distance between its center point and the simplified polygon’s center point.
      3. Once the best matching simplified polygon is found, we copy the postal code from the original Geometry’s userData.
      4. Some simplified polygons have no match in the original set because of holes, so those non-matches are thrown out in this process.
    3. Now that each simplified polygon has been identified as matching a postal code, we write new WKT files for each postal code. Our code that writes these files automatically creates MultiPolygon objects for those postal codes that are made up of more than one polygon.

    Simple polygons that remain after match with originals

    Simple polygons that remain after match with originals

In order to run this process on the lower 48 United States, I had to allocate 7GB of my 8GB of RAM to the JVM so that all 41,000 polygons could be simplified at the same time. Fortunately, it’s worth the time to build. Here are the number of coordinates needed to represent all of the polygons for the three areas, both originally and after simplification, along with the savings realized:

Coordinate Count
Original Simplified Reduction
Lower 48 United States 6,276,000 544,000 12x smaller
Alaska 262,000 15,000 17x smaller
Hawaii 72,000 960 75x smaller

Here’s a larger area of polygons, before and after simplification:

Original postal code polygon sample

Original postal code polygon sample

Simplified postal code polygon sample

Simplified postal code polygon sample

In order to create polygons that maintain any holes in the middle with MapQuest’s polygon API, we used JTS to cut a small slice between any inner features and the exterior of the polygon. This leaves a line in the middle of the polygon, but it’s more acceptable than no hole at all. Hopefully MapQuest will support polygons with inner holes in a later release. In fact, it would be really cool if MapQuest would incorporate other structures and features from JTS, including native WKT support.

Simplified postal codes on map

Simplified postal codes on map

Territory on map with hole enabled by slice

Territory on map with hole enabled by slice

We are very grateful for the Java Topology Suite and the polygon processing it allowed us to complete. The project we’re building for Dave’s Endorsed Local Provider program will be much more successful with these improvements.

Advertisements

Tags: , , , ,

6 Responses to “The Sweet Java Topology Suite – Part II”

  1. chasingthelion Says:

    this is great, sounds like you’re taking the mapping app to new heights. i hadn’t heard of JTS either. After you crunch down the plygons do you store them in a db of reference?

    -leon

  2. Steven Citron-Pousty Says:

    How long did the each of those steps take on your mighty b0x3n

  3. dugsmith Says:

    @leon: (nice to see your name) Yes, we store the polygons in the database, as gzipped XML representations of MapQuest FeatureCollection objects, so that the Flex client can read them. We also keep the WKT in the database as a reference.

    @Steven: I created an ant script that invokes all the code to do the process. It takes about 8 minutes on my box to process all of the lower-48 USA postal codes.

  4. Omniture Press Release featuring DaveRamsey.com « Tony Bradshaw’s Weblog Says:

    […] Mapquest mapping technology. One of the first companies to leverage their new API toolset (visit post on webmonkeyswithlaserbeams.wordpress.com Post 1 Post 2 […]

  5. First Experience with EC2 « danwatt.org Says:

    […] points) so that they can be used in a webapp that we use to maintain service provider territories (more details here). Due to the nature of how the graph algorithms work, we have to load the entire US (48 states) […]

  6. Same Boat Says:

    Thanks for the post. I’m currently facing a similar challenge (simplified country boundaries), albeit complicated by the fact that country geometries are multi-polygons with lots of small, pesky islands. This presents a challenge since, in Step 6, the centroid for each island wouldn’t necessarily correlate with the original centroid for the overall country. Any ideas about how to address this?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: