Archive for March, 2009

Plan your escape

March 21, 2009

URL encoding is not character/entity encoding.

This should go without saying, but I frequently see this confused by experienced developers, especially when working with dynamic/loosely typed languages.

URL encoding is for URLs (URIs to be more generic).  The only time to URL-encode a string is when it is part of a URL.  JavaScript provides encodeURI(), encodeURIComponent(), decodeURI, and decodeURIComponent().  In ColdFusion, you can use the  URLEncodedFormat() and URLDecode() functions.  PHP provides urlencode(), rawurlencode() and their decode counterparts.

Entity encoding is used for representing characters in a document that lie outside of the document character set or have a special meaning within the document.  For example In XML, &, <,>,”,and ‘ have to be encoded as entities (‘&amp;, &gt;, etc.) in the document source code.  Typically, characters outside of “low” ASCII need to be encoded as well.  In client-side code, you typically need not worry about entity-encoding. You’re working with a DOM, not document source code, so entity expansion/subsitituion has already been done.

ColdFusion is a little tricky on this one.  There is no equivalent to PHP’s htmlentities.  You basically have two options, HTMLEditFormat and XMLFormat.  The former will encode characters with special meanings, but it misses high-ASCII (and higher).  The latter will encode high-ASCII, but will not use special HTML entity names.  It’s for (the more generic) XML after all.  XMLFormat escapes characters using character entity references.

The concept applies outside of this concrete example, but this is the example that led me to channel my angst into what I hope is a helpful guide post for others.  In fact, you’ll notice that two paragraphs above, Word Press has transformed my double and single quotes to right-double and left-single quotes, respectively.  What are some other escaping/transforming pitfalls you’ve seen?

Advertisements

User-Centric Development

March 18, 2009

The keynote at DevLearn 2008 this past past fall was given by Dan Roam, the author of The Back of the Napkin. Dan spoke about the importance of thinking visually, especially when problem solving.

His presentation spurred a conversation between myself and the other Lampo attendees at the conference: Jon Shearer and Michael Finney. We were discussing how bad things can happen when software engineers find a neat “feature” they can work into an app, regardless of whether the users want or need it.

This brought Finney to a perfect, real-life example of this concept in action. He had recently purchased a new Gateway laptop with a built-in 802.11n wireless network adapter. When using the wireless, however, he had noticed that it would occasionally slow to a crawl. It’s never terribly convenient to troubleshoot this type of problem, especially when so many components are involved (ISP, broadband modem, wireless router, laptop hardware, OS…) and the problem is intermittent.

Well, last weekend, he had a guest in his house with a laptop, and he was able to do a side-by-side speed test during one of these episodes, and he discovered that while the other laptop was getting about 20mbps down, his was getting about 20kbps down.

This took the network out of the troubleshooting stack, so he really dug into the laptop to locate the source of the problem. Before long, he found a “feature” that is new to Windows Vista.

As with prior versions of Windows, you can create and customize power schemes to use when on battery power in order to preserve battery life by throttling the power that certain components use. It is common to scale back the processor speed or the LCD brightness to achieve longer batter life.

Well, Vista has added Wireless Network Adapter Throttling into the default power saver scheme, effectively crippling your network speeds in the name of battery preservation. Let’s think this through for a minute. I want to preserve battery life, so having to sit and wait 1000x longer for a website to come up (all while my LCD and processor burn through battery while I sit and do nothing) is the answer?

Just because you can do something as a developer with the technology you have, doesn’t mean you should.

And just in case you got to this post by searching for an answer to your network slowdown woes, here’s how you change this setting in Vista. Control Panel -> Hardware & Sound -> Power Settings -> Change Plan Settings (on your currently selected power plan) -> Change advanced power settings -> Wireless Adapter Settings -> Power Saving Mode. That’s right, 7 clicks deep, one of which includes clicking on “Sound”. *sigh*

The Sweet Java Topology Suite – Part II

March 4, 2009

In a previous post, we described how we started using the Java Topology Suite (JTS) to manipulate postal/zip code polygons that we are viewing in an application built on MapQuest’s Flex API. Since then, we have added the ability to join multiple postal codes into territories. Sometimes over 1,000 postal code polygons will be combined to form a single territory.

We ran into two significant technical hurdles. First, MapQuest’s API doesn’t support polygons with inner holes. So, a donut-shaped polygon would just look like a circle, with no hole in the middle. The other problem was that some of the postal codes were so complicated that the unify process would fail.

Union of postal code polygons with a hole in the middle

Union of postal code polygons with a hole in the middle

This union of postal code polygons should have a hole in the middle

Union of postal code polygons, missing the hole in the middle

If you read the other article, you saw that we did use JTS to simplify polygons (by reducing the number of points that make up the polygon). However, we didn’t end up using those in production because the edges of the simplified polygons would not line up. They end up looking like broken glass, because the simplify process had no regard for adjacent polygon edges.

Simplified polygons with edges that don't line up

Simplified polygons with edges that don't line up

So, we set out on an adventure to simplify the polygons so that the edges of the simplified postal codes matched up. We received some very responsive and helpful guidance from Martin Davis, one of the principle developers of JTS. He also pointed us to the open source tool OpenJUMP, which he also helped to build. Source code from that tool was very helpful as we created our own automated simplification process.

Here’s the simplification process in a nutshell:

  1. Convert the MapQuest postal code polygon data for the current patch (like the lower 48 states) to Well-Known Text (WKT) and save each postal code polygon to an individual file on the file system. For the lower 48 states, this resulted in more than 41,000 files. Here is an unsimplified version of the few polygons we’ll simplify in this example:

    Original, unsimplified postal code polygons

    Original, unsimplified postal code polygons

  2. Read all of the WKT files, one per postal code, and store them as JTS Geometry objects in a collection. To support step six (below), we store the postal code in the geometry object using the very handy Geometry.userData property. That way, each original/source geometry remembers what postal code it represents.
  3. Use JTS to convert the polygons to merged LineString objects. This creates a collection of the outlines of every polygon, where the common polygon edges become a single line.

    Extracted border lines of original polygons

    Extracted border lines of original polygons

  4. Use JTS to simplify the merged LineStrings by reducing the number of coordinates that define each line. Our code iterates across every merged LineString and uses JTS’s DouglasPeuckerSimplifier with a simplify tolerance of 0.01.

    Simplified polygon border lines

    Simplified polygon border lines

  5. Use JTS to create polygons from the simplified LineStrings. The primary JTS class was the magic Polygonizer class, along with code from OpenJUMP that prepared the line data for the Polygonizer.

    New polygons made from simplified lines

    New polygons made from simplified lines

  6. Now the tough part. We have a collection of simplified polygons, but they aren’t linked to any postal codes, so we can’t find the polygon and use it in our application. We needed to match the simplified polygon with the original. Since this is among the most involved processes, I’ll describe it in a bit more detail:
    1. Add each of the original polygons to a JTS SpatialIndex called STRtree. The STRtree provides a quick query interface to find polygons that fall within a spatial constraint.
    2. Iterate through each of the simplified polygons, and:
      1. Query the STRtree to find all of the original polygons that touch the envelope (bounding rectangle) of the current simplified polygon.
      2. Find the polygon in that set which has the smallest distance between its center point and the simplified polygon’s center point.
      3. Once the best matching simplified polygon is found, we copy the postal code from the original Geometry’s userData.
      4. Some simplified polygons have no match in the original set because of holes, so those non-matches are thrown out in this process.
    3. Now that each simplified polygon has been identified as matching a postal code, we write new WKT files for each postal code. Our code that writes these files automatically creates MultiPolygon objects for those postal codes that are made up of more than one polygon.

    Simple polygons that remain after match with originals

    Simple polygons that remain after match with originals

In order to run this process on the lower 48 United States, I had to allocate 7GB of my 8GB of RAM to the JVM so that all 41,000 polygons could be simplified at the same time. Fortunately, it’s worth the time to build. Here are the number of coordinates needed to represent all of the polygons for the three areas, both originally and after simplification, along with the savings realized:

Coordinate Count
Original Simplified Reduction
Lower 48 United States 6,276,000 544,000 12x smaller
Alaska 262,000 15,000 17x smaller
Hawaii 72,000 960 75x smaller

Here’s a larger area of polygons, before and after simplification:

Original postal code polygon sample

Original postal code polygon sample

Simplified postal code polygon sample

Simplified postal code polygon sample

In order to create polygons that maintain any holes in the middle with MapQuest’s polygon API, we used JTS to cut a small slice between any inner features and the exterior of the polygon. This leaves a line in the middle of the polygon, but it’s more acceptable than no hole at all. Hopefully MapQuest will support polygons with inner holes in a later release. In fact, it would be really cool if MapQuest would incorporate other structures and features from JTS, including native WKT support.

Simplified postal codes on map

Simplified postal codes on map

Territory on map with hole enabled by slice

Territory on map with hole enabled by slice

We are very grateful for the Java Topology Suite and the polygon processing it allowed us to complete. The project we’re building for Dave’s Endorsed Local Provider program will be much more successful with these improvements.