Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The 10:10 Code (2010) (jgc.org)
117 points by jgrahamc on July 18, 2014 | hide | past | favorite | 29 comments


If you're looking to add a checksum, why not use an existing format that does things like variable precision and proximity well like geohash (http://en.wikipedia.org/wiki/Geohash) and just tack on a checksum digit?

"ezs42cw4" becomes "ezs42cw4 f3" and my client can generate the checksum for me so I just have to verify, rather than enter the last two digits.



I guess today is the day for talking about encoding locations on earth on HN: https://news.ycombinator.com/item?id=8052908


The trick is to get Google to recognise these codes. If I can type it into maps or the search box and it comes up with a longitude or latitude then that will make is useful.


I thought I could maybe do better with a binary search system. Here is some napkin math:

Circumference of earth: 40,075,000m

Number of choices to get 10m accuracy: log2(40,075,000/10) = 21.9 ~ 22 bits

We need two numbers for latitude and longitude = 44 bits

We can get 5 bits from each alphanum character = 44/5 ~ 9 alphanum characters

Add the checksum character and you're at 10 characters.

Anyone have better methods that only use capital letters and digits?

This system has the disadvantage of two nearby locations potentially having totally different codes.


Besides the "more accurate at poles" issue, you are also double-counting because you don't need to cover the circumference twice.

An alternative is to find the Earth's surface area (5.1e8 km^2), and divide by 10mX10m. This gives the number of cells. Then take log base 30 (26 alpha + 10 digit - {"O", "I", "S", "U", "V", "Z"}).

This turns out to be 8.60 characters. To detect all single-character errors, you need a full extra character (9.60).

If you allow all alphanumerics (36 possibilities), you end up with 8.16 characters, plus one for errors is 9.16.

Conclusion: if you were willing to give up some accuracy over oceans, you could get away with 9 characters, otherwise, 10 is the best you can do.


As the others have said, there are too many bits used at the poles. You might be interested in sphere point picking:

http://mathworld.wolfram.com/SpherePointPicking.html

x = sqrt(1 - u^2) * cos θ

y = sqrt(1 - u^2) * sin θ

z = u

θ ∈ [0,2π)

u ∈ [-1,1]

The idea is to pick θ and u uniformly instead of latitude and longitude, which ensures equal spacing anywhere on the globe (no squishing at the poles). If you visualize the areas of the side of a cylinder and a sphere:

AreaOfSideOfCylinder = 2 * π * * r * h = 2 * π * r * (2 * r) = 4 * π * r^2

AreaOfSphere = 4 * π * r^2

You can see that the areas are equal, but if you think of the triangular sections of a basketball, a uniform distribution uses just over half the bits of lat/long. That’s because the sections bulge out, and also latitude isn’t as susceptible to pinching (perhaps someone who works with distributions can provide the exact ratio). So my guess is that this scheme would use perhaps 6 or 7 digits, depending on whether a checksum is desired.

I've used the formulas on that page with great success for things like random stars in OpenGL. This is loosely related to quaternions but I don't have enough math to describe how exactly. Quaternions sweep uniformly, whereas Euler angles pinch and suffer from gimbal lock.


Too late to edit my own post but mturmon is correct, my guess about the number of digits was incorrect. Even with a uniform distribution, it still takes 8.44 digits in base 32 and 7.04 digits in base 64, not counting checksum:

https://www.wolframalpha.com/input/?i=log+base+32+of+%28%285...

https://www.wolframalpha.com/input/?i=log+base+64+of+%28%285...


One obvious "issue" with that: you have better accuracy at the poles than at the equator.


Does latitude and longitude not suffer from the same problem?


No because latitude and longitude are infinitely precise only limited by the accuracy of the measuring instrument. With the binary search idea you proposed the dividing lines are more tightly packed longitudinally near the poles.


But near the poles, 1 degree longitude specifies a much smaller span than at the equator. So with say, 4 digits per latitude and longitude, you can specify a much smaller region at the poles than the equator.


And that's the problem.

To get 10m accuracy at the equator, you'd need to get much greater than 10m accuracy at the poles.

You're effectively wasting information.

The surface area of the Earth is 5.10x10^8 km^2. In an ideal world, you could uniquely represent every 10m*10m area of the Earth - there are 5.1x10^12 such areas. log2(5.1x10^12) is about 42.2.

So you're wasting about 1.8 bits of information (~4%). Not too bad, now that I calculate it out.


Yes, I get that. I was responding to user rtkwe who said that latitude/longitude didn't waste information.


Lat. long. doesn't have a length limitation of the binary search system, 10:10 so the information isn't wasted because where the grid is sparser near the equator we just extend the decimal places.


That "extension" is itself the waste being discussed. Some places require more digits than others.


I was going to put together a tray widget tool for this, but even the example code given doesn't work for me, as far as I can tell. For the coördinates of the UK Eurotunnel terminal, it gives me: MEQ N6G 7NY5. Problems on a trivial level like that don't encourage me to think this idea is important to its creator — especially since I see the blog comments are full of people who noticed the same thing, with no response.


There is also the Maidenhead Locator System used by Amateur Radio. I always thought it was very useful and could have applications outside of Amateur Radio:

http://en.wikipedia.org/wiki/Maidenhead_Locator_System


Assuming we don't see floating barge-cities and antarctic dome-habitats popping up, you might be able to achieve some substantial savings with a system that's limited to major continents and islands. That's a ~70% shrinkage in the coordinate-space.

Alternately, maybe some form of huffman coding that assigns shorter representations to common zones...


I'd doubt limiting land space would be very helpful -- given that the each character encodes 33 possibilities (alphabet + digits, without O, I, or L for clarity), you'd need to reduce the space by a factor of 33 (~97%) to remove a single character.

Huffman coding would be interesting, but the tradeoff would be that since locations are no longer standardized to 10 characters, you'd lose the validation aspect of it (not to mention you'd have to change the name =).

Edit: Apparently S and Z are also missing, so 31 possibilities.


Alternately you could keep the same number of characters and use the "extra" to encode more accuracy. Trimming it from 10m accuracy down to 3m sounds very useful for finding stores and homes in denser cities.


That's a lot of digits.

It would be interesting to interlace the latitude and longitude bits in such a way that any prefix is a valid code of reduced resolution, and you can add arbitrarily many digits for increased precision. I think geohash does this? Or does reducing precision give you a geohash that isn't a prefix? Anyway...

Each digit of such a code might be roughly equivalent to one decimal digit of both latitude and longitude. For example, maybe the first three digits would get you to within about a nautical mile.

But that's dumb. While short codes are nice, a short code that only gets you within a mile is mostly useless. It would be better to use a suffix than a prefix. You can still get arbitrary precision with a decimal point.

First, specify a fixed "unit" of precision. Say 10m. Any short code is assumed to be the least significant digits. If my math is right, you could specify any point closer than 2 km +/- 10 m using maybe 2 digits. Most buildings in most cities will need maybe 3-5 digits.

You could invite somebody from the same city to your house, telling them you live at YA9. Their GPS would assume that the missing prefix digits match their current location. If you want to give them the location of the centerpiece on your dining room table, you could add a decimal point and more precision: YA9.7R.

You could even use short-codes that look like Mapcodes: US-DC, YA9. The location provides the context for the missing prefix digits. But you always have the option of specifing the full code to avoid ambiguity or lookup tables.

One problem is the check digit. Maybe there's a way to interlace the check bits too, so you don't need a full check digit for every code. Is there a checksum algorithm where a prefix of the checksum is a checksum of a prefix? That obviously couldn't detect a one-digit typo. Or, maybe just put a check digit at end. All codes become 2 digits minimum. The checksum algorithm wouldn't need to be fancy since you can assume the rest of the digits based on your current location.

Another easily fixed problem is mentioned on the Geohash wikipedia page: locations near the equator and prime meridian could have wildly different codes than other locations nearby. The solution to that is easy: Translate lat/lon into a different system. Translate latitude into a number from 0 to 180, with 0 being the south pole. Similarly translate longitude to a number from 0 to 360, with 0 being at the international date line. Then encode those translated numbers. Nobody would ever see those intermediate numbers. You always display either lat/lon or the code, but the translation ensures that you don't have vastly different codes for nearby places (at least in any place with a significant number of people).


Check out Military Grid Reference System (MGRS) [1], referenced in the article. Increasing the number of digits increases the precision of the location.

1: http://en.wikipedia.org/wiki/Military_grid_reference_system


As the comments note, encodings for humans should probably skip letters and numbers that are easily confused.


This is reflected in the code:

    var alphabet = 'ABCDEFGHJKMNPQRVWXY0123456789';
perhaps not perfectly, but 2 vs Z, 5 vs S, 0 vs O are solved for.


It's certainly better than nothing. The system now cannot direct the user to the wrong location if they input the wrong, similar, characters. But it does not make the user feel confident that the input was correct.

This only works if you know 2 is a possible character, but Z is not, and etc. Otherwise an average user still isn't sure whether '0' means zero, or the letter O. But at least the system will refuse the incorrect version.

You could make it a little more user friendly if you replace all the non-existent characters with their similar partner before interpreting ('Z' becomes '2'), but then it still is not clear to the user if the code they entered should have been 'ABC123' or 'ABC1Z3'. Even though they are identical to the system, the user cannot know this (always assume the user will not read your FAQ).


It's a one way relationship though, digits always trump the letter look-alikes, so that helps.


I would probably map the longitude and latitude on a hilbert curve so there is better locality of the codes.


The codes are already pretty "local". A small change in location only makes a small change in the code.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: