Show HN: Serverless USGS historical topographic map tiles

kylebarron · on May 6, 2021

Hi all, I was invited to repost this (originally posted last year) to be included in the second chance pool!

The USGS created topographic maps by hand from 1884 to 2006 (when they started making their topographic map series digitally). I stumbled upon the fact that all 183,000 of their hand-drawn maps were digitized and uploaded to a public S3 bucket [0] in Cloud-Optimized GeoTIFF format [1].

Because they're in this format, it's possible to quickly mosaic images _on demand_, in a process called "dynamic tiling" [2]. The dynamic nature of it gives you a lot more flexibility, because you aren't creating static tiles once.

In contrast, the hardest part about this project is the time variation. The USGS created many different scales of maps at many different time periods, so figuring out how to combine them into a few mosaics is quite tricky. With more time invested in this, it would be ideal to have a search capability to show user-specified time horizons. (Edit: I talk about this process a bit more in the Github README [3]).

More recently I've been spending time doing similar dynamic tiling projects, but focusing on satellite imagery, to enable analytic processing of satellite imagery in the browser [4].

[0]: `aws s3 ls s3://prd-tnm/StagedProducts/Maps/HistoricalTopo/GeoTIFF/`

[1]: https://www.cogeo.org/

[2]: https://kylebarron.dev/blog/cog-mosaic/overview

[3]: https://github.com/kylebarron/usgs-topo-tiler

[4]: https://www.unfolded.ai/blog/2021-04-28-raster-layer/

zzleeper · on May 7, 2021

Hi Kyle,

This looks amazing! I wouldn't be surprised if this can be useful for e.g. urban economics, to compute measures of how cities and urban areas grew. Like LA eating into the San Fernando Valley, for instance.

kylebarron · on May 7, 2021

Oh hi Sergio! Hope you're doing well. I'm not in the economics field anymore, but it would be cool to see if an economist could use this data. It's probably a little harder to use because you'd have to apply some vectorization techniques on the maps to get more usable road network data.

zzleeper · on May 7, 2021

Definitely true! But if there's a good question then that's definitely doable.

Also I noted your change of field; your github repos and starred repos that pop up on my github main page are much more interesting that the usual r/stata repos

Thanks agains for the contributions, GIS and carto are definitely a super fun and useful field with a greater bang-for-the-buck than most economic papers!

kylebarron · on May 6, 2021

I personally really love looking at old maps of cities like LA [0] to see to what extent the urban structure has changed in the last ~100 years.

[0]: https://kylebarron.dev/usgs-topo-mosaic/#13.02/34.03937/-118...

tinus_hn · on May 7, 2021

Doesn’t that mean that whoever uploaded these tiles is now paying for hosting this experiment?

kylebarron · on May 7, 2021

The S3 bucket is hosted/managed by the U.S. Geological Survey. The only costs borne by them are the hosting, S3 requests, and S3 egress. In this case there's no egress since I fetch from the same region, so the marginal cost of this experiment to them is roughly $0.000001 per tile a user loads.

They've chosen to leave this particular bucket fully public for the time being. If usage got too high, they could convert the bucket to a requester pays bucket, where the requester (me) would pay for the S3 requests and egress. They've done this for their Landsat bucket (usgs-landsat), which is set as requester pays.

coin · on May 7, 2021

How is this “serverless”? There’s still a S3 and Lambda server.

bdon · on May 7, 2021

If the developer does not need to deal with a long-running process abstraction, and is billed only by actual compute time and not idle time, I think the consensus is to call that "serverless"

gpvos · on May 7, 2021

Thanks, finally a somewhat clear definition. It's still a misnomer though.

dmos62 · on May 7, 2021

True. Point to remember that client-side != serverless.

kylebarron · on May 7, 2021

S3 isn't a server, it's an object store. This uses Lambda (the serverless part) to fetch source data on demand, combine, reproject to web mercator, and send to the client.

There are varied definitions of serverless; it isn't "as serverless" as the browser loading data directly from S3, but an AWS marketing person would call this usage of Lambda serverless.

eharmon · on May 7, 2021

This is great! I've been fascinated with tiling historical maps for quite some time. In fact, I made a very similar (but far less advanced) site focusing primarily on historical USGS maps of the SF Bay Area [0].

I'm curious about this:

> To get around this, I apply a "hack" to the MosaicJSON format. Instead of just encoding a URL string, I encode the url and the bounds of the map as a JSON string.

In my effort I struggled to automatically assemble bounds from tilted and angled scans. Is this source data set clean enough to make simpler assumptions, or did you also devise a way to automatically determine the map boundaries within the scan?

[0]: https://eharmon.net/bayquads/

simonw · on May 6, 2021

I'm assuming the magic is this bit?

https://us-west-2-lambda.kylebarron.dev/usgs-topo/11/335/790...

Which figures out how to return a JPG of that specific 11/335/790 tile.

What's Dynamo DB being used for here?

kylebarron · on May 6, 2021

Yeah pretty much.

The backend needs to 1) find the S3 urls of the relevant images that intersect that XYZ tile index 2) read the relevant portion of each of them (removing the map collar as necessary [0]), and 3) composite them together to minimize areas of missing data.

In this project there's a static, pre-generated index used for 1. Essentially a mapping from tile indexes to an array of URLs. And I use DynamoDB under the hood for that fast key-value lookup. In a more production-ready setup, you'd have a Postgres database in the backend, so that you can handle arbitrary date ranges or other query parameters from the user.

[0]: https://github.com/kylebarron/usgs-topo-tiler#removing-map-c...

dirtyid · on May 7, 2021

Somewhat related, there was a great video on USGS cartography department from the 60s from a deleted youtube archival channel that went through the steps of map making and administration. Ive been trying to refind for years, wonder if anyone has come across it or know where to look. It looked like an official gov video, so must be archived somewhere.

smashah · on May 7, 2021

Wow excited to use this!

nonameiguess · on May 7, 2021

This is great work, but it almost makes me sad that you have to do this. I spent the earliest parts of my software career implementing full spectrum geospatial intelligence product generation algorithms for US intelligence agencies and the military. Part of the code base was software that does exactly this, but not with the historical, hand-drawn maps, but with the most current maps generated from aerial and space-borne radar sensing, as well as non-USGS data of, say, the ocean floor generated by US Navy sonar surveying programs. We could generate a map of any arbitrary coordinate boundaries of any part of the earth in seconds, using data that was at most a few years old.

There is absolutely no reason any of this code needs to be classified, as the maps themselves are not, the GeoTIFF format and tiling procedures are completely open standards, and the NGA has even open sourced some libraries for doing this kind of thing, but so much of the code that does this really quickly and efficiently, directly on your own workstation using C++ native executables into formats that can be rendered by a standard image viewer, is tied up in proprietary and even classified code bases just because of the monorepo nature of early work and the fact that development was done entirely on the classified systems because of the difficulty in developing on unclassified systems and then transferring across the air gap.

This is slowly changing, and in fact what I am presently working on is automated transfer and assurance mechanisms so more development that doesn't need to be classified can be done on open systems and work shared, but it's unfortunate to see so many wheels being reinvented when there are already really, really good tools to do these kinds of things but they are kept hidden.

We also have up-to-date indexed catalogs of exactly which tiles exist in which scale ranges to solve the problem you noted of trying to query and getting no data for poorly mapped regions. We can automatically retrieve the best data available and interpolate to the desired scale range instead. And rather than only getting the GeoTIFF, you can also get the raw altitude data intended for machine consumption, which is the main geoint use case for doing projections from sensor ephemeris to ground point.

I can see you also forked proj4js. This is of course also the kind of thing we had heavily optimized code for doing, since you have to project from lat/lon to ECF in order to project from ground to coordinates in orbit, which are ECF (we had a need to do this before the open source libproj for C existed).

In any case, it's unfortunate that you seemingly need to work for free doing this to get any attention and I hope you get a job out of it eventually. Sad to say I don't think any of the big aerospace and intelligence players would give you a second look because of a basic requirement that everyone has a degree in engineering, physics, applied math, or computer science, but hopefully a more innovative, disruptive, and agile company will give you a look. I'm not sure you'd even want the work, considering many people probably find working for spy agencies to be unethical, but I'd have hired you if it was up to me.

scardycat · on May 7, 2021

amazing

pickelchips · on May 6, 2021

ytdytvhxgydvhh · on May 6, 2021

Channeling Mitch Hedberg: There used to be hills here. There still are, but there used to be too.