Hacker News new | past | comments | ask | show | jobs | submit login
United States Census Bureau APIs (census.gov)
322 points by stevewilhelm on Oct 23, 2017 | hide | past | favorite | 40 comments



FWIW, this has been around for a couple of years and has felt pretty solid, i.e. no obvious rate limit when bulk data gathering. For example, I believe that if you want to collect stats at the census tract/block level, you'd have to loop through every state, which is not a problem. County-level stats can be gathered in a single call, IIRC.

The API seems to be the basis for the Census's new data platform: https://data.census.gov/


I really wish the Census would publish more normalized data sources. For example, the American Community Survey summary file is effectively a giant table with thousands of rows (representing different geographies) and thousands of columns (representing different statistics that have been calculated for those geographies). The columns just have an ID and a name that tells you what's in there, but it will be something like "median income for black or african american women with a master's degree." If you want that data point, you basically have to go searching through a bunch of docs (or one of their horrendous online interfaces) to find out the ID of the column.

Instead, I'd love to have a set of tables... in each one, you'd have a geography column and then, using my above example, a race column, a gender column, an educational attainment column and a median income column. Then you could easily load that data to a database and run a SELECT query over the information to get what you need.

Heck, I'd even settle for metadata that was good enough to enable me to construct this stuff myself. For example, for each column ID, tell me what variables were used to construct it and what their values were equal to. Unfortunately I've contacted the Census Bureau and they've told me such a thing doesn't exist.


SDMX was designed by statistical agencies for this purpose.

Instead they created a non-standard API. (SDMX has a standardized REST API.) Next they will create a mobile app.


I've worked with this API a fair bit since early 2016, and it's been around longer. I've got no real complaints about the API, just want to note that the data itself is.. arcane.. it'll take a bit of study and research to get up to speed on even a small subset of what's available.


Yes, as someone who has played with Census data a lot this year, I've found myself becoming intimately familiar with FIPS numbers. Also, it's interesting to see what the Census Bureau designates a "place" and what it designates a "county subdivision". New England and New Jersey in particular look weird from a Census perspective (because of this, I've found my self using subdivisions exclusively for New England, New Jersey, Pennsylvania, Wisconsin, and the Dakotas).

And if you think that's arcane, you should try shapefiles. It took me a while to figure out how to do it, but I have some Python code for pulling the vector data out of shapefiles, sticking them in Python data structures, and then rendering arbitrary combinations of shapefile shapes into a single PNG. It's pretty nice that I can traverse my data, grab the shapes for every place in, say, the Dallas-Fort Worth-Arlington MSA, and render them all to a PNG. And I have a REST API for doing that dynamically.

Yeah, I really need to get my Census application up and running again and post it here...


Please do, that sounds awesome!


Just to add to that: I spent a fair amount of time with this data on a side project and you really need to dedicate yourself to understanding what this data is and isn't, and how it gets updated.


Is the data out of date? or is it fairly recent (ie at least 2016)


The main census is only every ten years, but there are other datasets updates more often, as well as extrapolated data. The issue for me wasn’t data being out of date, it’s just that the structure and content is fairly arcane if you’re not well versed in census - I assume much of the structure has been carried forward over the decades as there’s an obvious need to make sure data is comparable across different census.


While awkward, "censuses" is indeed the correct plural of census.


The main datasets that are tied to a "geography" are the American Community Survey (ACS) and the Census.

The ACS comes in 1, 3, and 5 year editions. The 1-year is the least precise, and has the smallest sample, while the 5-year uses trailing 5-year data. The tradeoff is that ACS-5 estimates are only available, IIRC, for geographies of 50,000 or more people.

The ACS-1 estimates for 2016 are already released. ACS-5 will come out in December.


> The ACS comes in 1, 3, and 5 year editions. The 1-year is the least precise, and has the smallest sample, while the 5-year uses trailing 5-year data. The tradeoff is that ACS-5 estimates are only available, IIRC, for geographies of 50,000 or more people.

Other way around. ACS-1 is only available for geographies above 20k people, but ACS-5 is available for everything up to and including the block group level.

https://www.census.gov/programs-surveys/acs/guidance/estimat...

Also, ACS-3 has been discontinued.


My apologies, I was wrong about the direction of the tradeoff. I.e., ACS-1 is only available for larger geographies.

There is a distinction though between the "ACS-1" and "ACS-1 supplemental estimates". The former is only available for 65,000+ person geographies.

Also, it's worth noting that for ACS-1 datasets, if you're looking at the census tract or census block level, you're going to have a significant number of omitted estimates. E.g. if you want to see the year to year median income even in a well-surveyed area like New York City, there will be many NA values


Mike Bostock of D3 puts this API to use in his somewhat esoteric but very impressive Command Line Cartography tutorial series.

https://medium.com/@mbostock/command-line-cartography-part-1...


We've recently seen the removal of some publicly-available datasets that are overseen by the federal government. Does anyone have any perspective on how stable this particular dataset is?


The future of the Census, including 2020, is indeed in question: http://www.npr.org/sections/codeswitch/2017/07/15/536908867/...

Maybe that means API access is one of the line-item things that could be cut when the budget isn't properly funded. But the data itself, including current datasets (which have been downloadable via bulk FTP for awhile) [0] should be pretty stable. After all, the Census is a function directly mandated by the Constitution.

[0] https://www.census.gov/programs-surveys/acs/data/data-via-ft...


It’s good to see the Census Bureau getting coverage. And it is true we don’t have a director, however we have a great deal of staff who have at least one to two Censuses under their belts. The real issue aside from leadership is funding. All through the planning stages we’ve had to cut important tests because of Congress and their inability to pass a budget.


> The future of the Census, including 2020, is indeed in question

The outgoing director doesn't seem worried.. from your link, when asked if he was worried about there being a leadership vacuum after his departure:

> (laughs) No, the Census is much more than the director


Maybe I should have picked a different link that discussed the bureau's overall issues, particularly anticipated budgetary shortfalls: https://www.nytimes.com/2017/07/17/opinion/census-trump-budg...

Yes, the Census is more than its leader, but losing its longtime leader and seemingly not having anyone new (there is a new acting director, IIRC) in place this close to the next census seems worrying.


What's wrong with having an acting director? At least on the corporate side, an acting-whatever is better than "not having anyone new" and many times the acting-whatever becomes the new whatever.


The data should be very stable unless for some reason they found a way to eliminate the Census Bureau.


Have been using it through R using the excellent https://github.com/walkerke/tigris for some geospatial data visualization. No complaints.


The censusapi package is also worth a look: https://github.com/hrecht/censusapi

In my experience, it provides much more granular access to Census API endpoints vs tigris. I.e., you can explore datasets not tied to a specific geography.


When I clicked the "Mailing List" button and signed up for the newsletter I was also given the option to subscribe to a lot of government related email lists, including those local to my state.

Thought that was pretty neat -- I had no idea such a thing existed.


I did that once, years ago. Careful, the government is really good at sending an absurd volume of emails.


Add that to the long list absurdities government is good at.


I haven't had a chance to look at this yet, but I did some census data exploration a few years ago and it was ... baroque at the time. More here: http://www.mooreds.com/wordpress/archives/963

If things haven't changed, I'd echo the comments that getting the data is only the first and in some ways easiest step. Understanding the data is much more important and a deeper process.


Interestingly, if you click on any of the links on the "example" page [1], it seems to return results, even without an API key. Conveniently, it strips off the url param "YOUR_API_KEY_HERE" for you. Not sure what to make of the results, though. [1] https://api.census.gov/data/2010/sf1/examples.html


One reason to get an API key is that they use the associated email to send out (very low volume) messages about API "news".


Anyone know if the APIs give info about spoken languages? I feel non hispanic immigrants are a huge variety and was wondering if this aspect is available through the API.

A quick search through the API docs didn't reveal anything.



Stoked for this. the GUI has been terrible


I've had fun with the API. I used it to download a bunch of data that I then fed into a system to correlate various things and display information for different parts of the country at several levels.

One of these days, I'll need to start it up again, set up a robots.txt to make sure the Googlebot doesn't crawl it (it has a lot of content generated on demand, including vector maps from shapefile data), and post it here.


Looking forward to checking this out. Scraping census data is exhausting. Just navigating the available tables is hard.


Really hope Australia gets this some time!


have already been using this for a while.. great improvement and easy to use.


Where do you go to register for an API key?



Where do you see API key mentioned? I thought https://www.census.gov/data/developers/updates/new-discovery... is good enough?


http://api.census.gov/data/key_signup.html or on the Discovery Tool Page




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: