FWIW, this has been around for a couple of years and has felt pretty solid, i.e. no obvious rate limit when bulk data gathering. For example, I believe that if you want to collect stats at the census tract/block level, you'd have to loop through every state, which is not a problem. County-level stats can be gathered in a single call, IIRC.
I really wish the Census would publish more normalized data sources. For example, the American Community Survey summary file is effectively a giant table with thousands of rows (representing different geographies) and thousands of columns (representing different statistics that have been calculated for those geographies). The columns just have an ID and a name that tells you what's in there, but it will be something like "median income for black or african american women with a master's degree." If you want that data point, you basically have to go searching through a bunch of docs (or one of their horrendous online interfaces) to find out the ID of the column.
Instead, I'd love to have a set of tables... in each one, you'd have a geography column and then, using my above example, a race column, a gender column, an educational attainment column and a median income column. Then you could easily load that data to a database and run a SELECT query over the information to get what you need.
Heck, I'd even settle for metadata that was good enough to enable me to construct this stuff myself. For example, for each column ID, tell me what variables were used to construct it and what their values were equal to. Unfortunately I've contacted the Census Bureau and they've told me such a thing doesn't exist.
I've worked with this API a fair bit since early 2016, and it's been around longer. I've got no real complaints about the API, just want to note that the data itself is.. arcane.. it'll take a bit of study and research to get up to speed on even a small subset of what's available.
Yes, as someone who has played with Census data a lot this year, I've found myself becoming intimately familiar with FIPS numbers. Also, it's interesting to see what the Census Bureau designates a "place" and what it designates a "county subdivision". New England and New Jersey in particular look weird from a Census perspective (because of this, I've found my self using subdivisions exclusively for New England, New Jersey, Pennsylvania, Wisconsin, and the Dakotas).
And if you think that's arcane, you should try shapefiles. It took me a while to figure out how to do it, but I have some Python code for pulling the vector data out of shapefiles, sticking them in Python data structures, and then rendering arbitrary combinations of shapefile shapes into a single PNG. It's pretty nice that I can traverse my data, grab the shapes for every place in, say, the Dallas-Fort Worth-Arlington MSA, and render them all to a PNG. And I have a REST API for doing that dynamically.
Yeah, I really need to get my Census application up and running again and post it here...
Just to add to that: I spent a fair amount of time with this data on a side project and you really need to dedicate yourself to understanding what this data is and isn't, and how it gets updated.
The main census is only every ten years, but there are other datasets updates more often, as well as extrapolated data. The issue for me wasn’t data being out of date, it’s just that the structure and content is fairly arcane if you’re not well versed in census - I assume much of the structure has been carried forward over the decades as there’s an obvious need to make sure data is comparable across different census.
The main datasets that are tied to a "geography" are the American Community Survey (ACS) and the Census.
The ACS comes in 1, 3, and 5 year editions. The 1-year is the least precise, and has the smallest sample, while the 5-year uses trailing 5-year data. The tradeoff is that ACS-5 estimates are only available, IIRC, for geographies of 50,000 or more people.
The ACS-1 estimates for 2016 are already released. ACS-5 will come out in December.
> The ACS comes in 1, 3, and 5 year editions. The 1-year is the least precise, and has the smallest sample, while the 5-year uses trailing 5-year data. The tradeoff is that ACS-5 estimates are only available, IIRC, for geographies of 50,000 or more people.
Other way around. ACS-1 is only available for geographies above 20k people, but ACS-5 is available for everything up to and including the block group level.
My apologies, I was wrong about the direction of the tradeoff. I.e., ACS-1 is only available for larger geographies.
There is a distinction though between the "ACS-1" and "ACS-1 supplemental estimates". The former is only available for 65,000+ person geographies.
Also, it's worth noting that for ACS-1 datasets, if you're looking at the census tract or census block level, you're going to have a significant number of omitted estimates. E.g. if you want to see the year to year median income even in a well-surveyed area like New York City, there will be many NA values
We've recently seen the removal of some publicly-available datasets that are overseen by the federal government. Does anyone have any perspective on how stable this particular dataset is?
Maybe that means API access is one of the line-item things that could be cut when the budget isn't properly funded. But the data itself, including current datasets (which have been downloadable via bulk FTP for awhile) [0] should be pretty stable. After all, the Census is a function directly mandated by the Constitution.
It’s good to see the Census Bureau getting coverage. And it is true we don’t have a director, however we have a great deal of staff who have at least one to two Censuses under their belts. The real issue aside from leadership is funding. All through the planning stages we’ve had to cut important tests because of Congress and their inability to pass a budget.
Yes, the Census is more than its leader, but losing its longtime leader and seemingly not having anyone new (there is a new acting director, IIRC) in place this close to the next census seems worrying.
What's wrong with having an acting director? At least on the corporate side, an acting-whatever is better than "not having anyone new" and many times the acting-whatever becomes the new whatever.
In my experience, it provides much more granular access to Census API endpoints vs tigris. I.e., you can explore datasets not tied to a specific geography.
When I clicked the "Mailing List" button and signed up for the newsletter I was also given the option to subscribe to a lot of government related email lists, including those local to my state.
Thought that was pretty neat -- I had no idea such a thing existed.
I haven't had a chance to look at this yet, but I did some census data exploration a few years ago and it was ... baroque at the time. More here: http://www.mooreds.com/wordpress/archives/963
If things haven't changed, I'd echo the comments that getting the data is only the first and in some ways easiest step. Understanding the data is much more important and a deeper process.
Interestingly, if you click on any of the links on the "example" page [1], it seems to return results, even without an API key. Conveniently, it strips off the url param "YOUR_API_KEY_HERE" for you. Not sure what to make of the results, though.
[1] https://api.census.gov/data/2010/sf1/examples.html
Anyone know if the APIs give info about spoken languages? I feel non hispanic immigrants are a huge variety and was wondering if this aspect is available through the API.
A quick search through the API docs didn't reveal anything.
I've had fun with the API. I used it to download a bunch of data that I then fed into a system to correlate various things and display information for different parts of the country at several levels.
One of these days, I'll need to start it up again, set up a robots.txt to make sure the Googlebot doesn't crawl it (it has a lot of content generated on demand, including vector maps from shapefile data), and post it here.
The API seems to be the basis for the Census's new data platform: https://data.census.gov/