Hacker News new | past | comments | ask | show | jobs | submit login
Tracking Heat Records in 400 U.S. Cities (pudding.cool)
204 points by feross on May 24, 2022 | hide | past | favorite | 79 comments



This is cool, but why not using rigorous statistics?

Block maxima (e.g daily max temperature) follow a generalized extreme value (GEV) distribution.

One can try to see whether this distribution is stationary or not.

Things can get much fancier by e.g. modeling correlations across cities but the above is pretty basic.

A great practical intro is: https://link.springer.com/book/10.1007/978-1-4471-3675-0


Looks like a neat book but $170 is a lot for an intro.



I know it is not my country, and that converting is not that difficult, but personally Fahrenheit degrees are meaningless for me.


Fahrenheit is intuitive. 0°F is the lowest air temperature measured in a town called Danzig in winter 1708-09, and water boils at 212°F. 4°F is ammonium chloride brine, and water freezes at 32°F. These are ballpark figures, you need Celsius or Kelvin to precisely define Fahrenheit.


The beauty of the HN community is that I genuinely can't tell if you're being sarcastic.


Just to give you an idea: It's basically 0-100

0 is really cold. bellow freezing. Negative temps are pretty rare. freezing is 32 which you just have to memorize.

70 is room temperature, and 68-75 is comfortable for most

100 is really hot, temperatures above this are also fairly uncommon for most folks.


Not meaningless if you merely search for something like 'graphical Fahrenheit and Celsius', print out one of the representations of the two scales that appeals and keep it where you can see it for a day or so. Choosing at random I'd say that https://learn.weatherstem.com/modules/learn/lessons/62/img/d... is quite neat.


For a quick-and-dirty conversion, I do F=2C+30, C=(F-30)/2.


Interesting that the site uses ipinfo.io, an analytics service that brags about, among other things "Detect[ing] various methods used to mask a user's true IP address, including VPN detection, proxy detection and more." They appear to even offer a service designed to detect "privacy" users using tor and such.

I'm really curious why the site needs such invasive analytics. It's not by accident - it costs a minimum $1200/year.


I have interacted with this site and its ownership a fair amount online. My guess would be they're trying to gauge true interest in topics so they can refine their editorial perspective.

And $100/mo might not seem so much if it can make debunking interest-spoofing attempts more trivial, especially if there is ever any kind of bonus to the visualization authors for having a big hit.


Interesting new site here that lets you explore historic temperatures and trends: http://realclimatetools.com/apps/graphing/index.html


University of Maine's Climate Reanalyzer is my trusted, respected go to.

They have long time series: https://climatereanalyzer.org/reanalysis/monthly_tseries/

They have wonderful daily difference-from-average "anomoly" maps: https://climatereanalyzer.org/wx/DailySummary/#t2anom

I really dislike this site you've linked. The whole middle of the chart is completely overwhelmed with data points unless you zoom way in, until you can only see a decade or so. There's the extreme data points of each year that stick out, but what's actually happening most of the time is utterly occluded & unclear. The data masks rather than informs.


Here's something similar, created from the same dataset (ERA5):

https://climate-explorer.oikolab.com - I created this to show how temperature has changed for any arbitrary location of interest. Working on adding CMIP6 projection too.


Which version of the temperature record are these tools using, the original raw data, or the adjusted data? Do you have confidence in the rationale and methodology for the adjustments?

http://myweb.wwu.edu/dbunny/pdfs/Evid_Based_Climate_Sci/Ev_B...


Yikes - that reference is some dentists-having-opinions-on-brain-surgery conspiracy stuff. I've seen these before and really not sure why people who study rocks have strong opinions on climate science. A few points:

- Reanalysis data is generated from all available data source - land station, satellite, weather balloon, airplanes, etc. They are corrected from observational biases, using the similar approach as Kalman filter, combining data with known laws of physics.

- I do have confidence and use these data for analyzing weather-driven utility load data with excellent result.

- The data assimilation process and methodologies are all well documented - https://rmets.onlinelibrary.wiley.com/doi/epdf/10.1002/qj.38...


Nice site. In my case it just confirms that where I'm at (vicinity of Hillsboro Oregon) we've been having an unusually cold and wet spring. Some of the plants are only just now putting on leaves.

I'm sure we'll return to the regularly scheduled inferno soon.


The "story" being told doesn't have a clear through line. I didn't understand the context for anything. I also didn't see any of the information I wanted to see.

What was the hottest year? What was the year with the most broken records? What is the overall trend? It actually told me what the hottest day on record was, but there is so much to read I completely glazed over and missed it the first time.


from the same site: The US has gone $days since a record high temperature

https://pudding.cool/2022/03/weather-map/


This is not to fault the site creator(s) at all, but wow this a good example of how using third party viz libraries can be misleading. At default zoom on my device, this shows the all-time high near Portland, OR (Vancouver, WA) registered 330 days ago, while the nearest visible data for Seattle shows 40 years(!) despite the fact that Seattle’s all-time record was broken the same day. I have to zoom in knowing that fact to get accurate data.

Limiting data points on a zoomable map is a great reasonable default, but it degrades pretty terribly when it matters.


Given you have 400 samples a day, what is the baseline expected days since record high temperature just based on statistical variation?


Presumably, if this is stationary, wouldn't record highs decrease in frequency over time?


Yes, if climate was static. It is pretty obvious that it is not and the earth is warming.


And if you had enough data. There are 365 days in a year, but only 100 or so years of temperature records. So there are plenty of days that have not had their "chance" as it were to set a record for that date. (For example in my city today's high was set over 130 years ago.)

It also ignores heat islands as cities get larger.


I sort of wonder if there's some cognitive bias in perception of record highs. I'm not sure people understand what to expect with them if climate were constant. Athletic records keep happening, for example, so I could see people expecting something similar, or at least happening at a constant rate.


Athletes are actually getting better. That isn't random fluctuations in a stationary distribution. The minimum qualifying time for the 100m in the 2020 Olympics would have been a world record as late as the 1960s.


Right, but if people are used to those sorts of records, it might distort their perception of other things?


No, that’s the problem. Record highs in single locations are being set quite often.


Wait there are tiny cities here.

Isn't it the case that as geographies get smaller, the count of them gets higher, and you'll always get on square inch of the US that's breaking the time-of-year temperature record right this second!

I'm all for data, but the statistics here seem selected to drive towards the conclusion "global warming is a huge problem". Climate change is so real that we don't need to resort to this sort of trickery -- it harms rather than hurts the message.


No, that shouldn't be the case. Unless the planet is warming, we shouldn't constantly be setting new highs. I'm not sure what the trickery is here.


The planet clearly is warming but I also agree this method is useless by itself.

For Los Angeles (my city) the data goes back 145 days a year. If temperature has not been increasing (and therefore each individual day follows an identical distribution) that would still mean that 1/145 days we'd see a record high in Los Angeles - so about two highs a year. Highs in Los Angeles are obviously going to be correlated to highs in San Diego - but not exactly because of cloud cover and stuff so its reasonable to think that even if we were in a boring unchanging climate we'd still see hundreds if not thousands of records set every day across the country.


I think your math is wrong. You wouldn't see two record highs every year with a normal distribution. The record high for each day would be set once, and would be an extreme outlier. Think about something like IQ. If you grab a group of people you wouldn't expect to find a genius every time, or even most of the time. Most of the time you'd expect to find someone near 100. Unless IQs had been steadily rising.

With a normal distribution you wouldn't expect even one record to be set every year. The expected case would be zero records in a year.


Perhaps you are misunderstanding the records it is describing.

A record in that data is defined as a particular high in a particular city on a particular day. The data goes back 145 years in Los Angeles.

So if it is the hottest May 24th ever that would be a record.

If everyday followed an identical and uncorrelated distribution then we would expect that 1/145 days would be record.

In your IQ example imagine if you had a school with 365 separate classrooms with 144 students in each one. A new 145th student then enters each classroom. The chance that the new student is the one with the highest IQ is 1/145. So in the universe of the 365 classrooms you'd expect 2 new IQ records to be set.


I fully understand what they record is.

Let me ask this, why in your school with 365 classrooms would you expect 2 new IQ records to be set with 145 new students. With a normal distribution, you'd expect all of those new students to be within one standard deviation of 100.

> The chance that the new student is the one with the highest IQ is 1/145

Why do you think this is the case?


The student who enters the classroom has an IQ that follows the same distribution of intelligence of any of the other students. Therefore, one student out of the 145 must have the highest intelligence and the chances that it is student #4, #30, #100, or #145 is the same.

Here is some python code you can run. You can change around the distribution however you want and you'd get the same exact results. The chance that a particular record is the highest in a set of IID variables is not dependent on the distribution itself.

  from numpy import random
  RecordsSet=0
  TotalSimulations=365000
  for i in range(0,TotalSimulations):
      ClassRoomSample = random.normal(100, 15, 144)
      RecordIQ = max(ClassRoomSample)
      NewStudentIQ = random.normal(100, 15, 1)
      if NewStudentIQ>RecordIQ:
          RecordsSet=RecordsSet+1

  print(100*float(1)/145)
  print(100*float(RecordsSet)/TotalSimulations)
A record is set 1 out of every 145 days/classes. Which means in a year/school of 365 days/classes you'd expect a bit more than 2 records on average.


It's not the case because temperatures are correlated; If <random suburb of $BIG_CITY> sets a high, it's likely that $BIG_CITY will as well.

However, we should be somewhat regularly setting new highs just because we have on the order of just 200 years of data (so just 200 data points for each daily high); if the planet were not warming, and the high for any given day will be normally distributed around some average for that day, then as time marches on the odds of getting an extreme outlier increases. Some cities on some days will have a high that is very improbable, while others will not have gotten "lucky" yet.


Would be interesting to have the same info for a record low.


I can pull up some summary data (e.g. averaged across a month) at sites like https://www.aos.wisc.edu/~sco/clim-history/7cities/madison.h...

Graphs such as https://www.aos.wisc.edu/~sco/clim-history/stations/msn/MSN-... show the average high, mean, and low temperature.

This is even more pronounced if you look at the summer (June, July, August) temperatures - https://www.aos.wisc.edu/~sco/clim-history/stations/msn/MSN-...

You can also look and, while it isn't the daily records, you can see the daily temperatures - https://www.aos.wisc.edu/~sco/clim-history/stations/msn/msn-... for this year so far.


Or for record high lows ("the low has never been this warm").


Indeed, this would be both interesting and an important control metric.


How has Ohio (or is that Iowa? I can never keep the two straight) gone 560 days while all of the surrounding states have not?


You have to zoom in to see but that's for Zanesville not all of Ohio. My guess is it's due to topology, like being in the shadow of the Appalachian mountains.


That's Ohio. But Youngstown (close to the Pennsylvania border) had a record high 79 days ago.


I wonder why it’s only record highs and not record lows, too?

I’d be interested to see extremes on both ends.


Meanwhile it’s raining again in Seattle.


Not sure if this is a local observation that it is in fact lightly raining in Seattle at the moment, or a glib observation of the stereotype. Just adding this for context for other readers, not a correction or dispute.

The all time record high in Seattle (per the site’s data) was 107, last June. Other data I looked up shows 108. There are of course discrepancies between different data sources, my localized weather showed 112 that day. “Officially”, several days I’ve experienced >100 temps in my 20 years here are in the records as lower.

We typically have little to no precipitation during the summer. And of course every weather statistic varies by year, persistently higher summer temps have arrived (albeit lagging behind many other places) along with a much longer and more severe wildfire season over the last few years.

I’m crabby about the rain right now, but I fully expect to be relieved by it in a few short months.


It literally was raining at the moment.

But in fact this April and May are unusually cold and wet, very annoyingly so.


I’m annoyed right there with you! Here’s hoping for a mild summer.


“And for me it was Tuesday.”


This is pretty! What did you use to visualize the data for 1 day and about 3 weeks of data?


In regards to the animation, I see a few posting that the slideshow / animation ruins the visualization. I particularly "like" this experience which is very hard to do well.

What went into animating between these two views of the data?


Why does this force me to scroll through this like a slideshow?

Anyone else reminded of the bad-old-days when Flash websites were common and broke all UI norms?


I assume it's based on the UI of Instagram Stories, which maybe feels quite natural for some people (not me so much)


I think these kinds of sites are made by people who never experienced that hell.


No, I think it’s really cool!

Their website, their decision on how to “force” you to experience it. Remember, you can always close the tab if it’s not compelling or interesting to you personally.


I accidentally clicked past the last page, hit the back button then had to go through the thing again. As I was rushing, I missed the last page again, and could not figure out how to go back. Scrolling would have trivialized this issue.


What’s the lesson here? Click carefully and don’t rush?


Ahh its the users fault for not reading the data before clicking.

Imagine an unholy text book where once you turn the page, the pages before become inaccessible. Its your own fault for not understanding everything on the previous page and you have to read the entire thing before you can start again to find that page.

Yes its the users fault.


No, the user’s a genius because they can’t use a lamp.


Right....


>What’s the lesson here?

Don't hide compelling or interesting information behind bullshit ui restraints. Particularly egregious was the "Click here for more information, click two pixels lower to permanently leave this screen."


The lesson is that one should at least make an attempt to make websites usable.


I didn’t have any trouble using it.


Accepting the limitations of a site doesn't mean they don't exist.


No one has ever forgotten how to close the tab. Whats the point of being exclusionary and having a 'if you don't like it, leave' attitude to presenting information? It isn't some pretentious indie bands website, its weather data!


Why should every website have the same UI?

I'm tired of the same discussion again and again on HN. Javascript bad. Anything but #000 text on #fff bad. CSS and animations?! Fuggedaboutit. If it's not free range, handcrafted HTML, I don't want it anywhere near my 5950X.


Not every website is the same and this is not the best way to present statistical data. It pays to keep things simple and cohesive when you're trying to explain what a big block of numbers or a graph with bars/lines represents.

After all, your goal is to make sure the viewer understands the information being shown, not impressing them with moving text.


This is a very difficult to navigate site. I wish I had the data presented almost any other way. Which is a shame, because its a cool idea, w/ cool analysis.

You should open source the data.


> You should open source the data.

The data isn't theirs to open source (or not).

> Temperature records are collected from ACIS, which tracks weather for approximately 400 US cities. ACIS has data from the ThreadEx project.

The ThreadEx project is http://threadex.rcc-acis.org

This suggests its using the data from a service that is collected from http://www.ncei.noaa.gov which right on the front has: Looking for Data?


I think this is an interesting project, it does seem to present the information in a rather biased way. I’m not trying to dispute whether things are warmer, but I would like to better understand what the underlying data consists of. I know they have 148 years of data, so I’m curious how that data was collected even 50 to 70 years ago let alone 100 years ago. There has to be a change in the quality of the data collection over time. It would be challenging to fairly compare data from the last 10 or so years to a similar timeframe 100 years ago. I would imagine data doesn’t even exist in some of the cities.

Maybe people who have more expertise in the field or the data collection can chime in.


> It would be challenging to fairly compare data from the last 10 or so years to a similar timeframe 100 years ago.

Why? A thermometer works exactly the same it did 100 years ago. For such a simple measurement like temperature I can't think of many reasons why measurements from the 30s could be significantly different (as in more than fractions).

I also guess that if the data was considered unreliable for any reason it would not be used.


I don't have any particular reason to believe the data is unreliable, I'm inclined to believe temperates are actually rising, but it's fun to try to come up with reasons why the rise might be artificial:

- Most cities grow and develop over time and the parts of cities with more buildings and especially with more cement will radiate more heat. If an observation station was built somewhere less developed then city growth might artificially push up temperatures around the station. This effect might be strongest somewhere like Phoenix or LA.

- This one seems unlikely (well, I'm sure it happens but it's very likely to have already been corrected for) but aggregation methods might have changed over time. I could imagine in the 30s someone might check the thermometer every hour and record the current temperature in a logbook. Modern thermometers are almost certainly automated. If the "highest temperature" is measured by recording the temperature each minute and taking the max then you're going to get a higher number than the manual method.

- site selection might have changed: observation posts probably used to be buildings which needed to be large enough to fit equipment and the people to operate them. City real-estate is expensive and meteorology used to be a cost-center so these buildings were probably placed at the edge of town. I'm sure modern stations are much smaller, they don't need a full-time staff to operate them, so they can be placed closer to the (hotter) centers of cities. Alternatively: I'm sure many weather stations are now situated at airports, an option which was not available in the 30s, and airports are particularly pavement-heavy (hotter) parts of cities.


> Most cities grow and develop over time and the parts of cities with more buildings and especially with more cement will radiate more heat. If an observation station was built somewhere less developed then city growth might artificially push up temperatures around the station. This effect might be strongest somewhere like Phoenix or LA.

The temperature record - at least, what's used for generating long-term trends - is homogenized to remove discontinuities (step-wise or linear) associated with this sort of drift or other significant changes. Regardless, this effect likely wouldn't matter for the stats the website illustrates.

> This one seems unlikely (well, I'm sure it happens but it's very likely to have already been corrected for) but aggregation standards might have changed over time. I could imagine in the 30s someone might check the thermometer every hour and record the current temperature in a logbook. Modern thermometers are almost certainly automated. If the "highest temperature" is measured by recording the temperature each minute and taking the max then you're going to get a higher number than the manual method.

When manual reading of the temperatures was the norm, you'd have a mercury and an alcohol thermometer to record the daily max and min, respectively. For a mercury thermometer, it would have a stopper that rises and records the maximum level the liquid reaches. A human needs to invert the thermometer to reset it. Same for an alcohol thermometer, albeit in reverse (records the minimum). So granularity of observation doesn't matter - the daily max, historically, is truly the maximum temperature taken since the last time a human reset the stopper.

> site selection might have changed: observation posts probably used to be buildings which needed to be large enough to fit equipment and the people to operate them. City real-estate is expensive and meteorology used to be a cost-center so these buildings were probably placed at the edge of town. I'm sure modern stations are much smaller, they don't need a full-time staff to operate them, so they can be placed closer to the (hotter) centers of cities. Alternatively: I'm sure many weather stations are now situated at airports, an option which was not available in the 30s, and airports are particularly pavement-heavy (hotter) parts of cities.

Also accounted for by homogenization. In most cases when siting changes, you consider that site a new station.


> When manual reading of the temperatures was the norm

For any wanting more information on one of the maximum and minimum thermometers: https://en.wikipedia.org/wiki/Six%27s_thermometer which was invented back in 1780.


> site selection might have changed

This is often noted in the climate record data. If you look at https://www.aos.wisc.edu/~sco/clim-history/stations/msn/MSN-... you will see vertical dotted lines indicating location change for that measure.

And yes, it does impact it - https://www.aos.wisc.edu/~sco/clim-history/stations/msn/MSN-...


Great links, thank you.

While you're here, do you happen know of a good book / other resource to read if I want to learn more about how we decided climate change was real and anthropogenic? I take it for granted; I've never actually looked at any data.


The single, best general audience easy read on the basics of why we know the planet is warming due to human greenhouse gas emissions is Kerry Emanuel's ["What we know about climate change"](https://www.amazon.com/What-About-Climate-Change-Press/dp/02...).


I guess you missed the main point of lack of data in addition to possible quality issues. Certainly I would assume we collect many more air temperature readings now then we did in the 1800’s even into the 1900’s. I can’t imagine there are nearly as much temperature data from 100 years ago compared to how much data is being collected in the last 30. You didn’t answer the real question, of course a thermometer works the same, but does the collection methods work the same? Did we have satellite data 100 years ago? Obviously not. We are collecting a much wider variety of data from a variety of sensors and it has built up especially in the last 50 years. I haven’t seen the data I’m only asking the question. What does the data look like, I’d love to compare just a slice from one city. Someone post the data from that data set that compares 100 years ago reading to ones collected in 20 year intervals. My guess is that there is progressively more and more data as you move forward. If I’m wrong show me the data.


> but does the collection methods work the same?

Yes, more or less, they're the same. It's simple enough to restrict measurements to just surface observations made with a thermometer of some sort if you want a wholly apples-to-apples, long-term temperature record. You don't have to add in satellite data (satellites don't directly retrieve surface air temperature so the direct comparison is fraught) although there do exist techniques which leverage it to augment the lack of spatial coverage of surface observations, especially over oceans.

Quality issues have been studied ad nauseum over the past thirty years. Nearly half a dozen independent efforts have all tackled this problem. Probably the most notable recent one is the Berkeley Earth Land/Ocean Temperature Record (Rohde and Hausfather, 2020 -https://essd.copernicus.org/articles/12/3469/2020/essd-12-34...).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: