There's a little cautionary story I like to tell about predictions and probabilities
There is a man living near a volcano. He has put up a sign outside his house for travelers, proudly declaring: "Will not erupt today. Accuracy: 100%."
One night, after thirty years of this, the volcano erupts for a few hours at night, oozing magma on its opposite side.
The next morning, the man is grateful that his house is fine, but feels a pang of sadness as he replaces the famous sign with a new one: "Will not erupt today. Accuracy rate: 99.99%."
Yes, most people can predict weather in the dessert. But why do you claim this is what happened here? Or was it a joke? Because people took it serious.
Neither the article, nor the linked paper state that. But they have all the details on precision and condition.
It was a joke about how silly valley based companies will claim the moon and back and then design a car that doesn’t know dumping snow in the boot is bad.
The work was done by DeepMind, which is in the UK. Weather in the UK is quite variable and difficult to predict (which is why the English are always talking about it).
>Figure 2 shows some of the forecast samples of GenCast for Typhoon Hagibis, shortly before it made landfall in Japan on 12 October 2019. Figure 2b–e,g,h–k,m shows that GenCast forecasts are sharp and have spherical harmonic power spectra that closely match the ERA5 ground truth at both 1- and 15-day lead times
Seriously. Let's see an accurate forecast for Cleveland, Ohio. Even local forecasters can barely get the next day correct on any sort of consistent basis.
This is great from a practical standpoint (being able to predict weather), but does it actually improve our understanding of the weather, or WHY those predictions are better?
That is my issue with some of these AI advances. With these, we won't have actually gotten better at understanding the weather patterns, since it's all just a bunch of weights which nobody really understands.
I’m by no means an expert in weather forecasting, but I have some familiarity with the methods. My understanding is that non-“AI” weather models basically subdivide the atmosphere into a 3d grid of cells that are on the order of hundreds to thousands of meters in each dimension, treat each cell as atomic/homogeneous at a given point in time, and then advance the relevant differential equations deterministically to forecast changes across the grid over time. This approach, again based on my limited understanding, is primarily held back by the sparse resolution and the computational resources needed to improve it, not by limitations of our understanding of the underlying physics. (Relatedly, I believe these models can be very sensitive to small changes in the initial conditions.) It’s not hard to imagine a neural net learning a more efficient way to encode and forecast the underlying physical patterns.
GFS and IFS are both medium-range global models in the class Google is targeting. These models are spectral models, meaning they pivot the input spatial grid into the frequency domain, carry out weather computations in the frequency domain, and pivot back to provide output grids.
The intuition here is that, at global scale over many days, the primary dynamics are waves doing what waves do. Representing state in terms of waves reduces the accumulation of numerical errors. On the other hand, this only works on spheroids and it comes at the expense of greatly complicating local interactions, so the use of spectral methods for NWP is far from universal.
> It’s not hard to imagine a neural net learning a more efficient way to encode and forecast the underlying physical patterns.
And that is where your understanding breaks down.
What makes weather prediction difficult is the same thing that make fluid-dynamics difficult: the non-linearity of the equations involved.
With experience and understanding of the problem at hand, you can make some pretty good non-linear predictions on the response of your system. Until you cannot. And the beauty of the non-linear response is that your botched prediction will be way, way off.
It's the same for AI. It will see some nicely hidden pattern based on the data it is fed, and will generate some prediction based on it. Until it hits one of those critical moments when there is no substitute to solving the actual equations, and it will produce absolute rubbish.
And that problem will only get compounded by the increasing turbulence level in the atmosphere due to global warming, which is breaking down the long-term, fairly stable, seasonal trends.
This has been the case for years now, way before the AI craze. We just used to call it machine leaning. The best performing predictive models are black boxes which can’t practically be interpreted by humans the same way you can say a linear regression model that gives easily digestible parameters as output. Boosted trees are a great example of very well performing models that quickly become impossible for humans to understand once they get big enough to be useful.
In Australia, meteorologists used to be deployed across the country to local offices and would receive computer-generated forecast models (and other raw data) whenever the supercomputer at headquarters had finished running a job. The local meteorologists would then be allowed to apply their local knowledge to adjust the computer-generated forecast.
This was (and still is) particularly important in situations such as:
* Fast moving weather systems of high volatility, such as fire weather systems coupled with severe thunderstorms.
* Rare meteorological conditions where a global model trained on historical data may not have enough observed data points to consider rare conditions with the necessary weighting.
* Accuracy of forecasts for "microclimates" such as alpine resorts at the top of a ultra-prominent peak. Global models tend to smooth over such as an anomaly in the landscape as if the landscape anomaly was never present.[1]
It'd perhaps be possible to build more local monitoring stations to collect training data and run many local climate models across a landscape and run more climate models of specific rare weather systems. But it is also possibly cheaper and adequate (or more accurate) to just hire a meteorologist with local knowledge instead?
I would think we also use these models to run simulations. So maybe the AI models can be used to run simulations with different kinds of inputs to see if doing X in one place (like planting a lot of trees in one area) will have an outsized impact rather than doing Y in another place.
AI models rarely have that kind of predictive power because they are noticing patterns not running simulations.
The model may care about trees because mountains above a specific height don’t have trees on them. The old, measures stop being useful once you start optimizing for them, AI edition.
A fisherman was relaxing on a sunny beach, enjoying the day with his fishing line in the water. A businessman, stressed from work, walked by and criticized him for not working harder.
"If you worked more, you'd catch more fish," the businessman said.
"And what would my reward be?" asked the fisherman with a smile.
"You could earn money, buy bigger nets, and catch even more fish!" the businessman replied.
"And then what?" the fisherman asked again.
"Then, you could buy a boat and catch even larger hauls!" said the businessman.
"And after that?"
"You could buy more boats, hire a crew, and eventually own a fleet, freeing you to relax forever!"
The fisherman, still smiling, replied, "But isn't that what I'm already doing?"
I think that might have been their point. People moving work to AI so they can "relax" by working on more complicated technical matters (or more AI) are the businessmen, and the meteorologists just chilling out predicting the weather as best as they can with science are the fishermen.
Edit: Just saw their reply to you, so maybe I was wrong about the parable coming across wrong.
I mean, yeah, if you want 90% of your relaxation time for the rest of your life to be while you're fishing, that's fine.
For the businessman to assume otherwise is not outlandish. The idea of an entire fleet is overblown, but having "fuck you money" is a pretty nice goal.
-
Also I don't see how this applies to meteorologists? Which part is the "working more" aspect?
Climatologists use climate models to predict the climate. Meteorologists use weather models to predict the weather. They are different time scales and disciplines.
The climatologists can, at least. Can they scrutinize the esoteric ensemble of weights making up this AI model? And which type of model's going to be easier to update based on changing climate parameters, a climate model or an AI model?
Weather and climate models have their own physics, which at the very least means that the solution is physical for the universe that particular model inhabits. The boundary conditions are parameterized, and those can be tweaked as climate and land use changes.
AI models don’t have any of that, but they are actually more akin to human forecasters, gaining forecast skill from pattern recognition. I think there’s a place for both in weather forecasting, but I’d have zero confidence in an AI climate model, or anything longer than a year. An AI might be very good at seasonal forecasts though, picking up easy to miss signals in the MJO or ENSO.
> Weather and climate models have their own physics, which at the very least means that the solution is physical for the universe that particular model inhabits. The boundary conditions are parameterized, and those can be tweaked as climate and land use changes.
That really isn't true these days. The dynamical cores and physics packages in numerical weather prediction models and general circulation models have more-or-less converged over the past two decades. For instance, you'll find double-moment microphysical schemes in a cross-section of both classes of models, and slightly specialized versions of full-fledged GCMs can be be run within assimilation frameworks to generate true-blooded weather forecasts.
> AI models don’t have any of that, but they are actually more akin to human forecasters, gaining forecast skill from pattern recognition
This grossly sells short what the current crop of AI weather models is capable of, and how they're formulated. It's best to think of them as "emulators" of their physics-based cousins; they're trained to reproduce the state transitions from t=t0 to t=t0+delta_t that an NWP system would generate. It's a bit reductive to call this "pattern matching", especially when we increasingly see that the emulators recover a fair bit of fundamental dynamics (e.g. Greg Hakim's work which reproduces idealized dycore tests on AI-NWP models and clearly demonstrates that they get some things surprisingly correct - even though the setups in these experiments is _far_ from real-world conditions).
> That really isn't true these days. The dynamical cores and physics packages in numerical weather prediction models and general circulation models have more-or-less converged over the past two decades.
Ah, well, I did stop studying GCMs about 20 years ago so perhaps I should shut up and let other people post. I appreciate the detail in your explanation here, and I wouldn’t mind a link to papers explaining the current state of the art.
I'm not sure I can point you to a single reference, but a good starting point would be the UK Met Office's "Unified Model", which provides a framework for configuring model simulations that scale from sub-mesoscale rapid refresh (e.g. the UKV system) to traditional global modeling (e.g. MOGREPS) and beyond into climate (latest versions of the Hadley Centre models, which I think the current production version is HadGEM3).
Sure, but with this new predictive model we will have better predictions to work backwards from.
OC was saying (I’m going to paraphrase) that this is the death of understanding in meteorology, but it’s not because we can always work backwards from accurate predictions.
Comparing the difference between correct predictions and incorrect predictions, especially with a high accuracy predictive model, could give insight into both how statistical models work and how weather works.
Define "our understanding". With complex / chaotic systems there sometimes are no higher level laws that govern them - all we have is just modeling and prediction.
I think this will be the next generation of science to some extent. The things we can understand and call explanations/reasons might be something involving 5 or 50 variables with not too many interactions between them. They have some unifying principles or equations we can fit in our head. I think many things inherently just involve far too many variables and complexity for us to understand in a neat theory and we are hitting those limits in biology & physics. Even so I'm sure we will develop better and better ways to interpret these models and get some level of understanding. Maybe we can understand them but not create them past a certain scale.
This description isn’t how science has played out. When the scale changes often the model gets simple again.
Think about how complex friction is at a molecular level. But a single coefficient is a good enough model for engineers and a continuous 1d graph is incredible.
There is also no evidence that general AI models like multilayer perceptrons are good at constructing physical models from phenomena and lots of examples where they aren’t.
The opposite seems to have had more success. Someone who understands a system constructs a model and then lets a computer determine the actual parameter values.
What is there that we don't already understand? Sure there are probably higher level patterns in the weather that the model might be exploiting to make the claimed better predictions, but normal models are just based on physics and people rarely cared to try and improve them based on these patterns before. Mainly because they are too complex.
In short those patterns are only useful because of AI.
We could train AI models to simplify models in ways that require much less "intelligence" in the given domain.
For instance, we could ask AI to simplify the "essence" of the problems it solves in a similar manner to how Einstein and Feynmann simplified laws of Physics. With train/elevator metaphors or representations like Feynmann diagrams.
Of course, such explanations don't give the depth of understanding required to actually do the tensor calculus needed to USE theories like General Relativity or Quantum Electrodynamics.
But it's enough to give us a fuzzy feeling that we understand at least some of it.
The REAL understanding, ie at a level where we can in principle repeat the predictions, may require intuitions so complex that our human brain wouldn't be able to wrap itself around it.
You will be mocked by the AI hype crazies for your very serious and important question during this time.
The reality is no ,we won't learn why and how something works. And it will get much worse in the future. We are trading human learning for machine learning. As time goes on humans will get more stupid and machines more intelligent and eventually we will have generations who know nothing about how anything works and depend on the machines to tell them how to function.
One could make the argument: hey... fifty years ago everyone knew intricately how a car worked because you had to. It broke down so often, you needed to be able to repair it yourself on the side of the road. Now people just press a button and if it doesn't work you have the 'shop' take care of it for you. AI advancement will be no different.
Problem is: the 'shop' today is still humans who designed and built the cars and know how a car works and how to repair one. AI advancement can lead to eventually no one knowing anything as models get so sophisticated we just don't know why A leads to Z.
When climate change occurs (which it is), we're going to want a causal one so we can actually make a forecast instead of predicting only based on past (unchanged) data.
And note that these new models based on machine learning are already better at predicting extreme events. This is because existing models are not built entirely from first principles but rather include a lot of heuristics to describe important phenomena like cloud formation and albedo effects. That means that traditional models are just as rooted in weather-as-it-was as the machine learning models are. The big difference is that it takes a lot of work to identify the dependencies in traditional models while it takes less work to retrain the machine learning model.
Accurate black box is fine until it blows a forecast and people die, and you want to fix it for the next time.
Take the situation with Hurricane Otis — what do you do if an AI doesn’t detect cyclogenesis 24-48 hours beforehand? Are we sure tuning the model to detect this event will improve forecast skill in general, or will it make it worse?
Weather forecasting models will gradually improve over time but it's unreasonable to expect perfect accuracy. That is mathematically impossible in a chaotic system. Some people will always die in extreme weather events.
The current crop of AI weather models captures a surprisingly robust array of "core dynamics". E.g., recent work by Hakim and Masanam [1] reproduce classic dynamical core test cases on AI weather models and show that in many cases they robustly capture the expected dynamical responses to these highly idealized setups - setups far outside the bounds of what the models are trained on. What this means is that the causal or dynamical mechanisms producing a particular flow regime can very directly be inferred from the model.
something interesting is that these predictions can be made, and gives the hypothesis "we can have a better understanding of weather than makes better predictions than our current understanding does"
in contrast to "we cannot make better predictions on this input data than what we're doing now"
This whole article smells like marketing and a way to monetize their weather model. I hope its leaps and bounds beyond current capability but I have very strong doubts about its efficacy in the real world.
> "We'll soon be releasing real-time and historical forecasts from GenCast, and previous models, which will enable anyone to integrate these weather inputs into their own models and research workflows."
They will give you the weights and code not the forecast - your quote is incomplete.
Its either a temporary gift to the community until its adopted then charge for it OR they know most orgs can integrate that into their products therefore requiring to buy google products IF it works as they say it does.
I'm friends with a meteorologist and the 15+ day forecast is the bane of their existence because you can't accurately forecast beyond a week so I would love to know how they are measuring accuracy. The article doesn't say and I know the paper is going to go over my head.
Totally wrong. You cannot generalize such a statement because it depends on the micro and macro weather conditions. A very stable situation makes it very easy to forecast one week and beyond. On the other hand, there can be situations where you cannot accurately predict the next 12 hours (e.g. cold air pool).
> Totally wrong. You cannot generalize such a statement because it depends on the micro and macro weather conditions. A very stable situation makes it very easy to forecast one week and beyond. On the other hand, there can be situations where you cannot accurately predict the next 12 hours (e.g. cold air pool).
And that is exactly where these AI models will break down. They will "shine" (or fool us) with how well they predict the stable situations, and will produce utter rubbish when the high, turbulent and dynamic weather fronts make prediction difficult.
But of course, if your weather is "stable" 80% of the time, you can use those shiny examples to sell your tool, and count on user forgetfulness to get away with the 20% of nonsense predicted the rest of the time.
I would guess that everyday they're comparing the current weather against the forecast from 15 days ago. Not a lot of data points to be sure, but perhaps enough to have confidence of very high accuracy.
So do they say. I am reminded of Google Flu Trends [0]. They likely also did similar "verification" and it didn't work.
> The initial Google paper stated that the Google Flu Trends predictions were 97% accurate comparing with CDC data.[4] However subsequent reports asserted that Google Flu Trends' predictions have been very inaccurate, especially in two high-profile cases. Google Flu Trends failed to predict the 2009 spring pandemic[12] and over the interval 2011–2013 it consistently overestimated relative flu incidence,
One of the difficulties with using user data to understand society is that the company isn't a static entity. Engineers are always changing their algorithms for purposes that have nothing to do with the things you're trying to observe. For Google Flu Trends specifically here's a great paper
2013 was a bad year for h1n1 or whatever it was. It killed my sister while she was vacationing back east. I think it was aggressive but it was quite cold that winter and that may have completely screwed with their data/interpretation at that time. For instance, it snowed 5 times and stuck in Central Louisiana that winter (continuing into frebruary/march, whateer.. It's been colder since, but that year was a real outlier (in my experience on earth, this is my supposition, i have been trying to figure this out for 11 years)
This gets tricky. Once you look into your past, you're presumably looking at data that was used to generate your training corpus. So you would expect better accuracy on that than you would find on present/future predictions.
The paper seems quite readable to me and I also lack the training. But this point is adressed in the beginning.
"The highly non-linear physics of weather means that small initial uncertainties and errors can rapidly grow into large uncertainties about the future. Making important decisions often requires knowing not just a single probable scenario but the range of possible scenarios and how likely they are to occur."
>> We use 2019 as our test period, and, following the protocol in ref. 2, we initialize ML models using ERA5 at 06 UTC and 18 UTC, as these benefit from only 3 h of look-ahead (with the exception of sea surface temperature, which in ERA5 is updated once per 24 h). This ensures ML models are not afforded an unfair advantage by initializing from states with longer look-ahead windows.
See Baselines section in the paper that explains the methodology in more depth. They basically feed the competing models with data from weather stations and predict the weather in a certain time period. Then they compare the prediction with the ground truth from that period.
Plot twist: they measure accuracy in predicting the weather 5 years in the past.
They can say what they want, but I get rained on by surprise rain more than I ever have in my life, now that I'm practically forced to use the built-in Google weather due to them and Apple catching and killing all the good weather apps.
It you are in the US and while it still exists.
(I found out about it when private weather companies were mad they were giving out forecasts for free.. )
Their Alpha* work from DeepMind is actually quite good and has a good track record. LLM/Gemini - yeah, what you said, I wouldn't trust a word their team says.
I will point out that Physics and Chemistry are awarded by the Royal Swedish Academy of Sciences while the Peace Prize is awarded by a separate Norwegian committee, so it is plausible that one would be more respectable than the other. Literature is a completely different institution as is Physiology/Medicine.
The Peace prize and Literature prizes have far more questionable winners in my view than the Physics and Chemistry ones.
Economics wasn't added until the 1970s, and has had some extremely suspect winners.
For now, this is not so much about being "respectable", as about the prizes given in the Sciences are not so bound to ideology or politics.
AlphaFold/AlphaProteo is genuinely a major breakthrough in biochemistry. Now if they start to hand out prices in Physics, say, based on it's importance in promoting some specific ideological agenda, then I would be wary. (For instance if they give the price in Physics for "making the the field of Physics more relatable to transwomen in the Middle East".
Some people see it as obvious that hard science and politics have glaring differences that makes the meaningfulness of an award for being very good at one or the other very different.
The Nobel committee picks candidates for all the prizes, and the final decisions are split up over four different institutions.
The only argument I could see along these lines would be favoring the Royal Swedish Academy of Sciences in particular, but that would mean that chemistry, physics, and economics count but medicine doesn't count. And that's just confusing.
Aside from the fact you're conflating two very different things comparing this to Gemini, what exactly is the problem with the Gemini?
Specifically just that the release was a bit awkward and had some problems? I've found the latest model releases to be very good compared to other frontier models.
Probably a reference to when Google was widely perceived to rush out the first Gemini release, neglecting to mention it did lots of weird racist stuff.
A Nobel physicist that couldn't do basic arithmetic would definitely raise my eyebrows, but even taking your analogy at face value Gemini was not marketed as fast food slop. Google can't be trusted to hype products in a reliable way, regardless of their technical details.
It's trendy for accomplished people to talk down about themselves as a way to sound cute. It's similar to software developers who like to say "I don't know what I'm doing, I just Google and Stackoverflow all day." It has a certain charm to it, and certainly there are some people for whom it's true, but overall it's just a misguided attempt at being modest but ultimately a horribly misleading statement.
"Do not worry about your difficulties in mathematics, I assure you that mine are greater" - Albert Einstein
If they're using the same strategy they were a year or two ago when I first started hearing about Google AI making progress in this space then I think it's closer to the second thing.
IIRC, they had a model that was built to predict weather 15 minutes into the future and were just recursively running that to get days worth of predictions. If that's still the case then I would assume they have predictions for every 15 minutes. Back when I first read about that they were saying the accuracy started to drop off around the 7-10 day mark though, so not sure if they improved on that or changed their strategy though.
It does seem like this is one of those domains where new AI models could thrive. From my understanding, the amount and variety of data necessary to make these models work is huge. In addition to historical data, you've got constant satellite data, weather stations on the ground for data collection, weather balloons going high into the atmosphere multiple times daily per location, Doppler radars tracking precipitation, data from ships and other devices in the ocean measuring temps and other info, and who knows what else.
It's incredible that we are able to predict anything this far into the future, but the complexity seems like it lends itself to this kind of black box approach.
*This is all speculation, so I'd be grateful if someone more knowledgeable could tell if if I'm mistaken about these assumptions. It's an interesting topic.
The real reason why this is a domain where AI models can thrive is actually because we already have extremely complex, well-developed, physics-based simulations of the atmosphere which can take all of these observations and do useful things with them. So the task we currently ask AI models to do with regards to weather forecasting is actually much, much simpler than what you're proposing - we're simply asking the AI models to emulate what these existing models are capable of doing.
Actually doing useful things with the observations alone is a different story altogether. It's the fact that we've been developing computer simulations of the weather and climate since the 1950's that is the key enabler here.
>> But DeepMind said GenCast surpassed the precision of the center's forecasts in more than 97 percent of the 1,320 real-world scenarios from 2019 which they were both tested on.
Here's a slightly longer answer. When training machine learning models it's the done thing to test them on held-out data, and use the error on the held-out test data to estimate the accuracy of a model on truly unseen data that we really don't have- such as observations that are still in the future, like the weather tomorrow (as in 12/12/24) [1].
The problem is that held-out test data is not really unseen and when a model doesn't perform very well on it, it is common to tweak the model, tweak hyperparameters, tweak initialisation etc etc, until the model performs well on the held-out test data [2]; which ends up optimising the model on the held-out test data and therefore destroying any ability to estimate the accuracy of the model when predicting truly unseen data [3].
You can check Deepmind's paper and see if you can find where in the description of their methodology they explain what they did to mitigate this effect. You won't find it.
This is enough of a problem when the model is, say, an image classifier, but when it's a model that's supposed to predict the weather 15 days from now, the best you can say when you look at the results on held-out test data is: the model is doing fine when predicting the weather we had in the past.
____________
[1] Yes, that's why we test models on held-out data. Not so we can brag about their "accuracy", or so we can put a little leaderboard in our papers (a little table with all the datasets on one side, all the systems on the other side, and all our entries in bold or else we don't submit the paper) and brag about that. We're trying to estimate generlisation error.
[2] "The first iteration of my model cost hundreds of man-hours and thousands of dollars to code and train but if it doesn't perform well on the first try on my held-out test data I'm going to scrap it and start all over again from scratch".
Yeah. Right.
[3] Even worse: everyone splits their experimental dataset to 80/20% partitions, 80% for training and 20% for testing, and that already screws over the accuracy of any error estimates. Not only we're predicting 20% of our data from 80% of it, but we're predicting a tiny amount of data in absolute terms, compared to the true distribution.
I don’t really understand why Google and other companies making similar models are able to train on existing modelled or reanalysis data sets and then claim further accuracy than the originals. Sure, stacks of convolutions with multimodal attention blocks should be able to tease apart all the of idiosyncratic correlations that the original models may not have seen. But it’s unclear to me that better models is the direction to go in as opposed to better data.
> The model was trained on four decades of temperature, wind speed and air pressure data from 1979 to 2018 and can produce a 15-day forecast in just eight minutes—compared to the hours it currently takes.
Basically, they trained the model on old observations, not old predictions.
IE, imagine someone created a giant spreadsheet of every temperature observation, windspeed observation, precipitation observation, ect, and then told the computer to learn how to predict a column.
> GenCast is trained on 40 years of best-estimate analysis from 1979 to 2018, taken from the publicly available ERA5 (fifth generation ECMWF reanalysis) reanalysis dataset
They say they trained on era5 data. That is modeled data not only direct observations. They do it this way to have a complete global data set that is consistent in time and space.
Would be really cool to convert it's predictive model into a computer program that predicts written in like python/C/rust/whatever, and I think that would better serve our ability to understand the world.
We don't need to; dynamical meteorology is an incredibly mature field and our understanding of the fluid dynamics of the atmosphere grossly exceeds the resolutions and limitations of coarse, 0.25 degree global numerical models.
...why else do you think it's important to predict hurricane paths and tornado spawning storms and flooding rains and heat waves further ahead of time?
I see a fundamental issue issue from systems theory here, but do note that I'm by no means an expert at that, just had a bachelor course covering it together with control theory. The issue is that you would train a model on historical data, then you would use the model's predictions to make financial decisions. But, the moment you are using those predictions, you modified the system the model learned to predict. Now, maybe we can argue that, if you only invest a little money, you aren't really influencing the stock market. But I think the case would be very different if we're talking about big investors.
Yes, it is well studied that markets are not, in general, predictable but that doesn't mean that you cannot gain an advantage if you can extract some extra meaning, even a tiny one, that is what hedge funds, and other kind of sophisticated trading firms actually do. Here we are saying that they have an extra pattern recognition tool that they can use with a probability rate.
there probably already is, and it's deployed with a kill-switch one would hope at the HFT firms.
I don't have any insider knowledge but i'm pretty sure that one Quant group that everyone knows about probably figured out most of the "tricks" used to get llms and SD to really take off a decade before anyone else. iirc they consistently get like 30% or greater returns, which is practically black magic and means that they have something that can see a pattern(s) in the data.
I recently read that the 10 day forecast now is as accurate as the 5 day forecast from 30 years ago when I was a kid.
This surprised me. I grew up in Ohio and now I live in the Bay Area and the forecast here seems to be accurate only 2-3 days out. It would be so helpful to have an accurate 10 day forecast.
... there is probably a newer one, it continues to improve. Keep in mind that individual locations are always hit or miss compared to the average skill.
This also surprises me. The weather app on my iphone has been atrocious with local same-day forecasts lately. It will say that the forecast for noon is sunny when it is 11:50 and dumping rain. It sometimes gets the overall idea that warmer or colder weather is coming sometime soon but it’s off by up to 2 days. I haven’t bothered to look for better sources, but it might be worth doing some research and comparison.
as mentioned elsewhere, weather.gov has accurate enough predictions for stuff like rain.
It's good enough that i can confidently claim on a wednesday that i need to leave some place before 16:30 because there's a storm coming that i saw on weather.gov the preceeding sunday or monday. and there's a 75% chance that it will be raining at 16:30. This may not sound impressive, but in Louisiana knowing when a storm system is coming as opposed to some rain is probably pretty taxing on weather models; and is nearly universally good information to have.
I find i can't rely on the temperature and humidity readings, the "local prediction center" is in a concrete jungle (near an "international" airport, to boot), and "town" is consistently 7 degrees warmer than where i reside. Other than that, if it says temp below 32, it will freeze. if there's a heat advisory, we feel it too, just less so than town 20 miles away.
Hope this helps, and i hope other countries/regions do this as well as weather.gov - it just works! I do use ventusky for tropical system paths and the like, if the gov site allcaps weather briefing doesn't give me enough detail
my android phones always say "30c", no matter what settings i change - the built in weather thing is just broken.
As far as phone apps go, i don't know any good weather ones. I've never used ventusky on a cellphone, either.
It's so new that it has no long term track record. I want to see how its record looks after say 1/3/5 years. Any 15 day forecast is just too far into the future for me to take seriously. Even the 3 day forecast is loose at best. By the time it reaches 7-10 days, some of the forecast is completely off when it reaches the 3 day window.
"But DeepMind said GenCast surpassed the precision of the center's forecasts in more than 97 percent of the 1,320 real-world scenarios from 2019 which they were both tested on."
Great, so it's using historical data to make a prediction about what has already happened. I want to see it on current data where it is truly a prediction with no data to make that prediction.
I'm not convinced the results haven't been juiced/salted
>Great, so it's using historical data to make a prediction about what has already happened.
That's ok though because it doesn't know it's acting on old data. Researchers would give it data and ask for the 15 day forecast, the researchers then compare against that real world data. And as noted, "it surpassed the centres forecast in more than 97 percent" of those tests. This is all referred to as backtesting.
It's still not as good as actually new data. The individual model may not overfit the existing data, but the whole system of researchers trying lots of models, choosing the best ones, trying different hyperparameters etc. easily can.
In principle they trained on data up to 2017, validated (tried different hyper parameters) on data from 2018, and published results on data from 2019...
Overfitting is always bad by definition. The model learns meaningless noise that happens to help getting good results when applied to old data (be it due to trained weights, hyperparameters, whatever) but doesn't help at all on new data.
I wish these sort of papers would focus more on the 3 percent that it got wrong. Is it wrong by saying that a day would have a slight drizzle but it was actually sunny all day, or was it wrong by missing a catastrophic rain storm that devastated a region?
I've worked on several projects trying to use AI to model various CFD calculations. We could trivially get 90+ percent accuracy on a bunch of metrics, the problem is that its almost always the really important and critical cases that end up being wrong in the AI model.
i was using scikit-learn or some other scaffolding/boilerplating similar software for doing predictions and i hate massaging data so i just generated primes and trained it on the prime series to like 300 or 1000, then had it go on trying to "guess" future primes or assert if a number was prime or not that it had seen before
and hilariously it was completely wrong (like 12% hit rate or something). I complained online and i forget the exact scope of my lack of understanding but suffice to say i did not earn respect that day!
2019 would already be a post-training data prediction:
> GenCast is trained on 40 years of best-estimate analysis from 1979 to 2018, taken from the publicly available ERA5 (fifth generation ECMWF reanalysis) reanalysis dataset
It takes 8 minutes to produce a 15 day forecast. That's actually quite a long time for an AI model. I should probably read the paper to find out why but does anyone know? Is the model predicting the weather in 10 minutes time and just run iteratively 2000 times for a 14 day forecast?
> That's actually quite a long time for an AI model.
Sure, but it's several orders of magnitudes smaller than the parent, physics-based weather model it's emulating, which may take 2-3 hours of wallclock time on an HPC system with a few thousand cores allocated to running it.
> I should probably read the paper to find out why but does anyone know? Is the model predicting the weather in 10 minutes time and just run iteratively 2000 times for a 14 day forecast?
Basically, yes. It's run autoregressively but in 6-hour forecast intervals. The added complexity is the iterative nature of the diffusion process used to generate each of those forecast intervals.
1: Previous conditions in fluids cannot be determined with any kind of accuracy except for really obvious cases like "did this wet ground come from a cloud?" or laminar flows (https://www.youtube.com/watch?v=p08_KlTKP50). Since weather is fluid mixing at high speeds with low viscosity, there is a huge amount of turbulence and entropy. Entropy means systems are not time-reversible.
2: Can we predict weather 2000 or 1M years ago based on estimates of temperature and geography? Yes, pretty reliably. Vegetation and albedo are some of the biggest variables- plant transpiration puts huge amounts of water into the air and changes surface temperatures. But we have a pretty good idea of the geography, and relatively small changes don't have a huge impact on general weather trends.
3: Can we predict the exact weather on a day 100 years ago, given available measurements? No, not really. Without satellites you need radar, and without radar you need weather balloons. Coal burning also had impacts on weather. Low pressures at the surface can tell you that the weather may get worse, but it doesn't tell you where its coming from or what the higher airflows are doing.
Obviously this team knows way more about this donain than me but I have to ask, wouldnt this only be able to predict weather which is in line with past weather patterns/indicators? I can imagine a weather analyst might be able to see "between the data" and recognise when some anomaly might be brewing, but an AI model would not
Yup. There's fundamentally no way to automatically verify predictive accuracy, the assumptions in ML papers about such accuracy is (the almost always false presumption) that there are no distributional shifts in the data generating process.
Here, since weather certainly changes its fundamental patterns over time, there is no way of reliably predicting out-sample performance.
It's highly likely that in 1-2 years time we'd take all physics-based predictions and all predictions of current AI models, and find the AI model accuracy drops off a cliff when faced with a real out-sample.
> Here, since weather certainly changes its fundamental patterns over time, there is no way of reliably predicting out-sample performance.
This seems overly pessimistic.
The weather doesn't change its fundamental patterns over time; those are governed by the fluid dynamics of the atmosphere, and are largely forced (to first order) by things like the amount of solar radiation received by the sun and the the configuration of the continents - neither of which change much at all on timescales of a few thousand or tens of thousands of years. The total manifold of "realistic" weather patterns - especially trajectories of weather patterns conditioned on one or more prior weather sates - is vastly smaller than it would seem at first glance (there's only so much variation in how a Rossby wave can evolve and there is clear asymptotic behavior with regards to amplitude).
I think if you wanted an "ultimate stress test" of an AI weather forecasting system, you could run a a high-fidelity general circulation model equilibrated to the current climate freely for a several thousand year cycle, and randomly draw initial conditions to branch off a 10-day forecast. It shouldn't be _perfect_ (there is likely meaningful skew between the atmospheric state trajectories that arise in this free running simulation compared to reanalysis cycled over the historical observation record for a few decades), but the trends in error growth should be meaningful and directly interpretable in the context of a real-world forecast.
The causal mechanics are the same. ML works on very specific distributions of measurements of those patterns.
ML models are models of measures, not of the causal process which generates those measures. In this sense they arent 'models' at all, in the traditional scientific sense of the term.
Ultimately, a hybrid approach might win out in the end.
Use equations for what we know, use machine learning to fit parameters in the equations, as well as terms we don't know.
Prople nowadays can solve differential equations where some terms are "neural networks" and train those networks on data while numerically solving the equation. (some people call it "neural differential equations", if you want a search query to start)
Weather models can't compute some of the important physics that happens at sub-grid scale, and use various tricks to deal with that. How sure are you that these traditional models aren't also heavily tuned on past data to maximize performance? Perhaps they will also perform badly "out of distribution".
Causal physical models have no distributional requirements, so they are not sensitive to distribution shifts. ie., A causal model accounts for all possible distributions.
The kind of risk with a causal model is that the model itself, of reality, is incorrect. This is a different kind of risk than there being 'no model at all' as in the case of curve-fitting over historical data, which is radically more fragile to expected shifts.
In general, we're quite good at knowing the limitations of causal models, ie., specifying model risk here is much easier. You even, exactly, point out a known problem in the modelling which is impossible to state for an AI model.
Since the AI model is just a weak induction, there are no terms/structures/etc. within that model which can be analysed to understand what parts of it are sensitive to what aspects of the system from which the data was taken.
All we can say is, we know that in general, train/test distributions have to be "nearly exactly the same" for any of these methods to show anything like cross-val levels of accuracy. So we can very confidently predict when we know train/test wont be the same, that this is a mumbojumbo metric.
Indeed, in the vast majority of common ML examples you can, right now, just go and look at real out-sample data collected later than the "common dataset" and you'll find the val accuracy is random or worse-than-random despite arbitarily high cross-val scores.
The dataset which drives me most up-the-wall on this is house price datasets, or pricing datasets in general. Prices, generally, follow geometric brownian motion and nearly all ML models are extremelhy bad at modelling prices. So it's basically pseudoscience whenver anyone uses these datasets to demonstrate anything, esp. predictive accuracy.
> But DeepMind said GenCast surpassed the precision of the center's forecasts in more than 97 percent of the 1,320 real-world scenarios from 2019 which they were both tested on.
> The model was trained on four decades of temperature, wind speed and air pressure data from 1979 to 2018
The test set is taken from after the training set. I agree it seems likely that it will need to be continuously trained on new data to maintain it's accuracy going forwards, but this seems like a promising sign that it should work as long as you keep updating the model.
Of course the "can compute weather in 8 minutes" thing doesn't include the ongoing training cost... so the "this is much cheaper to run" claim seems slightly suspect.
It's much cheaper to run for Google as long as the ECMWF keeps cranking reanalysis for training data and keep generating new initial conditions for forecasts which are generated by combining new observations and the previous forecast generated by the ECMWF physical models which is not cheap to do.
I know nothing about weather, but aren't changes happening gradually instead of overnight? There's no major anomaly that appears all of a sudden. In which case we can assume the every new change will be incorporated in the AI model's training.
As mentioned the bomb cyclone in PNW this year, but also the hurricane that started in the Pacific, whittled down to a trop depression in the gulf (crossing all that land inbetween!) and then spun up into a gnarly hurricane that made a beeline for florida (iirc?). I distinctly remember a lack of worry about that storm system until it started spinning and collecting more stuff a few days after it had left land in central america. None of the models that i know of correctly predicted that it would re-spin, and even the latest updated models with planes flying in the storm still couldn't nail the eyepath - it was essentially off by two whole cities up until it actually made landfall, and then they were a lot more accurate about the land track.
i'd never heard of a storm doing that, but i am not really that in to weather, so it's possible it's in the training data somewhere?
I remember one time when I was at summit station on the Greenland ice sheet, one of the AI-driven weather models on Windy was predicting a physically impossible 40 degrees F (we were sitting on 3 km of ice, no way that can happen).
> A new artificial intelligence-based weather model can deliver 15-day forecasts with unrivaled accuracy and speed, a Google lab said, with potentially life-saving applications as climate change ramps up.
How hilariously dystopian. Instead of spending energy/power/influence to mitigate/solve/revert the problem, they’re actively making it worse to in the name of predicting how bad it’s going to be (to try and save themselves, which will ultimately fail).
What a weird take. How is google using that much energy? Is it just for their own operations or are they providing a good or service that billions are consuming?
By using a computer you are a consumer and are complicit. Your overall contribution might be small but it’s the same exact behavior that is absolutely part of the climate change problem.
You have zero knowledge of my consumption or my life. You also know nothing about my environmental care or actions. You’re extrapolating from my use of a computer to post comments to assume my use of energy is a net negative, which is as bad faith as it is wrong. It is possible to live in such a way that you reduce your impact to a minimum in some areas and then do a net positive in other.
Additionally, the number of people who consume Google’s services is irrelevant. Millions of people consume tobacco products too, it doesn’t make it positive.
stop posting, bro - you're literally killing the planet. turn off your device and send it away to be recycled, so someone else's descendants 100 years from now will have a few extra grams of copper. dig a hole somewhere in the wilderness and live out the rest of your life consuming nothing but grass and rainwater for sustenance. also try to breathe in moderation - you exhale CO2 every time, bringing the doomsday a bit closer, inch by inch.
That’s not what I said, and taking the argument in good faith would make it obvious. Of course AI isn’t to blame for climate change, it started way before this craze and there isn’t one single culprit. But the energy consumption doesn’t help, and burning more fossil fuels to power this energy-hungry systems makes it worse, that’s the point.
i think the point is that we have to pick and choose our battles, everything has an impact, but i think the impact of AI in terms of positives has incredible, massive upside given the negative externalities of the power generation used to power it.
sure, bitcoin is stupid, let's not do that. on the list of common sense things we should do to combat climate change, energy used for AI is like #100, especially considering the big names are trying to use nuclear ASAP to power it in a clean way
Which I can forecast 15 days out, too.
reply