In 2011, Google claimed that each search query takes about 0.3Wh [1]. Earlier this year, Sam Altman also claimed about 0.3Wh avg use per query for OpenAI.
I'm honestly surprised that they're so similar. I've thought of LLM queries as being far more energy-intense than "just" a Google search, but maybe the takeaway is that ordinary Google searching is also quite energy-intense.
If I as a user just wanted an answer to a dumb question like, say, the meaning of some genZ slang, it seems about an order of magnitude to ask a small LLM running on my phone than to make a google search.
(Check my math: assuming the A16 CPU draws 5 watts peak for 20sec running Gemma or whatever on my iPhone, that’s 0.03Wh to answer a simple query, which is 10x cheaper)
Are training costs (esp. from failed runs) amortized in these estimates?
Around 2008 a core step in search was basically a grep over all documents. The grep was distributed over roughly 1000 machines so that the documents could be held in memory rather than on disk.
Inverted indices were not used as they worked poorly for “an ordered list of words” (as opposed to a bag of words).
And this doesn’t even start to address the ranking part.
It seems highly unlikely that they did not use indices. Scanning all documents would be prohibitively slow. I think it is more likely that the indices were really large, and it would take hundreds to thousands of machines to store the indices in RAM. Having a parallel scan through those indices seems likely.
Wikipedia [1] links to "Jeff Dean's keynote at WSDM 2009" [2] which suggests that indices were most certainly used.
Then again, I am no expert in this field, so if you could share more details, I'd love to hear more about it.
I worked on search at Google around that timeframe, and it definitely used an index. As far as I know, it has from the very beginning.
You can solve the ordered list of words problem in ways that are more efficient than grepping over the entire internet (e.g. bigrams, storing position information in the index).
> the takeaway is that ordinary Google searching is also quite energy-intense.
A related takeaway should be that machine inference is pervasive and has been for years, and that defining "AI" to mean just chatbots is to ignore most of the iceberg.
> I'd still love to see a report that accurately captures training cost. Today's report[1] notably excludes training cost.
From 2022, so possibly out of date: "ML training and inference are only 10%–15% of Google’s total energy use for each of the last three years, each year split ⅗ for inference and ⅖ for training." That's probably close enough to estimate 50/50, or the full energy cost to deliver an AI result is double the inference energy.
My gosh you're right! The paper in question is https://arxiv.org/pdf/2204.05149, "The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink"
.3 Wh is 1080 joules. A liter of gasoline contains over 30 million joules. So this is like .034 milliliters of gasoline. But with grid power so even less than that since gasoline is very inefficient.
They just be doing something crazy because any time I query my local llm the lights in my office dim and the temperature rises a few degrees. Definitely far more energy than running the microwave for 1 second.
And if it does, you should get your wiring checked! If voltage is sagging enough to dim your lights with such a small load, that indicates a lot of resistance somewhere in the wiring, which could lead to fires.
Google was not using deep learning for search in 2011. Deep learning as a concept didn't really take off until AlexNet in 2012 anyway.
Various ML "learn-to-rank" tooling was in use at Google for a while, but incorporating document embedding vectors w/ ANN search into the ranking function probably happened over the course of 2018-2021 [1], I think. Generative AI only started appearing in ordinary search results in 2024.
With google serving AI overviews, now an average search query should cost more? Compute is getting cheaper but also algorithms getting more and more complex, increasing compute?
I'm honestly surprised that they're so similar. I've thought of LLM queries as being far more energy-intense than "just" a Google search, but maybe the takeaway is that ordinary Google searching is also quite energy-intense.
If I as a user just wanted an answer to a dumb question like, say, the meaning of some genZ slang, it seems about an order of magnitude to ask a small LLM running on my phone than to make a google search.
(Check my math: assuming the A16 CPU draws 5 watts peak for 20sec running Gemma or whatever on my iPhone, that’s 0.03Wh to answer a simple query, which is 10x cheaper)
Are training costs (esp. from failed runs) amortized in these estimates?
1: https://googleblog.blogspot.com/2009/01/powering-google-sear...