He suggests to limit the scope of the AI problem, add manual overrides in case there are unexpected situations, and he (rightly, in my opinion) predicts that the business case for exponentially scaling LLM models isn't there. With that context, I like his iPod example. Apple probably could have made a 3TB iPod to stick to Moore's law for another few years, but after they reached 160GB of music storage there was no usecase where adding more would deliver more benefits than the added costs.
I’m still waiting for SalesForce to integrate an LLM into Slack so I can ask it business logic and decisions long lost. Still waiting for Microsoft to integrate an LLM into outlook so I can get a summary of a 20 email long chain I just got CCed into.
I don’t think the iPod comparison is a valid one. People only have so much time to listen to music. Past a certain point, no one has enough good music they like to put into a 3TB iPod. However, the more data you feed into an LLM, the smarter it should be in the response. Therefore, the scale of iPod storage and LLM context is on completely different curves.
Haha, I'd be happy if outlook just integrated a search that actually works.
Most of outlook search results aren't even relevant, and it regularly misses things I know are there. Literally the most useless search I've ever had to use.
Irony: they did. They bought LookOut, which was a simple and extremely good search plugin for desktop Outlook. And then, somehow, it was melted down into the rather weak beer search that 365 has today.
There is an alternative, Lookeen, which positions itself as LookOut's successor, but I've yet to try it.
I can vouch for Lookeen (circa 2012-2015). I set it up for 200+ users on Citrix and it worked great. I had the index for each user saved to their network share and yet the search was instant. It even indexed shared mailboxes. It barely used any CPU when doing background indexing.
It worked well with thin OSTs too but due to how Outlook and Exchange work, it would have to rebuild the index more often.
Definitely a blast from the past reading the word “Lookeen” but mostly good memories about it. I believe the ADMX integration was pretty decent too.
Don’t get me started on Outlook’s search. I can try to search for an email that’s only a few weeks old and somehow it won’t find it. It will, however, find emails that are from over a decade ago.
The fact that one chap's decade-old freeware[0] is 100x better search than the current native tool of a trillion-dollar technology corp is my proof that there is a god, and their name is Loki.
> However, the more data you feed into an LLM, the smarter it should be in the response.
Is it that way? For example if it lacks a certain reasoning capability, then more data may not change that. So far LLMs lack useful ideas of truth, it will easily generate untrue statements. We see lots of hacks how to control that, with unconvincing results.
ChatGPT says, “Generally, yes, both large language models (LLMs) and humans can make better decisions with more context. … However, both LLMs and humans can also be overwhelmed by too much context if it’s not relevant or well-organized, so there is a balance to be struck.”
I'm being pedantic here but... I think the correct statement is "the more context an LLM or human has TO A POINT, the better decision it can make".
For example it's common to bury an adversary in paperwork in legal discovery to try to obscure what you don't want them to find.
Humans do not do better with excessive context and it is appearing that although you can go to 2m tokens etc. they don't actually understand. They ONLY do well at "find the needle in the haystack, here's a very specific description of the needle" tasks but nothing that involves simultaneously considering multiple parts of that.
Why are young children able to quickly surpass state-of-the-art ML models at arithmetic tasks, from only a few hours of lecturing and a "training dataset" (worksheets) consisting of maybe a thousand total examples?
What is happening in the human learning process from those few thousand examples, to deduce so much more about "the rules of math" per marginal datapoint?
Are they? Even before OpenAI made it hard to force GPT to do chain of thought for basic maths it usually took over a dozen digits per number before it messed up arithmetic when I tested it.
How many young children do you genuinely think would do problems like that without messing up a step before having drilled for quite some time?
I'm sure there are aspects to how we generalise that current LLM training processes does not yet capture, but so much of human learning processes involve repeating very basic stuff over and over again and still regularly making trivial mistakes because we keep tripping over stuff we learned how to do right as children but keep failing to apply it with sufficient precision.
Frankly, making average humans do these kind of things consistently right manually even for small numbers without putting a process of extensive checking and revision around it is an unsolved problem. And convincing an average human apply that kind of tedious process consistently is an unsolved problem.
> How many young children do you genuinely think would do problems like that without messing up a step before having drilled for quite some time?
You're overestimating how many examples "drilled for quite some time" represents. In an entire 12 years of public school, you might only do a few thousand addition problems in total. And yet you'll be quite good at arithmetic by the end. In fact, you'll be surprisingly good at arithmetic after your first hundred!
> I'm sure there are aspects to how we generalise that current LLM training processes does not yet capture, but so much of human learning processes involve repeating very basic stuff over and over again and still regularly making trivial mistakes because we keep tripping over stuff we learned how to do right as children but keep failing to apply it with sufficient precision.
LLMs fail when asked to do "short" addition of long numbers "in their heads." And so do kids!
But most of what "teaching addition" to children means, is getting them to translate addition into a long-addition matrix representation of the problem, so they can then work the "long-addition algorithm" one column at a time, marking off columns as they process them.
Presuming they can do that, the majority of the remaining "irreducible" error rate comes from the copying-numbers-into-the-matrix step! (And that can often be solved by teaching kids the "trick" of inserting commas into long numbers that don't already have them, so that they can visually group and cross-check numbers while copying.)
LLMs can be told to do a Chain-of-Thought of running through the whole long-addition algorithm the same way a human would (essentially, saying the same things that a human would think to themselves while doing the long-addition algorithm)... but for sufficiently-large numbers (50 digits, say) they still won't perform within an order-of-magnitude of a human, because "a bag of rotary-position-encoded input tokens with self-attention, where the digits appear first as a token sequence, and then as individual tokens in sentences describing the steps of the operation" is just plain messier — more polluted with unrelated stuff that makes it less possible to apply rigor to "finding your place" (i.e. learn hard rules as discrete 0-or-1 probabilities) — than an arbitrary-width grid of digits representation is.
People — kids or not — when asked to do long addition, would do it "on paper": using a constant back-and-forth between their Chain-of-Thought and their visual field, with the visual field acting as a spatially-indexed memory of the current processing step, where they expect to be able to "look at" a single column, and "load" two digits into their Chain-of-Thought that are indirected by their current visual attention cursor — with their visual field having enough persistence to get them back to where they were in the problem if they glance away; and yet with the ability to arbitrarily refocus the "cursor" in both relative and absolute senses depending on what the Chain-of-Thought says about the problem. Given an unbounded-length "paper" to work on, such a back-and-forth process can be extended to an unbounded-length processing sequence robustly. (Compare/contrast: a Turing machine's tape head.)
Pure LLMs (seq2seq models) cannot "work on paper."
If you consider what is even theoretically possible to "model" inside a feed-forward NN's weights — it can certainly have the successive embedding vectors act as "machine registers" to track 1. a set of finite-state machines, and 2. a set of internal memory cells (where each cell's values are likely represented by O(N) oppositional activations of vector elements representing each possible value the cell can take on.) These abstractions together are likely what allow LLMs to perform as well as they do on bounded-length arithmetic. (They're not memorizing; they're parsing!)
But given the way feed-forward seq2seq NNs work, they need a separate instance of these trained weights, and their commensurate embedding vector elements, for each digit they're going to be processing. Just like a parallel ALU has a separate bit of silicon dedicated to processing each bit of the input registers, an LLM must have a separate independent probability model for the outcome of applying a given operation to each digit-token "touched" on the same layer. Where any of these may be under-trained; and where, if (current, quadratic) self-attention is involved, the hidden-layer embedding-vector growth caused by training to sum really big numbers, would quickly become untenable. (And would likely be doubly wasted, representing the registers for each learned arithmetic operation separately, rather than collapsing down into any kind of shared "accumulator register" abstraction.)
---
That being said: what if LLMs could "work on paper?" How would that work?
For complete generality — to implement arbitrary algorithms requiring unbounded amounts of memory — they'd very likely need to be able to "look at the paper" an unbounded number of times per token output — which essentially means they'd need to be converted at least partially into RNNs (hopefully post-training.) So let's ignore that case; it's a whole architectural can of worms.
Let's look at a more limited case. Assuming you only want the LLM to be able to implement O(N log N) algorithms (which would be the limit for a feed-forward NN, as each NN layer can do O(N) things in parallel, and there are O(log N) layers) — what's the closest you could get to an LLM "working on paper"?
Maybe something like:
• adding an unbounded-size "secondary vector" (like the secondary vector of a LoRA), that isn't touched in each step by self-attention, and that starts out zeroed,
• with a bounded-size "virtual memory mapping" — a dynamic and windowed position-encoding of a subset of the vector into the Q/K vectors at each step, and a dynamic position-encoding of part of the resulting embedding (Q.KT.V) that maps a subset of the embedding vector back into the secondary vector
• where this position-encoding is "dynamic" in that, during training of each layer, that layer has one set of embedding vectors that it learns as being a "input-vocabulary memory descriptor table", describing the virtual-memory mappings of the secondary vector's state-at-layer-N into the pre-attention vector input at layer N [i.e. a matrix you multiply against the secondary vector, then add the result to the pre-attention vector]; and an equivalent "output-vocabulary memory descriptor table", mapping the post-attention embedding vector to writes of the secondary vector [i.e. a matrix you multiply against the post-attention embedding vector, then add to the secondary vector]
• and where the secondary vector is windowed, in that both memory-descriptor-table matrices are indicating positions in a window — a virtual secondary vector that actually exists as a 1D projection of a conceptually-N-dimensional slice of a physical secondary N-dimensional matrix; where each pre-attention embedding contains 2N elements interpreted as "window bounds" for the N dimensions of the matrix, to derive the secondary vector "virtual memory" from its physical storage matrix; and where each post-attention embedding contains 2N elements either interpreted again as "window bounds" for the next layer; or interpreted as "window commands" to be applied to the window (e.g. specifying arbitrary relative affine transformations of the input matrix, decomposed into separate scaling/translation/rotation elements for each dimension), with the "window bounds" of the next layer then being generated by the host framework by applying the affine transformation to the existing window bounds. (And again, with the output window bounds/windowing command parameters being learned.)
I believe this abstraction would give a feed-forward NN the ability to, once per layer,
1. "focus" on a position on an external "paper";
2. "read" N things from the paper, with each NN node "loading" a weight from a learned position that's effectively relative to the focus position;
3. compute using that info;
4. "write" N things back to new positions relative to the focus position on the paper;
5. "look" at a different focus position for the next layer, relative to the current focus position.
This extension could enable pretty complex internal algorithms. But I dunno, I'm not an ML engineer, I'm just spitballing :)
Pretty much, yep. There was definitely a more significant jump there in the middle where 7B models went from being a complete waste of time to actually useful. Then going from being able to craft a sensible response to 80% of questions to 90% is a much smaller apparent increase but takes a lot more compute to achieve as per the pareto principle.
Most data around is junk and the internet produces junk data faster then useful data and current GPT AIs basically regurgitate what someone already did somewhere on the internet. So I guess the more data we feed into GPTs the worse the results will get.
My take to improve AI output is to heavily curate the data you feed your AI, much the like expert systems of old (which were lauded as "AI" also.) Maybe we can break the vicious circle of "I trained my GPT on billions of Twitter posts and let it write Twitter posts to great sucess", "Hey, me too!"
There are multiple companies hiring people on contracts to curate and generate data for this. I do confidential contract work for two different ones at the moment, and while my NDAs limit how much I can say, it involves both identifying issues with captured prompt/response pairs that have been filtered, and writing synthetic ones from scratch aided by models (e.g. come up with a coding problem, and rewrite the response to be "perfect").
The first category has obviously been pre-filtered to put cheaper resources in simpler problems, as sometimes these projects pays reasonable tech contract rates for 1-2 hours of work to improve only 2-3 conversation turns of a single conversation, and it's clear they usually involve more than one person reviewing the same data.
A lot of money is pouring into that space, and the moats in the form of proprietary training data heavily curated by experts is going to be growing rapidly given how much cash the big players have.
Thanks for your insights! Would you say, this is an approach suited for "general" GPTs (not in the sense of AGI) or more for expert systems like Copilot?
I can't really say I know whether the outcomes are good as I won't be told to what extent the output makes it into production models, and I don't even always know which company it's for. But I know at least some of it is being used for "general" models. I do more code-related work than general purpose as it's the work I find most interesting, but the highest paid contract I've had in this space so far is for a general-purpose model that to my knowledge isn't available yet, for a model from a company you'd know (but I'm under strict NDA not to mention the company name or more details about the work).
Thankfully, we have stalwart and well-known defenders of our security like Apple and Microsoft to protect us. There's nothing of the sort to worry about.
Unlikely to happen. Orgs that use MS products do not want content of emails leaking and LLMs leak. There is a real danger that an LLM will include information in the summary that does not come from the original email thread, but from other emails the model was trained on. You could learn from the summary that you are going to get fired, even though that was not a part of the original conversation. HR doesn't like that.
> However, the more data you feed into an LLM, the smarter it should be in the response
Not necessarily. At some point you are going to run out of current data and you might be tempted to feed it past data, except that data may be of poor quality or simply wrong. Since LLMs cannot tell good data from bad, they happily accept both leading to useless outputs.
Didn't they already do this? A friend of mine showed me his outlook where he could search all emails, docs, and video calls and ask it questions. To be fair, he and I asked it questions about a video call and a doc - but not any emails, we only searched emails.
This was last week amd it worked "mostly OK," but having a q/a conversation with a long email feels inevitable
Asking questions about a document is one thing; asking questions that synthesize information across many documents — the human-intelligent equivalent of doing a big OLAP query with graph-search and fulltext-search parts on your email database — is quite another.
Right now AFAICT the latter would require the full text of all the emails you've ever sent, to be stuffed into the context window together.
The definitely added it to the web version n LinkedIn, you can see it when you want to write or reply to a message and it gives you an option to "Write with AI".
> There is a real danger that an LLM will include information in the summary that does not come from the original email thread, but from other emails the model was trained on. You could learn from the summary that you are going to get fired, even though that was not a part of the original conversation. HR doesn't like that.
There could be separate personal fine-tunes per user, trained (in the cloud) on the contents of that user's mail database, which therefore have knowledge of exactly the mail that particular user can access, and nothing else.
AFAICT this is essentially what Apple is claiming they're doing to power their own "on-device contextual querying."
> There could be separate personal fine-tunes per use
Yes, but that contradicts the earlier claim that giving AI more info makes it better. If fact, those who send few emails or just joined may see worse results due to the lack of data. LLMs really make us hard to come up with ideas on how to solve problems that did not exist without LLMs.
> Still waiting for Microsoft to integrate an LLM into outlook so I can get a summary of a 20 email long chain I just got CCed into.
Still waiting Microsoft to add a email search to Outlook that isn’t complete garbage. Ideally with a decent UI and presentation of results that isn’t complete garbage.
…why are we hoping that AI will make these products better, when they’re not using conventional methods appropriately, and have been enshittified to shit.
Think about an AI with a 1 bit model. If you feed that AI data that can't possibly be classified into less than 2 bits, it can't get it precisely right, no matter how much data you train it on, or what the 1 bit of the model represents.
For any given size of system, there will be a ceiling on what it can learn to classify or predict with precision.
I used "system" rather than "model" there for a reason:
Memory in any form, such as context and RAG or API access to anything that can store and retrieve data affects the maximum - a turing machine can be implemented with a very small model + a loop if there's access to an external memory to act like the tape. But if the "tape" is limited, there will be some limitation on what the total system can precisely classify.
Both of these already exist. Slack just introduced AI and copilot for M365 products has been available for quite a while now. It works great, I use it every day.
It's a very weird comparison, as putting more music tracks to your iPod doesn't make them sound better, while giving a LLM more parameters/computing power make it smarter.
Honestly it sounds like a typical "I've drawn my conclusion, and now I only need an analogy that remotely supports my conclusion" way of thinking.
No, it makes sense: he's coming at it from the perspective of knowing exactly what task you want to accomplish (something like "fixing the grammar in this document.") In such cases, a model only has to be sufficiently smart to work for 99.9999% of inputs — at which point you cross a threshold where adding more intelligence is just making the thing bulkier to download and process, to no end for your particular use-case.
In fact, you would then tend to go the other way — once you get the ML model to "solve the problem", you want to then find the smallest and most efficient such model that solves the problem. I.e. the model that is "as stupid as possible", while still being very good at this one thing.
If you have no conception of mathematics, do you think you'd get better at solving mathematics problems based on looking at more examples of people who may or may not be solving them correctly?
Only after humanity became a general intelligence. The previous ancestors were also pretty smart, but not smart enough to develop technology on their own, you have to be extremely smart to do what humanity did.
Is there a reason why memory was used and not compute power as an example? I don't understand how cherry picking random examples from past explain future of AI. If he think business needs does not exist he should explain how he arrived at that conclusion instead of a random iPod example.
It's an analogy. He's making the point that even though something can scale at an exponential rate, it doesn't mean there is a business need for such scaling
This. The scaling of compute has vastly different applications than the scaling of memory. Shows once again that people who are experts in a related field aren't necessarily the best to comment on trendy topics. If e.g. an aeroplane expert critiques Spacex's starship, you should be equally vary, even though they might have some overlap. The only reason this is in the media at all is because negative sentiment to hype generates many clicks. That's why you see these topics every day instead of Rubik's cube players criticising the latest version of Mikado.
The business case is absolutely there, it's just the industry has weirdly latched onto 'chatbot' as the usecase as opposed to where the real value lies.
The pretrained model is where the enterprise gold is at.
But the companies building the models past the tipping point scale for that value to be derived are walling up their pretrained model behind very heavy handed fine tuning that strips away most of the business value.
The engineers themselves seem to lack the imagination for the business cases, and the enterprise market doesn't have access to start discovering the applications outside of 'chatbot,' particularly with large context windows of proprietary data fed into SotA pretrained models.
There's maybe a handful of people who actually realize what value is being left on the table, and I think most of them are smart enough not to currently be in positions to make it happen.
That makes sense from Brooks' perspective. He's done his best work when he didn't try to overreach. The six-legged insect thing was great, but very dumb. Cog was supposed to reach human-level AI and was an embarrassing dud. The Roomba was simple, dumb, and useful. The military iRobot machines were good little remote-controlled tanks. The Rethink Robotics machines were supposed to be intelligent and learn manipulation tasks by imitation. They were not too useful and far too expensive. His new mobile carts are just light-duty AGVs, and compete in an established market.
Close. Brooks once described the Roomba algorithm as something like this:
1. Go in a random direction in a straight line for a while.
2. Then spiral outwards until hit something.
3. Try turning some random angle. If repeatedly hitting a wall, go into wall following mode for a while. Otherwise, go back to step 1.
If you run this long enough to travel over 2x the actual floor area, the odds of achieving near full coverage are pretty good. Who needs navigation?
> He uses the iPod as an example. For a few iterations, it did in fact double in storage size from 10 all the way to 160GB. If it had continued on that trajectory, he figured out we would have an iPod with 160TB of storage by 2017, but of course we didn’t.
I think Brooks' opinions will age poorly, but if anyone doesn't already know all the arguments for that they aren't interested in learning them now. This quote seems more interesting to me.
Didn't the iPod switch from HDD to SSD at some point, and they focused on shrinking them rather than upping storage size? I think the quality of iPods have been growing exponentially, we've just seen some tech upgrades on other axises that Apple thinks are more important. AFAIK, looking at Wikipedia, they're discontinuing iPods in favour of iPhones where we can get a 1TB model and the disk size trend is still exponential.
The original criticism of the iPod was that it had less disk space than its competitors and it turned out to be because consumers were paying for other things. Overall I don't think this is a fair argument against exponential growth.
Exponentials have a bad habit of turning into sigmoids due to fundamental physical and geometric, and also practical, constraints. There’s the thing called diminishing returns. Every proven technology reaches maturity; it happened to iPods, it happened to smartphones, it happened to digital cameras and so on. There’s still growth and improvement, but it’s greatly slowed down from the early days. But that’s not to say that there won’t be other sigmoids to come.
If you haven’t noticed how growth in storage capacity of HDDs in general screeched to a relative halt around fifteen years ago? The doubling period used to be about 12-14 months; every three or four years the unit cost of capacity would decrease 90%. This continued through the 90s and early 2000s, and then it started slowing down. A lot. In 2005 I bought a 250 GB HDD; at the same price I’d now get something like a 15 EB drive if the time constant had stayed, well, constant.
There is, of course, always a multitude of interdependent variables to optimize, and you can always say that growth of X slowed down because priorities changed to optimize Y instead. But why did the priorities change? Almost certainly at least partislly because further optimization of X was becoming expensive or impractical.
> quality has been growing exponentially
That’s an entirely meaningless and nonsensical statement unless you have some rigorous way to quantify quality.
> That’s an entirely meaningless and nonsensical statement unless you have some rigorous way to quantify quality.
We have a qualitative measure of quantity, that was where the discussion started - size in bytes.
Storage size in bytes on a pocket iDevice seems to be growing exponentially. Brooks said it stopped doubling and I don't think that is true. iDevices are still doubling their storage size every 2-4 years or so and have been for a while. There was a one-time change to SSDs where they lost a few generations and they are only up to 1TB instead of 160TB (suggesting SSDs are around 7 generations behind HDD which seems reasonable on the face of it to me). But apart from that it has been a pretty steady doubling every few years.
His claim was that iPods didn't get to 160TB because it "nobody actually needed more than that" and that observation is misunderstanding what happened. Apple switched from HDD to SSD because SSD is a much better fit and put the rate of growth back by a few generations but the growth is still ongoing and probably will reach 100s of TB sooner or later.
He was right that iPods didn't need >a few gig of storage ... but that just meant Apple discontinued the iPod brand and replaced it with iPhones, where there is no usage barrier to consuming terabytes of storage. They were obsoleted by the very trend he claimed was over! It appears Apple thinks people did need more, because they aren't making iPods any more.
Most rigorous definitions of quality incorporate a target expectation that is impossible to exceed. You are removing errors, so best case, exponential improvement in quality adds nines - a la the sigmoid function you mention.
> Overall I don't think this is a fair argument against exponential growth.
Do we still need to make a fair claim against unrestricted exponential growth in 2024? Exponential growth claims have been made countless times in the past, and never delivered. Studies like the Limits to Growth report (1972) have shown the impossibility of unrestricted growth, and the underlying science can be used in other domains to show the same. There is no question that exponential growth doesn't exist, the only interesting question is how to locate the inflexion point.
Apparently the only limitless resource is gullible people.
A slowdown already happened once (the original pace was doubling every year), and I believe it's widely accepted that we're not going to keep the current pace for long.
When self driving cars started driving around public in 2010 that was a universally recognized success of self driving cars. That shocked many people, as before then self driving cars was entirely science fiction.
There was of course never a recognized success of FSD, since we don't have FSD, but when people started seeing cars drive themselves on public streets 2010 they assumed we would have FSD in 5-10 years, but we still barely have restricted self driving cars today 14 years later.
Arguing from a lack of personal imagination is not a strong position. It is the people with ideas who are responsible for finding uses for resources. They've succeeded every other time they were given an excess of storage space; I'm still not sure how I manage to use up all the TB of storage I've bought over the years.
Maybe we store a local copy of a personal universe for you to test ideas out in, I dunno. There'll be something.
> I probably would never listen to more than a few GBs of high fidelity music in my life time.
Well from that I'd predict that uses would be found other than music. My "music" folder has made it up to 50GB because I've taken to storing a few movies in it. But games can quickly add up to TB of media if the space is available.
Storage did become cheaper and more compact. Both flash drives and SD cards offer this functionality and showed significant improvements.
Whatever innovative use cases people could come up with to store TBs of data in physical portable format is served by these. And along the way the world has shifted to storing data more cheaply and conveniently on the cloud instead.
IPod being a special purpose device , with premium pricing (for average global consumer) and proprietary connectors and software would not have made a compelling economic case to over-spec it in hopes that some unforseen killer use case might emerge from the market.
> IPod being a special purpose device , with premium pricing...
Well, yes but if we're literally talking the iPod, it has been discontinued. Because it has been replaced by devices - with larger storage, I might note - that do more. I'm working from the assumption that as far as Brooks was talking iPhones basically are iPods. This is why the argument that the tech capped out seems suspect to me. The tech kept improving exponentially and we're still seeing storage space in the iPod niche doubling every few years.
> I probably would never listen to more than a few GBs of high fidelity music in my life time.
FWIW, I currently have about 100 GB of music on my phone. And that is in a fairly high quality AAC format. Converted to lossless, it might be about five times that size? I don't even think of my music collection as all that extensive. But still, 160 TB would be a different ball game altogether. For sure, there is no mass market for music players with that sort of capacity. (Especially now that streaming is taking over.)
Don’t dismiss someone offhand because they disagree with you, they may really have never heard your argument.
> AFAIK, looking at Wikipedia, they're discontinuing iPods in favour of iPhones where we can get a 1TB model and the disk size trend is still exponential.
iPods were single-purpose while iPhones are general computers. While music file sizes have been fairly consistent for a while, you can keep adding more apps and photos. The former become larger as new features are added (and as companies stop caring about optimisations) while the latter become larger with better cameras and keep growing in number as the person lives.
The iPod didn't stop growing. It turned into an iPhone - a much more complex system which happened to include iPod features, almost as a trivial add-on.
If you consider LLMs as the iPod of ML, what would the iPhone equivalent be?
Large multimodal models. Maybe using diffusion transformers. Possibly integrated into lightweight comfortable AR glasses via Wifi so they can see everything you can, optionally automatically receiving context such as frame captures. Optionally with a very realistic 3d avatar "teleported" into your space.
Probably the kind of intuitive AI assistant you can find in a lot of science fiction works. A virtual assistant that can search the web for you, write/proofread an essay, analyse social situations, and enrich the right bits of your personal information for you. Basically a smart assistant integrated into a vr headset.
I'd rather consider the current generation of LLMs as the first iPhone (or maybe flip phone, depends on where you stand), given that hundreds of billions of dollars in private money is already going into developing them.
He made an analogy to discuss the business case for scaling an iPod, not whether or not new products and services would be invented. My iPhone still only has about 64gb of storage.
I think there is some kind of "bias" in how we test LLMs. Most (all?) benchmarks, either those coming from the industry and academia or those end-users like you or me may run on a couple examples all seem to compare LLMs answers with expected answers. This doesn't capture the extent to which one can augment his abilities using LLMs for cases where we don't know the expected answer (and this is precisely why we often turn to LLMs).
One instance of this was when I was able to extend features of a partial std::functional port for the AVR platform and was able to achieve my goals by asking ChatGPT to generate rather complex C++ template code that would have taken me several days to figure out since I'm not a C++ programmer. In about two hours and several back-n-forth between the code and ChatGPT's interface, I was able to integrate the modifications (about 50 LOCs) and save me the daunting and frustrating task of rewriting around 2000 LOCs I had written for the espressif platform. This is what I would have done if ChatGPT wasn't around.
In this context, look-good-but-broken examples are not really a problem when you can identify what is wrong and communicate the problems back to GPT. These cases do not bode well when we are asserting the correctness and autonomy of AI systems, but they are not as problematic when one seeks to augment his own abilities.
From my experience it depends on the length of the task.
For ~50 LOC examples ChatGPT can consistently modify the code to add some new parameter or change some behaviour etc.
For using new external APIs it hallucinates often - that said, all of my changes to my static Hugo websites: new shortcodes, modifying short codes like "change the list of random articles to only include articles that have the same 'type'" works excellent - are done by ChatGPT without problems.
> .com crash look like peanuts.
I think what people get wrong about the .com crash: It wasn't a technology crash but a crash of overvalued companies and the sudden fear of VCs. Internet usage and new applications just grew and grew. There was no internet technology crash - and many successful companies like Amazon and eBay just kept working - the pet.com's of the world died (and sadly my Wiki/Blog/Onthology startup died too)
Also, the author focuses on the fact that LLMs are not much better at things that robots already do, such as moving stuff in a store. Yeah. But they are surprisingly good at many things that robots already didn't do, such as writing texts and composing songs.
It's like getting a flying car and saying "meh, on a highway it's not really much faster than the classical car". Or getting a computer and saying that the calculator app is not faster than actual calculator.
A robot powered by some future LLM may not be much better at moving stuff, but it will be able to follow commands such as "I am going on a vacation, pack my suitcase with all I need" without giving a detailed list.
The iPod analogy is a poor one. Instead of 160TB music players, we got general computers (iPhone) with effectively unlimited storage (wireless Internet).
I don’t need to store all my music on my device. I can have it beamed directly to my ears on-demand.
The music part is just scaling in a different way, but that's not the reason the iPhone is so successful. The leap from iPod to iPhone was definitely not just scaling.
We'll see GPT-5 in a few months and that will be vastly more useful information to update your sense of whether the current approach will continue to work than anyone's speculation today.
In a few months, or in 18 months? Mira Murati said this month that the next "PhD level intelligence" models will be released in a year or a year and a half. Everything points to GPT-5 not happening this year, especially given that they've only started the training officially last month.
I feel like "generative" might be the worst possible label, because while the generative capabilities are the most exciting features, they're not the most useful, and in most cases they aren't useful at all.
But the sentiment analysis, summaries, and object detection seem incredibly capable and like the actual useful features of LLMs and similar tensor models.
"Generative" is a great technical term for LLMs because it is what they do. But it a bad marketing term because it doesn't describe what they can do well.
>object detection
This comes from a different class of machine learning models unless I am mistaken?
"He says the trouble with generative AI is that, while it’s perfectly capable of performing a certain set of tasks, it can’t do everything a human can"
This kind of strawman "limitations of LLMs" is a bit silly. EVERYONE knows it can't do everything a human can, but the boundaries are very unclear. We definitely don't know what the limitations are. Many people looked at computers in the 70s and saw that they could only do math, suitable to be fancy mechanical accountants. But it turns out you can do a lot with math.
If we never got a model better than the current batch then we still would have a tremendous amount of work to do to really understand its full capabilities.
If you come with a defined problem in hand, a problem selected based on the (very reasonable!) premise that computers cannot understand or operate meaningfully on language or general knowledge, then LLMs might not help that much. Robot warehouse pickers don't have a lot of need for LLMs, but that's the kind of industrial use case where the environment is readily modified to make the task feasible, just like warehouses are designed for forklifts.
I don't think you and him are in disagreement. I read it as him saying "evaluating LLMs is extremely difficult and a big problem right now is that many people are treating them as basically human in capability".
Its the opposite problem to the perception of computers in the 70s, early computers were seen by some as too alien to be as useful as a person across most tasks, llms are seen by some as too human to not be as useful as a person across most tasks. They are both wrong in surprisingly complex ways.
Not sure if said in such generic ways. You need to define what "smarter" means. E.g. ChatGPT probably outperforms most people at math. Does it make it smarter than most people?
It might seem silly... but that's likely because you understand them better than most. He is talking about a real human problem - we see a thing (or person) do X well, and assume it's the result of some very general capability. With humans we tend to call it "halo effect". People are seeing LLMs to some insanely good stuff but don't know how they work and assume all kinds of stuff. It's an "AI halo effect".
After using Copilot that is pretty bad at guessing what I exactly want to do, but still occasionally right on the money and often pretty close: AI is not really AI and it won't kill us all, but the realization is that a lot of work is just repetitive and really not that clever at all. If I think about all the work I did in my life it follows the same pattern: a new way of doing things comes along, then you start figuring out how to do it and how to use it, and once you're there you rinse and repeat. The real value in the work will be increasingly in why is it useful for people using it, although it probably was like this always, the geeks just didn't pay attention to it.
Have you ever used gpt-4o, voice mode or Claude 3.5 Sonnet, or any other leading models, or are you really basing your entire assessment on Copilot?
OoenAI hasn't even released their best model/capabilities to the public yet. Their text to image capabilities they show in their gpt-4o page are mine-blowing. The (unreleased) voice mode is a huge step up in simulating a person and often quite convincing.
> a new way of doing things comes along, then you start figuring out how to do it and how to use it, and once you're there you rinse and repeat
This happens a billion times a month in chatGPT rooms. User comes with a task, maybe gives some references and guidance. The model responds. User gives more guidance. And this iterates for a while. The LLM gets tons of interactive sessions, it can learn how to rank the useful answers higher. This creates a data flywheel where people generate experience and LLMs learn and iteratively improve. LLMs have the tendency to make people bring the world to them, they interact with the real world through us.
I actually said it can read my mind which I interpret as my work being repetitive and not that clever. It can still read my mind poorly most of the time, although it's in the ballpark, but on occasion it hammers one home which feels like magic. Whether or not it can get good at reading my mind or not I don't know. But whether it can get good at doing non repetitive tasks that have been done a billion times, well, it feels like a no right now.
I like Data, but I never understood the concept. What does he really ad above the ships computer? Why have the ship computer walk around (I know feelings and stuff).
In Ian Banks universe AIs also have avatars, but for the sole benefit of humans, they like to interact with avatars more than with a voice in their head.
Data & the Ship's Computer are separate entities running on different substrates. I think that Data is an example of "AGI", while the Ship's Computer is excellent at responding to natural language queries.
ChatGPT and other world-knowledge AIs are proven super intelligences in that they have a broader range of knowledge than any human who has ever lived and they are faster than any human who has ever lived.
Anyone who can not acknowledge this lacks the discernment to understand that when they ask a computer program a question and get an answer no human could have given, that is super intelligence.
I use Midjourney for my work here and there. It let me done job faster, however it would be useless without my years of experience with visual production. It still require knowledge of postprocessing, drawing, imagination, photomanipulation, creating assets... And a taste which is subjective but it helps a lot.
Aesthetic is complex subject but cannot be generated if task is more precise.
Can someone translate what that means? I'm struggling to read past that line, since I just can't wrap my head around what a "Panasonic Professor" is. Does that word refer to something other than the corporation in this context?
i dont know much about machine learning but what i think i know is that its getting an outcome based on averages of witnessed data/events. so how's it going to come up with anything novel? or outside of normal?
You have some input, which may include labels or not.
If it does have labels, ML is the automatic creation of a function that maps the examples to the labels, and the function may be arbitrarily complex.
When the function is complex enough to model the world that created the examples, it's also capable of modelling any other input; this is why LLMs can translate between languages without needing examples of the specific to-from language pair in the training set.
The data may also be synthetic based on the rules of the world; this is how AlphaZero beats the best Go and chess players without any examples from human play.
ML is (smooth) surface fitting. That's mostly it.
But that is not undermining it in any way. Approximation theory and statistical learning have long history, full of beautiful results.
The kitchen sinks that we can throw at ML are incredibly powerful these days. So, we can quickly iterate through different neural net architecture configurations, check accuracy on a few standard datasets and report whatever configuration that does better on Archiv. Some might call it 'p-hacking'.
It can't. Without the ability to propose a hypothesis and then experimentally test it in the physical real world, no ML technique or app can add new information to the world. The only form of creativity possible using AI (as it exists today) is to recombine existing information in a new way -- as a musical composer or jazz artist creates a variation on an theme that already exists. But that can't be compared to devising something new that we would call truly creative and original, especially novel work that advances the frontier of our understanding of the world, like scientific discovery.
It's predicting a function. You train it on known inputs (and potentially corresponding known outputs). You get a novel output by feeding it a novel input.
For example asking an LLM a question that nobody has even asked it before. The degree to which it does a good job on those questions is called "generalisation".
Were one to slice the corpus callosum,
and burn away the body,
and poke out the eyes..
and pickle the brain,
and everything else besides...
and attach the few remaining neurones
to a few remaining keys..
then out would come a program
like ChatGPT's --
"do not run it yet!"
the little worker ants say
dying on their hills,
out here each day:
it's only version four --
not five!
Queen Altman has told us:
he's been busy in the hive --
next year the AI will be perfect
it will even self-drive!
Thus comment the ants,
each day and each night:
Gemini will save us,
Llama's a delight!
One more gigawat,
One more crypto coin
One soul to upload
One artist to purloin
One stock price to plummet
Oh thank god
-- it wasn't mine!
Don't worry about where it sleeps in your tool cabinet so much. Put that fastidiousness to better use by reaching in with your intellect to see what favorable positions you can tickle the LLM into. None to be found you say? In the tool world an LLM is closer to a mirror than a hammer.
It certainly did take off, it's called "Subsumption Architecture", and Rodney Brooks started iRobot, who created the Roomba, which is based on those ideas.
Subsumption architecture is a reactive robotic architecture heavily associated with behavior-based robotics which was very popular in the 1980s and 90s. The term was introduced by Rodney Brooks and colleagues in 1986.[1][2][3] Subsumption has been widely influential in autonomous robotics and elsewhere in real-time AI.
iRobot Corporation is an American technology company that designs and builds consumer robots. It was founded in 1990 by three members of MIT's Artificial Intelligence Lab, who designed robots for space exploration and military defense.[2] The company's products include a range of autonomous home vacuum cleaners (Roomba), floor moppers (Braava), and other autonomous cleaning devices.[3]
Okay, interesting, but I meant nobody actually ended up sending thousands of little robots to other planets. No doubt the research led to some nice things.
Edit: the direct sensory-action coupling idea makes sense from a control perspective (fast interaction loops can compensate for chaotic dynamics in the environment), but we know these days that brains don't work that way, for instance. I wonder how that perspective has changed in robotics since the 90s, do you know?
About four years before Rodney Brooks proposed Subsumption Architecture, some Terrapin Logo hackers from the MIT-AI Lab wrote a proposal for the military to use totally out-of-control Logo Turtles in combat, in this article they published on October 1, 1982 in ACM SIGART Bulletin Issue 82, pp 23–25:
>At Terrapin, we feel that our two main products, the Terrapin Turtle ®, and the Terrapin Logo Language for the Apple II, bring together the fields of robotics and AI to provide hours of entertainment for the whole family. We are sure that an enlightened application of our products can uniquely impact the electronic battlefield of the future. [...]
>Guidance
>The Terrapin Turtle ®, like many missile systems in use today, is wire-guided. It has the wire-guided missile's robustness with respect to ECM, and, unlike beam-riding missiles, or most active-homing systems, it has no radar signature to invite enemy missiles to home in on it or its launch platform. However, the Turtle does not suffer from that bugaboo of wire-guided missiles, i.e., the lack of a fire-and-forget capability.
>Often ground troops are reluctant to use wire-guided antitank weapons because of the need for line-of-sight contact with the target until interception is accomplished. The Turtle requires no such human guidance; once the computer controlling it has been programmed, the Turtle performs its mission without the need of human intervention. Ground troops are left free to scramble for cover. [...]
>Because the Terrapin Turtle ® is computer-controlled, military data processing technicians can write arbitrarily baroque programs that will cause it to do pretty much unpredictable things. Even if an enemy had access to the programs that guided a Turtle Task Team ® , it is quite likely that they would find them impossible to understand, especially if they were written in ADA. In addition, with judicious use of the Turtle's touch sensors, one could, theoretically, program a large group of turtles to simulate Brownian motion. The enemy would hardly attempt to predict the paths of some 10,000 turtles bumping into each other more or less randomly on their way to performing their mission. Furthermore, we believe that the spectacle would have a demoralizing effect on enemy ground troops. [...]
>Munitions
>The Terrapin Turtle ® does not currently incorporate any munitions, but even civilian versions have a downward-defense capability. The Turtle can be programmed to attempt to run over enemy forces on recognizing them, and by raising and lowering its pen at about 10 cycles per second, puncture them to death.
>Turtles can easily be programmed to push objects in a preferred direction. Given this capability, one can easily envision a Turtle discreetly nudging a hand grenade into an enemy camp, and then accelerating quickly away. With the development of ever smaller fission devices, it does not seem unlikely that the Turtle could be used for delivery of tactical nuclear weapons. [...]
See why today's hackers aren't real hackers? Where are the mischievous hackers hacking Roombas to raise and lower a pen and scrawl dirty messages on their owners' clean floors? Instead what we get is an ELIZA clone that speaks like a Roomba sucked all the soul out of the entire universe.
There will never be, and can never be, "artificial intelligence". (The creation of consciousness is impossible.)
It's a fun/interesting device in science fiction, just like the concept of golems (animated beings) are in folk tales. But it's complete nonsense to talk about it as a possibility in the real world so yes, the label of 'machine learning' is a far, far better label to use for this powerful and interesting domain.
I'll happily engage in specifics if you provide an argument for your position. Here's mine (which is ironically self-defeating but has a grain of truth): single-sentence theories about reality are probably wrong.
I just went back and added a parenthetical statement after my first sentence before seeing this reply (on refresh).
> The creation of consciousness is impossible.
That's where I'd start my argument.
Machines can 'learn', given iterative training and some form of memory but they can not think nor understand. That requires consciousness, and the idea that consciousness can be emergent (which it is my understanding that the 'AI' argument rests upon), has never been shown. It is an unproven fantasy.
No, you don't have any proof for the creation of consciousness, including human consciousness (which is what I understand you are referring to).
In my view, and in the view of major religions (Hinduism, Buddhism, etc) plus various philosophers, consciousness is eternal and the only real thing in the universe. All else is illusion.
You don't have to accept that view but you do have to prove that consciousness can be created. An 'existence proof' is not sufficient because existence does not necessarily imply creation.
The Buddha did not teach that only consciousness is real. He called such a view speculative, similar to a belief that only the material world is real. The buddhist teaching is, essentially, that the real is real and ever changing and that consciousness is merely a phenomenon of that which is reality.
Cheers, I should have been more precise in my wording there. I should have referred to some (Idealist) schools of thought in Buddhism & Hinduism, such as the Yogachara school in Buddhism.
Particularly with Hinduism, its embrace of various philosophies is very broad and includes strains of Materialism, which appears to be the other person's viewpoint, so again, I should have been more careful with my wording.
Look, I don't want to get into the weeds of this because personally I don't think it's relevant to the issue of intelligence, but here's a list of things I think are evident about consciousness:
1. People have different kinds of conscious experience (just talk to other humans to get the picture).
2. Consciousness varies, and can be present or not-present at any given moment (sleep, death, hallucinogenic drugs, anaesthesia).
3. Many things don't have the properties of consciousness that I attribute to my subjective experience (rocks, maybe lifeforms that don't have nerve cells, lots of unknowns here).
Given this, it's obvious that consciousness can be created from non-consciousness, you need merely to have sex and wait 9 months. Add to that the fact that humans weren't a thing a million years ago, for instance, and you have to conclude that it's possible for an optimisation system to produce consciousness eventually (natural selection).
Your responses indicate (at least to me) that you are, philosophically, a Materialist or a Physicalist. That's fine; I accept that's a philosophy of existence that one can hold (even though I personally find it sterile and nihilistic). However many, like me, do not subscribe to such a philosophy. We can avoid any argument between ppl who hold different philosophies but still want to discuss machine learning productively by using that term, one that we can all agree on. But if materialists insist on using 'artificial intelligence' then they are pushing their unproven theories and I would say fantasies on the rest of us and they then expose a divergent agenda, where one does not exist when we all just talk about what we all agree we already have, which is machine learning.
If you find it sterile and nihilistic that's on you, friend =)
I think pragmatic thinking, not metaphysics, is what will ultimately lead to progress in AI. You haven't engaged with the actual content of my arguments at all - from that perspective, who's the one talking about fantasies really?
Edit: in case I give the mistaken impression that I'm angry - I'm not, thank you for your time. I find it very useful to talk to people with radically different world views to me.
Every scientist has to be a materialist, it comes from interacting with the world as it is. Most of them then keep their materialism in their personal life, but unfortunately some come home from work and stop thinking critically, embracing dualism or some other unprovable fantasies. If you want people engaging with the world and having correct ideas, you need to tolerate materialism because that's how that happens. It offending your religious worldview is unrelated.
Quite obviously not true (and also untrue in a documented sense in that many great scientists have not been materialists). Science is a method, not a philosophical view of the world.
> it comes from interacting with the world as it is
That the world is materialistic is unproven (and not possible to prove).
You are simply propounding a materialist philosophy here. As I've said before, it's fine to have that philosophy. It's not fine to dogmatically push that view as 'reality' and then extend on that to build and push fantasies about 'artificial intelligence'. Again, we avoid all of this philosophical debate if we simply stick to the term 'machine learning'.
This is a category error, and basically a non-sequitur to the discussion. Whether or not LLMs are conscious is not the topic of discussion (and not a position I've seen anyone seriously advocate).
I like using AI or generative AI. Each term we use signifies the era of the technology. When I was starting, expert systems were the old thing, AI was a taboo phrase that you avoided at all costs if you wanted funding, and ML was the term of the day. Then it was the deep learning era. And now we are in the generative AI era - and also an AI spring which makes the term AI appropriate (while the spring lasts).
AI is what we've called things far simpler for a far longer time. It's what random users know it as. It's what the marketers sell it as. It's what the academics have used for decades.
You can try and make all of them change the term they use, or just understand it's meaning.
If you do succeed, then brace for a new generation fighting you for calling it learning and coming up with yet another new term that everyone should use.
He suggests to limit the scope of the AI problem, add manual overrides in case there are unexpected situations, and he (rightly, in my opinion) predicts that the business case for exponentially scaling LLM models isn't there. With that context, I like his iPod example. Apple probably could have made a 3TB iPod to stick to Moore's law for another few years, but after they reached 160GB of music storage there was no usecase where adding more would deliver more benefits than the added costs.