Modern LLMs, just like everyone reading this, will instead reach for a calculator to perform such tasks. I can't do that in my head either, but a python script can so that's what any tool-using LLM will (and should) do.
Long multiplication is a trivial form of reasoning that is taught at elementary level. Furthermore, the LLM isn't doing things "in its head" - the headline feature of GPT LLMs is attention across all previous tokens, all of its "thoughts" are on paper. That was Opus with extended reasoning, it had all the opportunity to get it right, but didn't. There are people who can quickly multiply such numbers in their head (I am not one of them).
I tried this with Claude - it has to be explicitly instructed to not make an external tool call, and it can get the right answer if asked to show its work long-form.
Mathematics is not the only kind of reasoning, so your conclusion is false. The human brain also has compartments for different types of activities. Why shouldn't an AI be able to use tools to augment its intelligence?
There are many examples of current limitations, but do you see a reason to think they are fundamental limitations? (I'm not saying they aren't, I'm curious what the evidence is for that.)
It's because of how transformers work, especially the fact that the output layer is a bunch of weights which we quite literally do a weighted random choice from. My hunch is that diffusion models would have a higher chance of doing real reasoning - or something like a latent space for reasoning.
Thinking that LLMs are intelligent arises from an incomplete understanding of how they work or, alternatively, having shareholders to keep happy.
Furthermore, the LLM isn't doing things "in its head" - the headline feature of GPT LLMs is attention across all previous tokens, all of its "thoughts" are on paper
LOL, talk about special pleading. Whatever it takes to reshape the argument into one you can win, I guess...
LLMs don't reason.
Let's see you do that multiplication in your head. Then, when you fail, we'll conclude you don't reason. Sound fair?
I can do it with a scratch pad. And I can also tell you when the calculation exceeds what I can do in my head and when I need a scratch pad. I can also check a long multiplication answer in my head (casting 9s, last digit etc.) and tell if there’s a mistake.
The LLMs also have access to a scratch pad. And importantly don’t know when they need to use it (as in, they will sometimes get long multiplication right if you ask them to show their work but if you don’t ask them to they will almost certainly get it wrong).
The context is the scratch pad. LLMs have perfect recall (ignoring "lost in the middle") across the entire context, unlike humans. LLMs "think on paper."
Plenty of humans can't do arithmetic. Can they also not reason.
Reasoning isn't a binary switch. It's a multidimensional continuum. AI can clearly reason to some extent even if it also clearly doesn't reason in the same way that a human would.
> Plenty of humans can't do arithmetic. Can they also not reason.
I just pointed out that this isn't valid reasoning ... it's a fallacy of denial of the antecedent. No one is arguing that because LLMs can't do arithmetic, therefore they can't reason. After all, zamalek said that he can't quickly multiply large numbers in his head, but he isn't saying that therefore he can't reason.
> Reasoning isn't a binary switch. It's a multidimensional continuum.
Indeed, and a lot of humans are very bad at it, as is clear from the comments I'm responding to.
> AI can clearly reason to some extent
The claim was about LLMs, not AI. This is like if someone said that chihuahuas are little and someone responded by saying that dogs are tall to some extent.
LLMs do not reason ... they do syntactic pattern matching. The appearance of reasoning is because of all the reasoning by humans that is implicit in the training data.
I've had this argument too many times ... it never goes anywhere. So I won't respond again ... over and out.
Indeed, and a lot of humans are very bad at it, as is clear from the comments I'm responding to.
This is your idea of "conversing curiously" and "editing out swipes," I suppose.
I've had this argument too many times ... it never goes anywhere. So I won't respond again ... over and out.
A real reasoning entity might pause for self-examination here. Maybe run its chain of thought for a few more iterations, or spend some tokens calling research tools. Just to probe the apparent mismatch between its own priors and those of "a lot of humans," most of whom are not, in fact, morons.
you’re just abstracting it away into this new “systems” definition
when someone says LLMs today they obviously mean software that does more than just text, if you want to be extra pedantic you can even say LLMs by themselves can’t even geenrate text since they are just model files if you don’t add them to a “system” that makes use of that model files, doh
> when someone says LLMs today they obviously mean ...
LLMs, if the someone is me or others who understand why it's important to be precise. And in this context, the distinction between LLM and AI mattered--not pedantic at all.