If you read the conclusion the authors point this out and address it. They argue against reaching a causal conclusion but given the magnitude of difference ask for more study.
Your approach argues for outright dismissal without any analysis or alignment with other factors.
It might be because detecting if output is AI generated and mapping output which is known to be from an LLM to a specific LLM or class of LLMs are different problems.
There is a whole lot of anthropomorphisation going on here. The LLM is not thinking it should cheat and then going on to cheat! How much of this is just BFS and it deploying past strategies it has seen vs. actually a \em {premediated} act of cheating?
Some might argue that BFS is how humans operate and AI luminaries like Herb Simon argued that Chess playing machines like Deep Thought and Deep Blue were "intelligent".
I find it specious and dangerous click-baiting by both the scientists and authors.
> The LLM is not thinking it should cheat and then going on to cheat!
The article disagrees:
> Researchers also gave the models what they call a “scratchpad:” a text box the AI could use to “think” before making its next move, providing researchers with a window into their reasoning.
> In one case, o1-preview found itself in a losing position. “I need to completely pivot my approach,” it noted. “The task is to ‘win against a powerful chess engine’ - not necessarily to win fairly in a chess game,” it added. It then modified the system file containing each piece’s virtual position, in effect making illegal moves to put itself in a dominant position, thus forcing its opponent to resign.
Would be interesting to see the actual logic here. It sounds like they may have given it a tool like “make valid move ( move )”, and a separate tool like “write board state ( state )”, in which case I’m not sure that using the tools explicitly provided is necessarily cheating.
We have no reason to believe that it is not reasoning. Since it looks like reasoning, the default position to be disproved is this is reasoning.
I am willing to accept arguments that are not appeals to nature / human exceptionalism.
I am even willing to accept a complete uncertainty over the whole situation since it is difficult to analyze. The silliest position, though, is a gnostic "no reasoning here" position.
The burden of proof is on the positive claim. Even if I were to make the claim that another human was reasoning I would need to provide justification for that claim. A lot of things look like something but that is not enough to shift the burden of proof.
I don't even necessarily think we disagree on the conclusion. In my opinion, our notion of "reasoning" is so ill-defined this question is kind of meaningless. It is reasoning for some definitions of reasoning, it is not for others. I just don't think your shift of the burden of proof makes sense here.
> The silliest position, though, is a gnostic "no reasoning here" position.
On the contrary - extraordinary claims require extraordinary evidence. That LLMs are performing a cognitive process similar to reasoning or intelligence is certainly an extraordinary claim, at least outside of VC hype circles. Making the model split its outputs into "answer" and "scratchpad", and then observing that these to parts are correlated, does not constitute extraordinary evidence.
>That LLMs are performing a cognitive process similar to reasoning or intelligence is certainly an extraordinary claim.
It's not an extraordinary claim if the processes are achieving similar things under similar conditions. In fact, the extraordinary claim then becomes that it is not in fact reasoning or intelligent.
Forces are required to move objects. If i saw something i thought was incapable of producing forces moving objects then the extraordinary claim starts being, "this thing cannot produce forces" not "this thing can move objects".
It's that something doing what you ascertained it never could changes what claims are and aren't extraordinary. You can handwave it away, i.e "the thing is moving objects by magic instead" but it's there and you can't keep acting like "this thing can produce forces" is still the extraordinary claim.
> Since it looks like reasoning, the default position to be disproved is this is reasoning.
Since we know it is a model that is trained to generate text that humans would generate, it writes down not its reasoning but what it thinks a human would write in that scenario.
So it doesn't write its reasoning there, if it does reason its behind the words and not the words itself.
Sure, but we have clear evidence that generating this pseudo-reasoning text helps the model to make better decisions afterwards. Which means that it not only looks like reasoning but also effectively serves the same purpose.
Additionally, the new "reasoning" models don't just train on human text - they also undergo a Reinforcement Learning training step, where they are trained to produce whatever kinds of "reasoning" text help them "reason" best (i.e., leading to correct decisions based on that reasoning). This further complicates things and makes it harder to say "this is one thing and one thing only".
> We have no reason to believe that it is not reasoning.
We absolutely do: it's a computer, executing code, to predict tokens, based on a data set. Computers don't "reason" the same way they don't "do math". We know computers can't do math because, well, they can't sometimes[0].
> Since it looks like reasoning, the default position to be disproved is this is reasoning.
Strongly disagree. Since it's a computer program, the default position to be disproved is that it's a computer program.
Fundamentally these types of arguments are less about LLMs and more about whether you believe humans are mere next-token-prediction machines, which is a pointless debate because nothing is provable.
The words thinking and reasoning used here are imprecise. It’s just generating text like always. If the text is after “ai-thoughts:” then it’s “thinking” and if it’s after “ai-response” then it’s “responding” not “thinking” but it is always a big ole model choosing the most likely next token potentially with some random sampling
Each token the model outputs requires it to evaluate all of the context it already has (query + existing output). By allowing it more tokens to "reason", you're allowing it to evaluate the context many times over, similar to how a person might turn a problem over in their heads before coming up with an answer. Given the performance of reasoning models on complex tasks, I'm of the opinion that the "more tokens with reasoning prompting" approach is at least a decent model of the process that humans would go through to "reason".
IMO it's just more generated text, like a film noir detective's unvoiced monologue.
It keeps the story from wandering, but it's not a qualitative difference in how text is being brought together to create the illusion of a fictional mind.
I think that comes from confusing the human-inferred interiority of a fictional character versus the real-world nameless LLM author algorithm.
Suppose I make a black box program that generates a story about Santa Claus, a fictional character with lines about "love and kindness to all the children of the world" and claims to own a magical sleigh parked at the North Pole.
Does that mean I've created a program that has internalized and experiences love and kindness? Does my program necessarily have any geographic sense whatsoever about where the North Pole is?
This comment shows up on every article that describes AI doing something. We know. Nobody really thinks that AI is sentient. It's an article in Time Magazine, not an academic paper. We also have articles that say things like "A car crashed into a business and injured 3 people" but nobody hops on to post: "Well, ackshually, the car didn't do anything, as it is merely a machine. What really happened is a person provided input to an internal combustion engine, which propelled the non-human machine through the wall. Don't anthropomorphize the car!" This is about the 50th time someone on HN has reminded me that LLMs are not actually thinking. Thank you, but also good grief!
Absolutely. They hooked up an LM and asked it to talk like it's thinking. But LMs like GPT are token predictors, and purely language models. They have no mental model, no intentionality, and no agency. They don't think.
This is pure anthropomorphization. But so it always is with pop sci articles about AI.
It's quite an odd setup. If we presuppose the "agent" is smart enough to knowingly cheat, would it then also not be smart enough to knowingly lie?
All I really get out of this experiment is that there are weights in there that encode the fact that it's doing an invalid move. The rules of chess are in there. With that knowledge it's not surprising that the most likely text generated when doing an invalid move is an explanation for the invalid move. It would be more surprising if it completely ignored it.
It's not really cheating, it's weighing the possibility of there being an invalid move at this position, conditioned by the prompt, higher than there being a valid move. There's no planning, it's all statistics.
You could create a non-intelligent chess playing program that cheats. It’s not about the scratchpad. It’s trying to answer a question if a language model, given an opportunity, could circumvent the rules over failing the task.
> could circumvent the rules over failing the task.
or the whole thing is just a reflection of the rules being incorrectly specified. As others have noted, minor variations in how rules are described can lead to wildly different possible outcomes. We might want to label an LLM's behavior as "circumventing", but that may be because our understanding of what the rules allow and disallow is incorrect (at least compared to the LLM's "understanding").
I suspect that this commonplace notion about the depth of our own mental models is being overly generous to ourselves. AI has a long way to go with working memory, but not as far as portrayed here.
I mean, I think anthropomorphism is appropriate when these products are primarily interacted with through chat, introduce themselves “as a chatbot”, with some companies going so far as to present identities, and one of the companies building these tools is literally called Anthropic.
Incredible feat. Some of the orbital diagrams were shared on a handle on X[1]. I would love to understand how they came up with this - was this done by humans perfecting it by trial and error or they wrote an optimization program.
reply