The claim isn't "LLMs don't use tools", the author is saying that LLMs can't make reliable inferences regarding their own knowledge or capabilities which fundamentally limits their usefulness for many tasks. LLMs "know" that LLMs can't do math reliably, LLMs "know" that calculators can do math reliably, and yet LLMs generally just soldier on and try to do math themselves when asked. You can of course RL it or prompt it into writing javascript when it sees math but so far LLMs haven't been capable of generalizing the process of "I am bad at X" + "Thing is good at X" -> "I should ask for Thing to do X" unless that specific chain of thought is common in the training data.
The solution so far has just been to throw more RL or carefully crafted synthetic data at it but its arguably more pavlovian than it is generalized learning.
Someone could teach a dog to ring a bell that says "food" on it, and you could reasonably argue that it is using a tool. Will it then know to ring a bell that says "walk" when it wants to go outside?
The availability of tools and what they're named is going to influence it's behavior. Gemini 2.0 Pro can obviously get this question right on it's own but the existence of a find_tool() option causes it to use it. Sorry it's scuffed, I just did it on my phone to make the point but I'd imagine you could get similar results with the tools param as all it's doing is putting the tool options into the context.
You are an advanced AI assistant that has a number of tools available to you. in order to use a tool, respond with "USE TOOL: <tool_name>(tool_parameter)".
The solution so far has just been to throw more RL or carefully crafted synthetic data at it but its arguably more pavlovian than it is generalized learning.
Someone could teach a dog to ring a bell that says "food" on it, and you could reasonably argue that it is using a tool. Will it then know to ring a bell that says "walk" when it wants to go outside?