Hacker News new | past | comments | ask | show | jobs | submit login

The author adressss this and argues that it misses the point:

> Now, some might interject here and say we could, of course, train the LLM to ask for a calculator. However, that would not make them intelligent. Humans require no training at all for calculators, as they are such intuitive instruments. We use them simply because we have understanding for the capability they provide.

So the real question behind the headline is why LLMs don't learn to ask for a calculator by themselves, if both the the definition of a calculator and the fact that LLMs are bad at math are part of the training data.




I’d contest the statement that humans don’t need to be trained to use a calculator. It certainly isn’t instinctive behavior.


I have dyscalculia and I still have no clue about calculators except I was taught how to make it give me the answer to math problems, I'm a bit embarrassed to say: even still I sometimes take a few seconds to boot into being able to use one. We often discuss LLMs like there is no divergence in humans, I don't know how many people math is intuitive for, but I know plenty of people like me.


I first used a calculator as a kid. Took about 30 seconds. Never had instruction or training. We aren't talking about scientific calculators.


Yeah, the buttons do what the symbol says.

Then about 30 seconds after that somebody showed me I could spell “boobs” if I flipped it upside down.


I do think it's interesting to think about why the LLM needs to be told to ask for a calculator and when to do that. And not just in individual prompts where humans manually prompt ask it to "write some code to find the answer" but in general.

We often use the colloquial definition of training to mean something to the effect of taking an input, attempting an output, and being told whether that output was right or wrong. LLMs extend that to taking a character or syllable token as input, doing some computation, predicting the next token(s), and seeing if that was right or wrong. I'd expect the training data to have enough content to memorize single-digit multiplication, but I'd expect it to also learn that this model doesn't work for multiplying an 11 digit number by a 14 digit number.

The "use a calculator" concept and "look it up in a table" concepts were taught to the LLM too late and it didn't internalize that as a way to perform better.


> Humans require no training at all for calculators, as they are such intuitive instruments

I don't think that's even true though. If you think this, I would suggest you've just internalized your training on the subject.


> So the real question behind the headline is why LLMs don't learn to ask for a calculator by themselves

They can. They're sometimes a bit cocky about their maths abilities, but this really isn't hard to test or show.

https://gist.github.com/IanCal/2a92debee11a5d72d62119d72b965...

They can also create tools that can be useful.


This still doesn't get at the point, with this example you've effectively constructed a prompt along the lines of: "Note: A calculator is available upon request wink, here's how you'd use it: ... Now, what's the eighth root of 4819387574?"

Of course the model will use the calculator you've explicitly informed it of. The article is meant to be a critique of claims that LLMs are "intelligent," when, despite knowing their math limitations, don't generally answer "You'd be better off punching this into a calculator" when asked a problem


How have I told it there's a calculator? All I've given it is the ability to search for tools and enable ones it wants.

> Of course the model will use the calculator you've explicitly informed it of

I didn't. I also gave it no system prompt pushing it to always use tools or anything.

It searches for tools with a query "calculator math root" and is given a list of things that includes a calculator. It picks the calculator, then it uses it.

The code and trace are right there.


What do you imagine `tools` is doing if not a system prompt? I suggest reading the documentation: https://docs.anthropic.com/en/docs/build-with-claude/tool-us...

The vast majority of features that Anthropic and OpenAI ship are just clever ways of building system prompts


It's told it can search for tools and it can enable them. It is not told initially that there's a calculator, it asks for one.


I see, that clarifies things for me, it's not quite like the example I gave then.

Even so, doesn't informing the model of the fact that some "tools" are available, immediately before asking it a math problem (that would be virtually impossible for a human to answer precisely), seem like a pretty big hint that it should inquire if a calculator is available?

Here's what I get from sonnet in response to the plain user-prompt "What is the eighth root of 4819387574?"

""" Let me solve this step by step.

To find the 8th root of 4819387574:

1) The 8th root of 4819387574 means finding x where x⁸ = 4819387574

2) This is a large number, but it's a perfect 8th power.

3) One way to approach this is to find factors: 4819387574 = 13⁸

4) Therefore, the 8th root of 4819387574 is 13.

To verify: 13⁸ = 13 × 13 × 13 × 13 × 13 × 13 × 13 × 13 = 4819387574

The answer is 13. """




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: