I am puzzled by the fact that the modern LLMs don't do multiplication in the sam...

threeducks · 2025-02-20T21:09:36 1740085776

Two reasons:

1. LLMs "think" in terms of tokens, which usually are around 4 characters each. While humans only have to memorize 10x10 multiplication tables to perform multiplication of large numbers, LLMs have to memorize a 10000x10000 table, which is much more difficult.

2. LLMs can't "think in their head", so you have to make them spell out each step of the multiplication, just like (most) humans can't multiply huge numbers without intermediate steps.

A simple way to demonstrate this is to ask an LLM for the birth year of a celebrity and then whether that number is even or odd. The answer will be correct almost every time. But if you ask whether the birth year of a celebrity is even or odd and forbid spelling out the year, the accuracy will be barely above 50 %.

threatripper · 2025-02-20T22:49:42 1740091782

Can't we tokenize numbers always as single digits and give the LLM an <thinking> scratchpad invisible to the user?

threeducks · 2025-02-21T06:29:10 1740119350

Yes, but we could also give the LLM access to a Python interpreter and solve a much larger class of problems with correctness guarantees and around a billion times less compute.

bobro · 2025-02-20T21:14:03 1740086043

Are there a lot of examples written out of people talking through running that algorithm? I’d guess not.