Is it a quantisation or tokenisation problem?

simonw · 2025-02-11T21:22:25 1739308945

Having replicated it at F32 I now suspect tokenization.

mluo · 2025-02-11T21:30:08 1739309408

Try bfloat16! We have a bug where the model was saved as fp32.

simonw · 2025-02-11T21:47:11 1739310431

I just tried it with this 3.6GB F16 model:

  ollama run hf.co/bartowski/agentica-org_DeepScaleR-1.5B-Preview-GGUF:F16

And this time it didn't get confused with the tokenization of strawberry! https://gist.github.com/simonw/9e79f96d69f10bc7ba540c87ea0e8...

mluo · 2025-02-11T22:09:27 1739311767

Nice, very glad to see it works! Small models are very sensitive to the dtype :(