Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
evrimoztamur
12 months ago
|
parent
|
context
|
favorite
| on:
DeepScaleR: Surpassing O1-Preview with a 1.5B Mode...
Is it a quantisation or tokenisation problem?
simonw
12 months ago
[–]
Having replicated it at F32 I now suspect tokenization.
mluo
12 months ago
|
parent
[–]
Try bfloat16! We have a bug where the model was saved as fp32.
simonw
12 months ago
|
root
|
parent
[–]
I just tried it with this 3.6GB F16 model:
ollama run hf.co/bartowski/agentica-org_DeepScaleR-1.5B-Preview-GGUF:F16
And this time it didn't get confused with the tokenization of strawberry!
https://gist.github.com/simonw/9e79f96d69f10bc7ba540c87ea0e8...
mluo
12 months ago
|
root
|
parent
[–]
Nice, very glad to see it works! Small models are very sensitive to the dtype :(
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: