Hacker News new | past | comments | ask | show | jobs | submit login

hm... well...

It definitely runs. It uses almost 20GB of RAM so I had to exit my browser and VS Code to keep the memory usage down.

But it produces completely garbled output. Either there's a bug in the program, or the tokens are different to 13B model, or I performed the conversion wrong, or the 4bit quantization breaks it.




I’m also getting garbage out of 30B and 65B.

30B just says “dotnetdotnetdotnet…”


I've finally managed to download the model and it seems to be working well for me. There's been some updates to the quantization code, so maybe if you do a 'git pull && make' and rerun the quantization script it will work for you. I'm getting about 350ms per token with the 30B model.


Thanks for reminding me! It works now. The difference is striking!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: