hm... well... It definitely *runs*. It uses almost 20GB of RAM so I had to exit ...

shocks · on March 12, 2023

I’m also getting garbage out of 30B and 65B.

30B just says “dotnetdotnetdotnet…”

gorbypark · on March 15, 2023

I've finally managed to download the model and it seems to be working well for me. There's been some updates to the quantization code, so maybe if you do a 'git pull && make' and rerun the quantization script it will work for you. I'm getting about 350ms per token with the 30B model.

tomp · on March 15, 2023

Thanks for reminding me! It works now. The difference is striking!