The performance loss is because this is RTN quantization I believe. If you use t... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

cypress66 on March 13, 2023 | parent | context | favorite | on: Using LLaMA with M1 Mac and Python 3.11

The performance loss is because this is RTN quantization I believe. If you use the "4chan version" that is 4bit GPTQ, the performance loss from quantization should be very small.

xdennis on March 13, 2023 [–]

What's the 4chan version?

aseipp on March 13, 2023 | | [–]

See https://github.com/ggerganov/llama.cpp/issues/62 (the related repo was originally posted on 4chan, is all, but the code is on GitHub)

cypress66 on March 13, 2023 | | [–]

https://rentry.org/llama-tard-v2

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact