Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
cypress66
on March 13, 2023
|
parent
|
context
|
favorite
| on:
Using LLaMA with M1 Mac and Python 3.11
The performance loss is because this is RTN quantization I believe. If you use the "4chan version" that is 4bit GPTQ, the performance loss from quantization should be very small.
xdennis
on March 13, 2023
[–]
What's the 4chan version?
aseipp
on March 13, 2023
|
parent
|
next
[–]
See
https://github.com/ggerganov/llama.cpp/issues/62
(the related repo was originally posted on 4chan, is all, but the code is on GitHub)
cypress66
on March 13, 2023
|
parent
|
prev
[–]
https://rentry.org/llama-tard-v2
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: