I know this doesn't retrain, but I wonder if approaches like this plus quantizat... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

yousif_123123 6 months ago | parent | context | favorite | on: Show HN: Speeding up LLM inference 2x times (possi...

I know this doesn't retrain, but I wonder if approaches like this plus quantization could get back any "lost" quality with some post training.

It's great to see, and to know in one's mind how much likely performance and cost will be better in the future.

I know it's fun to work on, but I also want to say THANK YOU for developing open source.

spencerchubb 6 months ago [–]

At first glance, that sounds like it could work. From what I've read, there seems to be two main ways to regain some quality with quantization: post-training which happens after, and quantization-aware training which quantizes during training but leaves the activations and gradients full precision

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact