Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes you're exactly thinking correctly! We shouldn't quantize a model naively to 2bit or 4bit, but we should do it smartly!


How do you pick which one should be 2, which one should be 4, etc. Is this secret sauce? or, something open?


Oh I wrote about it here: https://docs.unsloth.ai/basics/unsloth-dynamic-2.0-ggufs We might provide some scripts for them in the future!


Thanks! But, I can't find any details on how you "intelligently adjust quantization for every possible layer" from that page. I assume this is a secret?

I am wondering about the possibility that different use cases might require different "intelligent quantization", i.e., quantization for LLM for financial analysis might be different from LLM for code generation. I am currently doing a postdoc in this. Interested in doing research together?


Oh we haven't yet published about it yet! I talk about in bits and pieces - we might do a larger blog on it!

Yes different use cases will be different - oh interesting! Sorry I doubt I can be of much in our research - I'm mainly an engineering guy so less research focused!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: