Oh thanks for sharing this! The fork of llama.cpp for how to do the dynamic quant is here: https://github.com/unslothai/llama.cpp. I also found min_p = 0.05 can help reduce chances of some bad tokens coming up for 1.58bit (I found it to happen around 1/8000 tokens of the time)