I want to emphasize how fascinating I find that the transform from 16 bit to a 4...

stevenhuang · on Feb 20, 2023

A recent discussion I found on int4, definitely looks like this is the new hotness. Very exciting!

https://news.ycombinator.com/item?id=34404859

t-vi · on Feb 23, 2023

In my understanding, at a very high level and omitting many crucial details, the key is that when you have mainly largish matrix multiplications (as in transformers) well-behaved (mean zero uncorrelated random or so) quantization errors cancel out. People do/did experiment with 1 or 2 bit compression of gradients/updates in the context of distributed training, but there it has been generally deemed useful to keep track of compression errors locally.

inciampati · on Feb 20, 2023

Very insightful! Now I'm curious what the bottleneck is.