Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>Does the n-gram model really need all those parameters to mimic GPT-4? Yes, it does.

I don't understand what this argument is supposed to demonstrate. Obviously you can compress the 8000-gram model that GPT-4 represents - GPT-4's weights are proof!



That's right, but if you did that compression, it wouldn't be an n-gram anymore. What I'm attempting to get across is that you could model GPT-4 as an equivalent 8000-gram in an abstract sense, but that's not a good mental picture for how it actually functions. Internally, GPT-4 is no more an 8000-gram than Stockfish is a giant lookup table of chess positions. GPT-4 is learning RASP programs, not statistical text correlations.


Does ChatGPT really represent an 8000 gram model? Seems the claim was that it just predicts the next word !




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: