The Autocomplete Myth

PaulHoule · on April 6, 2023

It's beam search. You don't generate one word at a time but you generate groups of words and consider how the conditional probability distribution works

https://huggingface.co/docs/transformers/generation_strategi...

That's the miracle of "talk like a pirate" in that a style of speech is just a conditional probability distribution.

Also the underlying model is trained in a bidirectional manner. You mask out 15% or of the word and the model tries to put them back. I remember trying to generate case studies like the one from pubmed one character at a time with RNNs and it was a terrible struggle for many reasons, the bidirectional nature of BERT-like models was a revolution, as was the use of subword features.

warrenm · on April 6, 2023

ChatGPT is 'merely' a Markov chain generator writ large

This is not to denigrate or oversimplify the advances ChatGPT has made

...but - fundamentally - it's "nothing new"