Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I believe GPT-3 has a transformer-based architecture. So it doesn't recursively ingest it's own output in each iteration. I believe attention-based transformer models have enough complexity to be able to learn what you are talking about on their own.


GPT-3's transformers only recur some finite amount. Attention does a lot compared to a bog standard RNN, and probably if the numbers were tokenized it would be enough for most reasonable computations, but eventually you definitely would hit a cap. That's probably a good thing, of course. The network and training are Turing complete together, but it would suck if the network itself could fail to terminate.


Thank you for pointing out the difference. I went and reread about transformers; previously I thought they were a kind of RNN. (I am not an ML engineer.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: