Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Indeed. Here is a very fresh paper finding that attention is certainly not all you need as sometimes recurrence is necessary.

https://arxiv.org/abs/1906.01603

This is also obvious: Without recurrence you cannot remember information that is not externally visible, but it may be computationally very convenient and often necessary to maintain information that is hidden.

The hard part is learning reps for hidden information as recurrences are plagued by vanishing and shattering gradients.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: