Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is from 2017, so probably obsolete by now.


It is. For example, it uses LSTM, which is obsolete now.


People keep saying this with extreme confidence; I’m not sure I buy it.

Certainly recurrent networks in general are not obsolete, even if attention/convolution works better for some applications.

Perhaps one ought to try GRU before LSTM but there’s no reason to suppose that it would dominate in all cases.


Indeed. Here is a very fresh paper finding that attention is certainly not all you need as sometimes recurrence is necessary.

https://arxiv.org/abs/1906.01603

This is also obvious: Without recurrence you cannot remember information that is not externally visible, but it may be computationally very convenient and often necessary to maintain information that is hidden.

The hard part is learning reps for hidden information as recurrences are plagued by vanishing and shattering gradients.


What is the modern choice now, instead of LSTM?


Transformer.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: