This is from 2017, so probably obsolete by now.

sanxiyn · on June 8, 2019

It is. For example, it uses LSTM, which is obsolete now.

currymj · on June 8, 2019

People keep saying this with extreme confidence; I’m not sure I buy it.

Certainly recurrent networks in general are not obsolete, even if attention/convolution works better for some applications.

Perhaps one ought to try GRU before LSTM but there’s no reason to suppose that it would dominate in all cases.

terminalhealth · on June 8, 2019

Indeed. Here is a very fresh paper finding that attention is certainly not all you need as sometimes recurrence is necessary.

https://arxiv.org/abs/1906.01603

This is also obvious: Without recurrence you cannot remember information that is not externally visible, but it may be computationally very convenient and often necessary to maintain information that is hidden.

The hard part is learning reps for hidden information as recurrences are plagued by vanishing and shattering gradients.

xyproto · on June 8, 2019

What is the modern choice now, instead of LSTM?

avinium · on June 8, 2019

Transformer.