Writing RNNs in Tensorflow

ganeshkrishnan · on July 15, 2017

Are RNNs ideal to process non-textual time series information?

We are looking to replace our Arima models with RNNs and the results so far has been far from satisfactory.

The usecase is: based on sale quantity in past year, predict the sale quantity tomorrow.

Regression does not consider weekdays or weekends or similar bumps and we thought RNN w/LSTM would be well suited for this problem

xgb84j · on July 15, 2017

I would try to include additional indicator features for weekdays or weekends directly into your Arima model.

Also I believe that RNNs are useful mainly for highly non-linear problems. Non-linearities in a problem such as sales forecasting are best handled through interaction terms or by including non-linear transformations (e.g. logarithm) of existing features into your model.

ganeshkrishnan · on July 15, 2017

The problem with ARIMA is that it's more of an exponential regression rather than deep learning. We already have added weekday,weekend,month variable to it but its more of regression than deep learning.

So you are saying that RNN won't be suited for non-linear sales forecasting?

xgb84j · on July 15, 2017

I don't have the details of your exact use case, but I cannot imagine any complex non-linearities involved in your sales process. If ARIMA models produce decent results for your use case I would try to improve the ARIMA model through additional data, rather than switching to deep learning.

If you are convinced that there are complex non-linearities that an ARIMA model cannot describe, then I would try to use RNNs to find a pattern in your ARIMA models' residuals and try to augment your ARIMA model with an RNN.

nonbel · on July 15, 2017

>"I cannot imagine any complex non-linearities involved in your sales process"

Knowing nothing about his sales process and features, I would assume there are many complex non-linearities (to be possibly leveraged for better predictions). I find this statement bizarre.

xgb84j · on July 16, 2017

What I mean by this statement is that there are lots of "tricks" such as interaction terms, regime switching and non-linear transformation of features to handle non-linearities in linear models (e.g. different food sold before Christmas).

But if you can give me an example of a non-linearity in sales forecasting that cannot be fit by a linear model but can by an RNN I'd honestly be really interested in that.

ganeshkrishnan · on July 16, 2017

> If ARIMA models produce decent results for your use case I would try to improve the ARIMA model through additional data, rather than switching to deep learning.

ARIMA models cannot take into consideration special scenarios or even multivariate features (Like location and time). It's good for simple forecasting but when you need more "human like" predictions we are betting on neural networks.

For example we would instinctively know that sales of turkey goes up on Thanksgiving. ARIMA model cannot factor this but RNN/LSTM should be able to (Theoretically speaking).

xgb84j · on July 16, 2017

There are techniques regime switching and indicator functions that can handle exactly this type of interaction in a linear model.

https://en.wikipedia.org/wiki/Dummy_variable_(statistics) https://en.wikipedia.org/wiki/Interaction_(statistics)

alexcnwy · on July 15, 2017

I'd suggest lecture 14 of the fast.ai mooc on some advice on feature engineering for timeseries data to make it possible to model with something like regression (or a non-recurrent neural net - can work surprisingly well)

bhuthesh_r · on July 15, 2017

Where do I find the lecture 14? I was able to find only 7 lectures here - http://course.fast.ai/ .

alexcnwy · on July 16, 2017

I'm not sure if it's officially out yet but if you search YouTube, you will find it ;)

https://m.youtube.com/watch?v=6lTyqrrWVQ0

superfx · on July 15, 2017

May wanna look at Phased LSTMs: https://arxiv.org/abs/1610.09513

ganeshkrishnan · on July 16, 2017

Keras has support for this. We will give this a shot too. Thanks!

greato · on July 15, 2017

I read the article and seems to be well-written though lacking.

For even more customized RNNs such as attention mechanism, beam search as in Seq2Seq, you'll need to skip the tf.dynamic_rnn abstraction and use a symbolic loop directly: tf.while_loop

fdrdrive · on July 15, 2017

I think that's covered in the article - there's a passage on using `tf.scan` when the `tf.dynamic_rnn` abstraction won't cut it. `tf.scan` is more flexible than `tf.dynamic_rnn`, but provides a little more scaffolding for RNNs than using `tf.while_loop` directly.

greato · on July 15, 2017

Using tf.scan is a bad idea.

scan implements strict semantics so it will always execute the same number of timesteps no matter what the accumulator is (nan).

while_loop implements dynamic execution (quit once cond is not met) and at the same time allows parallel execution when some ops are not dependent on accumulator.

If you read the code for `dynamic_rnn` and contrib.legacy Seq2seq model you'll find while_loop. I have yet to see tensorflow library code using tf.scan anywhere!

Also, internally, scan is defined using while_loop. In my code, I find scan lacking in RNN and always have to fall back to while_loop.

Here is video of a talk by the RNN/Seq2Seq author himself:

https://youtu.be/RIR_-Xlbp7s?t=16m3s

fdrdrive · on July 15, 2017

I don't follow. tf.scan will execute as many time steps as there are elements in the input series, which is the same behavior you'd get with tf.while_loop or tf.dynamic_rnn. It does not execute for a fixed number of time steps, which I think is what you're implying?

The difference from using tf.while_loop directly is that tf.scan handles the logistics of an accumulator to keep track of hidden states, so you don't have to implement that piece yourself.

As you say, tf.scan uses tf.while_loop internally; it's not particularly different from something you might build using tf.while_loop yourself.

greato · on July 15, 2017

In neural translation seq2seq, using while_loop in the decoder RNN saves a lot of GPU time because it can quit early when a sentence ends.

fdrdrive · on July 15, 2017

I see - you're talking about a use case like this: https://github.com/google/seq2seq/blob/4c3582741f846a19195ac...

I agree that you have to use a tf.while_loop in those cases. But then tf.scan isn't an option, so I don't understand what you mean by 'quit early' or 'saves time'.

When tf.scan is possible, i.e. when you have an input sequence you want to scan over, it is a perfectly good option.

greato · on July 15, 2017

Unless you want to execute the structure on multiple GPUs.

fdrdrive · on July 15, 2017

I don't understand how that's related.

Nimitz14 · on July 15, 2017

Do you know if using tf.while_loop speed things up? Using dynamic_rnn at the moment and it's _so_ slow. I'm not finding implementations using tf.while_loop, there's dynamic_rnn as you said but that's so convoluted to read (like TF code..).