Hacker News new | past | comments | ask | show | jobs | submit login

The problem with time is that it is (typically) not a causal variable. If you are modeling the price of a stock for example, time is certainly not what is causing to go up or down! Yes it is true, that the price at time t+1 is highly correlated with the price at time t, but extrapolating outwards must require a more sophisticated model that includes the real causal variables.



so then, discounting making time itself a causal variable, it seems like using methods that rely on stationary distributions still treat the data, after pre-processing, as i.i.d, rather than predicting values from their correlated history.

I'm interested in methods that don't "subtract" simple "trends" and "seasonality" from the data (which may work for bog-standard templates such as sales data but not what I'm interested in), and rather responds to sequential relationships in the data itself, that exploits exactly the correlations you describe directly.


> I'm interested in methods that don't "subtract" simple "trends" and "seasonality"

a 2nd order difference equation can model a single harmonic frequency - that is, if your data is a pure sine and sampled at regular intervals, then

    x_n =~ a x_n-1 + b x_n-2
can model any frequency with the proper a and b values (machine precision limits apply in real world scenarios, of course); That is, if your data looks like a sine wave with a yearly period, you still need no more than one sample per minute and 2nd order model to filter it out.

It's likely not a perfect sinewave, so you'd need a lot more - but if you are incredibly lucky and your periodic underlying admits a (relatively) sparse harmonic decomposition, and the signal riding on it has (very) low amplitude compared to the periodic signal, you can model very long periods implicitly by just having enough recent samples.


very interesting, thanks!


Time series isn't IID.

The name itself states this. Autocorrelation and autoregression. It regress on it's past values hence the "auto". We're interested on past value so each other value is dependent on the past.

An example of this is taking blood pressure. You would assume that taking blood pressure two consecutive days means that the previous day is highly correlated to the present day test.

Where as if you compare your blood test a year ago compare to today blood test it won't be as correlated. This is why ARIMA is dealing with correlation with time lags/lengths.


> it seems like using methods that rely on stationary distributions still treat the data, after pre-processing, as i.i.d, rather than predicting values from their correlated history.

Not sure what you mean exactly. Stationarity in time series is not i.i.d. The whole point of ARIMA modelling is to model, after transforming to stationarity, the remaining temporal correlation. ARMA is just a limited-parameter fit of the autocorrelation function.


>If you are modeling the price of a stock for example, time is certainly not what is causing to go up or down!

Actually, time is a valuable feature. Eg, if stock goes sideways too long day traders will get out of the trade even if it didn't go up to the levels they were looking for. Also, eg, if the market goes up a lot beyond a trader's expectations in a short amount of time, often time a trader will wait a little bit longer. Likewise, many of the popular indicators day traders use today to be profitable have time as a key ingredient, eg TD.


That's what I get for picking an example from a domain I do not understand well! So perhaps I'll relax my statement. Time is one of a large number of explanatory variables. The amount of information you can extract from it will be limited.


Yah, sorry for taking such a rough shot at you.

Your point is incredibly valuable, and if I wasn't in a hurry I probably would have brought this up:

Most time series analysis, especially when using ML is ideal when the time part is stripped out through cleaver feature engineering. This isn't always possible, which is why the stock market is a pain and ML often doesn't work on it.

I do time series predictive analytics for a living and most of what I do is clever feature engineering trying to strip the time domain out of it and also reducing the number of points as much as possible. The less features, the smaller your training set needs to be and the higher your accuracy will be. Note: This is time series classification, not time series forecasting.


To extend your point, time lag is a feature in ARIMA and statistical time series model.

What you're doing is looking at autocorrelation base on time lags. That's what ACF and PACF graphs is displaying.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: