Everyone repeat after me: "we need a baseline model".
You should always try some "dumb" models first. You'd be surprised how hard is to beat (of course depends on your KPIs) a historical average model with a more sophisticated method.
Not to mention the plethora of issues that arise from trying to fit an ARIMA onto an AR(1) process... It's weird that people just jump into using insanely complicated models right off the bat.
I've seen this in real time. I don't do statistics as part of my day job, but I've had enough experience and keep up with the field to know what I'm talking about. I've seen senior engineers try to ram in an overly specified ARIMA model just to claim that they've "improved" the system. It worked far worse than whatever model we were using before was (never got to take a look under the hood of that one unfortunately), was prone to wild swings in forecasting, and was eventually deprecated and we reverted to the old model.
I mean...you can always appeal to “old school” AI. Just dig in to the old papers and use their words. Latent semantic analysis (LSA) is an example of a hard to beat baseline model for text:
“By inducing global knowledge indirectly from co-occurrence data in a large body of representative text, LSA acquired knowledge about the full vocabulary of English at a comparable rate to schoolchildren.” (http://www.stat.cmu.edu/~cshalizi/350/2008/readings/Landauer...)
I once had a mentor with clout on a 9 figure investment committee tell me that maximum likelihood estimation is "the dumbest idea" he'd ever heard.
Words like "Cramer-Rao bound" didn't get through. What worked was saying "deep learning is usually just MLE with a mysteriously effective function approximation".
Hard to beat in terms of effort vs. quality of the outcome is more accurately what I meant (it’s two lines of code in scikit-learn [CountVectorizer() + TruncatedSVD()] to go from raw text to document/word embeddings, and the result is often “good enough” depending on what you’re trying to do). See the results on pg. 6 (note LSI==LSA): http://proceedings.mlr.press/v37/kusnerb15.pdf
Also, at least based on papers I’ve read recently, BERT doesn’t work that well for producing word embeddings compared to word2vec and GloVe (which can be formulated as matrix factorization methods, like LSA). See table on pg. 6: https://papers.nips.cc/paper/9031-spherical-text-embedding.p...
Point being: mastering the old models gives you a solid foundation to build from.
To my amateur eyes, normally the method for dealing with 'time series' is really just finding ways to turn a non-stationary distribution into a stationary distribution, where you can then apply classic statistical methods on them. So you're just finding ways to factor out the time component in the data so you can use the standard non-time sensitive regression models on the transformed data.
It seems like it's very challenging to either have time as a first-class component in the model or somehow treat the data points as not independent. Indeed most models require independence so often it seems like we try to force the data to look that way by smoothing and transformations. You can assume this anytime an algorithm is asking you to provide 'Stationarity'. It just seems like trying to look for the keys (prediction) where the streetlight is (model distributions with nice calculation properties).
The problem with time is that it is (typically) not a causal variable. If you are modeling the price of a stock for example, time is certainly not what is causing to go up or down! Yes it is true, that the price at time t+1 is highly correlated with the price at time t, but extrapolating outwards must require a more sophisticated model that includes the real causal variables.
so then, discounting making time itself a causal variable, it seems like using methods that rely on stationary distributions still treat the data, after pre-processing, as i.i.d, rather than predicting values from their correlated history.
I'm interested in methods that don't "subtract" simple "trends" and "seasonality" from the data (which may work for bog-standard templates such as sales data but not what I'm interested in), and rather responds to sequential relationships in the data itself, that exploits exactly the correlations you describe directly.
> I'm interested in methods that don't "subtract" simple "trends" and "seasonality"
a 2nd order difference equation can model a single harmonic frequency - that is, if your data is a pure sine and sampled at regular intervals, then
x_n =~ a x_n-1 + b x_n-2
can model any frequency with the proper a and b values (machine precision limits apply in real world scenarios, of course); That is, if your data looks like a sine wave with a yearly period, you still need no more than one sample per minute and 2nd order model to filter it out.
It's likely not a perfect sinewave, so you'd need a lot more - but if you are incredibly lucky and your periodic underlying admits a (relatively) sparse harmonic decomposition, and the signal riding on it has (very) low amplitude compared to the periodic signal, you can model very long periods implicitly by just having enough recent samples.
The name itself states this. Autocorrelation and autoregression. It regress on it's past values hence the "auto". We're interested on past value so each other value is dependent on the past.
An example of this is taking blood pressure. You would assume that taking blood pressure two consecutive days means that the previous day is highly correlated to the present day test.
Where as if you compare your blood test a year ago compare to today blood test it won't be as correlated. This is why ARIMA is dealing with correlation with time lags/lengths.
> it seems like using methods that rely on stationary distributions still treat the data, after pre-processing, as i.i.d, rather than predicting values from their correlated history.
Not sure what you mean exactly. Stationarity in time series is not i.i.d. The whole point of ARIMA modelling is to model, after transforming to stationarity, the remaining temporal correlation. ARMA is just a limited-parameter fit of the autocorrelation function.
>If you are modeling the price of a stock for example, time is certainly not what is causing to go up or down!
Actually, time is a valuable feature. Eg, if stock goes sideways too long day traders will get out of the trade even if it didn't go up to the levels they were looking for. Also, eg, if the market goes up a lot beyond a trader's expectations in a short amount of time, often time a trader will wait a little bit longer. Likewise, many of the popular indicators day traders use today to be profitable have time as a key ingredient, eg TD.
That's what I get for picking an example from a domain I do not understand well! So perhaps I'll relax my statement. Time is one of a large number of explanatory variables. The amount of information you can extract from it will be limited.
Your point is incredibly valuable, and if I wasn't in a hurry I probably would have brought this up:
Most time series analysis, especially when using ML is ideal when the time part is stripped out through cleaver feature engineering. This isn't always possible, which is why the stock market is a pain and ML often doesn't work on it.
I do time series predictive analytics for a living and most of what I do is clever feature engineering trying to strip the time domain out of it and also reducing the number of points as much as possible. The less features, the smaller your training set needs to be and the higher your accuracy will be. Note: This is time series classification, not time series forecasting.
Is there any other good resource on time series modeling and forecasting other than exponential smoothing and variants of ARIMA? Pretty much every tutotial on the web is on exponential smoothing and ARIMA or some lazy LSTM tutorials.
Some good free textbooks are Rob Hyndman's online book https://otexts.com/fpp2/ and Brockwell and Davis' old textbook https://link.springer.com/book/10.1007/978-3-319-29854-2. They focus much on ARIMA and exponential smoothers, because most time series data are pretty small sized (a few dozens to at most a few thousand samples), so there's really not that much else that can do.
Most of Hyndman's textbook approaches (mostly ARIMA and various exponential smoothers) are implemented in his 'forecast' R package.
ARIMA and exponential smoothers tend to be a bit hard to get working well on daily data (they come from the era where most data was monthly or quarterly). A modern take on classical frequency domain Fourier regression is Facebook Prophet (https://facebook.github.io/prophet/) which tends to work pretty well if you have a few years of daily data( https://facebook.github.io/prophet/ )
FPP is great, but limited to the simplest possible timeseries: a single number recorded at evenly-spaced intervals.
Anyone know of good resources for multivariate, multimodal, irregular timeseries forecasting? I know some great practical tools and tutorials (prophet, fast.ai), but I'd love to inject some statistical knowledge like FPP offers.
- Multi-variate: text book treatments tend to focus mainly on Vector Auto Regression (VAR) models. Unrestricted VARs scale very badly in vector dimension, so the often end up in some regularized form (dimension reduced by PCA or Bayesian priors). Lütkepohl's textbook is the standard reference.
VAR type models in my view not very practical for most business time series. You should probably not waste too much time on them unless you're really into macro-economic forecasting, in which case you're wasting your time anyway :). VAR forecast accuracy in macro-economics is not great to put it mildly, but we have nothing really better).
An alternative to VARs for multivariate time series are state space models, which are described mostly in Durbin&Koopman and Andrew Harvey's time series textbooks. These model types was recently popularized in tech circles by Google's CausalImpact R package (though that package I think only implements the univariate model).
- Multi-model: if you need to model some generic non-Gaussian time series process some slow generic simulation method (MCMC, particle filtering). I can't recommend any good reference since I haven't kept up with the literature for about 15 years. I only remember a bunch of dense journal papers from that era (e.g. https://en.wikipedia.org/wiki/Particle_filter#Bibliography)
- Irregular: if the irregularity is mild (filling up a relatively small number of gaps/missing data), you can do LOESS, smoothing splines, Kalman filtering, which should all get you pretty similar results. If your time series are extremely irregular, probably no generic method will do well and you probably need to invest some days/weeks/months into a fairly problem/data-specific method (probably some heavily tuned smoothing spline)
If you're only talking about forecasting and not medical/inferences then most of statistic models are that and GARCH variation.
There are multivariate models but I don't know much about those. Most of the good resources are in the econometric domain. Multivariate time series within econometric, from what I've seen, is portfolio balancing.
For a general overview for statistic domain I would recommend:
For GARCH:
Financial Modeling Under Non-Gaussian Distributions
If you want to learn more within statistic and time series in medical data: there is (1) longitudinal and (2) survival analysis. There are non linear time series but those are rare because most of our tools work within linear. There are also circular time series and temporal spatial statistic but I don't have any relevant knowledge in those to give you. I'm sure there are other that I don't know about within statistic.
There are 4 papers now and most of them are on statistical models which traditional dominating this domain. Datascience/ML models are slowing getting in there. M4 the best model was a highly tailor hybrid between ML/Stat technique the person who created it was employed by Uber and wrote an article about it.
The 5th competition m5 is currently underway and split into 2 contest. I'm eagerly waiting to read the paper on the results.
I can recommend this [0] book. It's focused on financial time series and trading, but the techniques covered in the book are generic enough to apply to all kinds of time series, you can just ignore the finance parts. If you search hard enough you can find the PDF for free online. The way they treat convolution operators and efficiently approximate them with fixed-size EMAs was quite interesting to me. It's definitely a bit dated, but that's some of its charm.
It hasn't really, at least not in production. Academics are now publishing a lot of papers using Deep Learning or RL, but you won't usually see those in live systems.
In live systems, latency is usually more important than a "better" model - A model that takes milliseconds to make slightly better predictions is too slow when you're working on nano- to microsecond scales, often on specialized hardware. Really, the "AI" part is less important in HFT than you may think. It's often more system/infrastructure.
This is for HFT specifically, perhaps it has had more impact on longer time horizons, or something like portfolio management. My impression is (but I may be wrong) that there aren't that many people doing something in between HFT and much longer (minutes to days) time horizons, something like milliseconds to seconds. Maybe there is an opportunity there for some of the newer AI techniques.
Try looking under the name "signal processing" instead. The toolbox under "time series analysis" is usually a variation on the contents of the old book by Box.
The readers interested in this article are probably able to give me good advice. I've been collecting stats daily on myself for the past year (weight, activity, calories consumed, sleep hours, etc) and I would love to be able to explore and extract interesting trends and relationships from the data.
Is there an easy tool where I can just drop in all the data and it presents me with some sort of dashboard? I would love it if the tool could identify and present interesting relationships (i.e. weight and calories consumed are strongly correlated)
Does anyone know if something like that exists? Or should I start rolling my own using python/pandas?
What would be some good graduate programs (I'm thinking Master's level) in the US that specialize in time series modeling and forecasting? Any available online?
Penn State has a bunch of their graduate stats courses online [1]. I worked through some of their time series class [2] and found it to be pretty good quality.
You should always try some "dumb" models first. You'd be surprised how hard is to beat (of course depends on your KPIs) a historical average model with a more sophisticated method.