XLSTMTime: Long-Term Time Series Forecasting with xLSTM

carbocation · 2024-07-16T17:58:32 1721152712

> In recent years, transformer-based models have gained prominence in multivariate long-term time series forecasting

Prominence, yes. But are they generally better than non-deep learning models? My understanding was that this is not the case, but I don't follow this field closely.

dongobread · 2024-07-16T20:43:00 1721162580

From experience in payments/spending forecasting, I've found that deep learning generally underperform gradient-boosted tree models. Deep learning models tend to be good at learning seasonality but do not handle complex trends or shocks very well. Economic/financial data tends to have straightforward seasonality with complex trends, so deep learning tends to do quite poorly.

I do agree with this paper - all of the good deep learning time series architectures I've tried are simple extensions of MLPs or RNNs (e.g. DeepAR or N-BEATS). The transformer-based architectures I've used have been absolutely awful, especially the endless stream of transformer-based "foundational models" that are coming out these days.

sigmoid10 · 2024-07-16T20:52:30 1721163150

Transformers are just MLPs with extra steps. So in theory they should be just as powerful. The problem with transformers is simultaneously their big advantage: They scale extremely well with larger networks and more training data. Better so than any other architecture out there. So if you had enormous datasets and unlimited compute budget, you could probably do amazing things in this regard as well. But if you're just a mortal data scientist without extra funding, you will be better off with more traditional approaches.

dongobread · 2024-07-16T21:39:14 1721165954

I think what you say is true when comparing transformers to CNNs/RNNs, but not to MLPs.

Transformers, RNNs, and CNNs are all techniques to reduce parameter count compared to a pure-MLP model. If you took a transformer model and replaced each self-attention layer with a linear layer+activation function, you'd have a pure MLP model that can model every relationship the transformer does, but can model more possible relationships as well (but at the cost of tons more parameters). MLPs are more powerful/scalable but transformers are more efficient.

Compared to MLPs, transformers save on parameter count by skimping on the number of parameters devoted to modeling the relationship between tokens. This works in language modeling, where relationships between tokens isn't that important - you can jumble up the words in this sentence and it still mostly makes sense. This doesn't work in time series, where relationships between tokens (timesteps) is the most important thing of all. The LTSF paper linked in the OP paper also mentions this same problem: https://arxiv.org/pdf/2205.13504 (see section 1)

newrotik · 2024-07-17T03:42:23 1721187743

Though I agree with the idea that MLPs are theoretically more "capable" than transformers, I think seeing them just as a parameter reduction technique is also excessively reductive.

Many have tried to build deep and large MLPs for a long time, but at some point adding more parameters wouldn't increase models' performance.

In contrast, transformers became so popular because their modelling power just kept scaling with more and more data and more and more parameters. It seems like the 'restriction' imposed on transformaters (the attention structure) is a verg good functional form for modelling language (and, more and more, some tasks in vision and audio).

They did not become popular because they were modest with respect to the parameters used.

sigmoid10 · 2024-07-17T06:52:23 1721199143

>Compared to MLPs, transformers save on parameter count by skimping on the number of parameters

That is only correct if you look at models with equal parameter count from a purely theoretical perspective. In practice, it is possible to train transformers to orders of magnitude bigger scales than MLPs because they are so much more efficient. That's why I said a modern transformer will easily beat these puny modern MLPs, but only in cases where data and compute budgets allow it. That is not even a question. If you look at recent time series forecasting leaderboard entries, you'll almost always see transformers playing along at the top of it: https://github.com/thuml/Time-Series-Library

immibis · 2024-07-16T23:32:01 1721172721

Transformers reduce the number of relationships between tokens that must be learned, too. An MLP has to separately learn all possible relationships between token 1 and 2, and 2 and 3, and 3 and 4. A transformer can learn relationships between specific values regardless of position.

techwizrd · 2024-07-16T18:27:48 1721154468

In my aviation safety work, deep learning outperforms traditional non-DL models for multivariate time-series forecasting. Between deep learning models, I've had a wide variance in performance between transformers, Bi-LSTMs, regular MLPs, VAEs, and so on.

montereynack · 2024-07-16T19:43:05 1721158985

Seconding the other question, would be curious to know

theLiminator · 2024-07-16T18:35:07 1721154907

What's your go-to model that generally performs well with little tuning?

techwizrd · 2024-07-16T20:13:19 1721160799

If you have short time-series with low variance, noise and outliers, strong prior knowledge, or limited resources to train and maintain a model, I would stick with simpler traditional models.

If DL is a good fit for your use-case, then I tend to like transformers or combining CNNs with recurrent models (e.g., BiGRU, GRU, BiLSTM, LSTM) and optional attention.

nerdponx · 2024-07-16T22:40:52 1721169652

What are you doing in aviation safety that requires time series modeling? Weather?

all2 · 2024-07-17T04:54:37 1721192077

My best guess would be accident occurrence prediction.

ramon156 · 2024-07-16T22:18:13 1721168293

Now take into account that it has to be lightweight and DL falls shirt

Pandabob · 2024-07-16T18:22:27 1721154147

While I don't have firsthand experience with these models, I recently discussed this topic with a friend who has used tree-based models like XGBoost for time series analysis. They noted that transformer-based architectures tend to yield decent performance on time series tasks with relatively little effort compared to tree models.

From what I understood, tree-based models can usually outperform transformers when given sufficient parameter tuning. However, models like TimeGPT offer decent performance without extensive tuning, making them an attractive option for quicker implementations.

svnt · 2024-07-16T21:30:58 1721165458

The paper says this in the next paragraph. xLSTMTime is not transformer-based either.

rjurney · 2024-07-16T21:26:37 1721165197

They aren’t so hot, but recent efforts at transfer learning were promising.

dkga · 2024-07-17T01:21:11 1721179271

A part of my work is literally building nowcasting and other types of prediction models in economics (inflation, GDP etc) and finance (market liquidity, etc). I haven’t yet had a chance to read the paper but overall the tone of “transformers are great for what they do but LSTM-type of models are very valuable still” completely resonates with me.

uoaei · 2024-07-17T03:12:31 1721185951

Have you had the chance to apply Mamba to your work at all? Thoughts?

dlojudice · 2024-07-16T21:45:06 1721166306

Is this somehow related to the Google weather prediction model using AI [1]?

https://deepmind.google/discover/blog/graphcast-ai-model-for...

scellus · 2024-07-17T08:40:25 1721205625

No, Graphcast is a graph transformer trained on ERA5 weather reconstructions of the atmosphere, not a general time series prediction model. It by the way outperforms all traditional global point forecasts (non-ensembles), at least on predicting large-scale global patterns (Z500 and such, on the lag of 3–10 days or so). ECMWF has AIFS that is a derivate of Graphcast, they'll probably get it or something similar to production in a couple of years.

wafngar · 2024-07-17T12:25:50 1721219150

AIFS is transformer based (Graphcast is pure GNN) so different architecture and is already running operationally, see:

https://www.ecmwf.int/en/about/media-centre/aifs-blog/2024/i...

Dowwie · 2024-07-16T18:10:34 1721153434

marketed as a forecasting tool, so is this not applicable to event classification in time series?

RamblingCTO · 2024-07-16T18:50:16 1721155816

I'd say that's kind of a different task. I'm not a pro in this, but you could maybe treat it as a multi-variate forecast problem where the targets are probabilities per event if n is really small?

jimmySixDOF · 2024-07-16T20:33:49 1721162029

Yes, I would be interested where this (and any Transformer/LLM based approach) is improving anomaly detection for example.

spmurrayzzz · 2024-07-17T16:54:33 1721235273

I can't speak for all use cases, but I've done a great deal of work in the space of using deep learning approaches for anomaly detection in network device telemetry. In particular with high resolution univariate time series of latency measurements, we saw success using convolutional autoencoders and GANs. These methods lean on reconstruction loss rather than forecasting, but still effective.

There is some prior art for this that we leaned on [1][2].

RE: transformers — I did some early experimentation with Temporal Fusion Transformers [3] which worked pretty well for forecasting compared to other deep learning methods, but rarely did I see it outperform standard baselines (like ARIMA) in our datasets.

[1] https://www.mdpi.com/2076-3417/12/23/12472

[2] https://arxiv.org/abs/2009.07769

[3] https://arxiv.org/abs/1912.09363

_0ffh · 2024-07-17T10:50:23 1721213423

Too bad the dataset link in the paper isn't working. I hope that'll get amended.

greatpostman · 2024-07-16T18:29:52 1721154592

The best deep learning time series models are closed source inside hedge funds.

fermisea · 2024-07-16T22:07:08 1721167628

Most of the hard work is actually feature construction rather than monolithic models. And afaik gradient boosting still rules the world

energy123 · 2024-07-17T02:29:31 1721183371

There is no such thing as a generally best model due to the no free lunch theorem. What works in hedge funds will be bad in other areas that need less or different inductive biases due to having more or less data and different data.

3abiton · 2024-07-16T19:12:12 1721157132

I think hedge funds, at least the advanced once, definitely don't use time series modelling anymore. That's quit outdated nowadays.

rjurney · 2024-07-16T21:32:03 1721165523

There are many ways of approaching quantitative trading and many people do employ time series analysis, especially for high frequency trading.

max_ · 2024-07-16T19:22:48 1721157768

What do you suspect they are using?

nextos · 2024-07-16T22:37:05 1721169425

Some funds that tried to recruit me were really interested in classical generative models (ARMA, GARCH, HMMs with heavy-tailed emissions, etc.) extended with deep components to make them more flexible. Pyro and Kevin Murphy's ProbML vol II are a good starting point to learn more about these topics.

The key is to understand that in some of these problems, data is relatively scarce, and it is really important to quantify uncertainty.

lopatin · 2024-07-18T02:06:46 1721268406

I know next to nothing about this. How do people make use of forecasts that don't provide an uncertainty? It seems like that's the most important part. Why hasn't bayseyan statistics taken over completely?

nextos · 2024-07-18T16:44:23 1721321063

Bayesian inference is costly and adds a significant amount of complexity to your workflow. But yes, I agree, the way uncertainty is handled is often sloppy.

Maximum likelihood estimates are very frequently atypical points in the posterior distribution. It is unsettling to hear people are using this and not computing the entire posterior.

meowkit · 2024-07-16T20:06:54 1721160414

They pull data from all kinds of things now.

For example, satellite imagery of trucking activity correlated to specific companies or industries.

Its all signal processing at some level, but directly modeling the time series of price or other asset metrics doesn’t have the alpha it may have had decades ago.

greatpostman · 2024-07-16T21:34:05 1721165645

Alternative data is passed into time series models. They are features.

You don’t know as much about this as you think

myhf · 2024-07-16T21:53:16 1721166796

emoji hand pointing up

localfirst · 2024-07-17T00:03:09 1721174589

time series forecasting works best with deterministic domains. none of the published LLM/AI/Deep/Machine techniques do well in the stock market. Absolutely none. we've tried them all.

optimalsolver · 2024-07-16T21:34:33 1721165673

Reminder: If someone's time series forecasting method worked, they wouldn't be publishing it.

dongobread · 2024-07-16T22:20:16 1721168416

They definitely would and do, the vast majority of time series work is not about asset prices or beating the stock market

musleh2 · 2024-07-16T23:14:34 1721171674

The Transformer model, despite being one of the most successful in AI history, was still being published.

logicchains · 2024-07-17T06:45:39 1721198739

It's a sequence model, not a time-series model. All time series are sequences but not all sequences are time series.

nyanpasu64 · 2024-07-16T21:14:13 1721164453

I misread this as XSLT :')

mikepurvis · 2024-07-17T18:23:40 1721240620

100% clicked thinking I was getting into an article about XML and wondering how interesting that was in 2024. Simultaneously disappointed and pleased.

antod · 2024-07-17T20:30:34 1721248234

Yup. And it's about transforms too.

selimnairb · 2024-07-16T22:44:51 1721169891

Same. I am old?

ThomasBHickey · 2024-07-17T01:31:25 1721179885

Me too (and yes, I'm old)

thedudeabides5 · 2024-07-16T21:01:24 1721163684

cant wait for someone to lose all their money trying to predict stocks with this thing

brcmthrowaway · 2024-07-16T22:02:54 1721167374

Wow, is there a way to apply this to financial trading?

abrichr · 2024-07-18T19:00:03 1721329203

The paper links to their code:

https://github.com/muslehal/xLSTMTime

And their data, which includes daily exchange rates:

https://drive.google.com/drive/folders/1nuMUIADOc1BNN-uDO2N7...

musleh2 · 2024-07-16T23:35:21 1721172921

If you have dataset in financial , I can try it for you