No they can't - or at least, this paper doesn't provide any compelling evidence ...

Kurtz79 · on Oct 5, 2016

Isn't overfitting historical data a basic mistake in any machine learning exercise, regardless of the domain ?

I thought the common practice was using part of the historical data for creating the model, and another sizable, non overlapping chunk to validate it.

KMag · on Oct 5, 2016

> I thought the common practice was using part of the historical data for creating the model, and another sizable, non overlapping chunk to validate it.

One problem is that too often, people break the data into a training set and a testing set. Then they train N algos on the training data, test them on the testing data, and then trade on the algo that tested best.

Once you use the testing set for more than one algo, it's really a meta-training set.

Really, you need a training set, a testing set, and a validation set. If you use the validation data set with more than one algo, it's no longer a validation set.

So, you train N algos, test N algos. Pick the best, and validate it. If validation fails, do you have enough discipline to wait for more data to come in and try again? Most people do not and will make hand-wavy arguments about why it's okay to re-shuffle the same data into 3 data sets and try again.

JoeAltmaier · on Oct 5, 2016

Its an infinite regression. You keep needing more data to be completely 'fair'. If the data set is finite, eventually you use all of it. Then where do you go?

Another route is to model the data source, and train on the model (which you can run forever to get endless data). Then test on the real-world data. But that's only as good as the model.

namecast · on Oct 5, 2016

> This is an unfortunate example of non-finance domain experts, who I'm sure are more than capable in their respective fields, making egregious errors when they try to apply their knowledge in finance.

Full ACK to this statement. I remember when this post was written in 2013 (by the way, can that date be put in the title?), alongside a similar paper arguing Twitter hashtags/likes/retweets could serve as a market signal - mostly for this excellent response:

http://sellthenews.tumblr.com/post/59720892780/no-limits-to-...

and

http://sellthenews.tumblr.com/post/57169975134/moon-patrol

I find upon re-reading them that I agree even more than I did 3 years back.