"Random data" didn't mean a random data set from a different domain. It meant ra...

scawf · on Jan 21, 2020

> It meant random data from the same domain - simulated price/volume data within a reasonable range

How do you know what is a reasonable range without hypothesis on the price distribution ?

Where does these hypothesis comes from ? historical data ?

So.. is that really valid ?

theincredulousk · on Jan 21, 2020

yes. You know the lower bound on price is 0, and the upper bound of infinity is probably of no practical value, so you can pick something like 10 or 100x max all-time. Volume is the same, 0 to infinity, but again you can pick a distribution that is much (10-100x) wider than the real one. The wider the better, as it will better uncover tail risks and payoffs for highly unusual or atypical events (see Taleb, Black Swan, etc.)

I'm not making this up - this is how model testing is actually done, in multiple domains. Simulation is a reason banks, HFTs, hedge funds, etc. use massive compute infrastructure - doing it the right way, with many millions of plausible data sets, requires orders of magnitude more computing resources than back-testing on one data set that just happens to represent one way things could have played out (i.e. reality).

Thinking that one historical data-set is somehow special (in itself, without context) is largely a delusion. In fact you can generate near perfect historically accurate price charts just using a random walk algorithm seeded with an opening price.