Show HN: EDW, quantitative analytics, machine learning.

aothman · on Jan 24, 2011

Looks impressive, particularly if those are real trading results.

One concern I immediately have is overfitting, particularly for claims about how various difficult values have been optimized to be the "best possible". It looks like the parameter space in use is truly enormous and so it would be very easy to come up with hypotheses that perform fantastically on your dataset but terribly in real life. This seems like it would be a first-order concern, while the ability to run tests in a single day seems second-order if those tests are producing garbage outputs.

joshu · on Jan 24, 2011

I accidentally down voted but meant to up vote. I agree.

Also, this appears to have no risk-oriented portfolio construction. You are calculating alphas somewhere, right?

(not to belittle this or anything. I'm just not a big believer in technical analysis, which is what this feels like. You should apply this kind of focus to real stat arb.)

dododo · on Jan 24, 2011

if you use some kind of regularization (e.g., if you think the parameters are sparse) it's possible to fit to such a large space without overfitting. this is common in machine learning and statistics (e.g., "N less than p" problems, where N is the number of data and p is the dimensionality of parameterization, common in genomics).

also provided the test data (i.e., the data it's bidding on) is not used in training, this should be a fair(ish) test; overfitting on the training data should lead to poor test performance.

however it's not clear to me that the data aren't being used twice, and what machine learning is actually going on... so in the best of worlds it could be ok, but given the scant details it could be all bunkum...

cschmidt · on Jan 24, 2011

Looking at the web interface, it is claiming a 92% success rate on trades. That clearly indicates to me that those results are "in-sample". In other words, the model was trained on some data, and then backtested on that same data. In-sample results are essentially worthless. I used to work for a quant hedge fund, and at least on daily trading, you could have great results with 60% correct. There's no way you can have 92%.

Finding a useful financial signal is not primarily a search problem through a giant space of potential indicators. It is all about controlling for overfitting, and ensuring that the signal continues into the future. Also, I saw no mention of transaction costs for the trading strategy, which can often turn a great strategy into a money losing one.

ajays · on Jan 24, 2011

Yep, it's the same thing as "testing on the training set". With memorization, one can achieve almost 100% on the training set. I'm not saying that EDW is doing this, but I'd be very surprised if they could get 92% success rate on real, unseen, test data. Heck, most Wall St companies would _kill_ for anything in approaching 60%.

copper · on Jan 24, 2011

Since this article doesn't mention it clearly, the demo is really nice, and worth a look: http://edwardworthington.com/

benmccann · on Jan 24, 2011

Why don't the values for year-to-date and month-to-date match? Is it a fake mock or a bug?

Why is Liquid Equity significantly greater than Account Value? Is it another fake mock thing or are you currently employing a leverage slightly greater than 3x?

michael_dorfman · on Jan 24, 2011

Based on the headline, I was really expecting a post explaining that edw519 had been silently replaced by a bot for the past few weeks, and that we've all been participants in a Turing Test.

For a moment there, I was seriously impressed.

ajays · on Jan 24, 2011

This looks interesting. Even though it's closed-source, the architecture is interesting.

As an amateur, I'm always stymied by the lack of data. For intra-day trading, where do you get the data?

copper · on Jan 24, 2011

Isn't that data available if you're willing to pay enough for it? For example: http://nseindia.com/content/research/res_histdata.htm

ajays · on Jan 24, 2011

Anything is available if you're willing to pay enough for it... ;-)

As an amateur, the "enough" in the above statement is close to 0.

Plus, I don't know what I'd do with NSE data. I'm looking for NASDAQ/NYSE data.

joshu · on Jan 24, 2011

What about EOD only? Iirc mildew was only a few hundred a month.

lrm242 · on Jan 24, 2011

There is data everywhere, and a lot of it is crap. Before you ask yourself what kind of data you need, you must ask yourself what kind of models you want to build.

You can buy data directly from the exchanges. Every exchange will sell you historical "Depth of Book" data. This data is typically a direct, loss-less representation of every limit oredr book message sent throughout the day. It allows you full reconstruct every event for every stock that trades on that exchange. Some examples:

http://www.nasdaqtrader.com/Trader.aspx?id=ITCH

http://www.nyxdata.com/Data-Products/ArcaBook-FTP

This data is pretty expensive. Average cost: about $1,000 per month per exchange. There are other sources for this type of data as well. The magazine Automated Trader sells some of it and there are some market data vendors that will sell it to you as well. They basically do a network capture of the real-time feed and sell you the dumps. If you go through one of these guys you can get the data for 1/2 to 1/3 of the price than if you were to buy it directly from the exchange--but beware, and make sure you validate it well.

You only need this kind of data if you're measuring time frames in the milliseconds. Typically, you'd be co-locating at an exchange site for connectivity.

If you care about intraday, but don't care about milliseconds but do care about minutes, then you have some more options. A lot of firms will sell "intraday" or "tick by tick" data but you really have to be careful. Much of it is crap: for example, Interactive Broker's "tick data" is sampled. They artifically down sample the tick stream, therefore what you think is "tick by tick" is really just IB's representation of that. My suggestion: check out Nanex NxCore. They have a real-time and historical product. Their prices are reasonable, their data is good, and their product is great and easy to use.

Daily data can be had from numerous sources. Yahoo data is one of the more widely used ones because they make downloading it easy. When dealing with daily data, finding historical, surivorship-bias-free data is very hard. Purchasing such data from a reputable firm that actually guarantees that the data they are giving you is survivorship-free, makes it more expensive.

So, what kind of models do you want to play with? That will set you down a path to find the right kind of data. Just getting depth of book data from the exchanges would be great. However, you'd spend a great deal of time dealing with non-trading things: How do you store it--it's at least 10GB of new data a day? Gotta build feed handlers for it, each one is different. Uh oh, the exchange added a field, gotta update the feed handler. Before you know it you want to design a normalized format and process all your raw data into that. Lots of work.

gaspard · on Jan 24, 2011

Applying these engineering forces to build a huge "casino winning" software is a waste. What happens if it works ? We have the over-super-rich who can build and maintain such machines that get richer and the rest of us with debts and taxes. I hope such experiments accelerate towards a Tobin tax: fast speculative machines (nanotrading) will just die away and we will get back to "investment" based trading not casino.

b0b0b0b · on Jan 24, 2011

When modelling hypothetical trades do you account for slippage and transaction costs?

spitfire · on Jan 24, 2011

If you look at his P&L sheet he has some largish drawdowns, with a lot of neutralish trades. I'd prefer to see smaller, but more consistent wins.

chr15 · on Jan 24, 2011

Is this still how it works? http://news.ycombinator.com/item?id=1946902

chopsueyar · on Jan 24, 2011

Can anyone comment on their experience with OptionsHouse?