Hacker News new | past | comments | ask | show | jobs | submit login

This looks interesting. Even though it's closed-source, the architecture is interesting.

As an amateur, I'm always stymied by the lack of data. For intra-day trading, where do you get the data?




Isn't that data available if you're willing to pay enough for it? For example: http://nseindia.com/content/research/res_histdata.htm


Anything is available if you're willing to pay enough for it... ;-)

As an amateur, the "enough" in the above statement is close to 0.

Plus, I don't know what I'd do with NSE data. I'm looking for NASDAQ/NYSE data.


What about EOD only? Iirc mildew was only a few hundred a month.


There is data everywhere, and a lot of it is crap. Before you ask yourself what kind of data you need, you must ask yourself what kind of models you want to build.

You can buy data directly from the exchanges. Every exchange will sell you historical "Depth of Book" data. This data is typically a direct, loss-less representation of every limit oredr book message sent throughout the day. It allows you full reconstruct every event for every stock that trades on that exchange. Some examples:

http://www.nasdaqtrader.com/Trader.aspx?id=ITCH

http://www.nyxdata.com/Data-Products/ArcaBook-FTP

This data is pretty expensive. Average cost: about $1,000 per month per exchange. There are other sources for this type of data as well. The magazine Automated Trader sells some of it and there are some market data vendors that will sell it to you as well. They basically do a network capture of the real-time feed and sell you the dumps. If you go through one of these guys you can get the data for 1/2 to 1/3 of the price than if you were to buy it directly from the exchange--but beware, and make sure you validate it well.

You only need this kind of data if you're measuring time frames in the milliseconds. Typically, you'd be co-locating at an exchange site for connectivity.

If you care about intraday, but don't care about milliseconds but do care about minutes, then you have some more options. A lot of firms will sell "intraday" or "tick by tick" data but you really have to be careful. Much of it is crap: for example, Interactive Broker's "tick data" is sampled. They artifically down sample the tick stream, therefore what you think is "tick by tick" is really just IB's representation of that. My suggestion: check out Nanex NxCore. They have a real-time and historical product. Their prices are reasonable, their data is good, and their product is great and easy to use.

Daily data can be had from numerous sources. Yahoo data is one of the more widely used ones because they make downloading it easy. When dealing with daily data, finding historical, surivorship-bias-free data is very hard. Purchasing such data from a reputable firm that actually guarantees that the data they are giving you is survivorship-free, makes it more expensive.

So, what kind of models do you want to play with? That will set you down a path to find the right kind of data. Just getting depth of book data from the exchanges would be great. However, you'd spend a great deal of time dealing with non-trading things: How do you store it--it's at least 10GB of new data a day? Gotta build feed handlers for it, each one is different. Uh oh, the exchange added a field, gotta update the feed handler. Before you know it you want to design a normalized format and process all your raw data into that. Lots of work.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: