Show HN: Historical order book reconstruction API for crypto markets

jotakami · on Aug 10, 2019

I doubt I’ll ever be trading enough volume to support this expense, but you just might get me addicted with those monthly free API access days...

1996 · on Aug 10, 2019

I have worked before on something similar: cryptomarketplot.com

Free API access is not 1 day but 24/7 on : http://cryptomarketplot.com/api.json ; if you have traffic limits check http://cryptomarketplot.com/api.json.br added a few month ago but not shown on the mainpage (I did that https://news.ycombinator.com/item?id=18653590 )

Only drawback is the limitation to 1 minute resolution. Still good enough if you are not HFT.

If you are HFT they had packages like "all you can eat" feeds. It was like $500/month for the BTCUSD pair on any 5 exchanges out of 75 supported, with different pricing for different latency offers (going all the way to colo!) using custom client and software integration (for SLA on latency targets)

I'd have to ask if they now provide historical data.

tardis_thad · on Aug 11, 2019

If you have precise symbols, date ranges and exchanges you're interested in, get in touch I'll work something out so it's not that taxing on your budget.

tardis_thad · on Aug 10, 2019

Hi, founder of https://tardis.dev here. Happy to answer any questions you have.

lopatin · on Aug 10, 2019

Hey this looks really well put together, nicely done! Maintaining a connection to all these exchanges, and managing that data, is no easy task.

A few random questions:

1. Can you speak more to the "synchronized clock" part? You augment data feeds timestamps with your own?

2. What kind of database did you choose for this? How much data are you managing?

Also, (promise you'll give me a discount if tell you this?) this seems underpriced. I haven't seen anyone sell truly high resolution depth data going months back yet. For BitMEX nonetheless. But maybe i'm out of the loop. Anyways, congrats on the launch!

tardis_thad · on Aug 10, 2019

Thanks for the feedback, really appreciate it.

1. Yes every message has also local timestamp (100 ns precision).

2. Currently it's around 4-5TB of compressed data (so 25-35TB uncompressed - need to check to be sure).

Indeed pricing is supposed to be very affordable as it's targeted at independent algo traders so without spending huge amounts of $$$ they can have good data to backtest on professional level (arguably if you can call crypto trading professional as some argue). Happy to provide you with the discount, please get in touch with me via email if interested.

Edit: some obsolete info redacted

1996 · on Aug 10, 2019

Your timestamping approach is interesting.

It is well known that exchanges like bitmex suffer from special latency issues. People pay very well to study that and provide mitigation.

Many approaches can be done. But running on a single computer, let alone a single VM, is very dangerous.

I worked for a company that offered multiple timestamps + hardware timestamping for accuracy.

A VM will cause too much latency. Even very well configured, your jitter will be above 100ns by several orders of magnitude.

So be very careful about considering your extra timestamp as authoritative.

You are spot on the other challenge: database. The person who focused on that achieved impressive results using a special hardware + software mix.

lopatin · on Aug 10, 2019

Thanks for the details. Flat files in the cloud, and heavy caching on the client sounds like the most effective way to go about it to me too. I work at a company that collects data like this for the traditional derivatives markets and fwiw about half of our data is in S3 too. The other half, Dynamo.

I appreciate you making this accessible to indie algo traders, it's def what we need. Will keep an eye on this.

bestbid · on Aug 11, 2019

This is a cool service. I worked as a researcher for a trading firm and we had similar internal tools.

Some hopefully not too harsh feedback:

You're capturing data in London. I know nothing about crypto markets, but they probably aren't all colocated in London, and your users won't be either. You should try to collect data at each source, synchronize it well, and let users adjust timings to suit their needs.

Data integrity is critical. You have incidentReports in your API, but I didn't see what goes in there. Ideally, make this machine readable (begin/end timestamps for each incident interval) or call the user saying data is good/bad as they stream it.

To make this more useful as a product, consider building a normalization layer on top of what you have here. It's great that you provide the actual exchange messages for those who need them, but researchers often want to answer questions like "which market has the tightest average bid-ask spread over the past month?" without learning details of a dozen APIs and writing boilerplate code for each.

I'd suggest providing the user with a standardized object representing the limit order book for a market and ticker. Clients would subscribe to it and receive generic events like snapshot, order/price level added/deleted, trade, etc. As the data is being streamed, they could also access the current state of the book at each point in time through this object to get information like the best prices, size and number of orders at each price, spread, etc.

tardis_thad · on Aug 11, 2019

Thanks, really appreciate constructive feedback! Some of the points you've mentioned were already on the roadmap and I'll definitely consider the rest, although crypto markets are quite specific and can't be 1 to 1 compared to traditional wise hence my initial choices.

sedeki · on Aug 10, 2019

The main thing you offer is order book replay, am I understanding that correctly? I have been interested in something similar, but I am not sure how to justify the extra data actually.

Could you give a scenario where order book data at this granularity might come in handy, as opposed to say a single measure of liquidity (however that would be defined)? Thanks

tardis_thad · on Aug 10, 2019

Yes, you are correct, but it's not only order book but also trades, liquidations etc- full market data replay. If you trade on higher time frame, it's not that useful, you can use daily OHLC data,but for intraday and more HFT algo strategies it may be handy. General common knowledge is that order book is noise and fake data mostly but I disagree - check out https://www.reddit.com/r/highfreqtrading/comments/av5c4m/mar... for some ideas why such data is useful.

tikriti · on Aug 10, 2019

Isn't all this data available for free from the exchange's APIs? Haven't looked at this in depth, but what is the value proposition?

tardis_thad · on Aug 10, 2019

Indeed this data is available as real-time stream via public exchanges APIs, but you can't 'go back in time' and subscribe to data from for example two months ago replay it again and recreate exact market state at that time, using this API you can, does it make sense?

tikriti · on Aug 10, 2019

Yes I think I understand. Historical data is freely and publicly available but this lets me replay the market rather than just do static analysis. Not an active trader, just trying to clarify my thoughts about why I would need this/pay for it. Thanks

tardis_thad · on Aug 10, 2019

Yes, also there is no way to get historical order book data via exchanges APIs

mychael · on Aug 11, 2019

Looks like a fine service, but as a matter of policy I (and many of my peers) do not partner with crypto services that do not list basic company details on their website. For new services, I'm looking for specific names of founders, location, mission statement/values etc.

tardis_thad · on Aug 11, 2019

Thanks for the feedback, noted, will change soon.

mtw · on Aug 10, 2019

I saw for coinbase it returns client_oid. Is that a user account? Does that mean you could build a user's history on coinbase?

mgraczyk · on Aug 10, 2019

Not in general. client_oid is meant to be different for different orders. It is a "cookie" that the clients can use to later identify orders placed through the rest API. "later" here can be via the websocket stream, or after a crash/restart.

tardis_thad · on Aug 10, 2019

Indeed, it's client order id, not customer id.

euroPoor · on Aug 10, 2019

Do you plan on extending the product offering?

Edit: Do you use FIX API for the exchanges that provide it?

1996 · on Aug 10, 2019

For those who don't know: the 3 main way to connect are REST, WebSockets and FIX.

https://algosforcryptos.com/trading-apis-top-crypto-exchange...

But dedicated shops use custom made formats, with custom made software, often running on custom hardware.

If you want FIX, you may also need a low latency feed and at least a custom client where you just add your algorithms.

tardis_thad · on Aug 11, 2019

No immediate plans for FIX API, many exchanges focus solely on WebSocket and REST APIs and even if there's FIX it's often built on top of WS API.

lorrit · on Aug 11, 2019

https://www.kaiko.com/ seems to the same data but with far longer historical coverage (Tardis starts from April this year). The drawback of Kaiko is the higher price tag.

tardis_thad · on Aug 11, 2019

Yes, kaiko provides similar service and there are others in this space as well, but it's normalized data only and only snapshot of 10% of the top of the order book taken every minute - not streaming order book data (initial snapshot + incremental updates). It works for some use cases, but not all, hence my API which I'd hope fills that niche.

hombre_fatal · on Aug 11, 2019

For fun, what's your tech stack and how do you think of it now that you built this project with it? Since we're on HN after all.

gbasin · on Aug 10, 2019

Great offering and slick design! I hope you guys expand to traditional markets as the providers there are a bit messier.

netsectoday · on Aug 11, 2019

I'm pretty sure the crypto exchanges own this data and will be hostile when they see it being repackaged and resold like this.

tdjsnelling · on Aug 10, 2019

Is this a Stripe product? The site is almost identical.