Exactly, really my point should have been that you can't expect college to teach you explicit step by step knowledge for most real world problems that will require solving. The idea is that they make you solve a small set of problems and hope that you can extract a general framework for problem solving and communication. Of course there is a lot of rote memorization of reusable axioms that are taught as well.
yeah, choice of language is not so easy in this case: c++ seems to be industry standard, but I am curious to measure how golang or scala would improve time of development and what would be their performance.
Out of curiosity what was the full timing - since the moment we get data till the moment when we sent request to exchange?
>> I had a lot of fun trying to optimize round trip time with various datacenters around the world.
To be honest - I am thinking crypto exchanges are not at this state yet: many of them relying on public clouds,
some of them still send non-compressed plain text json data, etc.
So I was trading between a market in Ukraine (Liqui.io, now defunct) and one in Japan (Binance.com). My server was in the US in the mid-west on a cheap provider. I found it was the best to reach either side. The round-trip to each was about 100-120ms. It's quite high, but my tests showed me that going anywhere else would increase one side so the best is to have the same time on each side.
I am sure there is a lot of optimization to be done to my bot, but it had a good balance of sales/buy and I would rarely lose money.
>> With arbitrage you also have to beat the fees and the time.
I have to add - coins that you use for trading is also important factor, volatility may lead to situation when you were able to increase volume of some alt-coins, but next day their price severely drops
This is why you need to build a global balancer that withdraws funds into more stable coins once they reach a certain threshold. A minimum (e.g. Max trade size * 3) should be kept in each active coin for trading only.
It's on npm. It has some simple query abilities. Also there is `tql-cli`. The basic idea is pretty simple, just that you use log files and query by date and time range and key and so you avoid querying or analyzing the entire dataset at once. Note that since it uses the disk with lots of files so much, the assumption is that you are on an SSD.