Hey folks, I'm the primary maintainer of Gryphon. The backstory here is: I was one of the founders of Tinker, the trading company that built Gryphon as our in-house trading infrastructure. We operated 2014-18, starting with just a simple arbitrage bot, and slowly grew the operation until our trades were peaking above 20% of daily trading volume on the big exchanges.
The company has since wound down (founders moved on to other projects), but I always thought the code deserved to be shared, so we've open sourced it. Someone trying to do what we did will probably save 1.5 years of engineering effort if they build on Gryphon vs. make their own. As far as I know there isn't anything out there like this, in any market (not just cryptocurrencies).
"Ask HN: Why would anyone share trading algorithms and compare by performance?"
https://news.ycombinator.com/item?id=15802834 (pyfolio, popular [Zipline] algos shared through Quantopian)
Well documented. Looks like a nice way to learn about market making in a real life situations with small fractions of bitcoin. Curious to know if someone takes it for a real drive in a production environment.
I had considered adding a 'reading list' to the docs for people who are new to trading, so this could be used as a teaching tool (or just to make using it easier). I'll bump that up the priority list.
Shouldn't be, you just need to write a wrapper for the exchange API with the interface defined in 'gryphon.lib.exchange.exchange_api_wrapper'. I'll add an article to the docs with more about that soon.
It's possible CCXT could be used to easily wrap other exchanges into gryphon. I'm not familiar with the library so hard to guess if it would be a net win or not.
I find the directory structure just a confusing, why does each exchange-market get it's own file, wouldn't it be better for the exchange just to define some markets as hashes?
It's perfectly general: arb, market making, signal trading, ml, etc. Whatever strategy class you're thinking of, you can probably implement it on Gryphon.
Can you please explain to me how a tool written in python can be used for HFT or market making?
I’m asking because we generally used ASICS and c++ in the past, or more recently rust. Even GPUs are often difficult because they introduce milliseconds of latency.
If you want to restrict the definition of HFT to only sub-millisecond strategies you're correct. But then, all HFT is impossible in crypto, since with web request latency and rate limits, it would be very difficult to get tick speeds even in the 10s of milliseconds. It's fine if you want to call this "algo trading" instead of HFT, but I think a common understanding of the term would include Gryphon's capabilities.
In any case Gryphon uses Cython to compile itself down to C, which isn't quite as good as writing in native C but is a good chunk of the way there.
I don't believe true crypto HFT strategies exist (i.e. sub-millisecond tick to trade). It's just not possible with websockets and http requests being the standard for data feeds and order placement on crypto exchanges.
Is there something like that, but with more predictable languages like Rust[1], or OCaml? I wouldn't trust my money for dynamic typing. I recently asked the same question[2] to the Futu broker[3]. Sadly no answer. I wish more quant companies would pay attention to the predictability of their own code and external APIs. For example using Imandra[4] to reason about their own code. Using their own protocol language specification[5] they created verified FIX library[6].
P.S. Also you might want to add that framework in Awesome Quant list[7].
A couple of years ago I made a bit of cash by arbitraging within an exchange. It had both BTC and fiat pairs for all other cryptos, and in some cases buying one, converting to the other and then selling back could net a profit even with fees.
They'd occasionally ban my IP (but not API key, for some reason) I assume due to lots of order book info requests, even though my bot traded enough volume to certainly more than make up for them in fees.
Eventually it stopped earning anything. I assume the exchange started doing it internally themselves and the profit was had before the orders hit the books.
Could also just be competition. The thing about arbitrage opportunities is that they're usually arbitraged out of existence once enough people become aware of them. All it takes is one faster arbitrage bot front-running you and you won't see any more arbitrage opportunities.
I had to stop running my arbitrage bot because it was getting beat by other bots. All it takes is somebody closer to the exchange to get an edge in the arbitrage game. It was fun but there wasn't enough revenue to justify collocated hosting.
I've been playing around with using lstm recurrent nets to find patterns in forex trades, with no real expectation of anything other than learning about recurrent nets (and td convolutional nets). I was able to access 15 years of historical tick data. I would imagine lack of historical pricing data would be an issue for any machine learning approach to crypto trading. Even with 15 years of daily prices I only have ~5500 samples per major currency pair. I've toyed with learning off hourly prices rather than daily, and I've also thought about creating more samples by shifting prices up or down, since perhaps the general patterns would be the same.
You could make any decently clever algorithm work in backtests as long as you don't account for spread, order failures and data delays. Years ago I used to test algorithms in a "harsh environment" where all your trades are essentially 10% worse.
That's true, of course. If the technique worked here then it would be because there really are location independent features that can learned. I view it as similar to translating and transforming images in the MNIST digit set to account for the various ways the feature you're searching for can be spatially oriented. Of course I have no idea if this holds true and will work for pricing data.
Generating data this way is extremely difficult. The true population characteristics of prices are unobservable, we can only with with samples of the population. This means that any attempt at generating data is highly likely to add errors that are difficult to quantify. This is a fundamental difference to disciplines where ML is most successful: in finance you can’t meaningfully generate new data, you can only work with what has been historically recorded, and that is often not relevant to how the market will behave in the future. You can always generate more cat images for a ConvNet to learn, you can feed it photos from multiple angles or even 3D imagery. None of this is available for market data unfortunately.
Right, the thing about applications like the digit set and similar OCR problems etc., is that we can independently generate a model of "acceptable" translations/rotations and validate it reasonably easily because we understand the domain well (not that you can't cause trouble this way). This certainly isn't true across data sets.
Ouch. Out of all the possible subjects to learn NNs on, you have picked by far the most difficult possible. Seriously. If you think of an analogue to rocketry, with the easiest being launching fireworks from a bottle and the other being a mission to Mars, you have picked a Moon landing.
I don’t even know where to begin. Financial data has an extremely low signal to noise ratio and it is fraught with pitfalls. It is highly non-normal, heteroscedastic, non-stationary and frequently changes behavioural regimes. It is irregular, the information content is itself irregular and the prices sold by vendors often have difficult to detect issues that will taint your results until you actually start trading and realise that a fundamental assumption was wrong. You may train a model on one period, and find that the market behaviour has changed and your model is rubbish. Cross validation and backtesting on black box algorithms with heavy parameter tuning is a field of study on it’s own with so many issues that endless papers have been written on each specific nuance.
Successfully building ML models for trading is an extremely difficult discipline that requires a deep understanding of the markets, the idiosyncrasies of market data, statistics and programming. Most quant shops who run successful ML Algos (they are quite rare) have dedicated data teams whose entire remit is to source and clean data. The saying of rubbish in, rubbish out is very true. Even data providers like Reuter’s or Bloomberg frequently have crap data. We pay nearly 500k a year to Reuters, and find errors in their tick data every week. Data like spot forex is a special beast because the market is decentralized. There is no exchange which could provide an authoritative price feed. Trades have been rolled back in the past and if your data feed does not reflect this, you are effectively analysing junk data.
I don’t even want to get started about the fact that trying to train an RNN on 5500 observations is folly. Did you treat the data in any way? The common way to regularise market data for ML is to resample it to information bars. This is not going to work on a daily basis, so you should start off with actual tick data.
Nearly every starry eyed junior quant goes in with the notion that you can just run some fancy ML models on some market data and you’ll get a star trading algo. That a small handful of statistical tests will tell you whether your results are meaningful, whether your data has autocorrelation or mean reverting properties. In reality, ML models are very difficult to train on financial data. Most statistical forecasting tools fail to find relationships and blindly training models on past data very rarely results in more than spurious performance in a back test.
I don’t want to discourage you by any means, but I’d start off with something easier than what you are proposing. Finance firms have entire teams dedicated to what you are trying to do and even they often fail to find anything.
Seconding this. I'm in touch with a bunch of smart coder/traders trying this and nobody (to my knowledge) is making backtest match forward test. To me, ML isn't optimised for this kind of problem. It might be possible to kludge it, but you won't know what it's doing.
My bot that tracks and trades momentum isn't as sexy, but it works.
Thanks for the great feedback. I have no expectations for this other than the learning, and it's already been successful on that front. Just seemed like a fun thing to poke at when most other hobbyists seem to be doing image analysis and language modeling. I've crawled a couple of forums and I get that there are a lot of people out there who think they can readily use these techniques to make money. I doubt very much that this will be the outcome in my case :).
Where I am now I am just trying to figure out how to treat the data, whether to normalize or stationarize and how to encode inputs, etc. The reason that I am working with daily prices is that the fantasy output of this would be a model that can inform a one day grid trading strategy. It may very well be that daily prices won't work for this.
Whether there's anything like an equilibrium in cryptoasset markets where there are no underlying fundamentals is debatable. While there's no book price, PoW coin prices might be rationally describable in terms of (average_estimated cost of energy + cost per GH/s + 'speculative value')
A proxy for energy costs, chip costs, and speculative information
Are there standard symbols for this?
Can cryptoasset market returns be predicted with quantum harmonic oscillators as well?
What NN topology can learn a quantum harmonic model?
https://news.ycombinator.com/item?id=19214650
"The Carbon Footprint of Bitcoin" (2019) defines a number of symbols that could be standard in [crypto]economics texts. Figure 2 shows the "profitable efficiency" (which says nothing of investor confidence and speculative information and how we maybe overvalue teh security (in 2007-2009)). Figure 5 lists upper and lower estimates for the BTC network's electricity use.
https://www.cell.com/joule/fulltext/S2542-4351(19)30255-7
Here's a cautionary dialogue about correlative and causal models that may also be relevant to a cryptoasset price NN learning experiment:
https://news.ycombinator.com/item?id=20163734
Cool stuff, and I didn't mean to discourage you at all. Some of the most interesting challenges in datascience arise in finance.
Forex perhaps is just a pathologically tricky beast to trade well, even though it is the easiest to access. I think perhaps cryptos would be an easier start in terms of there being more inefficiencies and autocorrelation in the market.
In terms of data treatment, I recommend starting with Marco de Prado's Advances in Financial ML. I don't agree with some of his methods, but it is a practical book that highlights a lot of the issues you'll face. You can then draw your own conclusion how to treat them.
All of this is true, but I'll point out that for crypto, your competition is "I heard it was hot on Reddit/Telegram", and if it's not that it's often "I buy $X worth of Bitcoin on the first of every month". I suspect that a random algorithm (like, literally buys & sells random amounts of cryptos at random times) would do better than the average crypto trader, simply because the "I heard it was hot" strategy inherently leads towards buy-high-sell-low behavior.
Totally fair to wonder that, all I can say is we make it clear the built-in algos are just for demo/inspiration purposes, shouldn't be run live with any expectation of profit. The point of Gryphon is to use the framework to build/run your own strats much faster than if you had to build everything yourself.
The company has since wound down (founders moved on to other projects), but I always thought the code deserved to be shared, so we've open sourced it. Someone trying to do what we did will probably save 1.5 years of engineering effort if they build on Gryphon vs. make their own. As far as I know there isn't anything out there like this, in any market (not just cryptocurrencies).
Hope you guys like it!