Hacker News new | past | comments | ask | show | jobs | submit login
A high-frequency trading model using Interactive Brokers API in Python (github.com/jamesmawm)
125 points by holoiii on June 9, 2015 | hide | past | favorite | 34 comments



Great stuff.

I think the author would have been better suited without mentioning HFT, maybe algo trading model?, as its a lighting rod for controversy.

If anything I thin this is useful to illustrate just how hard it is to write a full blown trading system.

So maybe we could look at what you could add to this to make it something you could use in production(Note, please don't use this in production).

1) Risk system, before you write any algos, before you learn how to ingest market data, before you write anything else, you write the risk system.

in a true trading system because you always have orders flying between you and the exchange you never really know what your positions is. The Risk system allows you to deal with this uncertainty by giving each algo rules as to how many shares its allowed to be offside.

If you only ever take away one thing let it be this:

Trading Rule #1 Sooner or later your algo/code will make a mistake, it's your risk system that determines if you have a job tomorrow or not.

2) The system has no rate limiter, what do you do when the quotes come in too fast for you to deal with?

3) Locking the world, the system retrieves a quote and locks up the entire system while the quote is acted on. Essentially the very design of the system means you can only ever run one strategy for one set of tickers at a time.

If you wanted to tackle this start by looking at this data structure/library:

https://lmax-exchange.github.io/disruptor/

4) Back testing, no HFT trade idea's go into production without backtracking, Every HFT firm is different but I think they'd all adhere to this rule.

5) Closing positions, every algo gets offside at some point. How do you notify a trader to close out a position? How does the trader close the position and notify the algo?

6) Multiple algos over multiple symbols.

7) Real time PnL. Your PnL is everything, it means you get paid, it means you can do this again tomorrow. it is the single most important piece of information(Risk metrics are a close second) that you can track. This is probably my only quibble with the demo.


It's good that you lead with risk as #1. You know this, but many don't follow so closely. So here's a Wiki entry of a recent spectacular failure:

https://en.wikipedia.org/wiki/Knight_Capital_Group

The first sentence:

   The Knight Capital Group was an American
   global financial services firm
The key word is: was


As an infrastructure engineer doing DevOps currently, I use Knight Capital Group frequently as an example as why its important to always be in control of your application's environment.


Any good resources regarding building risk systems for someone who is new to the field?


Here are some ideas:

- throttle your orders to the market

- set a threshold for market risk you can take per symbol, per sector, etc.

- take into consideration average daily volume of a symbol for calculating market risk threshold

- implement controls to send cancels for unfilled orders in the event if algo goes haywire

- reject orders priced at some percentage less or more than current market price per share


As the great Jules Winnfield (http://en.wikipedia.org/wiki/Pulp_Fiction) once said, "Well, allow me to retort."

> I think the author would have been better suited without mentioning HFT, maybe algo trading model?, as its a lighting rod for controversy.

Don't be so quick... in a world where keeping score is simple and the odds are tilted for many, any publicity is good publicity.

> 4) Back testing, no HFT trade idea's go into production without backtesting, Every HFT firm is different but I think they'd all adhere to this rule.

I'm sure the majority do. But I really didn't. The problem with backtesting low latency (by which I mean switch-to-switch roundtrips of < ~ 50us) is there are so many sources of jitter the data is basically "mean of x, st dev of 6x^3". Too much noise to signal to make it worth it.

So I would run something "in sim" for a while on live market data but simulated execution. I never looked for profitability--I looked for predictability. If you know the knobs on your system, you can make it work in any market. If you don't know the knobs, you have no business trading it. After a run of a week or so with no major problems I'd go into production. BUT:

> before you write anything else, you write hte risk system.

Oh Dear Lord yes. Not counting life-supporting, military, etc. tech, these are some of the sharpest tools you can imagine. Knight lost $440mm in less than an hour. And they were decidedly not of average expertise. That failure was a much bigger deal than most realize. Luckily smart people noticed and a lot of risk stuff changed after that (imo that was when people finally started to say "fast enough, I need to generate smarter orders").

> 2) The system has no rate limiter, what do you do when the quotes come in too fast for you to deal with?

I've been out of the guts for ~2 years, but by universal unforgiving law, the volume of quotes has got to be ridiculous now. People talk about "low latency" when they're talking about serving static HTML at 1000/s. So few people have actually seen the nuts and bolts of feed handlers--it's not their fault, this isn't widely available stuff--but the traffic spikes are mind-boggling. Good adapters combined with a tuned network stack will translate signal into "book" data, meaning usable basically, in ~ 5us. Meaning they do that 200 times per MILLIsecond. And it's not enough sometimes. And you and your rival firms are spending a lot trying to make that number 4us. Blah blah blah, I kinda miss it.


I agree with you on back-testing. Trying to model order book dynamics across multiple exchanges accurately is something of a fool's errand (especially in pro-rata markets). Running stuff in sim generally lets you iron out flaws and then you can just plug and play to see if it pays. Heh, I used to just plug and play directly with small size as a test rather than using sim since my firm didn't have a usable sim set-up at the time. Just put extremely limiting risk limits in terms of order sending rates on and then cut it off it loses too much in the experiment and move to the next one (yes, this did make strategy choice pretty path-dependent...).

RE rate limiter: For this dude's implementation I don't think it matters. He's using IB (a retail broker) for his data rather than the direct market feed. IB sends a sample of market data rather than every single book update and I think they do it at a rate slower than ~10ms so he probably won't run into problems. Heh, I remember some fun times figuring out the optimal way to handle getting spammed by the exchange. It is a neat industry, but kind of makes you feel a bit like a societal leech some times (I know, we're risk salesmen making markets more efficient and all...).


Agree re: the rate limiter--totally irrelevant in the case of OP.


This guy gets it. great notes and things to work toward for the author and contributors.


The author explains at README.md:

>Sure, I had some questions "how is this high-frequency" or "not for UHFT" or "this is not front-running". Let's take a closer look at these definitions:

> High-frequency finance: the studying of incoming tick data arriving at high frequencies, say hundreds of ticks per second. High frequency finance aims to derive stylized facts from high frequency signals

>High-frequency trading: the turnover of positions at high frequencies; positions are typically held at most in seconds, which amounts to hundreds of trades per second.


Wow, didn't expect my repo to make it this far. It was my first time using Reddit, with the aim of introducing this for research more than actual trading.

Just some thoughts after reading comments:

- I've got a number of Python examples for trading futures, but a third-party app is required as 'the gateway'. Browse on GitHub if you wish: https://github.com/hftstrat/The-Gateway-code-samples

- Backtesting is a topic not to be taken lightly. There are far too many issues to address than just historical simulation. I've written a book covering this topic.

- In my book I've also covered the use of Oanda API with a simple trend-following strategy. Available on Amazon: http://www.amazon.com/Mastering-Python-Finance-James-Weiming...

- Table of contents and source codes used in my book are also available on GitHub: https://github.com/jamesmawm/Mastering-Python-for-Finance-so...

Alright, I'm done with shameless promotion of my links ;p


Pretty cool stuff. Thanks for posting holoiii. Last year, I took some time off to build an IB API client in Java/Scala for options trading.

The biggest challenge I faced personally was dealing with concurrency of maintaining my positions, orders and quote ticks which led to orders as IB API is based on a asynchronous tick model. I ended up re-writing my code with lots of threads/locks to more code with immutability. Curious how you guys architected your IB client?

Also I hosted my client on the cloud on AWS. I subscribed to maybe a few symbols but always wondered in a more production environment whether bandwidth will kill you (subscribed to 100+ symbols and to a real tick-by-tick quote stream not quote snapshots).

Finally I'd love to hear more about folks out there who might be doing more options or futures trading off of IB API. The equity world is definitely very interesting but I found out early on that I enjoy trading delta-neutral and am not good at predicting directions.

Did you guys find that you have to scale up and write your own execution algo's when your size got too big for the market you are trading? Nowadays I use my IB client mostly as a GUI to view my positions and execute complex option spreads; so it's more of a "gray-box" for me. Do you guys trade pure algorithmically or half-and-half?


Is it possible to get IB API access without opening a 10k account? Do they have a demo endpoint you can use? If you didn't already have an existing system built, having 10k tied up while you build one seems undesirable.


You would have to put in at least $25k to do anything serious, since the pattern day trading rule would definitely be triggered.


PDT rule only applies to stocks and options afaik, and only if you have a margin account. It doesn't affect futures/equities/forex.


Put in 15; enter some low maintenance positions. Simple ideas: sell some spreads on SPY, hold low vol.


It is so, so infrequently that I get to say this about something HFT-related on the internet: fucking awesome, dude. There is actual information here that is of use and doesn't mislead. It's amazing.

edit: s/dude/guy-who-made-this


The one key part this seems to be missing is a backtesting framework. Did you look at https://www.quantopian.com/?


Tangentially related question. As a relative newbie to (automated/ algo) trading systems, it seems most of posts here focus on systems that deal with large volumes of trades in very short timeframes.

I'd be interested in writing an automated system that would focus on trading (investing) over longer timespans (shares, options), with (average) returns. Are there any resources you would recommend to get into this? Or are automated systems really only employed in HFT?


What is a "longer timespan"? Minutes instead of seconds? Days instead of minutes? Month instead of days?

Automated trading systems only really make sense when you are trading a pace that is too fast for humans to reliably execute. If you have time for human intervention it is much better to produce a system that simply spits out reports and recommendations that you can act on (or not), rather than having a system trade automatically. If nothing else it can save you from losing all your money due to an unfortunate off-by-one error or something.


In terms of timespan, I was thinking days/ months. So it would take the aspect of human intervention and analysis/ research out of it (perhaps the system could still prompt the user for input, the user would just not have to monitor markets and positions actively). I don't know if this makes sense.


The problem with most HFT/algorithmic approaches is that it relies on large volumes to make profits. If you're averaging a very tiny profit pr. trade (with a large variance) you're going to have to make a lot of big trades and be able to sustain large temporary losses to make money.

That's not to say that computer assisted trading isn't a good idea for smaller time investors, but the focus is on assistance, not automation. You can write models that produce a shortlist of stocks and options that match certain criteria and you can write risk and portfolio models that try to model what might happen if you add or remove a certain asset from your portfolio, then you can combine these and run some sort of optimization algorithms over the whole thing. But at the end of day, the choice should be made by a smart informed human and should be heavily biased by your reading of the news and other external sources.

All that being said, if you decide to go into this, approach it first as a hobby rather than a job, with the assumption and acceptance that most hobbies cost, especially in the short run. If you just want to do something smart with your money, buy some index funds.


Also, I recommend checking out the QuantStart project started by Michael Halls-moore. Really handy if you are using the Oanda Api. Plus, there exists some great articles on the quantstart site about the creation of the code. https://github.com/mhallsmoore/qsforex


I have read the anti-FlashBoys book to try and grasp it (HFT is not bad and has cut spreads in equities by 5/6ths in ten years) but I still do not understand one HFT algorithm (other than the debunked front running idea)

Could someone explain what algo traders / HFT actually do? Is it all looking for a variation off of a relationship between say OrangeGrowersInc and OrangeJuiceBottlers Inc? And why is speed so important then?


What is the "anti-FlashBoys" book?

(Asuming you don't mean FlashBoys)



Yes thank you. Well worth reading. I was quite annoyed by what seems to be such evident lack of critical thinking by Michael Lewis

I picked up (and have even blogged about) the idea that "good" traders were trying to buy 100m USD of BP shares in three different exchanges and that by the time 1/3 of the order hit London HFT traders "somehow" worked out that there was another 2/3 going and sped over microwave towers to front run the order. But how do you work out that the london order is going to be for 100m? What if it's just for 50m - suddenly you have bought 50m shares of BP no one wants. It's an insane risk.

So I still don't know what HFT and algo is about (it seems to be there are arbitrage opportunities between strongly correlated stocks, and presumably arbitrage opportunities between strongly correlated stocks in different exchanges. And the speed issue is having to beat other people who have the same algo / correlation as you)

But honestly I would like some confirmation


It's strange to call this HFT. The author makes a note of it, and I suppose has the right to call it whatever he wants.

However, newbies who use this project to learn automated trading should be aware that this is not considered HFT in the industry.


I would love to hear your definition of HFT. I've worked in the industry and never seen it defined well.


Agree with you that there is no definition of HFT. If you define it by your trading forecast horizon then I've seen some people argue that HFT could be as long as a few hours. Could you argue that HFT on a multiple second or shorter forecast timescale is not HFT? I think HFT is, more broadly, any sort of intraday trading where your signals are derived from intraday tickdata.


It's great that you're sharing, so thanks for that. I get it that it's better than pseudo code but I'm having a hard time seeing the words "high-frequency trading and Python" next to each other


Python can be a good tool to prototype hft algos but not for trading (probably you want to trade under < 1ms).


see a few comments on the speed of python - given that the strategy is on a retail platform i'm not sure choice of language will be the bottleneck.


This is really cool, thanks for sharing it out!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: