We've been able to run order matching engines for entire exchanges on a single thread for over a decade by this point.
I think this specific class of computational power - strictly serialized transaction processing - has not grown at the same rate as other metrics would suggest. Adding 31 additional cores doesn't make the order matching engine go any faster (it could only go slower).
If your product is handling fewer than several million transactions per second and you are finding yourself reaching for a cluster of machines, you need to back up like 15 steps and start over.
> We've been able to run order matching engines for entire exchanges on a single thread for over a decade by this point.
This is the bit that really gets me fired up. People (read: system “architects”) were so desperate to “prove their worth” and leave a mark that many of these systems have been over complicated, unleashing a litany of new issues. The original design would still satisfy 99% of use cases and these days, given local compute capacity, you could run an entire market on a single device.
Why can you not match orders in parallel using logarithmic reduction, the same way you would sort in parallel? Is it that there is not enough other computation being done other than sorting by time and price?
I think that's allowed but this is where my meagre expertise runs out. You normally have to process orders serially or at least using algorithms that yield the exact same outcome that serial execution would give, but only within a single order book.
You are only able to do that because you are doing simple processing on each transaction. If you had to do more complex processing on each transaction it wouldn't be possible to do that many. Though it is hard for me to imagine what more complex processing would be (I'm not in your domain)
HFT would love to do more complex calculations for some of their trades. They often make the compromise of using a faster algorithm that is known to be right only 60% of the time vs the better but slower algorithm that is right 90% of the time.
That is a different problem from yours though and so it has different considerations. In some areas I/O dominates, in some it does not.
In a perfect world, maximizing (EV/op) x (ops/sec) should be done for even user software. How many person-years of productivity are lost each year to people waiting for Windows or Office to start up, finish updating, etc?
I work in card payments transaction processing and IO dominates. You need to have big models and lots of data to authorize a transaction. And you need that data as fresh as possible and as close to your compute as possible... but you're always dominated by IO. Computing the authorization is super cheap.
Tends to scale vertically rather than horizontally. Give me massive caches and wide registers and I can keep them full. For now though a lot of stuff is run on commodity cloud hardware so... eh.
I think this specific class of computational power - strictly serialized transaction processing - has not grown at the same rate as other metrics would suggest. Adding 31 additional cores doesn't make the order matching engine go any faster (it could only go slower).
If your product is handling fewer than several million transactions per second and you are finding yourself reaching for a cluster of machines, you need to back up like 15 steps and start over.