> Whenever a threat is detected and blocked, our [software] insert the blocked IP address in a list. This list was then sequentially checked every time a request was evaluated by an agent, thus increasing the latency of our customer’s application. This was a O(n) operation
Right, any database where you just go ALTER TABLE blocklist ADD INDEX ('ip'); is faster than that homebrew O(n) solution. No need for another homebrew solution.
Ended up reading the whole post anyway because now I'm curious if this radix tree might be better than a tree a database would use, but they never compared the performance between any old index and their radix tree.
The CPU time and memory bandwidth needed to search a 5000-entry radix tree does not leave you with a lot of budget to beat it by making any sort of network query to a database, not even one on the same machine. Even in the worst case the radix tree is likely to be done with its lookup before you even get memory allocated to start building the query, you might just barely win if the radix tree has to go all the way down and do all of its memory accesses, but you'll still lose overall. Advantageous early bailouts also mean that the radix tree is often done after just three or four pointer hops, with all of those three or four lookups having a decent chance of being in L1 and almost certain to be in L2, if this is the primary thing the CPU is doing on that machine.
and that is basically what happens in database software when the query executed 2nd and all the other following times. The mildly interesting difference is just radix tree vs. various [usually highly optimized] types of indexes and index/search optimized storage types available in a DB engine.
While correctly implemented embedded radix tree would probably be faster on smallish/medium datasets, the gain may be not worth the costs associated with basically implementing your own DB engine when it comes to datasets which wouldn't in its entirety normally sit in memory all the time and/or when complexity of the queries to run against a dataset makes you to implement your own full blown query language/API and/or when updates of the dataset by downloading/reloading the whole dataset become infeasible and you need to implement your own partial update machinery.
Yes, even if I assume the happy path here, once the prepared query is looked up on the client side, the parameters filled out, the packet constructed to send to the database, the OS kernel switched in to send the packet to the DB process, the DB switched in to select on it, the DB reads the packet, the DB parses the packet to determine what prepared query it's running, the DB finds the correct bytecode (or whatever) to run and passes it to an execution engine to run, yes, at that point the DB will make a highly optimized radix tree or similar lookup itself.
Additional non-"happy path" things the DB may encounter includes using a library or DB that doesn't support prepared queries so you're starting from scratch to construct a query, the DB being across any sort of actual network, and the DB library being thread-safe and having to deal with thread-safe primitives, any one of which is expensive enough to cover a radix tree's entire lookup operation, honestly, because even 50ns is likely to be longer than the entire radix operation.
My point is that the in-process radix tree is done before "the packet is constructed" and long before it can be sent. Depending on the details the radix tree might very well outrace "the prepared query is looked up on the client side" if that involves retrieving it from any sort of structure because you don't just have it in a variable, though under the circumstances assuming that it's just on hand is reasonable.
This is one of those things where you have to be thinking about those "order of magnitude" maps that are often posted about memory access operations, because computers span so many orders of magnitude that our intuition breaks down. Assuming L1 or L2 cache, the radix tree is made of almost entirely of single-digit-order nanosecondd instructions, and not even all that many of them, whereas reaching a database is going to involve microsecond-scale interactions. (To the extent that the radix tree may have to hit real RAM, well, so might we have to hit it for the DB interaction, and the DB is certainly bringing in a lot more code for the code cache, so even then, it's not a win for the DB-based code.)
i'm not arguing about the performance cost of network. Comparing apples to apples would be more about embedded db engines or a db engine on the same machine using a "local mode", not a network mode, driver.
Possibly! We didn't consider databases because this logic is implemented in agents injected in our users' applications. On top of limiting what we had access to, it also put a lot of pressure on us to minimise our CPU & memory overhead.
Because of that, we believed using a database (even in-memory) would have resulted in an unacceptable overhead.
Note that "database" does not necessarily mean "networked database server" to many people.
For example, using the Sqlite library counts as a database for most, and it runs in your process, in your memory. It is essentially a collection of functions for fast data structure manipulation.
I doubt that purpose build data structures optimizing L1/L2 cache, will be slower (or even similar to ) a database running on the same machine. Assuming, of course, that all that's needed can be fit into memory. Databases will do much better when you need to spill to disk.
There is a data structure called Judy arrays, invented by Doug Baskins. Judy array was used at some point in time in Erlang to back up its ets tables.
The paper (2003) comparing various data structures for that task is here
For unsorted tables with populations of 70,000 keys or more, performance improvement by using the judyeh table type instead of set is easily measurable and significant.
This improvement can be seen with keys in sequential order and random order during operations involving table insertion, lookup, and counter updates.
The deletion operation is not as fast as set, but deletion’s extra cost is smaller than the benefit of insertion, lookup, and update operations combined.
Furthermore, the additional RAM required by judyeh tables is quite modest.
The judyeh table type requires only about 6% more than set tables, which is smaller than the additional 15% required for the same data in an ordered set table.
The only operation this research examines which is significantly worse for judyeh tables than set tables is table traversal using the ets:next/2 function.
…".
Would be interesting how Judy arrays are compared to radix
Right, any database where you just go ALTER TABLE blocklist ADD INDEX ('ip'); is faster than that homebrew O(n) solution. No need for another homebrew solution.
Ended up reading the whole post anyway because now I'm curious if this radix tree might be better than a tree a database would use, but they never compared the performance between any old index and their radix tree.