> Whenever a threat is detected and blocked, our [software] insert the blocked I...

jerf · on Jan 16, 2019

The CPU time and memory bandwidth needed to search a 5000-entry radix tree does not leave you with a lot of budget to beat it by making any sort of network query to a database, not even one on the same machine. Even in the worst case the radix tree is likely to be done with its lookup before you even get memory allocated to start building the query, you might just barely win if the radix tree has to go all the way down and do all of its memory accesses, but you'll still lose overall. Advantageous early bailouts also mean that the radix tree is often done after just three or four pointer hops, with all of those three or four lookups having a decent chance of being in L1 and almost certain to be in L2, if this is the primary thing the CPU is doing on that machine.

trhway · on Jan 16, 2019

and that is basically what happens in database software when the query executed 2nd and all the other following times. The mildly interesting difference is just radix tree vs. various [usually highly optimized] types of indexes and index/search optimized storage types available in a DB engine.

While correctly implemented embedded radix tree would probably be faster on smallish/medium datasets, the gain may be not worth the costs associated with basically implementing your own DB engine when it comes to datasets which wouldn't in its entirety normally sit in memory all the time and/or when complexity of the queries to run against a dataset makes you to implement your own full blown query language/API and/or when updates of the dataset by downloading/reloading the whole dataset become infeasible and you need to implement your own partial update machinery.

jerf · on Jan 16, 2019

Yes, even if I assume the happy path here, once the prepared query is looked up on the client side, the parameters filled out, the packet constructed to send to the database, the OS kernel switched in to send the packet to the DB process, the DB switched in to select on it, the DB reads the packet, the DB parses the packet to determine what prepared query it's running, the DB finds the correct bytecode (or whatever) to run and passes it to an execution engine to run, yes, at that point the DB will make a highly optimized radix tree or similar lookup itself.

Additional non-"happy path" things the DB may encounter includes using a library or DB that doesn't support prepared queries so you're starting from scratch to construct a query, the DB being across any sort of actual network, and the DB library being thread-safe and having to deal with thread-safe primitives, any one of which is expensive enough to cover a radix tree's entire lookup operation, honestly, because even 50ns is likely to be longer than the entire radix operation.

My point is that the in-process radix tree is done before "the packet is constructed" and long before it can be sent. Depending on the details the radix tree might very well outrace "the prepared query is looked up on the client side" if that involves retrieving it from any sort of structure because you don't just have it in a variable, though under the circumstances assuming that it's just on hand is reasonable.

This is one of those things where you have to be thinking about those "order of magnitude" maps that are often posted about memory access operations, because computers span so many orders of magnitude that our intuition breaks down. Assuming L1 or L2 cache, the radix tree is made of almost entirely of single-digit-order nanosecondd instructions, and not even all that many of them, whereas reaching a database is going to involve microsecond-scale interactions. (To the extent that the radix tree may have to hit real RAM, well, so might we have to hit it for the DB interaction, and the DB is certainly bringing in a lot more code for the code cache, so even then, it's not a win for the DB-based code.)

trhway · on Jan 16, 2019

i'm not arguing about the performance cost of network. Comparing apples to apples would be more about embedded db engines or a db engine on the same machine using a "local mode", not a network mode, driver.

TaikiSan · on Jan 16, 2019

Possibly! We didn't consider databases because this logic is implemented in agents injected in our users' applications. On top of limiting what we had access to, it also put a lot of pressure on us to minimise our CPU & memory overhead.

Because of that, we believed using a database (even in-memory) would have resulted in an unacceptable overhead.

nh2 · on Jan 16, 2019

Note that "database" does not necessarily mean "networked database server" to many people.

For example, using the Sqlite library counts as a database for most, and it runs in your process, in your memory. It is essentially a collection of functions for fast data structure manipulation.

platform · on Jan 16, 2019

I doubt that purpose build data structures optimizing L1/L2 cache, will be slower (or even similar to ) a database running on the same machine. Assuming, of course, that all that's needed can be fit into memory. Databases will do much better when you need to spill to disk.

There is a data structure called Judy arrays, invented by Doug Baskins. Judy array was used at some point in time in Erlang to back up its ets tables.

The paper (2003) comparing various data structures for that task is here

[1] http://erlang.org/workshop/2003/paper/p43-fritchie.pdf

"....

For unsorted tables with populations of 70,000 keys or more, performance improvement by using the judyeh table type instead of set is easily measurable and signiﬁcant.

This improvement can be seen with keys in sequential order and random order during operations involving table insertion, lookup, and counter updates.

The deletion operation is not as fast as set, but deletion’s extra cost is smaller than the beneﬁt of insertion, lookup, and update operations combined. Furthermore, the additional RAM required by judyeh tables is quite modest.

The judyeh table type requires only about 6% more than set tables, which is smaller than the additional 15% required for the same data in an ordered set table.

The only operation this research examines which is significantly worse for judyeh tables than set tables is table traversal using the ets:next/2 function.

…".

Would be interesting how Judy arrays are compared to radix

outworlder · on Jan 16, 2019

Yeah. Even a sqlite library would have been a MASSIVE improvement.