I agree with his rant about "CS departments' algorithmic analysis" failing to cover these real-world hardware issue. That was certainly true of my ~1997 undergrad CS course at the University of St Andrews, that was otherwise exemplary.
The only thing I don't understand is why his data structure for finding the least-recently-used item wasn't just a linked list - where each time you touch an object you move it to one end of the list, and when you want to discard an object you remove from the other end. That'd be o(1) and would cause, at most, one page fault per update and none for the add and remove cases (since the two ends would surely be paged "in"). And it'd be a lot easier than his fancy thing. I guess I missed something.
It might be worth pointing out that NVMe or Intel Optane would solve his problem (1) without his fancy data structure too. Some might go as far as to say that while his fancy data structure is out dated now, the stuff that the CS departments teach became correct again. Maybe St Andrews had the right idea all along.
1) Or huge-pages or NUMA node pinning, to some extent.
> I agree with his rant about "CS departments' algorithmic analysis" failing to cover these real-world hardware issue. That was certainly true of my ~1997 undergrad CS course at the University of St Andrews, that was otherwise exemplary.
I counter with similarly anecdotal evidence from 2012. :)
While we had a normal "Algorithms and Datastructures" course that taught the classical theory, there was also a separate course about "Algorithm Engineering" which was (among other things) about performance in the presence of the memory hierarchy. The upshot is that there are different cost models that you have to use in the analysis of your algorithms (e.g., IO complexity), but the techniques that you use for the analysis are the same.
---
To be honest, I think the system is working as intended here. Undergraduate courses are not really designed to push people to the forefront of current research. They're intended to give you the basic skills you need to catch up to modern developments yourself.
I think what's happening here is that people remember "optimality proofs" from their algorithms courses and take that to mean that there is nothing more to be done (that's the sentiment I got from this blog post anyway). The problem with this is that in this context "optimality" referred to a technical definition - optimal with respect to some specific simplified model of computation. The result is probably not optimal with respect to a more complicated model!
> I don't understand is why his data structure for finding the least-recently-used item wasn't just a linked list
Because he is not talking about expiring the Least Recently Used item, but to expire the items whose "cache time" has expired. That is, when you add a new item to the cache, you add it with some lifetime (e.g.: an integer). Then you have some clock (another integer) and what Varnish wants to do is expire all items whose lifetime is lower than the current clock.
With a (doubly-)linked list you would have to traverse the list on insertion to place the item in the correct spot...
I wholeheartedly agree that many—if not most—algorithms classes don’t sufficiently talk about the limitations of the models used and how to handle this. I’m a big fan of Algorithm Engineering, which is best imagined as a cycle of design, analysis, implementation and experiments, with each stage influencing the next. This has led to some exceptional results (e.g. the route planning algorithms used by Google/Bing/Apple Maps, which are around 5 orders of magnitude faster than Dijkstra’s algorithm by using clever pre-processing that exploits the structure of road networks. Their asymptotic query complexity on general graphs is still the same as Dijkstra’s, iirc)
Nitpick, but your list-based LRU cache’s operations wouldn’t be in o(1) but rather O(1) – little oh means strictly slower growth, in this case sub-constant. Probably just a typo though. And it could be further optimised for locality by using a cyclic buffer instead of a linked list (-> no pointer chasing + sequential access), but see kilburn’s answer (the structure in the paper solves a different problem)
> I agree with his rant about "CS departments' algorithmic analysis" failing to cover these real-world hardware issue.
I disagree. CS algorithms is about the logical/mathematical understanding of algorithms. If you want to understand hardware implications, you are generally taught that in an comp architecture class. Blaming algorithms class for the lack of consideration for real world hardware issues is like blaming the computational class because the turing machine doesn't factor in hardware issues. You abstract it out to make general logical observations.
I disagree. It should at least cover what aspects should be considered when designing an algorithm for use in the real world. You dismiss the entire field of experimental algorithmics as if it were nothing more than a bunch of algorithmicists who know some computer architecture things, but there’s a lot more to it.
I’m not arguing against teaching classical algorithm design and analysis, they’re both incredibly important imho. But we should stop ignoring the shortcomings of the models used, hand-waving them away with “the rest is just a bit of engineering”. It’s not.
There are theoretical models of random-access machines with fixed-size caches, on which data structures like B-trees and others can be (and are) analysed for asymptotic performance. It's usually not taught until late undergrad or graduate level though.
(You may have heard the terms "cache aware" and "cache oblivious" used to describe algorithms. These terms come out of theoretical CS, describing algorithms with good performance on these "RAM with cache" machines when the cache size is known or unknown ahead of time, respectively.)
The only thing I don't understand is why his data structure for finding the least-recently-used item wasn't just a linked list - where each time you touch an object you move it to one end of the list, and when you want to discard an object you remove from the other end. That'd be o(1) and would cause, at most, one page fault per update and none for the add and remove cases (since the two ends would surely be paged "in"). And it'd be a lot easier than his fancy thing. I guess I missed something.
It might be worth pointing out that NVMe or Intel Optane would solve his problem (1) without his fancy data structure too. Some might go as far as to say that while his fancy data structure is out dated now, the stuff that the CS departments teach became correct again. Maybe St Andrews had the right idea all along.
1) Or huge-pages or NUMA node pinning, to some extent.