[...] choosing the least recently used (LRU) is an obvious choice, since you’re more likely to use something if you’ve used it recently.
There are many other cache replacement policies [1] and they can outperform LRU especially if the quoted assumption is not true. It is for example quite common to have two types of data, one is frequently reused over very long periods, the other is reused over short periods after the first use. In those cases a more complex policy like ARC or CAR can provide noticeable improvements.
Even when discussing CPU caches, there are policies that can severely outperform LRU, specifically at the Last-Level Cache (LLC).
LLCs often are exposed to access streams where the temporal locality (which is the assumption that makes LRU good) has been filtered out by earlier caches.
For that reason, newer cache replacement policies such as RRIP, SHiP, and Hawkeye have significant headroom to improve upon LRU (and PseudoLRU).
> One of the main benefit of LRU is that it only needs one extra bit per cache line.
This might be well known, but why's that? I've recently seen a Go implementation of LRU and it uses a lot of memory. Maybe this would allow us to save some of that in a better implementation.
Requiring only a single bit is only true for a two-way set associative cache, i.e. the content stored at every memory address may only be cached in two different cache slots. In this case you can simply flag the other corresponding cache entry you did not access as least recently uses every time you access a cache entry.
The implementation in software was probably fully associative, i.e. every item can be cached in every slot. This requires a lot more memory and it is the same for caches in processors, they require more additional bits and logic the more freedom they have where to cache the content of every address.
To be more precise, you have to keep track of the order all cache entries were accessed which requires at least about n * log(n) bits unless you only implicitly store the access order by using something like move to front.
> It is for example quite common to have two types of data, one is frequently reused over very long periods, the other is reused over short periods after the first use.
I can't help but see the parallels with garbage collection algorithms, which seem to actually be the same problem approached from a different angle. Which objects should be scanned for cleanup and how often for optimal use of time maps very closely to what items to keep or event from a cache.
Or when transmitting data packets over a network you want to favor real time data like phone or video calls over file downloads. And in case of a failed transmission you want to just forget about the few milliseconds of voice you just lost but you certainly want to retransmit a lost packet from the middle of a binary.
It's all the same, identifying objects with different characteristics in order to handle them with tailored algorithms instead of simply using a generic but less optimal algorithm for all objects.
There are many other cache replacement policies [1] and they can outperform LRU especially if the quoted assumption is not true. It is for example quite common to have two types of data, one is frequently reused over very long periods, the other is reused over short periods after the first use. In those cases a more complex policy like ARC or CAR can provide noticeable improvements.
[1] https://en.wikipedia.org/wiki/Cache_replacement_policies