Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My first thought upon seeing the prompt:

    If you would build an in-memory cache, how would you do it?

    It should have good performance and be able to hold many entries. 
    Reads are more common than writes. I know how I would do it already, 
    but I’m curious about your approach.
Was to add this requirement since it comes up so often:

    Let's assume that keys accessed follow a power law, so some keys get 
    accessed very frequently and we would like them to have the fastest 
    retrieval of all.
I'm not sure if there are any efficient tweaks to hash tables or b-trees that might help with this additional requirement. Obviously we could make a hash table take way more space than needed to reduce collisions, but with a decent load factor is the answer to just swap frequently accessed keys to the beginning of their probe chain? How do we know it's frequently accessed? Count-Min sketch?

Even with that tweak, the hottest keys will still be scattered around memory. Wouldn't it be best if their entries could fit into fewer pages? So, maybe a much smaller "hot" table containing say the 1,000 most accessed keys. We still want a high load factor to maximize the use of cache pages so perhaps perfect hashing?



I've been doing this so long that my first thought was "use redis."

Why?

* it works

* it's available now

* it scales

* it's capable of HA

* it has bindings for every language you probably want to use

Why bother writing your own cache, unless it's for an exercise? Cache management is complicated and error prone. Unless the roundtrip kills you just use redis (or memcached).


In a typical LRU cache every read is a write in order to maintain access order. If this is a concurrent cache then those mutations would cause contention, as the skewed access distribution leads to serializing threads on atomic operations trying to maintain this ordering. The way concurrent caches work is by avoiding this work because popular items will be reordered more often, e.g. sample the requests into lossy ring buffers to replay those reorderings under a try-lock. This is what Java's Caffeine cache does for 940M reads/s using 16 thread (vs 2.3B/s for an unbounded map). At that point other system overhead, like network I/O, will dominate the profile so trying to rearrange the hash table to dynamically optimize the data layout for hot items seems unnecessary. As you suggest, one would probably be better served by using a SwissTable style approach to optimize the hash table data layout and instruction mix rather than muck with recency-aware structural adjustments.

The fastest retrieval will be a cache hit, so really once the data structures are not the bottleneck then the focus should switch to the hit rates. That's where the Count-Min sketch, hill climbing, etc. come into play in the Java case. There's also memoization to avoid cache stampedes, efficient expiration (e.g. timing wheels), async reloads, and so on that can become important. Or if a dedicated cache server like memcached, one has to worry about fragmentation, minimizing wasted space (to maximizing usable capacity), efficient I/O, etc. because every cache server can saturating the network these days so the goals shifts towards reducing the operational cost with stable tail latencies. What "good performance" means is actually on a spectrum because one should optimize for overall system performance rather than any individual, narrow metric.


I think splay trees would be good for this: https://en.m.wikipedia.org/wiki/Splay_tree


You should check out the FASTER paper from Microsoft. It specifically covers how to create a K/V log that spills to disk for older keys, but keeps recent keys in memory.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: