This is great! In my case, only reason for choosing Redis over Memcached was persisting to disk. With memcached being multi-threaded compared to Redis being single-threaded, I see a big win in simple use-cases for Memcached.
If the OS is still alive, everything is ok (but the OS may sync different pages at different times and in any order if you don't fsync specific ranges with special calls). If the whole machine crashes (power issue, kernel panic, ...), what you find on disk can be a mess.
I prefer to architect to use redis in an ephemeral way. Redis isn't exactly safe in the same way postgresql was up until very recently on newer linux. The semantics of fsync on Linux have been esoteric and poorly understood in the error cases. I would try to cause fsync to fail in another process, while memcached is shutting down and immediately recover. I wonder if the authors checked this scenario. Redis kindof does the right thing and will eventually put the right thing on disk but why do it?
Note that that only was an issue in cases the storage system itself was failing (i.e. IO errors were generated). In contrast to protection against power failures etc, which was/is working correctly.
Well done to them for getting the feature out. I've been a long time user of Memcached.
Lol interesting the way we used it at my last place of business was to restart the memcached server every 5 minutes (cron). Because as we know.... there are only really two hard problems in computer science. Naming things and invalidating cache :D
> Your system clock must be set correctly, or at least it must not move before memcached has restarted. The only way to tell how much time has passed while the binary was dead, is to check the system clock. If the clock jumps forward or backwards it could impact items with a specific TTL.
Why not write the system timestamp to the memory state file as well? And then "evict" (ie just not load) anything exceeding TTL on state file load?
That's exactly what it does, though since it's resuming a monotonic clock the actual code was a bit more complicated...
To find out how long it was down, it notes system time into the state file on shutdown. On start it checks the current system time and adds the delta to the monotonic timer and resumes. Objects exceeding TTL are removed appropriately.
I might’ve just misunderstand that paragraph. I’d assume with $snapshot_timestamp, current time, and TTLs, time wouldn’t be an issue (and of course, don’t futz with system time while memcached isn’t running).
Not really. If it's 4:00:00 and the system clock says 4:20:00, and then it exits... then 21 minutes goes by and the system clock is set correctly. It is indistinguishable from the clock not stepping and 1 minute going by.
> (and of course, don’t futz with system time while memcached isn’t running).
Well, yes, that's what the original thing you quoted said: "Your system clock must be set correctly, or at least it must not move before memcached has restarted."
I’ve never worked in environments where real time and system time had that much drift (due to ntp), but I acknowledge it probably happens out there in distributed systems. Accurate time is important!
Before memcached had a monotonic clock people would end up with immortal objects (underflowed TTL's) because ntp would start after memcached and make a huge adjustment due to the hardware clock being really off.
With the restart code, people could run a kernel upgrade and reboot while the daemon is down... so if this ends up causing a huge clock adjustment you're screwed.
Presumably it works by mapping that file underneath a regular memory allocator, that means extremely small and random IO patterns. I think "must" is appropriate as a real backing file would likely mean a slowdown of 1-2 orders of magnitude! Compare to a typical DB system where the allocation structure always has some kind of large block IO locality by design
The alternative to 'must' is 'should', and people ignore that more easily, resulting in bug reports like 'memcached performance abysmal when running with disk', 'why does memcache cause 1MB of IO when I only read 256 4-byte keys', etc
Not every memory access to a memory mapped file is automatically paged out immediately, so depending on vm dirty bytes, etc, you may get away with it for a while.
But the worst case will be dismal and not unlikely.
You can do it at really low load levels, but you'll lose performance consistency. mmap'ing files over a real filesystem of any kind is super complicated.
The access pattern isn't optimized at all for flash or HDD or etc... however it does work super well if that mount happens to be a DAX mount over persistent memory.
Can't you just pass in a raw block device, get rid of any fsyncs and you got some very volatile backing store? The OS will write things out as best as it can, and if the whole thing fits in memory (and it's a memcached instance, so it should after all), then read-write will be fast. And if you want to restart, just stop the memcached process, do a sync, wait for it ... and reboot. (This seems simpler than copying the tmpfs, but a lot less safe/deterministic?)
You can try, but it's not going to be very consistent. You're at the mercy of whenever the OS decides to start flushing pages.. head of line blocking, mmap_sem locks, file descriptor locks, etc. My old job had an mmap-on-disk storage engine at scale and it wasn't any fun.
I think the worst part is mmap-on-disk looks fine at first, and only comes out as a problem after you scale up a while. False sense of security. :/
It's an interesting choice, writing to a mmap file. I wonder how the performance is impacted as compared to just mandating the write be replicated to a nearby memcached instance (like you can do with Redis).
In this case, if you're using tmpfs like we recommend the performance is identical. It's the same as if we used normal shared memory (or so far as I can tell via benchmarks).
Even with an optane pmem mount the perf. is super close.
This is nice feature. I hope other software can write documentation in satirical form like Memcache. [1]
[1] https://github.com/memcached/memcached/wiki/TutorialCachingS...