Depends? If you have a build farm / CI machines, don't use swap. With swap, if a...

AlisdairO · on April 21, 2022

> If you have a build farm / CI machines, don't use swap. With swap, if a user schedules too many compiles at once, machine will slow to a halt and become kinda-dead, not quite tripping dead timer, but not making any progress either. Instead, set up the OOM priority on the users processes so they are killed first. If OOM hits, clang is killed, build process fails, and we can go on.

This doesn't really work that well. It's true that if you enable swap and have significant memory pressure for any extended period your machine will grind to a halt, but this will _also_ happen if you don't use swap and rely on the Linux OOM killer.

Indeed, despite the lack of swap, as part of trying to avoid OOM killing applications, Linux will grind the hell out of your disk - because it will drop executable pages out of RAM to free up space, then read them back in again on demand. As memory pressure increases, the period of time between dropping the page and reading it back in again becomes very short, and all your applications run super slowly.

An easy solution to this is a userspace OOM-kill daemon like https://facebookmicrosites.github.io/oomd/ . This works on pressure stall information, so it knows when your system is genuinely struggling to free up memory.

On the historical fleets I've worked on pre-OOMD/PSI, a reasonable solution was to enable swap (along with appropriate cgroups), but target only allowing brief periods of swapin/out. This gives you two advantages:

* allows you to ride out brief periods of memory overconsumption

* allows genuinely rarely accessed memory to be swapped out, giving you more working space compared to having no swap

lazide · on April 21, 2022

Eh, I’ve never seen a machine actually use any notable amount of swap and not be functionally death spiraling.

I’m sure someone somewhere is able to use swap and not have the machine death spiral, but from desktop to servers? It’s never been me.

I always disable swap for this reason, and it’s always been the better choice. Not killing something off when you get to that point ASAP is a losing bargain.

toast0 · on April 21, 2022

FreeBSD isn't Linux, but I've had FreeBSD machines fill their swap and work just fine for months. I had one machine that had a ram issue and started up with a comically small amount of ram (maybe 4 mb instead of 256 mb... It was a while ago) and just ran a little slow, but it was lightly loaded.

I've also had plenty of machines that fill the swap and then processes either crash when malloc fails or the kernel kills some stuff (sometimes the wrong thing) or something things just hang. Measuring memory pressure is tricky, a small swap partition (I like 512 MB, but limit to 2x ram if you're running vintage/exotic hardware that's got less than 256MB) gives you some room to monitor and react to memory usage spikes without instantly falling over, but without thrashing for long.

You should monitor (or at least look at) both swap used % and also pages/second. If the pages/second is low, you're probably fine even with a high % use, you can take your time to figure out the issue; if pages/second is high, you better find it quick.

lazide · on April 21, 2022

The issue is specific to Linux. I’ve had Solaris and SunOS boxes (years ago) also do fine.

taeric · on April 21, 2022

Don't mistake every machine you have seen death spiraling using swap, with every machine using swap as death spiraling. Notably, how many machines did you not have to look at, because the swap was doing just fine?

lazide · on April 21, 2022

That I’ve administered? None under any significant load!

I even finally disabled it on the lab raspberry pi’s eventually, and a SBC I use to rclone 20+ TB NVR archives due to performance problems it was causing.

It’s a pretty consistent signal actually - if I look at a machine and it’s using any swap, it’s probably gotten wonky in the recent past.

taeric · on April 22, 2022

Apologies. I forgot I had posted something. :(

I am a little surprised that every machine you admin has had issues related to swap. Feels high.

For the ones that are now using swap and likely went wonky before, how many would have that crashed due to said wonkiness?

Teletio · on April 21, 2022

There are plenty of workload which sometimes just spike.

Batch process for example.

With proper monitoring you can actually act on it yourself instead of just restarting which just leads to a oom loop.

lazide · on April 21, 2022

If you pushed something to swap, you didn’t have enough RAM to run everything at once. Or you have some serious memory leaks or the like.

If you can take the latency hit to load what was swapped out back in, and don’t care that it wasn’t ready when you did the batch process, then hey, that’s cool.

What I’ve had happen way too many times is something like the ‘colder’ data paths on a database server get pushed out under memory pressure, but the memory pressure doesn’t abate (and rarely will it push those pages back out of swap for no reason) before those cold paths get called again, leading to slowness, leading to bigger queues of work and more memory pressure, leading to doom loops of maxed out I/O, super high latency, and ‘it would have been better dead’.

These death spirals are particularly problematic because since they’re not ‘dead yet’ and may never be so dead they won’t, for instance, accept TCP connections, they defacto kill services in ways that are harder to detect and repair, and take way longer to do so, than if they’d just flat out died.

Certainly won’t happen every time, and if your machine never gets so loaded and always has time to recover before having to do something else, then hey maybe it never doom spirals.

Teletio · on April 21, 2022

I try to avoid swap for latency critical things.

I do a lot of ci/CD where we just have weird load and it would be a waste of money/resources to just shelf out the max memory.

Other example would be something like Prometheus: when it crashes and reads the wal, memory spikes.

Also it's probably a unsolved issue to tell applications how much memory they actually are allowed to consume. Java has some direct buffer and heap etc.

I have plenty of workloads were I prefer to get an alert warning and acting on that instead of handling broken builds etc.

AlisdairO · on April 21, 2022

I think the key here is what you mean by using swap. Having a lot of data swapped out is not bad in and of itself - if the machine genuinely isn't using those pages much, then now you have more space available for everything else.

What's bad is a high frequency of moving pages in and out of swap. This is something that can cause your machine to be functionally unavailable. But it is important to note that you can easily trigger somewhat-similar behaviour even with swap disabled, per my previous comment. I've seen machines without swap go functionally unavailable for > 10 minutes when they get low on RAM - with the primary issue being that they were grinding on disk reloading dropped executable pages.

I agree that in low memory situations killing off something ASAP is often the best approach, my main point here is that relying on the Linux OOM killer is not a good way to kill something off ASAP. It kills things off as a last resort after trashing your machine's performance - userspace OOM killers in concert with swap typically give a much better availability profile.

lazide · on April 21, 2022

100% agree.

In a situation where a bunch of memory is being used by something that is literally not needed and won’t be needed in a hurry, then it’s not a big deal.

In my experience though, it’s just a landmine waiting to explode, and someone will touch it and bam useless and often difficult to fix machine, usually at the most inconvenient time. But I also don’t keep things running that aren’t necessary.

If someone puts swap on something with sufficiently high performance, then obviously this is less of a concern too. Have a handful of extra NVMe or fast SSD lying around? Then ok.

I tend to be using those already though for other things (and sometimes maxing those out, and if I am, almost always when I have max memory pressure), so meh.

I’ve had better experience having it fail early and often so I can fix the underlying issue.

rcxdude · on April 21, 2022

When I reenabled swap on my desktop (after running without swap for years assuming it would avoid the death spiral, only to find out it was almost always worse because there was no spiral: it just froze the whole system almost immediately), it would frequently hold about 25% of my RAM capacity with the system working perfectly fine (this is probably an indication of the amount of memory many desktop apps hold onto without actually using more than anything else, but it was useful). In my experience if you want a quick kill in low memory you need to run something like earlyoom to kill the offending process before the kernel desperately tries to keep things running by swapping out code pages and slowing the system to a crawl.

roelschroeven · on April 21, 2022

It's only one datapoint, but at this very moment a server at work is using a notable amount of swap, 1.5 GiB to be more precise, while functioning perfectly normally.

    $ free -h
                  total        used        free      shared  buff/cache   available
    Mem:          3.9Gi       1.7Gi       573Mi       180Mi       1.6Gi       1.7Gi
    Swap:         4.0Gi       1.5Gi       2.5Gi

lazide · on April 21, 2022

I wish you luck! Only time that’s happened before was memory leaks for me, and it didn’t go very long before death spiraling. But if you’re comfortable with it, enjoy.

roelschroeven · on April 25, 2022

It's still working just fine, with still the same amount of swap in use (approximately).

otabdeveloper4 · on April 21, 2022

> Eh, I’ve never seen a machine actually use any notable amount of swap and not be functionally death spiraling.

For my low-end notebook with solid-state storage I set the kernel's swappiness setting to 100 percent and this problem got magically fixed. It's rock-solid now.

I don't know how it works but it does.

yencabulator · on April 22, 2022

It's pretty common for me to see a gig or two in swap, never really wanted back, and that RAM used for disk caching instead.

theamk · on April 22, 2022

I think "Linux drops will drop executable pages without swap" is a symptom of machines with small amount of memory, say 4G or less. So it is pretty outdated for regular servers, and probably only relevant when you are saving money by buying tiny VMS.

Those build servers had at least 64GB of RAM, while executables were less than 1GB (our entire SDK install was ~2.5GB and it had much more stuff than just clang). So a machine would need to finely balance on memory pressure: high enough to cause clang to be evicted, but low enough to avoid OOM killer wraith.

I don't think this is very likely in machines with decent amount of memory.

AlisdairO · on April 22, 2022

Fair enough - I've seen it more commonly in smaller machines, but they're also more common in the fleets I've observed (and the ones that are more likely to run close to the edge memory-wise). I have also seen it in systems up to 32GB RAM, so it's by no means a non-issue in systems that are at least somewhat larger. The general observation that oomd/earlyoom + swap is a better solution than no swap still generally applies.

Teletio · on April 21, 2022

There are CICD builds out there which consume much more resources and time were just killing one part of the build would destroy the work of hours.

Not sure why you wouldn't want swap for it?

It will allow you to fine-tune the build later and give that build a realistic chance to finish

theamk · on April 22, 2022

Because once swap activates, build now takes hours instead of tens of minutes. So it would timeout anyway, but only after wasting lots of resources. And even if you increase the timeout a lot instead, your machine how has a bunch of things swapped out, so now your tests timeout, which is even worse.

Yes, killing that part of the build did destroy the work of hours. It was still better to disable the swap than try to "ride it out".

Teletio · on April 22, 2022

Things don't just take hours longer just because the Linux kernel throws out a few pages which haven't been used for a while.

And it also totally depends on how much memory is missing.

I still prefer to have something taking 20 minutes longer instead of failing and fine-tuning the resources after it.