My personal rules of thumb for Linux systems. YMMV. \* If you need a low-latency...

gregmac · on Feb 23, 2017

For the curious (I was):

* vm.swappiness = 0 The kernel will swap only to avoid an out of memory condition, when free memory will be below vm.min_free_kbytes limit.

* vm.swappiness = 1 Minimum amount of swapping without disabling it entirely.

* vm.swappiness = 60 The default value.

* vm.swappiness = 100 The kernel will swap aggressively.

https://en.wikipedia.org/wiki/Swappiness

Jedd · on Feb 23, 2017

> vm.swappiness = 0 The kernel will swap only to avoid an out of memory condition, when free memory will be below vm.min_free_kbytes limit.

This is not the case.

It used to be the case, but changed in kernel version 3.5-rc1 (2012 ish)

There was a discussion about this on HN a few weeks ago: https://news.ycombinator.com/item?id=13511086

And there's a blog post on the percona website about how this rather bizarre change bit them: https://www.percona.com/blog/2014/04/28/oom-relation-vm-swap...

I call it bizarre because (as I wrote in that other HN thread) a) it changed the behaviour of lots of production systems in a surprising way, and b) if you want to ensure your processes never swap you already had the option to not have a swap file or partition.

nisa · on Feb 23, 2017

If you are on the experimental side:

There is also zram (just swap in memory lz4/lzo compressed) and zswap (compressed cache in memory for swap pages before hitting disk) that needs a real swap device but compresses pages beforehand.

I run zswap on my Desktop and on a few servers and it gives you some more time before the oom killer comes and the system feels a bit longer responsive.

zram is a nice idea but quite a beast in practice (at least on MIPS with 32mb RAM) sys constantly at 100% if you need it and other quirks. Maybe it got better or I did something wrong.

But if you need an in-memory compressed block-device it's pretty great - you can just format it with ext4 and have a lz4 compressed tmpfs.

spangry · on Feb 24, 2017

From what I understand, zram results in LRU cache inversion whereas zswap does not (as it intercepts calls to the kernel frontswap API). Although, if you have a workload that would benefit from MRU then I guess this is just a bonus :)

Zswap maintains default kernel memory allocation behaviour, with the tradeoff that it needs a backing swap device to push old pages out to (which is why zram tends to be used more often in embedded devices that only have a single volatile memory store, of devices with limited non-volatile storage).

mxvzr · on Feb 23, 2017

I use zram rather than a regular swap partition on all my laptops (because I'd rather not swap on SSDs) and desktops (same reason and/or there is an absurd amount of RAM to begin with). I also hear that most chromebooks use zram too (you really don't want to be swapping on that eMMC memory).

I set it up with one zram device per CPU core for a total space of ~20% available RAM.

No performance issues w/ zram so far so I haven't felt the need to change the compression algorithm.

isr · on Feb 24, 2017

zram has worked fine on my chromebooks. This is with running multiple chroots - and I have hit the oom killer a number of times (when even zram swap wasn't enough).

Until you actually run out of memory, zram seems very much a set-and-forget type of thing. No babysitting required.

tl;dr: it does what it says on the tin, and ... with minimal cpu impact.

nerdponx · on Feb 24, 2017

First I've heard of either. How would I set these up?

mxvzr · on Feb 24, 2017

You can setup zram like this. Typically you'll want to make a service for it since it needs to run on every boot.

  # modprobe zram num_devices=1
  # echo 1G > /sys/block/zram0/disksize
  # mkswap zram0 /dev/zram0 -L zram0
  # swapon -p 100 /dev/zram0

Official documentation here: https://www.kernel.org/doc/Documentation/blockdev/zram.txt

sbuttgereit · on Feb 23, 2017

Wasn't there a Debian/Ubuntu thing recently where vm.swappiness = 0 had a behavior change which increased the number of incidents of the OOM killer stomping on things like database processes?

(Maybe it wasn't so new... https://www.percona.com/blog/2014/04/28/oom-relation-vm-swap...)

mwpmaybe · on Feb 23, 2017

Thank you for sharing this. There's an interesting conversation thread in the comments on that post. It's a little over my head, but my takeaway is that with the kernel change, in an OOM event, MySQL is unable to be swapped out due to the type(s) of memory pages it's using, so the kernel is forced to kill it (or itself). In practice, it's relatively straightforward to tune MySQL/MariaDB for a certain memory allocation, and if it's on a shared host, oom_score_adj can be set to protect it.

feld · on Feb 24, 2017

Can you not protect processes from the oomkiller? This is trivial and very useful on FreeBSD

https://www.freebsd.org/cgi/man.cgi?query=protect&sektion=1

mwpmaybe · on Feb 24, 2017

Yes, with oom_score_adj[0], which I've mentioned several times. Setting it to -1000 for a process protects it from OOM killing.

0. http://man7.org/linux/man-pages/man5/proc.5.html

feld · on Feb 24, 2017

that looks painful to use. Do Linux distros let you automatically protect services? E.g. On FreeBSD:

mysql_enable="YES" mysql_oomprotect="YES"

Now every time you start the MySQL service it's automatically protected

viraptor · on Feb 24, 2017

Define "automatically" and "services" first ;-) Normally you just set it in the systemd's unit file for each daemon you want to adjust. So for some definitions of the above the answer is yes. (OOMScoreAdjust in https://www.freedesktop.org/software/systemd/man/systemd.exe...)

JdeBP · on Feb 24, 2017

Personally, I use one tool for both FreeBSD and Linux. Picking up the (imported) rc.conf variable for a service is a mere matter of

    oom-kill-protect fromenv

And the conversion from something like OOMScoreAdjust is quite straightforward. A PostgreSQL systemd unit file that read OOMScoreAdjust=-625 becomes a run program that contains

    oom-kill-protect -- -625

* http://marc.info/?l=freebsd-hackers&m=145425153624976&w=2

* http://jdebp.eu./Softwares/nosh/guide/oom-kill-protect.html

feld · on Feb 25, 2017

Cool, thanks for sharing. I would hate to have to muck around in /proc manually to set this.

kalleboo · on Feb 24, 2017

> * SSD-backed desktops and other servers and workstations: enable swap and set vm.swappiness to 1 (for NAND flash longevity).

Is this that big of a worry? I have a 5-year old SSD in my daily driver laptop, on OS X which loooves to swap out anything it can to gain memory for disk cache, and I'm still barely 15% into the SSD wearout.

nerdponx · on Feb 24, 2017

How big is the OSX/macOS swap? It's a file and not a partition, right?

kalleboo · on Feb 24, 2017

It uses a dynamically-sized swap file rather than a dedicated partition.

kalleboo · on Feb 24, 2017

To elaborate, it uses a series of dynamically-sized swap files (something like 256 MB, then adding a 512 MB file, then 1 GB, then 2 GB, etc)

mwpmaybe · on Feb 24, 2017

Nope, just a rule of thumb. :-)

mwpmaybe · on March 2, 2017

Also... Linux. Not macOS.