Hacker News new | past | comments | ask | show | jobs | submit login

My personal rules of thumb for Linux systems. YMMV.

* If you need a low-latency server or workstation and all of your processes are killable (i.e. they can be easily/automatically restarted without data loss): disable swap.

* If you need a low-latency server or workstation and some of your processes are not killable (e.g. databases): enable swap and set vm.swappiness to 0.

* SSD-backed desktops and other servers and workstations: enable swap and set vm.swappiness to 1 (for NAND flash longevity).

* Disk-backed desktops and other servers and workstations: accept the system/distro defaults, typically swap enabled with vm.swappiness set to 60. You can and likely should lower vm.swappiness to 10 or so if you have a ton of RAM relative to your workload.

* If your server or workstation has a mix of killable and non-killable processes, use oom_score_adj to protect the non-killable processes.

* Monitor systems for swap (page-out) activity.




For the curious (I was):

* vm.swappiness = 0 The kernel will swap only to avoid an out of memory condition, when free memory will be below vm.min_free_kbytes limit.

* vm.swappiness = 1 Minimum amount of swapping without disabling it entirely.

* vm.swappiness = 60 The default value.

* vm.swappiness = 100 The kernel will swap aggressively.

https://en.wikipedia.org/wiki/Swappiness


> vm.swappiness = 0 The kernel will swap only to avoid an out of memory condition, when free memory will be below vm.min_free_kbytes limit.

This is not the case.

It used to be the case, but changed in kernel version 3.5-rc1 (2012 ish)

There was a discussion about this on HN a few weeks ago: https://news.ycombinator.com/item?id=13511086

And there's a blog post on the percona website about how this rather bizarre change bit them: https://www.percona.com/blog/2014/04/28/oom-relation-vm-swap...

I call it bizarre because (as I wrote in that other HN thread) a) it changed the behaviour of lots of production systems in a surprising way, and b) if you want to ensure your processes never swap you already had the option to not have a swap file or partition.


If you are on the experimental side:

There is also zram (just swap in memory lz4/lzo compressed) and zswap (compressed cache in memory for swap pages before hitting disk) that needs a real swap device but compresses pages beforehand.

I run zswap on my Desktop and on a few servers and it gives you some more time before the oom killer comes and the system feels a bit longer responsive.

zram is a nice idea but quite a beast in practice (at least on MIPS with 32mb RAM) sys constantly at 100% if you need it and other quirks. Maybe it got better or I did something wrong.

But if you need an in-memory compressed block-device it's pretty great - you can just format it with ext4 and have a lz4 compressed tmpfs.


From what I understand, zram results in LRU cache inversion whereas zswap does not (as it intercepts calls to the kernel frontswap API). Although, if you have a workload that would benefit from MRU then I guess this is just a bonus :)

Zswap maintains default kernel memory allocation behaviour, with the tradeoff that it needs a backing swap device to push old pages out to (which is why zram tends to be used more often in embedded devices that only have a single volatile memory store, of devices with limited non-volatile storage).


I use zram rather than a regular swap partition on all my laptops (because I'd rather not swap on SSDs) and desktops (same reason and/or there is an absurd amount of RAM to begin with). I also hear that most chromebooks use zram too (you really don't want to be swapping on that eMMC memory).

I set it up with one zram device per CPU core for a total space of ~20% available RAM.

No performance issues w/ zram so far so I haven't felt the need to change the compression algorithm.


zram has worked fine on my chromebooks. This is with running multiple chroots - and I have hit the oom killer a number of times (when even zram swap wasn't enough).

Until you actually run out of memory, zram seems very much a set-and-forget type of thing. No babysitting required.

tl;dr: it does what it says on the tin, and ... with minimal cpu impact.


First I've heard of either. How would I set these up?


You can setup zram like this. Typically you'll want to make a service for it since it needs to run on every boot.

  # modprobe zram num_devices=1
  # echo 1G > /sys/block/zram0/disksize
  # mkswap zram0 /dev/zram0 -L zram0
  # swapon -p 100 /dev/zram0
Official documentation here: https://www.kernel.org/doc/Documentation/blockdev/zram.txt


Wasn't there a Debian/Ubuntu thing recently where vm.swappiness = 0 had a behavior change which increased the number of incidents of the OOM killer stomping on things like database processes?

(Maybe it wasn't so new... https://www.percona.com/blog/2014/04/28/oom-relation-vm-swap...)


Thank you for sharing this. There's an interesting conversation thread in the comments on that post. It's a little over my head, but my takeaway is that with the kernel change, in an OOM event, MySQL is unable to be swapped out due to the type(s) of memory pages it's using, so the kernel is forced to kill it (or itself). In practice, it's relatively straightforward to tune MySQL/MariaDB for a certain memory allocation, and if it's on a shared host, oom_score_adj can be set to protect it.


Can you not protect processes from the oomkiller? This is trivial and very useful on FreeBSD

https://www.freebsd.org/cgi/man.cgi?query=protect&sektion=1


Yes, with oom_score_adj[0], which I've mentioned several times. Setting it to -1000 for a process protects it from OOM killing.

0. http://man7.org/linux/man-pages/man5/proc.5.html


that looks painful to use. Do Linux distros let you automatically protect services? E.g. On FreeBSD:

mysql_enable="YES" mysql_oomprotect="YES"

Now every time you start the MySQL service it's automatically protected


Define "automatically" and "services" first ;-) Normally you just set it in the systemd's unit file for each daemon you want to adjust. So for some definitions of the above the answer is yes. (OOMScoreAdjust in https://www.freedesktop.org/software/systemd/man/systemd.exe...)


Personally, I use one tool for both FreeBSD and Linux. Picking up the (imported) rc.conf variable for a service is a mere matter of

    oom-kill-protect fromenv
And the conversion from something like OOMScoreAdjust is quite straightforward. A PostgreSQL systemd unit file that read OOMScoreAdjust=-625 becomes a run program that contains

    oom-kill-protect -- -625
* http://marc.info/?l=freebsd-hackers&m=145425153624976&w=2

* http://jdebp.eu./Softwares/nosh/guide/oom-kill-protect.html


Cool, thanks for sharing. I would hate to have to muck around in /proc manually to set this.


> * SSD-backed desktops and other servers and workstations: enable swap and set vm.swappiness to 1 (for NAND flash longevity).

Is this that big of a worry? I have a 5-year old SSD in my daily driver laptop, on OS X which loooves to swap out anything it can to gain memory for disk cache, and I'm still barely 15% into the SSD wearout.


How big is the OSX/macOS swap? It's a file and not a partition, right?


It uses a dynamically-sized swap file rather than a dedicated partition.


To elaborate, it uses a series of dynamically-sized swap files (something like 256 MB, then adding a 512 MB file, then 1 GB, then 2 GB, etc)


Nope, just a rule of thumb. :-)


Also... Linux. Not macOS.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: