Do we really need swap on modern systems?

mwpmaybe · on Feb 23, 2017

My personal rules of thumb for Linux systems. YMMV.

* If you need a low-latency server or workstation and all of your processes are killable (i.e. they can be easily/automatically restarted without data loss): disable swap.

* If you need a low-latency server or workstation and some of your processes are not killable (e.g. databases): enable swap and set vm.swappiness to 0.

* SSD-backed desktops and other servers and workstations: enable swap and set vm.swappiness to 1 (for NAND flash longevity).

* Disk-backed desktops and other servers and workstations: accept the system/distro defaults, typically swap enabled with vm.swappiness set to 60. You can and likely should lower vm.swappiness to 10 or so if you have a ton of RAM relative to your workload.

* If your server or workstation has a mix of killable and non-killable processes, use oom_score_adj to protect the non-killable processes.

* Monitor systems for swap (page-out) activity.

gregmac · on Feb 23, 2017

For the curious (I was):

* vm.swappiness = 0 The kernel will swap only to avoid an out of memory condition, when free memory will be below vm.min_free_kbytes limit.

* vm.swappiness = 1 Minimum amount of swapping without disabling it entirely.

* vm.swappiness = 60 The default value.

* vm.swappiness = 100 The kernel will swap aggressively.

https://en.wikipedia.org/wiki/Swappiness

Jedd · on Feb 23, 2017

> vm.swappiness = 0 The kernel will swap only to avoid an out of memory condition, when free memory will be below vm.min_free_kbytes limit.

This is not the case.

It used to be the case, but changed in kernel version 3.5-rc1 (2012 ish)

There was a discussion about this on HN a few weeks ago: https://news.ycombinator.com/item?id=13511086

And there's a blog post on the percona website about how this rather bizarre change bit them: https://www.percona.com/blog/2014/04/28/oom-relation-vm-swap...

I call it bizarre because (as I wrote in that other HN thread) a) it changed the behaviour of lots of production systems in a surprising way, and b) if you want to ensure your processes never swap you already had the option to not have a swap file or partition.

nisa · on Feb 23, 2017

If you are on the experimental side:

There is also zram (just swap in memory lz4/lzo compressed) and zswap (compressed cache in memory for swap pages before hitting disk) that needs a real swap device but compresses pages beforehand.

I run zswap on my Desktop and on a few servers and it gives you some more time before the oom killer comes and the system feels a bit longer responsive.

zram is a nice idea but quite a beast in practice (at least on MIPS with 32mb RAM) sys constantly at 100% if you need it and other quirks. Maybe it got better or I did something wrong.

But if you need an in-memory compressed block-device it's pretty great - you can just format it with ext4 and have a lz4 compressed tmpfs.

spangry · on Feb 24, 2017

From what I understand, zram results in LRU cache inversion whereas zswap does not (as it intercepts calls to the kernel frontswap API). Although, if you have a workload that would benefit from MRU then I guess this is just a bonus :)

Zswap maintains default kernel memory allocation behaviour, with the tradeoff that it needs a backing swap device to push old pages out to (which is why zram tends to be used more often in embedded devices that only have a single volatile memory store, of devices with limited non-volatile storage).

mxvzr · on Feb 23, 2017

I use zram rather than a regular swap partition on all my laptops (because I'd rather not swap on SSDs) and desktops (same reason and/or there is an absurd amount of RAM to begin with). I also hear that most chromebooks use zram too (you really don't want to be swapping on that eMMC memory).

I set it up with one zram device per CPU core for a total space of ~20% available RAM.

No performance issues w/ zram so far so I haven't felt the need to change the compression algorithm.

isr · on Feb 24, 2017

zram has worked fine on my chromebooks. This is with running multiple chroots - and I have hit the oom killer a number of times (when even zram swap wasn't enough).

Until you actually run out of memory, zram seems very much a set-and-forget type of thing. No babysitting required.

tl;dr: it does what it says on the tin, and ... with minimal cpu impact.

nerdponx · on Feb 24, 2017

First I've heard of either. How would I set these up?

mxvzr · on Feb 24, 2017

You can setup zram like this. Typically you'll want to make a service for it since it needs to run on every boot.

  # modprobe zram num_devices=1
  # echo 1G > /sys/block/zram0/disksize
  # mkswap zram0 /dev/zram0 -L zram0
  # swapon -p 100 /dev/zram0

Official documentation here: https://www.kernel.org/doc/Documentation/blockdev/zram.txt

sbuttgereit · on Feb 23, 2017

Wasn't there a Debian/Ubuntu thing recently where vm.swappiness = 0 had a behavior change which increased the number of incidents of the OOM killer stomping on things like database processes?

(Maybe it wasn't so new... https://www.percona.com/blog/2014/04/28/oom-relation-vm-swap...)

mwpmaybe · on Feb 23, 2017

Thank you for sharing this. There's an interesting conversation thread in the comments on that post. It's a little over my head, but my takeaway is that with the kernel change, in an OOM event, MySQL is unable to be swapped out due to the type(s) of memory pages it's using, so the kernel is forced to kill it (or itself). In practice, it's relatively straightforward to tune MySQL/MariaDB for a certain memory allocation, and if it's on a shared host, oom_score_adj can be set to protect it.

feld · on Feb 24, 2017

Can you not protect processes from the oomkiller? This is trivial and very useful on FreeBSD

https://www.freebsd.org/cgi/man.cgi?query=protect&sektion=1

mwpmaybe · on Feb 24, 2017

Yes, with oom_score_adj[0], which I've mentioned several times. Setting it to -1000 for a process protects it from OOM killing.

0. http://man7.org/linux/man-pages/man5/proc.5.html

feld · on Feb 24, 2017

that looks painful to use. Do Linux distros let you automatically protect services? E.g. On FreeBSD:

mysql_enable="YES" mysql_oomprotect="YES"

Now every time you start the MySQL service it's automatically protected

viraptor · on Feb 24, 2017

Define "automatically" and "services" first ;-) Normally you just set it in the systemd's unit file for each daemon you want to adjust. So for some definitions of the above the answer is yes. (OOMScoreAdjust in https://www.freedesktop.org/software/systemd/man/systemd.exe...)

JdeBP · on Feb 24, 2017

Personally, I use one tool for both FreeBSD and Linux. Picking up the (imported) rc.conf variable for a service is a mere matter of

    oom-kill-protect fromenv

And the conversion from something like OOMScoreAdjust is quite straightforward. A PostgreSQL systemd unit file that read OOMScoreAdjust=-625 becomes a run program that contains

    oom-kill-protect -- -625

* http://marc.info/?l=freebsd-hackers&m=145425153624976&w=2

* http://jdebp.eu./Softwares/nosh/guide/oom-kill-protect.html

feld · on Feb 25, 2017

Cool, thanks for sharing. I would hate to have to muck around in /proc manually to set this.

kalleboo · on Feb 24, 2017

> * SSD-backed desktops and other servers and workstations: enable swap and set vm.swappiness to 1 (for NAND flash longevity).

Is this that big of a worry? I have a 5-year old SSD in my daily driver laptop, on OS X which loooves to swap out anything it can to gain memory for disk cache, and I'm still barely 15% into the SSD wearout.

nerdponx · on Feb 24, 2017

How big is the OSX/macOS swap? It's a file and not a partition, right?

kalleboo · on Feb 24, 2017

It uses a dynamically-sized swap file rather than a dedicated partition.

kalleboo · on Feb 24, 2017

To elaborate, it uses a series of dynamically-sized swap files (something like 256 MB, then adding a 512 MB file, then 1 GB, then 2 GB, etc)

mwpmaybe · on Feb 24, 2017

Nope, just a rule of thumb. :-)

mwpmaybe · on March 2, 2017

Also... Linux. Not macOS.

Animats · on Feb 23, 2017

Swapping should have disappeared years ago. At best, it gives the effect of twice as much memory, in exchange for much slower speed. It was invented when memory cost a million dollars a megabyte. Costs have declined since then. How much does doubling the memory cost today?

What seems to keep swap alive is that asking for more memory ("malloc") is a request that can't be refused. Very few application programs handle an out of memory condition well. Many modern languages don't handle it at all. Nor is it customary to check for a "memory tight" condition and have programs restrain themselves, perhaps by starting fewer tasks in parallel, opening fewer connections, keeping fewer browser tabs in memory, or something similar.

I've used QNX, the real time OS, as a desktop system. It doesn't swap. This make for very consistent performance. Real-time programs are usually written to be aware of their memory limits.

Most mobile devices don't swap. So, in that sense, swapping is on the way out.

AnthonyMouse · on Feb 23, 2017

> Nor is it customary to check for a "memory tight" condition and have programs restrain themselves, perhaps by starting fewer tasks in parallel, opening fewer connections, keeping fewer browser tabs in memory, or something similar.

These aren't mutually exclusive and are actually complementary with swap.

If you have more than enough memory then swap is unused and therefore harmless. The question is, what do you do when you run out? Making the system run slower is almost always better than killing processes at random.

And it gives processes more time to react to a low memory notification before low turns into none and the killing begins, because it's fine for "low memory" to mean low physical memory rather than low virtual memory.

It also does the same thing for the user. "Hmm, my system is running slow, maybe I should close some of these 917 browser tabs" is clearly better than having the OS kill the browser and then kill it again if you try to restore the previous session.

jstimpfle · on Feb 24, 2017

> Making the system run slower is almost always better than killing processes at random.

In practice, heavy swapping (forth and back) makes it impossible to even kill the culprit manually (because I can't open an xterm or whatever). While there is often no benefit to have the processes continue running that slow.

Also, idealistically programs should be written with the assumption that the machine could go down at any instant. Having a few more cases where the program is killed will have the effect that the program is better tested and debugged.

qznc · on Feb 24, 2017

I cannot remember a single occasion, where my desktop recovered when it started swapping. Always, the whole system locks up and I need to reboot. Thus, better kill some random processes instead of all of them.

gumoro · on Feb 24, 2017

Always, really? Perhaps I'm lucky but this happens quite frequently with my system (dev workstation, so browser with lots of tabs, IDE, my own app/server stuff, other "power/mem-hungry" dev tools...), and I always manage to keep it sane/healthy:

- notice system starts swapping (if you do not monitor that, to me it sounds as careless as driving on the highway on 2nd gear and ignoring engine noise -- ideally the OS could proactively help here but unfortunately I don't know a good "automated" tool)

- find out which process/app uses the most memory (Linux can even tell you which ones use the most swap space [1])

- decide which one you want to (gently|forcefully)-(quit|restart|whatever). Exercise judgment.

[1] http://stackoverflow.com/questions/479953/how-to-find-out-wh...

Razengan · on Feb 24, 2017

> I cannot remember a single occasion, where my desktop recovered when it started swapping.

..which operating system is that?

qznc · on Feb 24, 2017

Ubuntu

pareidolia · on Feb 24, 2017

Sounds to me your swap is not swapon'd. I get the same behaviour when I'm not running swap and memory is depleted.

thatcks · on Feb 24, 2017

Swap space is only partially related to virtual memory overcommit, and virtual memory overcommit is extremely common and almost unavoidable on most Unix machines. Part of this is a product of a deliberate trade-off in libraries between virtual address space and speed (for example, internally rounding up memory allocation sizes to powers of two), and part of this is due to Unix features that mean a process's theoretical peak RAM usage is often much higher than it will ever be in reality.

(For example, if a process forks, a great deal of memory is shared between the parent and child. In theory one process could dirty all of their writeable pages, forcing the kernel to allocate a second copy of each page. In practice, almost no process that forks will do that and reserving RAM (or swap) for that eventuality would require you to run significantly oversized systems.)

euyyn · on Feb 23, 2017

Plus mobile apps do get, and usually handle, a low-memory notification from the OS.

RubenSandwich · on Feb 23, 2017

On iOS too many low memory warning in a set amount of time, Apple won't tell developers how many in what time frame in order to prevent them from gaming the system, will result in your app getting killed.

Gaelan · on Feb 23, 2017

Until Apple stops soldering on memory, swap will still be alive on the desktop.

Animats · on Feb 23, 2017

Years ago, about 80% of desktop machines were never opened during their life. It's probably higher today.

Spooky23 · on Feb 23, 2017

... for a small fraction of users.

dredmorbius · on Feb 24, 2017

Memory allocation is a non-market operation on (most? all?) operating systems. There's effectively no cost to processes allocating memory, and a fair cost to them not doing so.

I'm not sure that turning this into a market-analagous operation (bidding some ... other scarce resource -- say, killability?) might make the situation better or worse. And the problem ultimately resides with developers. But as a thought experiment this might be an interesting place to go.

cmrx64 · on Feb 24, 2017

This idea was implemented in EROS, and we've been exploring it for Robigalia as well. Storage is a finite resource which can be transferred between processes, including an "auction" mechanism which allows two processes to examine a trade before agreeing to it.

nerdponx · on Feb 24, 2017

Doesn't this already exist for processor scheduling?

dredmorbius · on Feb 25, 2017

There's a weighting in many such systems, but ultimately it's still just a queue, usually a FIFO one.

Niceness allows for higher-priority processes to preempt others, but doesn't address the problem of an overwhelmed queue.

And processor scheduling isn't memory allocation. Time is ultimately some percentage of wall-clock (and/or overcommittment). Memory is ... different.

There's also the question of such stuff as garbage collection and scheduling of that. I had the opportunity to do some JVM tuning "ergonomics" (horrible name) a few years back. Turns out that you get far better behaviour in most cases by decreasing the sweep frequency and increasing the the allocation chunks (terminology is escaping me), due to the fact that natural attrition deallocates memory, and running sweeps too frequently simply chews up massive amounts of CPU time with no return on freed memory.

We also identified processes which genuinely did require very large memory allocations, and allocated hardware specific to those.

Specific workflow and process understanding (always idiosyncratic to a particular work assignment) was necessary, and took time to acquire.

scottlamb · on Feb 23, 2017

I hate swap. My experience with it is that once a disk-backed machine (as opposed to SSD) has started swapping, it's essentially unusable until you manually force all anonymous pages to be paged in by turning off swap ("sudo swapoff -a" on Linux) or reboot.

My hunch is that the OS is swapping stuff back in stupidly. Once memory is available, I'd like it to page everything back proactively, preferring stuff from swap and then from file-backed mmaps. But instead it seems to be purely reactive, each major page fault requiring a disk seek to page in what's needed with little if any readahead. Basically the whole VM space remains a minefield until you stumble over and detonate each mine in your normal operation. Much better to reboot and have a usable system again.

On my Linux systems, I've turned off swap.

On OS X...last I checked, I wasn't able to find a way to do this. I'd like to turn off swap entirely, or failing that, have some equivalent way to force all of swap to be paged in now so I don't have to reboot when I hit swap. Anyone know of a way?

outworlder · on Feb 23, 2017

> My experience with it is that once a disk-backed machine (as opposed to SSD) has started swapping, it's essentially unusable until you manually force all anonymous pages to be paged in by turning off swap ("sudo swapoff -a" on Linux) or reboot.

That depends. If your workload exceeds the amount of available memory, you will start "thrashing" the disk and that can make a system un-responsive.

If you happen to launch a large application, or start working with a big file, unused pages will be evicted to disk to make room and, after some slowdown, the system should become perfectly usable again. YMMV

On OSX, I don't know a way, but I can't recall the last time I had to reboot due to RAM/swap issues, even when I was developing apps on a 4GB Macbook Air. I guess memory compression, which is enabled by default, helps here. Most OSX systems have very fast SSDs as well.

scottlamb · on Feb 23, 2017

> If you happen to launch a large application, or start working with a big file, unused pages will be evicted to disk to make room and, after some slowdown, the system should become perfectly usable again. YMMV

What is an unused page? One that the foreground, memory-hungry application doesn't need? Okay, fine, but what happens when you switch back to some other application? My experience is that it needs the RAM that was paged out, and it doesn't get paged back in all at once. Every time you hit some 4 KiB of memory that happens to be paged out, you wait another 10 ms. I don't know how much beyond the 4 KiB gets paged in at the same time. Worst-case, there's no read-ahead at all. Let's say the application is using 1 GiB of RAM. Then this can happen 262,144 times, which means 44 minutes of waiting in small bursts as you're trying to use it, rather than the 10 seconds (at 100 MB/s) it'd take to read it all in one go. That's what I mean when I say the machine is unusable.

XorNot · on Feb 23, 2017

This is my experience too, and this thread has motivated me to "swapoff" all my desktop systems today I think. There's no situation in which swap usage for me has ever not lead to a reboot due to the system going unresponsive the moment it starts swapping.

aidenn0 · on Feb 24, 2017

I will posit that your swap is too big then; I have 16GB of ram and run with 1-2GB of swap. A runaway process hits the OOM in about 30s.

XorNot · on Feb 25, 2017

To be fair, this is probably ZFS-on-Linux having a bad swap interaction. But I have found that once I get to 95% RAM use, I start hitting the swap fairly frequently and as a side-effect my desktop starts stuttering (usually happens if I kick off an 8-core code compile).

The other culprit I suspect is too many potentially blocking processes Cinnamon's main thread, but I've never figured out a decent way to go after the problem.

aidenn0 · on Feb 25, 2017

oh ZoL is terrible with swap; I get OOM all the time when my ARC is still like 4GB.

tluyben2 · on Feb 23, 2017

On my 2015 MBP with 8GB & SSD, I am often stuck for 10-15 minutes unable to do anything while thrashing. And I am someone who has Activity Monitor handy. I do not have this on my much older and weaker Ubuntu X220s doing the same type of development. Not sure why that is.

Yaggo · on Feb 23, 2017

If it's 3rd party SSD, have you enabled TRIM? I had to do that for my old Mac Mini, made big difference. (2015 MBP of course has factory-installed SSD, but maybe this helps someone else.)

http://osxdaily.com/2015/10/29/use-trimforce-trim-ssd-mac-os...

tluyben2 · on Feb 24, 2017

It's all original Apple hardware.

raarts · on Feb 23, 2017

It can be Spotlight reindexing, or Time Machine creating a backup into /Volumes/MobileBackups (done once per hour). Especially taxing if you have lots of files on your disk.

nitrogen · on Feb 24, 2017

It's kind of annoying that these processes aren't niced and ioniced (or the macos equivalent). Especially when typing a password.

Spooky23 · on Feb 23, 2017

Look for hung mds (spotlight indexer) threads. They can get hung up and fail Un-gracefully with some files.

I had a client whose app generate PDFs that would cause this to happen.

tluyben2 · on Feb 24, 2017

Memory (8GB) is just suddenly jumping to red, usually by some Safari tab (gmail etc) which suddenly jumps to 1GB. And that's it. No indexing or mds processes; just browser tabs which suddenly jump over a point and lock up everything because of memory use. It's like i'm back in the early '90s when I first touched a non Amiga and non SunOS system. And asked how people can work with that 'Windows/DOS stuff'. I am not sure why it misbehaves so much though...

bluedino · on Feb 23, 2017

Usually when people say 'swapping' they mean page faulting. It's nothing more than a slight annoyance on a single-user machine if you swap for 10 seconds, but on a busy server you are dead in the water.

mtone · on Feb 24, 2017

Take this with a grain of salt since this anectode is a few years old, before I upgraded to a SSD to host my swap, and this is on Windows.

I'll always remember when I used to load a large piano instrument in a VST DAW on Windows 7, taking about 3-4GB out of 12GB of RAM, it played perfectly fine but if I left the application open, invariably on the next day I'd get a barrage of audio dropouts when pressing any new piano key. One trick was to put my arms on the entire keyboard a few times to wake the swap back up to memory. Another trick, which I ended up relying upon despite occasional low memory warnings, was to disable the pagefile entirely - that sure fixed the problem.

I'm not sure how/if things have improved since with Windows 10 and SSDs, but I always felt there was something wrong with the algorithms, since even with GBs of memory free at all times, old memory content would tend to end up on disk, without any good reason I could see.

I assume the OS used time to prioritize various caching/pre-fetching techniques over actual application data, and/or once paged, never preemptively loaded data back to RAM even if plenty of memory was available.

zzzcpan · on Feb 23, 2017

"My experience with it is that once a disk-backed machine (as opposed to SSD) has started swapping, it's essentially unusable"

The OS should start swapping very early to avoid bursts of disk I/O and rendering the system unusable. On linux this is somewhat configurable, even if not user friendly, but a combination of swappiness and vfs_cache_pressure could turn it into a usable machine, taking care of inefficient memory usage, memory leaks, unnecessary vfs cache, etc.

scottlamb · on Feb 23, 2017

> The OS should start swapping very early to avoid bursts of disk I/O and rendering the system unusable.

I think you're talking about the I/O of paging dirty things out, but I'm talking about the fact that some memory location is no longer present in RAM, so accessing it will take 10 ms or more to page in.

The system is not only useless while actively swapping. It's useless after it has ever swapped, and you can only recover by disabling swap ("sudo swapoff -a") or rebooting.

calpaterson · on Feb 23, 2017

Contrarian anecdote: I've recovered from swapping a few times on a desktop machine with an HDD (typical scenario: an ON clause was omitted from a join and postgres is doing a cartesian join between two tables). I didn't find things unable instantly, I was able to recover by SIGTERMing the relevant process and then running `swapoff --all` while maybe going for a tea break. YMMV.

dormento · on Feb 24, 2017

On OSX it used to be that you had to disable the pager daemon, then remove the swap file. Probably need to disengage system protection for it to work though.

sudo launchctl unload -w /System/Library/LaunchDaemons/com.apple.dynamic_pager.plist

Then:

rm /var/vm/swapfile*

Disclaimer: haven't tried doing it this way on Sierra.

benibela · on Feb 23, 2017

Something seems to be seriously wrong with the swap implementation on modern systems.

20 years ago on Windows 98 it just started swapping, but it was no big deal. If something became too slow to be usable, you could just press ctrl+alt+del and kill that swapped program and everything worked fine afterwards.

On the other hand, my modern linux laptop, it starts swapping, and it swaps and swaps and you can do nothing, not even move the mouse, till 30 minutes later something crashes.

bsdetector · on Feb 23, 2017

> on Windows 98 it just started swapping, but it was no big deal.

At that time, swapping out a 4k page was a significant part of memory: 4k of 16 MiB is 1/4096 of memory. Each swap gets back a lot of memory the program needs. Now the swap is still 4k pages, but memory has expanded by a thousand fold. Basically swap is a thousand times worse today than it was in the time of Windows 98.

For harddrives swap isn't used now to expand memory, it's used to remove initialization code and other 'dead' memory. Swap should be set to only a tiny fraction of the memory size for this reason, to prevent it from being used to handle actual out-of-memory conditions. But realistically for most users it's not even worth enabling at all because of the occasional memory that needs to be swapped in from disk.

For SSDs the seek speed has improved to match the extra memory so swap can still be used like in the old days to expand the effective memory size. But memory is so large a swap file that's a fraction of memory size to offload 'dead' memory is enough unless there's a specific reason to actually use swap for out-of-memory.

throwawayish · on Feb 23, 2017

I have been using various operating systems for a while.

I feel like Linux has, in general, from a UX point of view, the worst behaviour when swapping and the worst behaviour in general under memory pressure.

I feel like it has gotten worse over time, which might not be just the kernel but the general desktop ecosystem. If you require much more memory to move the mouse or show the task manager equivalent, then the system will be much less responsive when it thrashes itself.

Honestly, I'ld much rather have Linux just crash and reboot, that'd be faster than it's thrashing-tantrums.

Luckily, there's earlyoom, which just rampages the town quickly if memory pressure approaches. Like a reboot (ie. damage was done), just faster.

In any case, it makes me sad (in a bad way) to see how bad the state of things is when it comes to the basics of computing, like managing memory.

tluyben2 · on Feb 23, 2017

Not an excuse for bad implementations, but since I run i3wm, my feelings of happiness increased rapidly. To such an extend that I do not want to ever run anything else; stability, speed, memory use... Solves (for me) the issues you have.

sevensor · on Feb 23, 2017

i3 is magnificent. The same display seems 10x bigger when using i3. As true for netbooks as for big desktops. My old x120 dual boots win7, which is unusably slow and unstable on it. Arch with i3 is still snappy. Unless I'm running a web browser. Web browsers have gone insane.

kevin_thibedeau · on Feb 23, 2017

Use Noscript for browsing on old or resource limited hardware. The problem is the amount of code running on modern websites.

doubleplusgood · on Feb 23, 2017

I couldn't figure out how to scale i3 to the high DPI on my Yoga 900, with Wayland on F25.

sevensor · on Feb 23, 2017

If you're on Wayland, use Sway instead. It feels so much like i3 that I often forget it's not i3. Hidpi works pretty well: https://github.com/SirCmpwn/sway/issues/797. I use this on a Dell Precision with the 4k display.

doubleplusgood · on Feb 26, 2017

Thanks!

secure · on Feb 24, 2017

Use xrandr --dpi 192 (or whichever value you’d like to use) before starting i3, i.e. typically in your ~/.Xsession.

i3 ≥ v4.13 will pick up this value from the Xft.dpi resource in ~/.Xresources as well, which is the more common way of configuring DPI.

edit: haven’t tested this within Xwayland, though. Note that i3 is only supported on X11.

cryptarch · on Feb 23, 2017

What do you mean by "scale i3"? Just the text drawn by i3 or also the managed windows?

doubleplusgood · on Feb 26, 2017

I meant scale the entirety of the interface. I'll try Sway, as apparently i3 and Wayland isn't really a supported combination.

tluyben2 · on Feb 24, 2017

> Web browsers have gone insane.

Yep. I bought a second computer for full browsers. One for dev, another for 'full' browsing (Javascript on) and on my i3 dev machine, I only have NoScript browsing on for dev stuff.

TylerE · on Feb 23, 2017

That's what happens when you run everything through 100 layers of abstraction. Windows, for better or for worse, runs most things closer to the metal.

Asooka · on Feb 23, 2017

Because Windows 98 always kept enough resources available to show you the c-a-d dialog. On Linux, however, there is no "the shell must remain interactive at all times" requirement, so a daemon that gobbles memory and your rescue shell have the exact same priority. Modern Windows even has a graphics card watchdog and if any application issues a command to the GPU that takes too long, it's suspended and the user is asked if it should be killed. Probably not what you want on an HPC that does deep learning, but exactly what you want on an interactive desktop.

I suppose it might be possible to whip something up with cgroups and policy that will keep the VT, bash, X and a few select programs always resident in memory and give them ultimate I/O priority, but I haven't tried.

plorkyeran · on Feb 23, 2017

This is the exact opposite of my experience. Back in the Windows 9x days it was a fairly routine experience for the system to soft-lock with the HD grinding away and I'd sometimes end up just hard rebooting the computer after waiting a few minutes for the ctrl-alt-delete dialog to appear. On macOS with a SSD I don't even notice when my system is swapping heavily.

thiagobbt · on Feb 23, 2017

Isn't this related to this change on kernel 4.10? https://kernelnewbies.org/Linux_4.10#head-f6ecae920c0660b7f4...

ChuckMcM · on Feb 23, 2017

Possibly, however since the writeback behavior is configurable I expect you could test that thesis by changing the aggressiveness of the writeback draining.

rootbear · on Feb 23, 2017

Could this be a reflection of the increasing gulf between RAM speed and HD speed? Even with NVMe drives, which one probably shouldn't be swapping to anyway, RAM is orders of magnitude faster.

cout · on Feb 23, 2017

I think, among other things, it has to do with the size of the swap space relative to the speed of the swap device. IME high disk i/o combined with large swap space means swap never fills up and the OOM killer doesn't kick in. On systems with less RAM and swap, OOM conditions were hit much sooner, even with slower disks.

Default settings for dirty ratio and dirty background ratio exacerbate the issue: more data is held onto before it is written, and once the background ratio is hit, any application writing to disk will block.

Retric · on Feb 23, 2017

With SSD's disk is not that slow.

SapphireSun · on Feb 23, 2017

SSDs are only ~4x faster than magnetic last I checked. If RAM is 100ns per access, and hd access is down from say, 1ms to 0.25ms, that's still a huge huge gap. 4x isn't even an order of magnitude.

EDIT: see comment below for more accurate numbers.

snowwindwaves · on Feb 23, 2017

From the article:

>A typical reference to RAM is in the area of 100ns, accessing data on a SSD 150μs (so 1500 times of the RAM) and accessing data on a rotating disk 10ms (so 100.000 times the RAM.

xinyhn · on Feb 23, 2017

reminded me of this...

Latency Numbers Every Programmer Should Know

https://gist.github.com/jboner/2841832

SapphireSun · on Feb 23, 2017

Thank you for the correction. I should have read more carefully. Still, we're talking 3 orders of magnitude for SSD vs RAM.

tedunangst · on Feb 23, 2017

0. Possibly not true in all cases. 1. Modern systems are much more aggressive about enormous disk caches, which can ironically lead to io storms when it swaps out your application to buffer writes, then has to flush the cache to swap the app back in. 2. Difference in working set size and number of background programs waking up.

digi_owl · on Feb 24, 2017

I think thats more related to Linux and its prioritization of IO than anything else. Note that the latest kernel release 4.10) contains an IO throttle that should improve this experience.

https://kernelnewbies.org/Linux_4.10#head-f6ecae920c0660b7f4...

leni536 · on Feb 24, 2017

I feel you. X and some recovery critical software should have their reserved memory cgroup with some guaranteed, safe amount of physical memory and 0 swappiness. I speculate that on Windows it works so well because most of these stuff are in kernel space anyway.

mwpmaybe · on Feb 23, 2017

If you have an SSD, try setting vm.swappiness to 1 (not 0).

alsadi · on Feb 23, 2017

Just type

sudo swapoff -a sudo swapon -a

benibela · on Feb 24, 2017

Can't type while it is thrashing. Otherwise the offending program could just be killed

derefr · on Feb 23, 2017

What I've always been specifically confused about, is if there's any point in giving a VM a swap partition inside its virtual disk, rather than just giving it a lot of regular virtual memory (even overcommitting compared to the host's amount of memory) and then letting the host swap out some of that RAM to its swap partition.

Personally, I've never given VMs swap. I'd rather have memory pressure trigger horizontal scaling (or perhaps vertical rescaling, for things like DBMS nodes) than let Individual VMs struggle along under overloaded+degraded conditions.

tedunangst · on Feb 23, 2017

Generally yes. In fact, this is why "balloon" drivers exist, to allow the host to create backpressure and make the guest swap. The guest knows more about which pages are interesting than the host. If you make the host do the swapping, it will pick silly things, like the guest's disk cache, to write to swap.

scott_s · on Feb 23, 2017

For clarification to other readers, "Generally yes" was the reply to the originally posed question, which means the above comment actually disagrees with the suggested solution. (I had to read both a few times to get this straight.)

293984j29384 · on Feb 23, 2017

Ah, this is a great idea. It'd also be easier to understand and see service degradation (ie. physical memory being used on the host) directly from something like vCenter instead of relying upon Solarwinds to tell me the host is out of memory.

_9vzr · on Feb 23, 2017

But does the host actually know what is an appropriate thing to swap? It doesn't know what is contained in that chunk of memory it just swapped. Although ideally, you would just build the system to contain enough memory for each VM so they can each run at full capacity along with whatever else overhead it may need for your hypervisor. You wouldn't want the host swapping out anything related to your VMs because it's just going to kill any performance of the affected VM. Give each VM its own swap space and let the guest figure out what needs to be swapped.

sirn · on Feb 23, 2017

One usage of swap in modern systems: hibernation. If you need to use hibernation, that means a swap must exists, either as a swapfile (pre-allocated, as uswsusp require a fixed offset on the disk to resume) or as a partition.

lmm · on Feb 23, 2017

I've been reading these stories for ten years. About 8 years ago I started taking them seriously and stopped using swap. Turns out not having swap works much better. I'm amazed how slowly the consensus seems to be moving though.

njharman · on Feb 23, 2017

Systems are used for vastly different purposes. With different memory usages and expected operation.

There can be no consensus because there is no one answer.

jcrites · on Feb 23, 2017

We reached this same conclusion for our servers generally. The problem with swap is that it's unpredictable. It's better most of the time to have a system that's predictable. However much RAM is available to the system, you can deal with that, by making an appropriate choice of hardware type, or by scaling up, tuning software, etc. It's harder to deal with performance problems related to use of swap in my experience, since it's nondeterministic what will be swapped.

problems · on Feb 23, 2017

Yeah. I've had issues with this on some systems.

On Windows without swap when you hit a remotely low on RAM point, things start going really poorly for some reason - random latency. So with 16 GB of RAM even I can't disable swap on Windows without some really strange performance characteristics, I run SSDs so I really wanted it off and I just stuffed more RAM in my box - with 32 GB it isn't a problem.

On Linux however, you can pretty much turn it off and everything will run smooth until you're actually out and then you lag badly briefly, Linux's oom-killer does its thing and all is good again within the span of a few seconds.

jandrese · on Feb 23, 2017

I've noticed the same thing, Windows just becomes bizarrely cranky if you disable swap entirely. My solution was to instead leave it on, but limit it to just a couple of megabytes. That seems to avoid the VM subsystem freakouts thus far.

speeder · on Feb 23, 2017

Sadly, trying to investigate this is quite hard, since people are outright hostile to questions about it.

If you ASK about swapping on windows, you get people telling you that "Microsoft engineers are smart, don't disable swap and go <insert expletive here>" even if you asked something that is NOT about disabling swap.

So, I had this gamer laptop, i7, nVidia GPU, 8GB of RAM (when most machines had 2 or 4), but some stupidly slow 5k RPM HDD made for power saving and locked "noiseless mode", thus very slow seek too (ie: it moves the heads slowly to avoid making noise and for aerodynamic reasons).

I noticed that ever after I just booted up, RAM usage would jump to 6gb and the HDD would trash endlessy and make the machine unusable... after some research I found some interesting posts by MS employees about it:

Windows can "preemptively" use swap, it will write on swap things it thinks you might need to swap out. Sounds good on paper.

Also, Windows has several caching systems, that will write to "RAM" random crap.

One day that was particularly bad, I noticed that when I booted, Windows would immediately attempt to copy to RAM a gigantic binary file that was the sound files of a game I played a lot recently, this caused trashing due to reading the file, then, it would attempt to load other programs it had to, then page out immediately, and enter some crazy loop of trashing the I/O forever... Every time I opened the task manager and looked at the graphs, disk I/O would be constantly maxed out at 100%...

Disabling the VM made the laptop behave better (despite all the bugs Windows have when you disable VM).

But what I really wanted, was to change how the VM works... I wanted to keep the VM, and the caching, but change settings, for example I would set it to NOT page out anything at all unless RAM was used more than 80%, and also to never "cache" stuff unless HDD was actually idle and a good amount of RAM free. But sadly, this can't be done it seems, I got no useful answer on stackexchange sites when I asked this (But got a couple personal messages and e-mails full of expletives in many places where I asked about it, for some reason people get personally offended when the subject is virtual memory).

Animats · on Feb 23, 2017

Java on Windows used to have a background service which touched the pages of the Java components to keep them in memory and make Java performance look better. This was active even if you hadn't run a Java program in weeks. OpenOffice once had a similar program. Enough things like that and you can't get anything done.

jandrese · on Feb 24, 2017

That program used to stall your startup something fierce too. It was really annoying.

problems · on Feb 23, 2017

Yeah, what you wanted to change is what Linux calls "swappiness", configurable in vm.swappiness. In Windows I can't find any such configuration option.

c3833174 · on Feb 24, 2017

>One day that was particularly bad, I noticed that when I booted, Windows would immediately attempt to copy to RAM a gigantic binary file that was the sound files of a game I played a lot recently, this caused trashing due to reading the file, then, it would attempt to load other programs it had to

Oh, that's just superfetch, it's a service you can disable to reduce a bit the idle trashing after desktop has loaded.

slededit · on Feb 23, 2017

On windows when you allocate it will gurantee it has the memory to fulfill the request at the time of the request. On Linux no check is made until you try to use the memory.

Because of this memory pressure will be higher on a windows box. Pageing helps paper over this as the commit can be billed to the page file not RAM. Windows is smart enough to not write anything to swap until you actually use the page so in practice this is rarely a problem.

The benefit to this approach is you actually have a hope of recovering from OOM.

problems · on Feb 23, 2017

> On windows when you allocate it will gurantee it has the memory to fulfill the request at the time of the request. On Linux no check is made until you try to use the memory.

That's not true, you have to turn vm.overcommit_memory on in Linux for that to happen I believe. Which is off by default in most distros.

Asooka · on Feb 23, 2017

https://www.kernel.org/doc/Documentation/vm/overcommit-accou...

The default is to allow "sensible" overcommit whatever that means. From my experience whatever "sensible" is, really is sensible and I haven't had issue with that. You can also set it to allow all memory allocations, even "silly" ones (i.e. allocate 100GB memory on a system with 10GB RAM); or to refuse overcommiting memory.

problems · on Feb 23, 2017

Oh, interesting. Didn't know that. Thanks!

ams6110 · on Feb 24, 2017

> Linux's oom-killer does its thing

Usually selecting sshd to kill, in my experience, rendering the server inaccessable.

mkesper · on Feb 24, 2017

Protect that service against oom-killer, it's mentioned elsewhere in this thread how to.

rogerbinns · on Feb 23, 2017

Two examples of why I have swap:

* On a laptop to hibernate, which results in zero power consumption vs suspend which will drain the battery in a day or so

* I use tmpfs for /tmp and using swap as the backing is far more performant than regular filesystems

77pt77 · on Feb 23, 2017

> On a laptop to hibernate, which results in zero power consumption vs suspend which will drain the battery in a day or so

Swap is not strictly needed for this:

(it boils down to vm.swappiness=1)

https://wiki.debian.org/Hibernation/Hibernate_Without_Swap_P...

Spivak · on Feb 23, 2017

You still need swap, just not a swap partition. I suppose we can debate if there's a meaningful difference between the two.

rogerbinns · on Feb 23, 2017

I have an encrypted swap partition. Hibernation works well with that. I don't believe it is possible with a swap file since the containing filesystem would also be mounted by the hibernation image.

77pt77 · on Feb 23, 2017

> I don't believe it is possible with a swap file since the containing filesystem would also be mounted by the hibernation image.

for sure it is. Even for swap files inside an encrypted volume.

https://vadim-kirilchuk-linux.blogspot.com/2013/05/swap-file...

The crux is knowing the swap-file offset and passing that argument as resume_offset on boot.

Have a look. It's totally doable and you won't need to have an actual partition.

rogerbinns · on Feb 23, 2017

I never realised the swap header on files also provides enough information to locate the other blocks without going through the filesystem. Logical I guess. This is however more work that "works out of the box" swap partitions.

77pt77 · on Feb 23, 2017

I could swear there was a way around that but I don't remember the details and can't find them.

Still, it's possible to hibernate without having the drawbacks people have been complaining about in this thread.

lmm · on Feb 23, 2017

> * I use tmpfs for /tmp and using swap as the backing is far more performant than regular filesystems

This seems absurd. You're running an in-memory filesystem backed by memory-on-disk? You weren't comparing to a journalled filesystem or something like that?

rogerbinns · on Feb 23, 2017

Since I do use journalled filesystems for my real data, your comment implies I should create yet another partition for /tmp using a filesystem that optimises performance over durability/integrity, eg without journalling.

I consider files in /tmp to be temporary, and do not expect them to survive a reboot. (Actually I prefer they don't - less administration and housekeeping.) They also have random lifetimes ranging from fractions of a second to several days. And random sizes from zero length to gigabytes (eg making an ISO image).

With tmpfs RAM is used which provides the best performance since the filesystem is trivial. Memory pressure will cause swap to be used as needed. Files not accessed will end up in the swap and taking no RAM.

By far the fastest I/O is the I/O you don't have to perform.

lmm · on Feb 23, 2017

> Since I do use journalled filesystems for my real data, your comment implies I should create yet another partition for /tmp using a filesystem that optimises performance over durability/integrity, eg without journalling.

If you're using a swap partition just for the sake of /tmp then it's the same difference, no?

rogerbinns · on Feb 23, 2017

> If you're using a swap partition just for the sake of /tmp then it's the same difference, no?

No. The big difference is that regular filesystems try to do I/O to their backing device - heck that is their point, and what they do the vast majority of the time. tmpfs does not do any I/O. However I/O will happen when there is memory pressure by the swapper, but that is going to be rarer.

ie with tmpfs, swap is a spillover mechanism. With a regular filesystem, the underlying device is the primary mechanism.

Swap can also be used for actual swap on the occasions it is helpful.

lmm · on Feb 23, 2017

> No. The big difference is that regular filesystems try to do I/O to their backing device - heck that is their point, and what they do the vast majority of the time. tmpfs does not do any I/O. However I/O will happen when there is memory pressure by the swapper, but that is going to be rarer.

Yes and no - aren't they just two ways of looking at the same decision? Regular filesystems will buffer, and when the system is low on memory it will flush buffers, using similar criteria to deciding whether to swap.

> Swap can also be used for actual swap on the occasions it is helpful.

Swap enables actual swap, sure. My experience is that it usually hurts more than it helps though.

rogerbinns · on Feb 24, 2017

> ... Regular filesystems will buffer, and when the system is low on memory it will flush buffers ...

That is the bit you are missing. Unwritten filesystem data is regularly flushed. The flush interval is often around 5 seconds. Lookup "pdflush" to get the gist, although things have changed since then. Same with laptop mode.

Quite simply if a file is created and lives for at least N seconds then there will be disk activity irrespective of memory pressure. N is 5, perhaps up to 30 seconds in normal use.

Even if the file contents aren't fully flushed, metadata is.

pvdebbe · on Feb 24, 2017

Reminds me of a youngster who thought he could beat the system by putting the swap file to a ramdisk drive.

tossaway1 · on Feb 23, 2017

> I've been reading these stories for ten years. About 8 years ago I started taking them seriously and stopped using swap.

Not sure what you're referring to here. This story doesn't recommend eliminating swap...

tyingq · on Feb 23, 2017

"Systems without swap can make sense and are supported by Red Hat - just be sure the behaviour of such a system under memory pressure is what you want"

So, it doesn't exclusively recommend it, but it concedes that there are use cases where it makes sense.

tossaway1 · on Feb 23, 2017

The quote I included made it sound like he was referring to stories that advocate getting rid of swap.

tyingq · on Feb 23, 2017

Yours might have been interpreted as they advocated never getting rid of swap :)

contingencies · on Feb 23, 2017

Ditto, and over that period memory has become even cheaper.

I sort of wonder if we'll see a 100% RAM, large memory laptop soon that boots from an SD-card or in a cryptographically secure fashion over 4G wireless networks, aggressively disables RAM for power saving and suspends well.

pizzetta · on Feb 23, 2017

Aren't there legacy applications which expect swap where otherwise with modern applications swap isn't necessary? Or, at least that is my current (mis)-understanding...

phil21 · on Feb 23, 2017

This is by far my biggest pet peeve in the space. The "rule of thumb" that you need 2x RAM as swap. Even 10 years ago this "rule" was ancient and useless but it was always a constant challenge educating customers as to why, and that yes - we really did know better than your uncle Rob.

Once a server hits swap, it's dead. There is no recovering it other than for exceptional cases. If you are swapping out, you've already lost the battle.

I tend to configure servers with 512MB to 1GB swap simply so the kernel can swap out a couple hundred MB of pages it never uses - but that's really more to make people feel better than it really being useful at all.

thatcks · on Feb 24, 2017

Rules of thumb involving more swap than RAM probably date from decades ago, when Unix virtual memory systems were sufficiently primitive that the total amount of virtual memory you could use was just your swap space, not swap space plus (most of) RAM.

(The limitation came about because the simple way to handle swapping is to assign every potentially swappable page of virtual memory a swap address when you allocate it in the kernel. Then the kernel always knows that there's space for the page if it ever needs to swap it out and you're never faced with a situation where you need to swap out a page but there's no swap space left.)

toast0 · on Feb 24, 2017

2x RAM as swap is clearly bad, but I like having around 512MB to 1GB (on systems of basically any size); when you do start using more ram than you have, it gives you some buffer (as long as you actually alert on it). If you have a small memory leak, you can recover; if you have a large memory leak, you're going to run out of swap pretty quick anyway.

gravypod · on Feb 23, 2017

I wish we took the path of EROS [0] rather then "RAM and DISK are seperate". A lot of problems stem from that incompatable viewpoint of computing. Computer Science is about hiding complexity under lays of abstraction that continualy provide safer states and constraints on the things built on top of them. Our abstraction that RAM and DISK are seperate is not safer nor does it provide constraints that are simple to navigate. Thinking about this the other way, where DISK is all you need and memory is just a write-through cache, is much safer in my opinion and leads to some really cool application design.

If RAM and DISK are the same, then writing a file system is just writing an in-memory tree. No need to pull data from the disk, just navigate the tree in your program's memory and pull the blob data out. Want to persist acorss reboots, protect against power outages, or save user settings? Just set a variable and it'll be there.

The benifits are much better then the costs.

[0] - https://web.archive.org/web/20031029002231/http://www.eros-o...

rogerbinns · on Feb 23, 2017

The AS/400 (or whatever they call it now) had an approach like that. Everything was on disk and RAM was just a cache of disk. That also meant every "object" had an address and could be accessed by any process with suitable permissions. There are lots of other things they do, with a very different approach than Unix, Windows etc.

Frank Soltis' book is recommended reading: https://www.amazon.com/dp/1882419669/

PeCaN · on Feb 23, 2017

AS/400 is really an amazing system, in many ways still ahead of its time. Persistent, single-level storage and capability security are ideas that still have yet to catch on in the mainstream—even though more research gets poured into NVRAM every year.

It's a shame hardly anyone knows about it. Those things are a joy to use. You can get a free (limited, but still useful) AS/400 user account to play around with at http://pub400.com/. I really recommend it.

(Disclaimer: I'm slowly working on a system that resembles AS/400 in many ways, but optimized for analyzing and reporting on very large timeseries databases. It's intended for business applications that require a combination of scheduled reports and fast ad hoc analysis of big timeseries data, initially the oil & gas industry (which is where I work in my “real job”).)

vidarh · on Feb 23, 2017

The challenge with this is that abstracting away disk in a way that isn't horribly leaky is incredibly hard as long as one lets us manipulate individual bits and the other requires us to write whole sectors.

Note that EROS is not providing a write-through cache. It's providing a write-back cache using checkpointing coupled with a journalling capability and ability to explicitly sync data.

So it's leaky: Your application needs to know that it needs to structure it's writes to memory so that they will make sense if the system comes back up with some of the data missing, and needs to know how to use the journalling functionality.

It can't just act as if it's running in RAM forever.

gravypod · on Feb 23, 2017

I don't know where you get the idea that you can't just pretend you're running in RAM forever. If you look at the main goal of EROS you can see that is the point.

Check out http://wiki.c2.com/?TransparentPersistence

vidarh · on Feb 24, 2017

From the EROS website which specifically described the checkpointing mechanism as well as gives a short description of the journalling support.

If you just pretend you're running in RAM, and the system crashes, you will lose the data between the crash point and the last checkpoint unless you have explicitly used the journalling mechanism. Often that is acceptable. E.g. since you're restoring the program at the same point in time, if the changes are entirely based on data that were in the system at the point of the last checkpoint, it will just redo the work to calculate the changes.

But if there are side-effects, that is often not going to be acceptable. E.g. database updates that the system has said were committed will suddenly disappear.

To solve that, EROS has a journalling mechanism to allow you to give guarantees about specific data that changes in between checkpoints, but that requires applications to explicitly use it to tell the OS what needs to be saved when so that the application can guarantee that a given piece of data has been durably recorded when it promises a client it has been recorded, and that the writes get correctly ordered.

That's a sensible compromise - if you do it right, it only needs to touch the "boundaries" where the system does IO.

gravypod · on Feb 24, 2017

> If you just pretend you're running in RAM, and the system crashes, you will lose the data between the crash point and the last checkpoint unless you have explicitly used the journalling mechanism

> But if there are side-effects, that is often not going to be acceptable. E.g. database updates that the system has said were committed will suddenly disappear.

Yes, not every system will forever be recoverable. You can definetly crash at just the right moment to ruin your year. I'd still like the other safety constraints that this provides because I still think that even if N (where N is the number of threads available on the CPU) processes have a chance of being corrupted we're still going to be saving the rest of the processes on the system that aren't currently transacting with one another.

vidarh · on Feb 25, 2017

I absolutely like many of the ideas behind EROS, including checkpointing, though I think it does have some issues. E.g. we often treat reboots as a "clear all state to recover from weird situations", and so we still need something like that.

The point is more that it doesn't mean you can just treat things like we do RAM now. It ends up being closer to how you'd work in apps that use mmap'd files to back persistent data.

The capabilities model was interesting too.

I think my biggest problem with a persistence model like that, though, is that I suspect it would encourage not thinking seriously about state, and ideally I'd prefer a system where state is minimized. E.g. compare EROS with the virtual opposite: Android, for example, where apps might find themselves killed anytime. Some apps handle it poorly and go through lengthy initialisation processes again when restarted, but many maintain the illusion of being fully presistent to the extent that users can rarely tell if they've started from scratch or not.

I'd love to see more work into allowing the illusion of a persistent process with little developer effort, though. Perhaps OS-level per application checkpointing support that the application can have control over (allowing the app to control exactly what gets checkpointed, and when to ditch the checkpointed state to reinitialise). So "cleanup" can occur by restarting processes transparently for users while hopefully providing most of the benefits of persisting state.

Perhaps coupled with OS-level checkpointing of the information required to bring said "virtually-persistent" apps back up in the same state after a reboot.

nobodyorother · on Feb 23, 2017

You might want to investigate Mumps:

https://en.wikipedia.org/wiki/MUMPS

Setting data in memory is the same as setting data on disk, the only difference is the name of the variable:

s X=1 ; store 1 in variable named X, in memory.

s ^X=X ; store 1 in variable named X, on disk.

s X=^X ; load disk to memory

jandrese · on Feb 23, 2017

How is this different than just memory mapped files? I guess it happens a little more automatically, but it doesn't seem to really solve a major problem that I can see.

gravypod · on Feb 23, 2017

Have you ever lost power and lost data from a document you were editing? Has a server ever crashed in a datacenter, it's data corrupted, and now your company has lost a few 100k to a ffew million? Have you ever had to wait for processes to start again for a long time after fixing the hardware failure?

These are all problems that have been solved on EROS based system. They used to do demos where they would setup a system and have someone start working on some code or a text document, they'd pull out the power plug of the system, plug it back in and the user would be right were they left off. No data loss, no corruption, just back to work.

None of that was handled in user space. That was all opaque and you didn't have to worry about it at all.

jandrese · on Feb 24, 2017

How is that not slow as balls when you're hammering memory? Guaranteeing atomic writes to disk for every memory access would seem to be problematic from a performance perspective.

gravypod · on Feb 24, 2017

How is paging not slow as balls? When you're done changing your data, or your time quantum is up, you are paged and saved. The only difference now is if the system dies for some reason, you come back right where you were paged off.

jandrese · on Feb 24, 2017

So it's not right where you left off, and the program state is unknown?

I'm guessing the system must effectively "checkpoint" your work regularly and sync to disk to avoid partially saving a state and corrupting the data.

This isn't terribly different from working on a memory mapped file except that it also saves the ephemeral state of the running program so it can be restored. But I still don't understand how it's not going to be horribly slow when you start your program and the first thing it does is allocate 4GB of memory for its workspace. Synching all of that data to disk is a massive undertaking, and this isn't an uncommon use case, people start virtual machines all of the time.

And paging is slow as balls. That's what this whole article is about, modern machines are unusable when they start paging.

SteveBash · on Feb 24, 2017

I've read somewhere that for BeOS demos they used to play a bunch of videos and music and then unplug/plug and after boot everything was playing again from where it left. I guess they were using the same design for process persistence.

jandrese · on Feb 24, 2017

That was just the media player remembering where it left off and restarting from that place. There may have been some metadata support in Be's filesystem to help that, but it's not technically necessary. It's about as amazing as your web browser reloading your tabs when you restart it.

gravypod · on Feb 24, 2017

If "never loose data" isn't a great selling point then I don't really know what is.

jandrese · on Feb 24, 2017

It's a trivial problem if you're willing to run your system entirely off of the disk. I mean the performance will be unbearably slow, but you'll never lose your data.

throwawayish · on Feb 24, 2017

Memory mapped files are incredibly hard to use for consistent, durable storage. I mean, so is POSIX I/O in general, but if you do MAP_SHARED you made your life even more complicated. (MAP_PRIVATE and rewrite-the-whole-thing-for-every-commit works, though, and can have some advantages).

mayoff · on Feb 24, 2017

Is iOS a modern system? Because iOS does not have swap.

> Although OS X supports a backing store, iOS does not. In iPhone applications, read-only data that is already on the disk (such as code pages) is simply removed from memory and reloaded from disk as needed. Writable data is never removed from memory by the operating system. Instead, if the amount of free memory drops below a certain threshold, the system asks the running applications to free up memory voluntarily to make room for new data. Applications that fail to free up enough memory are terminated.

https://developer.apple.com/library/content/documentation/Pe...

sevensor · on Feb 23, 2017

My desktop at work has 16G of RAM. I didn't bother setting up swap, and I find the old guidance (2x RAM) pretty absurd at this point. I've had the OOM-killer render the system unresponsive a couple of times, but only because I'd written a program that was leaking memory and I was pushing it to misbehave. If you really want virtual memory on purpose, you can still set up a memory-mapped file for your big data structure.

jerf · on Feb 23, 2017

Putting spinning-rust-backed swap on a 16G system is absurd. By the time such a system is into swap, it probably isn't trying to swap three or four megabytes, it's probably trying to swap three or four gigabytes, and that can literally take hours. Simply writing that much data to a hard drive can take a non-trivial amount of time, and swap doesn't generally just cleanly run out to the hard drive with nothing else interfering, it's a lot messier. Given the speeds of everything else involved, a 16GB RAM system trying to swap to a hard drive, even a good one to say nothing of those slow-writing SMR hard drives [1], is basically a system that has completely failed and it might as well just start OOM-killing things.

A system backed by an SSD does degrade more nicely, though. The system visibly slows down but doesn't go to outright unresponsive like it does on a hard drive. You can make a case for letting that happen and having human intervention select the processes to kill, rather than letting the kernel do it. So, even though it still isn't really useful as an extension of RAM, it can still be useful in recovering from systems that you've run yourself out of memory on. Since putting an SSD in my systems I've actually gone back to running with some swap space. Though the fact I like hibernation sometimes is also a reason I run with swap in Linux on my laptop.

[1]: Swap will almost certainly completely blow out the buffers on those things and you'll be stuck with the raw hardware write speeds pretty quickly.

amyjess · on Feb 23, 2017

> I've had the OOM-killer render the system unresponsive a couple of times

Use earlyoom instead of relying on oom-killer.

https://github.com/rfjakob/earlyoom

To quote from the description:

> The oom-killer generally has a bad reputation among Linux users. This may be part of the reason Linux invokes it only when it has absolutely no other choice. It will swap out the desktop environment, drop the whole page cache and empty every buffer before it will ultimately kill a process. At least that's what I think what it will do. I have yet to be patient enough to wait for it.

[...]

> This made people wonder if the oom-killer could be configured to step in earlier: superuser.com , unix.stackexchange.com.

> As it turns out, no, it can't. At least using the in-kernel oom killer.

And earlyoom exists to provide a better alternative to oom-killer in userspace that's much more aggressive about maintaining responsivity.

sddfd · on Feb 23, 2017

I don't have swap either. On 8GB it is pretty annoying, because a program I often use frequently overcommits and the system hangs.

Is there any way to tell the OOM killer which program to kill first?

wyldfire · on Feb 23, 2017

> Is there any way to tell the OOM killer which program to kill first?

The fun OOM analogy [1] that comes up when people propose different OOM killer designs:

> An aircraft company discovered that it was cheaper to fly its planes with less fuel on board. The planes would be lighter and use less fuel and money was saved. On rare occasions however the amount of fuel was insufficient, and the plane would crash. This problem was solved by the engineers of the company by the development of a special OOF (out-of-fuel) mechanism. In emergency cases a passenger was selected and thrown out of the plane. (When necessary, the procedure was repeated.) A large body of theory was developed and many publications were devoted to the problem of properly selecting the victim to be ejected. Should the victim be chosen at random? Or should one choose the heaviest person? Or the oldest? Should passengers pay in order not to be ejected, so that the victim would be the poorest on board? And if for example the heaviest person was chosen, should there be a special exception in case that was the pilot? Should first class passengers be exempted? Now that the OOF mechanism existed, it would be activated every now and then, and eject passengers even when there was no fuel shortage. The engineers are still studying precisely how this malfunction is caused.

[1] https://lwn.net/Articles/104185/

dice · on Feb 23, 2017

>Is there any way to tell the OOM killer which program to kill first?

From TFA:

>Without swap, the system will call the OOM when the memory is exhausted. You can prioritize which processes get killed first in configuring oom_adj_score.

The linked solution document is only available to registered RH users, though, and the name is actually oom_score_adj and not oom_adj_score.

`man 5 proc` has details, but tl;dr is set /proc/<pid>/oom_score_adj to -1000 to make a process OOM-killer-invincible.

outworlder · on Feb 23, 2017

From the article: Without swap, the system will call the OOM when the memory is exhausted. You can prioritize which processes get killed first in configuring oom_adj_score.

amyjess · on Feb 23, 2017

Use earlyoom: https://github.com/rfjakob/earlyoom

By default, it'll start killing processes when free memory drops below 10%, though you can configure the threshold. I had the same problem for years, and then I started using earlyoom and I don't have to deal with it anymore.

lisivka · on Feb 23, 2017

Use cgroups or `ulimit -m`.

jlgaddis · on Feb 23, 2017

My new workstation has 128 GB of RAM. It also has 1 GB of swap (on NVMe) that, AFAICT, has never been touched. I use it as sort of a canary that something abnormal is happening if it starts being used.

amyjess · on Feb 23, 2017

I haven't used swap in years, and more recently I've accompanied that by using earlyoom [0] to start killing processes when RAM usage rises above 90%.

Both changes have made my computers much more usable. Systems should designed to fail fast when memory is low instead of slowing down.

[0] https://github.com/rfjakob/earlyoom

ChuckMcM · on Feb 23, 2017

One of the things we used at Blekko was that swap became a 'soft' indicator that something on the system had exceeded its foot print (our machines all had 96GB of RAM so it meant something had too much RAM) and OOM-killer messages in the log was grounds for taking the machine out and rebooting it and looking for a more serious problem (like sometimes things rebooted and had 32GB less RAM).

That said, the article's recommendation was spot on in terms of making a conscious decision on how you want your system to behave when its coming close to running out of memory. Large swap space was originally the way you got those things that were too big to fit in memory to run, and now they are a way to essentially batch process very large data sets.

mixedbit · on Feb 23, 2017

If Linux has no swap, it doesn't quickly and efficiently kill processes when memory is exhausted. Instead it first removes executable code from RAM and reads it back from disk when needed. This is because without swap executable code is the only thing in RAM that is duplicated on disk and can be removed. This makes the system completely frozen and unusable.

rcxdude · on Feb 24, 2017

This is my experience too. I used to run my desktop without swap, but found that the experience when running out of memory was even worse than with swap. Also there appears to be enough memory which isn't actually used frequently that it gives a bit more memory headroom (I will still manage to use up 32GB of RAM).

phire · on Feb 23, 2017

Last time I tried running a linux system with zero swap, I ran into huge issues.

It would never actually hit the OoM killer, instead it would just lock up while it still technically had a few hundred mb of memory free.

From what I can tell, it was stuck in a loop evicting something from cache and then immediately pulling it back in from disk. Everything was technically still running, but the ui wasn't responsive enough for me to even kill a program.

Simply adding 200mb of swap would change the behaviour enough that the OoM killer would eventually run.

phkahler · on Feb 23, 2017

I never understood the rule of thumb where swap space was proportional to the amount of physical RAM. It seem to me it should be the size of your largest expected allocation (system wide) minus the amount of physical RAM or something like that. If you had a nicely configured system and took out half the RAM it doesn't make sense that you'd want less swap space.

Someone · on Feb 23, 2017

A system that has way more swap than RAM will run out of 'performance is acceptable' way before it runs out of memory.

That was different in the early days, but that was because people accepted worse performance (GC that stops the world for seconds can be better than no GC, even when running a GUI).

Certainly nowadays, if you take out half the RAM, you will want to take out half the processes, too.

ordu · on Feb 23, 2017

But you choose amount of RAM dependant on maximum memory usage. Therefore swap space (being proportional to RAM) becomes dependant on largest expected allocation also. It wouldn't be wise to build system with 2Gb of RAM and 4Gb of swap space, when you need 6Gb of memory at peaks: such a system would be slooow. It may be not wise to buy 8Gb of RAM when 5Gb is the maximum that might be needed.

cbhl · on Feb 23, 2017

I think it made a lot more sense in the mid-90s, where a system would have 32 MB of RAM and people read the RAM requirements of the software they'd buy. So the size of your largest expected allocation was proportional to the size of RAM only because if you had more RAM you'd run more RAM-intensive software.

Now, desktops can have 32 GB of RAM, but everyone just uses it to run Chrome.

fb03 · on Feb 23, 2017

> Now, desktops can have 32 GB of RAM, but everyone just uses it to run Chrome.

.... which will happily chew away your 32 GB of RAM if you let it run for enough time :)

CGamesPlay · on Feb 23, 2017

Chrome, Slack, Spotify and Atom, which each bundle their own Chrome. On my computer right now Chrome is using about 1.5 GB of RAM (on Darwin, much of it is compressed). Slack is using 0.75 GB of RAM, Spotify is using another 0.5.

We write very memory-hungry software to make up for our copious amount of RAM.

NeutronBoy · on Feb 23, 2017

It makes supporting hibernate a lot easier - your system can just dump the contents of memory to the swap partition.

AckSyn · on Feb 23, 2017

The rule made sense when your system had fewer than 128-256 MB of ram. These days it doesn't.

jedberg · on Feb 23, 2017

My feeling on swap is this:

1) If you're ok with one machine dropping out of your system, you don't need swap.

2) You should never build a system where losing a single machine is a problem.

3) Therefore, you should never need swap

4) Perhaps there is an exception for a desktop machine, since it's doesn't fit rule 2.

galdosdi · on Feb 23, 2017

Tend to agree.

A bit of a side ramble: Unfortunately, sometimes regarding rule 2, you already have a system where losing a single machine is a problem, and it will take time and resources to improve or replace it to the point where losing a single machine isn't a problem, so "in the meantime" you have to accept and support this.

Also, sometimes "the meantime" is very long. :-(

Also, by the time the system is improved to be more resilient, maybe you'll be working somewhere else or on something else, and, presto, you'll uncover some other horrible legacy system in your dependency chain that isn't resilient either. It seems as if at every organization that has had computers for long enough, there is an infinite supply of legacy systems.

Point being unless you only work with brand new things that themselves only work with brand new things, you can't get out of getting decent at managing services that aren't properly "any single machine can disappear" resilient

jedberg · on Feb 24, 2017

Sure, dealing with legacy systems might mean messing with swap.

However, as pointed out elsewhere, if you're hitting swap your performance will be so bad you might as well have lost the machine.

perlgeek · on Feb 23, 2017

Doesn't that risk cascading failures?

A cluster of a few machines experiences a bunch of requests that trigger pathological memory usage. One machine OOMs, drops out. Now the rest of the cluster has to take more load, needs more memory, and increases the likelihood that the other machines also run out of memory.

scottlamb · on Feb 23, 2017

> A cluster of a few machines experiences a bunch of requests that trigger pathological memory usage. One machine OOMs, drops out. Now the rest of the cluster has to take more load, needs more memory, and increases the likelihood that the other machines also run out of memory.

A performance cliff (as you'd inevitably see while swapping) also puts you at risk of cascading failure. It might actually be better to completely drop out if the restart time is reasonably low. This is similar to GC thrashing with Java servers: many people prefer to configure their servers to suicide when GC time is over some threshold rather than try to go on as long as possible. I'm one of those people.

Better ways to avoid cascading failure are overprovisioning (RAM is pretty cheap for servers) and load shedding / graceful degradation at the application layer, coupled with care in client-side retry logic. (Avoiding accidental capacity caches, using exponential backoff on any retry.)

zumu · on Feb 23, 2017

How do you hibernate with no swap? Do you need a special hibernation partition to write to?

gravypod · on Feb 23, 2017

The way I've done it is create a swap file and set it's swappiness to 0 so nothing actually gets paged into it. Hibernation forces the writes so it will get used on hibernate.

rkeene2 · on Feb 23, 2017

The main issue I have with not using swap in modern Linux is that it will cause the kernel to be busy for hours at a time. What happens is, as the kernel runs low on RAM, it has to spend more time searching for smaller and smaller chunks of RAM to back the request, the smaller chunks are more numerous and the "kswapd" kernel thread is responsible for this activity. As the system approaches 0 RAM free kswapd will also try to release less important pages, which takes more CPU time. Ultimately you get to the point where allocations take a really long time, and there are lots of allocations.

rini17 · on Feb 23, 2017

I recommend using swap together with zswap, and increase swappiness. Zswap is available in mainline kernel. It keeps compressed "swapped-out" pages in memory (so they are accessible quickly on page fault) and only uncompressible pages go to disk. Usually most of memory is compressible and overhead is small, so it is suitable for many workloads. See https://wiki.archlinux.org/index.php/Zswap .

vbezhenar · on Feb 24, 2017

Many applications request a memory, do some writes and don't use it in typical scenarios. So this memory is effectively wasted. If there's swap, smart operating system will swap that memory and use physical memory for more important tasks, e.g. for disk caching. So using swap allows for more efficient memory usage. E.g. on small server with 21 day uptime I have 102/1024MB memory used and 41MB is in swap, which means that I have 5% memory almost for free.

pmontra · on Feb 23, 2017

My Ubuntu development laptop has been running without swap since I bought it in early 2014. It's got 16 GB of RAM and sometimes it hits 11 GB of used memory. No problems whatsoever. If I'll start hitting the memory limit I'll buy another 16 GB. I've replaced the HDD with a SSD but I don't understand why I should use it as swap like in the old days of RAM scarcity.

kabdib · on Feb 23, 2017

It's not uncommon for us to buy rack machines with much more RAM than disk. The disk is almost uninteresting, except that we need a place for an OS to boot from, and some other legacy things.

I suspect I would be fine with much of our datacenter being diskless (and put disk -- ahem, I mean storage -- where it is needed). Local disk is a headache more often than not.

treffer · on Feb 24, 2017

I have a somehow different view on swap.

The issue is not swap or swap utilisation, the problem is worst case latency. Even for a database an OOM is usually better than a latency hit that makes it unusable slow.

As a simple example an app might start allocating and use memory in an infinite loop. How long will that take? How long will your system be unresponsive?

If you have more swap than you can write in 30s you'll most likely do it wrong (your system can be unresponsive for 60+s).

Another worst case would be allocating all memory, using it and then performing random reads throughout the memory space. Your swap to ram ratio defines how much misses and thus how much IO you are doing instead of direct memory access. This should stay way below your IO capacity.

As a result I usually try to use a small swap partition and monitor for swap-ins, not swap usage or swap out.

So that's my thinking around swap mainly due to the fact that I have seen too many servers causing issues due to swap related latency.

sitkack · on Feb 23, 2017

If u want to exec from a process using a large fraction of the physical address space on the machine, you need swap to maintain a nice amount of virtual adddess space. Needing swap and using swap are different things. How swap interacts with the process and memory subsystem is poorly understood.

xorcist · on Feb 23, 2017

I tried to run systems without swap a few years ago. That wasn't a very good idea. Most applications are very generous in their memory usage (not to mention allocation, almost always insane), and normally those pages are swapped out never to be heard from again. So without swap performance suffers since less pages are available for cache. (And in virtual environments it gets even worse since the balloon driver isn't really that great.) I didn't have time to see it through and abandoned it.

In light of that this recommendation from Red Hat is very interesting. Just a fifth of memory as swap is probably enough to get real world performance back, without getting completely stuck when something goes haywire. On large memory systems it should probably be even less.

dancek · on Feb 23, 2017

> [Recommended amount of swap] depends on the desired behaviour of the system, but configuring an amount of 20% of the RAM as swap is usually a good idea.

This sounds like good advice compared to the classic "2x RAM" guideline. Back in the HDD era when we already had around 8GB RAM I started wondering how long it would take to actually fill 16GB of swap in terms of raw write speed.

On the other hand SSDs are fast enough that swap might actually make a low-memory system feel faster.

My current Linux laptop has around the same amount of swap as RAM. Am I mistaken in thinking that suspend-to-disk saves RAM contents on the swap partition?

adultSwim · on Feb 23, 2017

What about to support hibernation? Is that possible via swap file now?

kevinmgranger · on Feb 23, 2017

It depends upon what filesystem you're writing it to, but the answer is mostly yes.

adultSwim · on Feb 23, 2017

Answer: yes

beezle · on Feb 23, 2017

I'm not sure if it is still true on Win 10, but earlier versions started to perform terribly if you had no swap on the boot partition, regardless of how much core you had.