Another aspect that plays into this, can be interactions with GC'ed languages, a...

Another aspect that plays into this, can be interactions with GC'ed languages, and particularly longer than expected pauses that appear to be caused by page faults and thrashing when in a critical GC mark phase. IIRC this can even become apparent in non-blocking sections of the GC, because the GC runs out of available segments while still running the mark and thrashing paged out memory. I do believe all of the systems I've personally observed this behaviour on predate kernel 4.0 though, and didn't have SSDs, so I'm not aware of whether it's less pathological with GCed runtimes post kernel 4.0 and whether SSDs would be fast enough to avoid the blocking. It probably also depends on the allocation rate of the application.

So I have recommended disabling swap on systems in the past, but inline with the article, these are on systems that are largely dependent on application memory, and don't benefit from the IO cache prioritizing any application pages. As the article is pointing out, this isn't a hard and fast rule, but a tuning that needs to be done depending on the application, and using cgroups to fine tune this may be a better approach depending on the use case.