The easy answer is C extensions. PyPy does not support C extensions, and thereby does not support a wide range of native, accelerated libraries. Facebook infra is heavily dependent on wrapping C/C++ libraries (like Thrift) for use in many languages, and not having C extensions (or Cython) would cut off a large portion of our shared codebase.
The more important question is whether PyPy's GC uses intrusive data structures to maintain the metadata for each object. If it does the same thing as CPython, just faster, then the process would run out of memory faster as the GC writing to COW pages converts them into owned memory.
As the article says, reference counting was not the problem. They got their improvements by disabling the garbage collector sweeps, but reference counting was still being used.
My point was that CPython does reference counting, resulting in the Copy-On-Read behavior described in the article. That isn't something that can be easily changed, not without writing a whole new CPython system (or switching to another Python implementation). Their solution to the problem was the cleanest, under the circumstances.
Most tracing GC solutions would run into the same problem. A common implementation strategy of the classic three-color collector puts the grey bit in the object header (and often also the white/black bit). A two finger (Cheney-style) collector avoids an explicit grey bit, but moves objects around, which also triggers CoW of shared pages.
If you want CoW-friendly GC, you need to move your color bits / reference counts to be all together in the headers of your GC arenas instead of being in the object headers. That way, your high mutation bits are all together in a small number of pages. Those pages with the color bits/counters end up not being shared across processes, but at least they're all packed together to minimize the number of affected pages.
The article goes on to point out that their theory about copy-on-read being the problem didn't pan out, because when they disabled refcounting on code objects shared memory usage didn't change. It then goes on to identify the real culprit, which was GC.
So, no, this article is not an excellent argument against reference counting.
Copy on read doesn't even make sense, since a copy is a read.
Most problems people ascribe to reference counting have way more to do with excessive memory allocation and instead of fixing that problem and getting massive speedups of 7x or more, people try to fiddle with memory management schemes to gain 5% here or there at the expense of large unpredictable pauses.
Is python a good arena for any language features? Just as I wouldn't use C++ to show off garbage collection (Conservative M+S), Python seems like a bad place to show off reference counting because the compiler can't spend a practically infinite amount of time on data flow analysis to elide RC ops.
I'd say it was probably the optimal solution, allowing them to scale more or less gracefully. It might not be the most elegant, but it definitely works, and at that scale it's an engineering feat.
I get why people hate C, but once you get used to a couple of good libraries things get much easier - we're not in the 90s anymore and there's plenty to choose from, with some of them having excellent quality code. And especially in cases like these when it turns out you can get some benefits from delayed freeing of memory C's manual memory management is an asset.