Dismissing Python Garbage Collection (2017)

bluehark · on Dec 17, 2020

Follow-up blog post a year later: https://instagram-engineering.com/copy-on-write-friendly-pyt...

lolc · on Dec 17, 2020

It's nice to see a hack turn into an API extension that addresses the underlying problem.

nayuki · on Dec 18, 2020

Only half similar, but Java has a no-collection memory management strategy called "Epsilon GC": https://blogs.oracle.com/javamagazine/epsilon-the-jdks-do-no...

jsnell · on Dec 17, 2020

(2017)

Original discussion: https://news.ycombinator.com/item?id=13421464

I'd be curious to know if this hack has survived for four years.

rurban · on Dec 19, 2020

Now just imagine if someone would replace the horrible refcnt with a real copying GC. At least 2x faster I would assume.

ogogmad · on Dec 17, 2020

Why couldn't they use PyPy? The GC for that has higher throughput.

amethyst · on Dec 18, 2020

The easy answer is C extensions. PyPy does not support C extensions, and thereby does not support a wide range of native, accelerated libraries. Facebook infra is heavily dependent on wrapping C/C++ libraries (like Thrift) for use in many languages, and not having C extensions (or Cython) would cut off a large portion of our shared codebase.

Doxin · on Dec 18, 2020

Pypy does support C extensions, but it has to somewhat emulate the cpython API so it's rather slow and doesn't work for 100% of modules.

If your C library uses FFI or Ctypes it'll work perfectly fine under pypy.

yorwba · on Dec 17, 2020

The more important question is whether PyPy's GC uses intrusive data structures to maintain the metadata for each object. If it does the same thing as CPython, just faster, then the process would run out of memory faster as the GC writing to COW pages converts them into owned memory.

genjipress · on Dec 17, 2020

(2017)

dang · on Dec 17, 2020

Added. Thanks!

vincent-manis · on Dec 17, 2020

This article is an excellent argument against reference counting.

joosters · on Dec 17, 2020

As the article says, reference counting was not the problem. They got their improvements by disabling the garbage collector sweeps, but reference counting was still being used.

vincent-manis · on Dec 18, 2020

My point was that CPython does reference counting, resulting in the Copy-On-Read behavior described in the article. That isn't something that can be easily changed, not without writing a whole new CPython system (or switching to another Python implementation). Their solution to the problem was the cleanest, under the circumstances.

KMag · on Dec 18, 2020

Most tracing GC solutions would run into the same problem. A common implementation strategy of the classic three-color collector puts the grey bit in the object header (and often also the white/black bit). A two finger (Cheney-style) collector avoids an explicit grey bit, but moves objects around, which also triggers CoW of shared pages.

If you want CoW-friendly GC, you need to move your color bits / reference counts to be all together in the headers of your GC arenas instead of being in the object headers. That way, your high mutation bits are all together in a small number of pages. Those pages with the color bits/counters end up not being shared across processes, but at least they're all packed together to minimize the number of affected pages.

wzdd · on Dec 18, 2020

The article goes on to point out that their theory about copy-on-read being the problem didn't pan out, because when they disabled refcounting on code objects shared memory usage didn't change. It then goes on to identify the real culprit, which was GC.

So, no, this article is not an excellent argument against reference counting.

CyberDildonics · on Dec 18, 2020

Copy on read doesn't even make sense, since a copy is a read.

Most problems people ascribe to reference counting have way more to do with excessive memory allocation and instead of fixing that problem and getting massive speedups of 7x or more, people try to fiddle with memory management schemes to gain 5% here or there at the expense of large unpredictable pauses.

mhh__ · on Dec 18, 2020

Is python a good arena for any language features? Just as I wouldn't use C++ to show off garbage collection (Conservative M+S), Python seems like a bad place to show off reference counting because the compiler can't spend a practically infinite amount of time on data flow analysis to elide RC ops.

dvfjsdhgfv · on Dec 17, 2020

Or they could just have used PHP instead:

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

konjin · on Dec 17, 2020

Or C, like real men. \s

mhh__ · on Dec 18, 2020

Python honestly reminds me of C more than any other language.

I'd never really used it much before a year ago so I was quite surprised how flat most abstractions are.

I wouldn't write my website in C but equally it wouldn't need many features to be python-esque.

dvfjsdhgfv · on Dec 17, 2020

C is not ideal for this task, but last time I checked FB is run on a combination of C++ and PHP.

dharmab · on Dec 17, 2020

There's more than that. PHP, Erlang, C++, Java and a smattering of others.

konjin · on Dec 18, 2020

Yes, and they had to built their own php to c++ transpiler, then virtual machine and finally their own language.

Truly the real man solution. The only way this could have been better is if the vm for hack was written in brainfuck.

dvfjsdhgfv · on Dec 18, 2020

I'd say it was probably the optimal solution, allowing them to scale more or less gracefully. It might not be the most elegant, but it definitely works, and at that scale it's an engineering feat.

I get why people hate C, but once you get used to a couple of good libraries things get much easier - we're not in the 90s anymore and there's plenty to choose from, with some of them having excellent quality code. And especially in cases like these when it turns out you can get some benefits from delayed freeing of memory C's manual memory management is an asset.

Igelau · on Dec 17, 2020

I'm surprised this wasn't vanished into the memory hole for containing harmful terms like "master process", even if it was from way back when in 2017.