A deep dive into the Go memory allocator and garbage collector

beyang · on Aug 31, 2018

GopherCon liveblogger here. The room was jampacked for this talk and it did not disappoint. A very detailed tour of how memory allocation and GC works in Go. I learned a ton. Eben (https://twitter.com/_emfree_) has also posted his slides here: https://speakerdeck.com/emfree/allocator-wrestling.

FYI, this was part of a technical liveblog that covered all the talks at GopherCon this year in a similar fashion. If you missed the conference or are looking to review material, you might find https://sourcegraph.com/gophercon helpful. (Disclosure: I'm CTO of Sourcegraph—we organized the liveblog and love Go!)

movedx · on Aug 31, 2018

Thanks a bunch mate. Appreciated. Great talk. Very interesting. Now if you'll excuse me, I have pointers in my structs to squash...

presscast · on Aug 31, 2018

Do you know if anyone filmed the talk?

secure · on Aug 31, 2018

All talks were recorded and will be published in a few weeks.

sephamorr · on Aug 31, 2018

The suggestion to reuse objects, rather than reallocate temporaries (e.g. inside a loop body) was intriguing. Coming from C/C++ where stack allocations are approximately 'free', I tend to scope stack variables as narrowly as possible for readability and to help the compiler break dependencies. This is an interesting paradigm change that I wouldn't have expected from Go.

barrkel · on Aug 31, 2018

Most generational GCs will work better if you don't reuse objects - reuse extends lifetime and increases probability of promotion to an old generation, slower to scan. (Go's GC is not generational today, AFAIK.) I'd take any recommendations along those lines as last resort techniques to reduce pressure on critical path loops. And a more sophisticated runtime may allocate such temporaries in registers or on the stack.

In other words, you're tuning to today's runtime if you reuse heavily. Those techniques may not age well, and will harm readability and maintainability.

ainar-g · on Aug 31, 2018

>Go's GC is not generational today, AFAIK.

It isn't, and will probably never be. The Go runtime developers have actually implemented and tested a generational GC, but found that it's not useful, and even harmful at times, to the goal of having fast, low-latency, concurrent garbage collector. Mostly because most short-lived objects are allocated on stack, and the escape analysis is getting better with each release.

See https://blog.golang.org/ismmkeynote.

pcwalton · on Aug 31, 2018

The generational GC that the Go team implemented does not offer one of the primary benefits of generational GC: bump allocation in the nursery. Without that benefit, it's an unfair comparison.

This benefit is directly relevant to this thread, because bump allocation in the nursery makes the allocation fast path somewhere on the order of 6 instructions.

mwuertinger · on Aug 31, 2018

As far as I understand this Go will always try to keep objects on the stack because it is practically free (just as in C/C++). However, if the object escapes to the heap then allocation gets expensive (just as in any other language) and it can be beneficial to reuse objects instead of allocating new ones for every iteration.

pcwalton · on Aug 31, 2018

Allocation is not expensive in every other language. In Java HotSpot, JavaScript V8/SpiderMonkey, etc. it is somewhere on the order of 6 instructions. That is because those language implementations use generational garbage collectors with bump allocation in the nursery.

Go should switch to this strategy as well.

_ph_ · on Aug 31, 2018

Unneccessary allocation is expensive in all languages. Even if the allocation is instantanious, it creates more work for the GC and thus has a performance impact on the program. The Go dev team tried multiple alternative approaches as described here: https://blog.golang.org/ismmkeynote and the generational GC didn't compare so well against the current GC.

pcwalton · on Aug 31, 2018

They did not compare a generational GC with bump allocation in the nursery to their current GC. Without that, it's not a fair comparison.

Furthermore, there isn't much of a relevant difference between stack allocated and nursery allocated objects, because they have to be scanned either way--either as roots or via a Cheney scan. The difference is only in sweeping, which is incredibly fast for nursery objects.

What's really important about generational GC is that heap allocation becomes nearly as cheap as stack allocation. That's a game changer.

_ph_ · on Sept 1, 2018

There must have been a very good reason they tested the generational GC as they did.

Your comparison of heap allocation generational GC with stack allocation is not correct. Yes, allocating in the nursery is very fast, a couple of cycles only. The stack frame gets allocated once per function call. Yes, when a GC runs, you have to scan the whole heap, including the stack.

But what you are completely missing is, that allocating on the heap eventually triggers a GC, which takes cpu resources to perform. After a collection of the nursery, all surviving objects are promoted to the older generations. This promotion not only takes work, but grows the older generations which are more expensive to GC. So, while heap allocation with a generational GC is very cheap, it is not free. A large allocation count causes more frequent GC runs and objects might be promoted to older generations prematurly. As a consequence, a program that allocates less will perform better. Avoiding a high amount of heap allocations is a good way to increase your programs performance, be it by doing stack allocation, or by reusing buffers for example.

pcwalton · on Sept 1, 2018

> There must have been a very good reason they tested the generational GC as they did.

The reason was that they didn't have time to implement copying GC, per the talk. That's fair as far as engineering schedules are concerned. It says nothing about how good generational GC is in general.

> But what you are completely missing is, that allocating on the heap eventually triggers a GC, which takes cpu resources to perform.

It causes a minor GC only. Those are very cheap.

Yes, there are potential add-on costs. But it's been repeatedly shown that with a fast generational GC, the benefit of escape analysis for garbage collection is marginal. That's why Java HotSpot took so long to implement it. The main benefit of escape analysis in HotSpot, in fact, is that it allows SROA-like optimizations like lock elision, not that it makes garbage collection faster. Generational GCs really are that good.

cetico · on Sept 6, 2018

IMHE, generations are a nightmare to operate for high performance servers at scale because you have to balance the sizes of those heaps manually and it can change abruptly with code changes or workload fluctuation.

Go allocations are indeed costlier but the performance critical sections of applications can be profiled and optimized accordingly to remove allocations.

I'd rather have Go's amazing low GC latency and slightly higher allocation costs vs the operational nightmare from HotSpot.

Automatic management of generations has never fully worked in Java. Every new JDK version just adds more knobs. Sounds like you have a different experience?

HohPum1l · on Aug 31, 2018

Ideally it would still stack-allocate where beneficial and only materialize the object once it escapes to the heap, at least that's what graalvm does

presscast · on Aug 31, 2018

>However, if the object escapes to the heap

Can this be detected statically?

mwuertinger · on Aug 31, 2018

Yes this is decided at compile time. If you build your program with `-gcflags '-m'` the compiler will print out detailed escape analysis information. Here's more information: http://www.agardner.me/golang/garbage/collection/gc/escape/a...

presscast · on Aug 31, 2018

Nice! So this could presumably be built into a linter (or similar) and be flagged in the editor ... interesting...

jimmy1 · on Aug 31, 2018

As a heuristic it would be OK, however, there are some situations where the escape analysis algorithms don't detect certain scenarios so a linter may lead you to believe things are not escaping but in reality they really are, it's also implementation specific, so something that may have escaped in go 1.10 may no longer escape in 1.11 for example.

It's easy enough to profile and benchmark in go, so I would always treat that as the source of truth.

pcwalton · on Aug 31, 2018

It would be better to spend this effort to add generational GC with a bump allocator to Go. That way memory allocation becomes faster for everyone, not just the tiny subset of people who use fancy tools.

weberc2 · on Aug 31, 2018

I think you've made yourself clear. You've jumped into half a dozen threads in this post with some variation of the same comment as you do with every Go GC post. Your comment is interesting, but repeating it over and over is tiresome. Have you reached out to the Go GC team? What is their response?

masklinn · on Aug 31, 2018

IIRC you can't really flag an object as normally escaping so it would be very noisy.

IshKebab · on Aug 31, 2018

You could do it with a subtle colouring scheme.

dis-sys · on Aug 31, 2018

very nice idea.

masklinn · on Aug 31, 2018

There are heuristics — and they improve from version to version — but they're necessarily conservative: heap allocating something which doesn't escape is a performance hit, stack-allocating something which does breaks software.

javier2 · on Aug 31, 2018

The compiler can in certain conditions prove that a value does not escape to the heap.

Ergo, this value is safe to allocate on stack.

netheril96 · on Aug 31, 2018

In C++ if the class has non-trivial constructors, neither stack nor heap allocations are free. So in C++ reusing objects is still often better for performance.

guipsp · on Aug 31, 2018

I'd argue that if your constructors/destructors are non-trivial, then reusing objects is also often non-trivial.

easytiger · on Aug 31, 2018

Its a complete failure for me of the design of the lang.

IshKebab · on Aug 31, 2018

Yeah because of this minor compromise Go doesn't work at all and is a total failure!

easytiger · on Sept 2, 2018

> Yeah because of this minor compromise Go doesn't work at all and is a total failure!

Please quote where i said this. I said:

> for me

My requirements are not your requirements. Runtime pausing execution arbitrarily = bad for me.

I do also consider it a design failure to plug GC into what was originally marketed as a systems language.

joaodlf · on Aug 31, 2018

If this topic interests you, André Carvalho did another great talk on the subject at Gophercon UK earlier this month: https://speakerdeck.com/andrestc/understanding-go-memory-all...

ebikelaw · on Aug 31, 2018

Their allocator architecture seems very similar to tcmalloc, which maybe makes sense if all those people sit together at Google.

dullgiulio · on Aug 31, 2018

Yes, it was an explicit adaptation, as documented in the sources themselves.

weberc2 · on Aug 31, 2018

I wonder if they considered jemalloc? Apparently tcmalloc does suffer from fragmentation issues over time, and I believe I've seen related issues crop up on the Go issue tracker. https://www.quora.com/Is-tcmalloc-stable-enough-for-producti...

EDIT: Not sure why I'm being downvoted; if this is a contentious topic, I wasn't aware. I'm certainly not trying to make a point one way or another; I only posted this hoping someone who was more knowledgeable than me could respond.

amelius · on Aug 31, 2018

I wish a powerful GC like the one in GoLang was built into e.g. LLVM, so other languages could profit from it as well.

masklinn · on Aug 31, 2018

1. LLVM has at several built-in GC strategies and hooks to BYO: https://llvm.org/docs/GarbageCollection.html#built-in-gc-str...

2. Go does not have "a powerful GC", it has a GC built quite specifically for it and the applications it's intended to support with low-latency, middling throughput, and the assumption that allocations are long-lived (because short-lived objects don't escape and thus don't get heap-allocated)

3. Which is the issue of providing generic GC strategies: the type of GC you want and how you want to tune it is very much a factor of the language's semantics and its intended use cases, hence the official JVM shipping with something like half a dozen different GCs, and third-party JVMs having their own with their own tradeoffs

amelius · on Aug 31, 2018

1. I'm aware of this, but it seems LLVM's built-in GC is not a concurrent collector, which is quite a big deal.

2. The assumption that allocations are long-lived can often be made because a compiler can perform escape-analysis (and in fact LLVM could provide utilities for this).

3. Yes, I'm not saying that GoLang's GC (or an adaptation thereof) is the holy grail, just that it would be very useful for many other languages. And work in this area could help bring LLVM's suite of GCs to a competitive, world-class level.

snaky · on Aug 31, 2018

> the type of GC you want and how you want to tune it is very much a factor of the language's semantics and its intended use cases

And when one have some (parts, modules, libraries) in your program which are best suited with different GC behaviour, things tend to become complicated. And cross-language boundaries is yet another talk.

M_Bakhtiari · on Aug 31, 2018

I believe Ravenbrook MPS is designed to be pluggable in such a way.

gwbas1c · on Sept 2, 2018

.Net tried to do it, but most of the languages ended up being too similar to C#. A language is, in part, defined by its memory model.

That being said, the Bohem garbage collector tends to be what languages use until they can write their own.