FYI, this was part of a technical liveblog that covered all the talks at GopherCon this year in a similar fashion. If you missed the conference or are looking to review material, you might find https://sourcegraph.com/gophercon helpful. (Disclosure: I'm CTO of Sourcegraph—we organized the liveblog and love Go!)
The suggestion to reuse objects, rather than reallocate temporaries (e.g. inside a loop body) was intriguing. Coming from C/C++ where stack allocations are approximately 'free', I tend to scope stack variables as narrowly as possible for readability and to help the compiler break dependencies. This is an interesting paradigm change that I wouldn't have expected from Go.
Most generational GCs will work better if you don't reuse objects - reuse extends lifetime and increases probability of promotion to an old generation, slower to scan. (Go's GC is not generational today, AFAIK.) I'd take any recommendations along those lines as last resort techniques to reduce pressure on critical path loops. And a more sophisticated runtime may allocate such temporaries in registers or on the stack.
In other words, you're tuning to today's runtime if you reuse heavily. Those techniques may not age well, and will harm readability and maintainability.
It isn't, and will probably never be. The Go runtime developers have actually implemented and tested a generational GC, but found that it's not useful, and even harmful at times, to the goal of having fast, low-latency, concurrent garbage collector. Mostly because most short-lived objects are allocated on stack, and the escape analysis is getting better with each release.
The generational GC that the Go team implemented does not offer one of the primary benefits of generational GC: bump allocation in the nursery. Without that benefit, it's an unfair comparison.
This benefit is directly relevant to this thread, because bump allocation in the nursery makes the allocation fast path somewhere on the order of 6 instructions.
As far as I understand this Go will always try to keep objects on the stack because it is practically free (just as in C/C++). However, if the object escapes to the heap then allocation gets expensive (just as in any other language) and it can be beneficial to reuse objects instead of allocating new ones for every iteration.
Allocation is not expensive in every other language. In Java HotSpot, JavaScript V8/SpiderMonkey, etc. it is somewhere on the order of 6 instructions. That is because those language implementations use generational garbage collectors with bump allocation in the nursery.
Unneccessary allocation is expensive in all languages. Even if the allocation is instantanious, it creates more work for the GC and thus has a performance impact on the program.
The Go dev team tried multiple alternative approaches as described here: https://blog.golang.org/ismmkeynote and the generational GC didn't compare so well against the current GC.
They did not compare a generational GC with bump allocation in the nursery to their current GC. Without that, it's not a fair comparison.
Furthermore, there isn't much of a relevant difference between stack allocated and nursery allocated objects, because they have to be scanned either way--either as roots or via a Cheney scan. The difference is only in sweeping, which is incredibly fast for nursery objects.
What's really important about generational GC is that heap allocation becomes nearly as cheap as stack allocation. That's a game changer.
There must have been a very good reason they tested the generational GC as they did.
Your comparison of heap allocation generational GC with stack allocation is not correct. Yes, allocating in the nursery is very fast, a couple of cycles only. The stack frame gets allocated once per function call. Yes, when a GC runs, you have to scan the whole heap, including the stack.
But what you are completely missing is, that allocating on the heap eventually triggers a GC, which takes cpu resources to perform. After a collection of the nursery, all surviving objects are promoted to the older generations. This promotion not only takes work, but grows the older generations which are more expensive to GC. So, while heap allocation with a generational GC is very cheap, it is not free. A large allocation count causes more frequent GC runs and objects might be promoted to older generations prematurly. As a consequence, a program that allocates less will perform better. Avoiding a high amount of heap allocations is a good way to increase your programs performance, be it by doing stack allocation, or by reusing buffers for example.
> There must have been a very good reason they tested the generational GC as they did.
The reason was that they didn't have time to implement copying GC, per the talk. That's fair as far as engineering schedules are concerned. It says nothing about how good generational GC is in general.
> But what you are completely missing is, that allocating on the heap eventually triggers a GC, which takes cpu resources to perform.
It causes a minor GC only. Those are very cheap.
Yes, there are potential add-on costs. But it's been repeatedly shown that with a fast generational GC, the benefit of escape analysis for garbage collection is marginal. That's why Java HotSpot took so long to implement it. The main benefit of escape analysis in HotSpot, in fact, is that it allows SROA-like optimizations like lock elision, not that it makes garbage collection faster. Generational GCs really are that good.
IMHE, generations are a nightmare to operate for high performance servers at scale because you have to balance the sizes of those heaps manually and it can change abruptly with code changes or workload fluctuation.
Go allocations are indeed costlier but the performance critical sections of applications can be profiled and optimized accordingly to remove allocations.
I'd rather have Go's amazing low GC latency and slightly higher allocation costs vs the operational nightmare from HotSpot.
Automatic management of generations has never fully worked in Java. Every new JDK version just adds more knobs. Sounds like you have a different experience?
As a heuristic it would be OK, however, there are some situations where the escape analysis algorithms don't detect certain scenarios so a linter may lead you to believe things are not escaping but in reality they really are, it's also implementation specific, so something that may have escaped in go 1.10 may no longer escape in 1.11 for example.
It's easy enough to profile and benchmark in go, so I would always treat that as the source of truth.
It would be better to spend this effort to add generational GC with a bump allocator to Go. That way memory allocation becomes faster for everyone, not just the tiny subset of people who use fancy tools.
I think you've made yourself clear. You've jumped into half a dozen threads in this post with some variation of the same comment as you do with every Go GC post. Your comment is interesting, but repeating it over and over is tiresome. Have you reached out to the Go GC team? What is their response?
There are heuristics — and they improve from version to version — but they're necessarily conservative: heap allocating something which doesn't escape is a performance hit, stack-allocating something which does breaks software.
In C++ if the class has non-trivial constructors, neither stack nor heap allocations are free. So in C++ reusing objects is still often better for performance.
EDIT: Not sure why I'm being downvoted; if this is a contentious topic, I wasn't aware. I'm certainly not trying to make a point one way or another; I only posted this hoping someone who was more knowledgeable than me could respond.
2. Go does not have "a powerful GC", it has a GC built quite specifically for it and the applications it's intended to support with low-latency, middling throughput, and the assumption that allocations are long-lived (because short-lived objects don't escape and thus don't get heap-allocated)
3. Which is the issue of providing generic GC strategies: the type of GC you want and how you want to tune it is very much a factor of the language's semantics and its intended use cases, hence the official JVM shipping with something like half a dozen different GCs, and third-party JVMs having their own with their own tradeoffs
1. I'm aware of this, but it seems LLVM's built-in GC is not a concurrent collector, which is quite a big deal.
2. The assumption that allocations are long-lived can often be made because a compiler can perform escape-analysis (and in fact LLVM could provide utilities for this).
3. Yes, I'm not saying that GoLang's GC (or an adaptation thereof) is the holy grail, just that it would be very useful for many other languages. And work in this area could help bring LLVM's suite of GCs to a competitive, world-class level.
> the type of GC you want and how you want to tune it is very much a factor of the language's semantics and its intended use cases
And when one have some (parts, modules, libraries) in your program which are best suited with different GC behaviour, things tend to become complicated. And cross-language boundaries is yet another talk.
FYI, this was part of a technical liveblog that covered all the talks at GopherCon this year in a similar fashion. If you missed the conference or are looking to review material, you might find https://sourcegraph.com/gophercon helpful. (Disclosure: I'm CTO of Sourcegraph—we organized the liveblog and love Go!)