Yeah, I'm being unfair in naming Go & Java specifically. But these stories of 'f...

stcredzero · on July 19, 2015

The problem seems to be that no matter how you tweak GC, you will always have a class of program that it performs terribly for

For casual use, most programs can treat GC like magic, but if you are doing serious work in a language with GC, then you should learn about the GC's characteristics. That bit of due diligence and up front design effort is still often going to be tons cheaper than doing the manual memory management.

Reducing latency in exchange for throughput is the right decision for the vast majority of programs that will be written in Go. It was already a very attractive language for writing a multiplayer game server, so long as I didn't have very large heaps. (Even so, I can still support 150-250 players and 10's of thousands of entities.) With the "tweak," that limitation is much relaxed.

obstinate · on July 19, 2015

> often going to be tons cheaper than doing the manual memory management.

And on top of that, manual memory management is not free. I maintain a simple but high-throughput C++ server at Google, and tcmalloc is never less than 10-15% of our profiles.

Don't get me wrong, I'm not saying that Go is faster than C++ or ever will be. I'm just trying to counter the notion that "GC is expensive, manual memory management is near zero runtime cost."

jblow · on July 19, 2015

I bet that if someone who knew what they were doing decided to optimize that, you'd get the cost WAY down, possibly almost to zero. (If you are using std::string, that is your problem right there).

But the very important difference here is that in your case you have a choice and it is possible to optimize the cost away and to otherwise control the characteristics of wheyou pay this cost. In GC systems it is never possible to do this completely. You can only sort of kind of try to prevent GC. It's not just a difference in magnitude, it's a categorical difference.

obstinate · on July 20, 2015

Perhaps. The team is a group of seasoned veterans of high performance server engineering. But perhaps there are others who could improve on our efforts by a significant margin.

Of course we do not use std::string.

reagency · on July 19, 2015

If you really really want to, you can allocate a buffer for all your data.

obstinate · on July 20, 2015

This solves little. What do you think the system allocator is doing under the covers?

SamReidHughes · on July 20, 2015

It's doing a lot less, if you're allocating one buffer for your data instead of many.

SamReidHughes · on July 20, 2015

Just curious: Have you tried jemalloc, and what numbers did you get?

obstinate · on July 21, 2015

We haven't. Google infrastructure uses tcmalloc. Is there a reason to believe it offers a significant win?

SamReidHughes · on July 21, 2015

I'd expect similar performance but less fragmentation?, less memory used by the process if you aren't regularly calling MallocExtension::instance()->ReleaseFreeMemory() as a tcmalloc user.

The first answer at https://www.quora.com/Is-tcmalloc-stable-enough-for-producti... (by Keith Adams) is completely consistent with what I've seen. Rust went with jemalloc for some reason too.

Jweb_Guru · on July 22, 2015

IIRC jemalloc is somewhat better about releasing memory in a timely fashion, at least by default.

chetanahuja · on July 19, 2015

"That bit of due diligence and up front design effort is still often going to be tons cheaper than doing the manual memory management"

That's just a pipe dream. I say this having spent inordinate amounts of time trying to tune myriad parameters in JVM GC for large heap systems without ultimate success. What it always comes down to is, how much extra physical RAM you're willing to burn to get some sort of predictable and acceptable pauses for GC. It's usually an unacceptable amount.

stcredzero · on July 20, 2015

That's just a pipe dream. I say this having spent inordinate amounts of time trying to tune myriad parameters in JVM GC for large heap systems without ultimate success.

Patient: Doctor, it hurts when I do this!

Doctor: Don't do that!

Possibly, divide your heap into smaller pieces with their own GC? Restructure your system, such that most of your heap is persistent and exempt from GC? I don't know the details of the system you're trying to build, of course. It sounds interesting and challenging.

chetanahuja · on July 20, 2015

"Possibly, divide your heap into smaller pieces with their own GC? Restructure your system"

That's the common recommendation. (resisting calling it "pat answer"). Suffice it to say, this is not always possible. Apart from all the business related issues with rewriting a complex system from scratch, breaking up a large shared memory system into smaller, communicating processes multiplies both the software complexity (roughly by O(N^2) where N is the number of new components created) as well as hardware requirements in it's own right -- think of all the overhead of marshalling/demarshalling, communication latencies, thread managements, increased missed cache-hits because of fragmenting that nice giant cache you were hosting in that big JVM heap.

obstinate · on July 19, 2015

I'm curious how much physical ram is an unacceptable expense to you, given how cheap it is.

stcredzero · on July 19, 2015

Even the amount of RAM parceled out for virtual servers is an embarrassment of riches, provided you pay for something other than the bottom tier!

In the context of games, and other ones as well, I think there's too much attention paid to pushing the envelope and not enough to how much awesome can be had for what is readily available.

banachtarski · on July 19, 2015

> That bit of due diligence and up front design effort is still often going to be tons cheaper than doing the manual memory management.

Calling shenanigans. No it's not, unless the person doing the manual solution is a novice.

pascal_cuoq · on July 19, 2015

Despite the drastic page limit in the category I was submitting in, I made sure to include a paragraph about how GC enable sharing and how the only reasonable alternative when implementing a similar system in a non-GC language is a lot of gratuitous copying to solve ownership issues in http://frama-c.com/u3cat/download/CuoqICFP09.pdf

(The page limit was 4. Organizers only raised it to 6 after seeing submitted papers.)

I can also confirm the “bit of due diligence” part, and the fact that it's cheaper that the aggravation of not having memory management at all. In the example that I can contribute to the discussion, the due diligence amounted to two more short articles: http://cristal.inria.fr/~doligez/publications/cuoq-doligez-m... and http://blog.frama-c.com/public/unmarshal.pdf

masklinn · on July 19, 2015

> GC enable sharing and how the only reasonable alternative when implementing a similar system in a non-GC language is a lot of gratuitous copying to solve ownership issues

The solution to unclear or shared ownership is generally reference counting. There's a reason why shared_ptr is called that.

pjmlp · on July 19, 2015

With the usual set of locks, cache contention and pauses on cascade deletions of deep datastructures it brings.

masklinn · on July 20, 2015

You don't need locks to RC immutable structures, just atomics (and not even that if the system is single-threaded)

pascal_cuoq · on July 20, 2015

Reference counting is a garbage-collection system like the others (and if you are going to use a garbage-collection system, you can for many usecases do better than reference counting).

masklinn · on July 20, 2015

> Reference counting is a garbage-collection system like the others

Reference counting is a form of automated memory management which can easily be integrated and used in a manually-managed system, and can be used for a specific subset of the in-memory structures (again see shared_ptr). Not so for more complex garbage collection systems which tend to interact badly with manual or ownership-based memory management. Putting the lie to your assertion that the only way to implement sharing in a non-GC language is "gratuitous copying".

pascal_cuoq · on July 20, 2015

Yes, it's a shame that you were not a reviewer, mid-2009, of my article published in September 2009.

stcredzero · on July 19, 2015

It's not the writing of manual memory management in the usual case/happy path that's the problem. It's the very occasional mistake and the debugging time involved. (Though to be fair, automated static analysis tools have taken great strides, and this is not as big a problem as it used to be.)

What GC often gets you is a program that doesn't crash but instead has performance problems, but these are often more easily profiled and found and less severe than a crash. (Manual memory management isn't immune from the same performance problems in any case.)

In other words, GC gets you to "Step 1 -- Get it Correct" faster so you can play with running code faster. The cost/benefit may not fit your situation. In that case, use a different tool.

pcwalton · on July 19, 2015

> I wonder when we'll see a further GC update that trades latency for throughput...

This GC update in Go already trades latency for throughput, because of the added write barrier.

There is no free lunch in GC. Most features that reduce latency reduce throughput. For example, Azul C4 has lower throughput than HotSpot's GC does.

xenadu02 · on July 19, 2015

I hope you realize that malloc is far from free in a non-GC world right? (In a GC world allocating is just moving a pointer forward.) You pay the cost somewhere.

The CLR has also done a lot of GC work to enable concurrent GC, thread-local heaps, and "zero pause" (in reality extremely low constant time pauses).

The only way to avoid paying the cost for managing memory is to allocate everything you need once and never release it.

physguy1123 · on July 19, 2015

I hope you realize that stack allocation can replace a lot of allocation that would be done by a GC? And that having control over memory layout can lend itself to better performance? And that naively mallocing everywhere is not the only or fastest way to manually manage memory, and sometimes isn't even the easiest.

giovannibajo1 · on July 19, 2015

Go also has stack allocations for objects based on escape analysis; basically, if the compiler can prove that a variable doesn't escape, it is allocated on the heap, otherwise on the heap. Improvement on escape analysis in the compiler thus reduce also the heap size by allocating more things onto the stack.

tokenrove · on July 19, 2015

Many GC'd languages also have stack allocation (see dynamic-extent in Common Lisp, for example); when talking about GC vs malloc (already an over-simplified dichotomy), we should be talking about heap allocations of indefinite extent.

pjmlp · on July 19, 2015

Many GC languages have stack and global static memory allocation as well.

Go being one of them.

Others, Oberon family of languages, Modula-3, D, Eiffel and even .NET to a certain extent.

Having managed heap doesn't mean other allocation types aren't available.

joosters · on July 19, 2015

Too true! I've written a couple of different mallocs before, and I'd recommend it as a project to anyone who thinks malloc() is just a simple, lightweight operation.

It's not an either/or choice though, picking malloc or GC. There is a whole spectrum of allocation styles you can do that might be better for a particular application. For example, a server could use per-request memory pools, which effectively can turn related mallocs into a 'move the pointer forward' operation and the whole lot can be free()d together.

I'm not saying GC is worthless. I just have a distaste for GC because it doesn't truly deliver on the promise of removing worries about memory management. You still pay the cost and can be tripped up by nasty GC performance. Even worse, the garbage collector behaviour can change between language versions and a well-tested application can suddenly hit dire performance problems. Once you have to consider GC problems, IMO you might well be better off doing old fashioned app-controlled memory allocation.

strictfp · on July 19, 2015

"malloc is far from free". Hehe. Pun intended?

I always thought that malloc was further from free with GC :P

Manishearth · on July 19, 2015

> In a GC world allocating is just moving a pointer forward.

which is usually for short lived objects only, and in a non-GC world these get put on the stack anyway.