In many cases you can trade speed for memory or vice versa. For example, a good garbage collector provides better performance than most manual allocators, but needs two or three times as much memory. Likewise, use dynamic programming and cache values extensively.
How do good garbage collectors provide better performance than manual allocators? I'll admit to being fairly under-educated when it comes to garbage collectors, but I don't see how they would improve the performance of manual allocators. Are they faster at the actual allocation, or are they better at layout and cache coherency of the data allocated? I would have thought it would be incredibly difficult to match the efficiency of good manual allocation strategies.
Disclaimer: I work in games, where controlling and managing memory allocations (particularly heap allocations) is critical to maintain low frame times, which is why I'm interested in this.
I think this is largely a myth perpetuated by benchmarks that were faster GC'd than with some system-default badly-tuned malloc. There are gains to be made by copying, generational collectors, and it's impossible to generalize to all workloads, but a basic tenet of optimization is, so they say, that being faster is doing less work. Garbage collectors do a lot of extra work checking the heap, and thrash the cache while they're doing it. Batching a bunch of allocs/frees isn't an advantage exclusive to GC, since any allocator could be easily extended to do the same.
And yes games especially are finely tuned to their allocation patterns. Generational collectors somewhat emulate the kinds of region allocation games do. I know I've read war stories about fighting the GC in for example C# XNA games to avoid random stuttering when collection kicks in. For games you'd want some kind of incremental background collector, but these are kind of like "sufficiently smart" compilers. They work until they hit the case where they have to fall back to stop-the-world collection.
Unless your memory patterns are horribly irregular to the point that they fragment your heap (completely avoided in e.g. console games) and you need to compact memory heavily, non-GC code will just do less work than any type of GC.
Thanks for the response, and everyone else who contributed.
The idea of copying-GC compaction was something I hadn't considered - that's a good point.
For a Game though, I still can't see that being better than a decently tuned manual setup. Not even an expert system; for the android game I'm working on currently it's been quite easy for me to batch up allocations sensibly into regions, fixed-sized pools, and some stack allocators, and I'm far from an expert. A lot can be moved into normal stack allocations (i.e. local buffers or alloca() if necessary) and then you avoid most of the slowdown. My malloc() implementation is pretty naive currently (simple linked list) but it's called so rarely and in such predictable patterns that it's not worth me improving anymore (though there are plenty of ways that I could).
I would be interested to see how well a really good JIT doing escape analysis would perform, I suspect there could be a lot of gains there.
Garbage collected systems typically implement allocation with a simple pointer bump. This is possible because values are moved in memory by the garbage collector, updating references automatically. You can then compact all the empty space when collecting garbage, making allocation easy.
This is obviously faster than malloc which is what people compare with when they say allocation is faster with a garbage collector. Collecting the garbage, i.e. de-allocation, can be more expensive though, since it might require scanning large parts of the heap.
Since games generally use region based allocators , the performance gain is probably very small there. If you make lots of calls to malloc, then the gain would be larger.
But you pay for it on the back side when it comes time to, heh, collect. There was a paper which suggested it takes five times as much RAM for a GC'd program to equal the coorespondng manual-allocator program in terms of performance: Hertz and Berger's "Quantifying the Performance of Garbage Collection vs. Explicit Memory Management".
If you care about performance you will avoid the garbage collector like the plague. Best to write it in C, C++, etc.
But you don't actually save that much CPU and it's certainly not worth having to spend five times your working-set size just to equal explicitly-managed memory in performance terms.
Seriously, if you want cheap allocations use a freaking allocation pool. They're not that hard to implement in C++.
Between malloc and garbage collection is obstack. Allocate by pointer bump and free a complete stack at once. For example, used in compilers for AST nodes etc.
For example, an advanced garbage collector can group the remaining allocations together, reducing the fragmentation and potentially increasing cache hits.
For one thing, it'll make it a lot more feasible to build the linux kernel on the raspberry pi itself. Its already possible, but with the extra RAM, those of us who want to do such things are going to be happy with the doubling of the distcc pool ..
But I have no idea what I'd use the additional 256 MB for, in my experience the bottleneck so far has always been the CPU.