I think this is largely a myth perpetuated by benchmarks that were faster GC'd than with some system-default badly-tuned malloc. There are gains to be made by copying, generational collectors, and it's impossible to generalize to all workloads, but a basic tenet of optimization is, so they say, that being faster is doing less work. Garbage collectors do a lot of extra work checking the heap, and thrash the cache while they're doing it. Batching a bunch of allocs/frees isn't an advantage exclusive to GC, since any allocator could be easily extended to do the same.
And yes games especially are finely tuned to their allocation patterns. Generational collectors somewhat emulate the kinds of region allocation games do. I know I've read war stories about fighting the GC in for example C# XNA games to avoid random stuttering when collection kicks in. For games you'd want some kind of incremental background collector, but these are kind of like "sufficiently smart" compilers. They work until they hit the case where they have to fall back to stop-the-world collection.
Unless your memory patterns are horribly irregular to the point that they fragment your heap (completely avoided in e.g. console games) and you need to compact memory heavily, non-GC code will just do less work than any type of GC.
Thanks for the response, and everyone else who contributed.
The idea of copying-GC compaction was something I hadn't considered - that's a good point.
For a Game though, I still can't see that being better than a decently tuned manual setup. Not even an expert system; for the android game I'm working on currently it's been quite easy for me to batch up allocations sensibly into regions, fixed-sized pools, and some stack allocators, and I'm far from an expert. A lot can be moved into normal stack allocations (i.e. local buffers or alloca() if necessary) and then you avoid most of the slowdown. My malloc() implementation is pretty naive currently (simple linked list) but it's called so rarely and in such predictable patterns that it's not worth me improving anymore (though there are plenty of ways that I could).
I would be interested to see how well a really good JIT doing escape analysis would perform, I suspect there could be a lot of gains there.
And yes games especially are finely tuned to their allocation patterns. Generational collectors somewhat emulate the kinds of region allocation games do. I know I've read war stories about fighting the GC in for example C# XNA games to avoid random stuttering when collection kicks in. For games you'd want some kind of incremental background collector, but these are kind of like "sufficiently smart" compilers. They work until they hit the case where they have to fall back to stop-the-world collection.
Unless your memory patterns are horribly irregular to the point that they fragment your heap (completely avoided in e.g. console games) and you need to compact memory heavily, non-GC code will just do less work than any type of GC.