The standard state-of-the-art for garbage collection is two-space copying for the nursery, and mark-and-sweep for the tenured generation. That way you avoid copying the whole heap on every GC; you only get copying for the nursery. You may have a compaction pass for the tenured generation that runs very rarely (much more rarely than a regular minor or major GC). V8, Java, and OCaml all follow this model, and I believe .NET and Haskell do as well. Lua skips the generational part for simplicity's sake (as games care a lot more about latency than throughput); SpiderMonkey does too because Mozilla hasn't implemented generational GC yet.
If fork() performance is important to you, you probably don't want compaction for the tenured generation at all. That's totally fine; while fragmentation will accumulate over time, it's no worse than C++. You'll still have copying for the nursery, but the nursery fills up and gets recycled so often that the COW optimization isn't really helping you there anyhow.