Z Garbage Collector: The Next Generation

jwr · on April 24, 2023

As a Clojure programmer, I'm so happy that I'm getting all these improvements over many years. It's fashionable to complain about Java — but I don't use Java, and yet I benefit from all the fantastic work that is being done on the JVM.

jpe90 · on April 24, 2023

The excellent backwards compatibility rocks too. This weekend I wanted a library to diagram the class hierarchy of a Java project I put together.

I found a 10 year old library that worked with no issues, I was done in minutes even though I rarely use Java or Clojure so my familiarity is low. There are very few languages where I would have a similar experience.

https://github.com/stuartsierra/class-diagram

kaba0 · on April 24, 2023

I had a similar experience with some random jar visualizing some algorithm which I downloaded from a professor’s website, which just worked.

pjmlp · on April 24, 2023

That is the positive attitude, it always irks me when guest language comunities bash the platform that make their language even possible in first place.

Everything sucks, but then they gladly take the platform, the ecosystem of libraries, debuggers and what not from the host platform.

nickpeterson · on April 24, 2023

It depends, some languages have more to complain about than others. I’d argue F# on .net basically had to spend years treading water while Microsoft’s various teams figured out .net core/framework. The community feels like it lost a lot of inertia it used to have.

NeutralForest · on April 24, 2023

I feel that the latest JVM developments have really been aimed at the developers: easier syntax, nice threading support, fast garbage collector, etc. On the GUI side I know that JavaFX has also been pushing for more features and more performance to become competitive again.

bee_rider · on April 24, 2023

Is it still fashionable to bash Java? I remember the bashing of Java around 15 years or so ago, IMO justifiably because it could be slow to the point of making programs unpleasant to run.

But nowadays… I mean Java can be a bit verbose, but other than that it seems like a fine language and the JVM is pretty great. Plus we don’t have annoying Java applets in our browsers anymore (perhaps one of the main sources of legitimate Java hatred, and something so bad that we’ve intentionally all forgotten about it).

thechao · on April 24, 2023

Now I'm going to go look and see where Jython's parity is to Python.

EDIT: Only supports 2.7(x); that makes me sad.

vips7L · on April 24, 2023

Graal Python is compatible with python 3.x https://www.graalvm.org/python/

re-thc · on April 24, 2023

> Is it still fashionable to bash Java?

People do it more now rather than less.

Nowadays people run on hype and need a scapegoat.

millerm · on April 24, 2023

I say this all the time to people that say they want to use Node because they hate Java. There are so many languages that you can execute on the JVM. It’s almost unfortunate that it’s still called the JVM. It think the JVM needs a re-branding. Even if they just change the meaning of the J. I suppose that is what GraalVM is doing, but it’s an awful name and its purpose isn’t being conveyed well. The JDK should just switch to this VM and get rid of the name JVM.

These are just my opinions.

pohl · on April 24, 2023

I haven't been in that world since Java 7. What's the startup time on the JVM these days? It used to be a pretty hefty cost, and it didn't seem like it would get much better because of the way class loading was done.

millerm · on April 24, 2023

Not sure what startup time matters for running a persistent service is, and it's obviously tied to hardware. You can get applications up and running in less than second or so (we were using Micronaut to resolve dependencies at compile time). If you use GraalVM you can compile to native code and now we're talking a few tens of milliseconds to get an application up and running (or execute and exit).

pohl · on April 24, 2023

That was always the story back then, too — the assumption that no one writes anything other than services with long uptimes. No CLI tools, no lambdas, no quick-launching GUIs. Sounds like that bias hasn't really changed over the years?

jwr · on April 25, 2023

Not sure if it's a "bias". The JVM does start up slowly. This has never been an issue for me in all the years I've been using Clojure.

If I need "scripts" with short start-up times, I'll use Babashka (GraalVM) or ClojureScript (Node). Same language, instant start-up. But then I don't expect to get good performance from those VMs for long-running services with large memory allocations.

Use the right tool for the job at hand?

kaba0 · on April 24, 2023

You pay for what you use. If you bring in a litany of libraries to connect to some endpoint in every conceivable IPC mechanism then sure, it will take some time (but let’s not forget that it would also take considerable time with native machine code). But besides very very frequently run CLI tools like ls, the problem is entirely overblown. Come on, quick-launching GUIs? I hardly even remember that being a thing.

pohl · on April 24, 2023

You pay for what you use.

Exciting, if true. So, for example, could I choose to not use garbage collection or class loading or JIT compilation and avoid paying their costs?

kaba0 · on April 24, 2023

I was quite obviously talking about loading many classes.

But.. yeah, you can disable the GC and also the JIT compiler, for whatever reason.

viraptor · on April 24, 2023

JVM still has some limitations. https://stackoverflow.com/a/2682962/31667 Those actually influence either the design or the performance of the higher level language.

trollied · on April 24, 2023

That post is 13 years old.

viraptor · on April 24, 2023

Yes it is - that doesn't make it wrong. The type erasure is still a JVM limitation.

kaba0 · on April 24, 2023

The same way is it for Haskell?

phoe-krk · on April 24, 2023

The answer is 6 years old if we count the last modification time.

bierjunge · on April 24, 2023

And the last modification was only wording, with the same 13 years old content. [0] Always check what changed in SO posts, because this practice is very common.

[0] https://stackoverflow.com/posts/2682962/revisions

phoe-krk · on April 24, 2023

TIL, thank you!

nicktelford · on April 24, 2023

The improvement in throughput is tantalising. I'd love to see a comparison to the G1GC throughput. I wonder if it's getting close enough that ZGC might some day become a candidate to replace G1 as the default collector?

soulnothing · on April 24, 2023

I recently did a deep dive on gc issues at work, and compared eden the default collector and z.

Our app was collecting pretty heavily, and an average collection time of about 50ms. With Z, and Linux huge pages it was collecting sub millisecond. This was at the cost of a bit more cpu, as z uses more threads for collection.

Using gatling for performance testing, we saw a small increase in requests per second. But the benefit was a smoother response time, with less spikes to high p9X measures.

Using flight recorder with this is a great way to get details of how the gc is performing.

Addendum: As noted below zgc uses more memory, I only found it viable on 16GB heaps or above. With it really improving performance around 32GB.

zokier · on April 24, 2023

There are some politics involved here. ZGC is competing against Shenandoah for the role of next-gen GC; ZGC is from Oracle and Shenandoah from RedHat. Because they are pretty evenly matched I find it unlikely that one would be promoted as default over the other

capableweb · on April 24, 2023

As far as I know, ZGC uses a lot memory than G1, and for as long that's the situation, it's unlikely to replace G1 as a default configuration. Another understanding I have is that ZGC is best used for large workloads, compared to G1 which is more generalized and doesn't perform as well with larger workloads.

But, I'm no JVM experts and might have understood it incorrectly. Happy to be corrected!

pwagland · on April 24, 2023

Yes, on small heaps you are still forced to use a 64bit pointer instead of a 32bit pointer, which means that your per-object overhead is higher, this means that on "normal" heap sizes, < 32Gb, you end up paying a reasonable penalty.

That said, for a lot of use cases, this is still a reasonable trade-off to make, but it probably won't be promoted to "standard" until it's better in all use cases.

tkot · on April 24, 2023

Wouldn't UseCompressedOops flag alleviate the issue for smaller heaps?

aseipp · on April 24, 2023

ZGC relies on the space of 64-bit pointer (and the size of the 64-bit virtual address space) for its pointer coloring algorithm, and so it is by definition incompatible with UseCompressedOops even at small sizes.

I suppose it's possible to fix this with a change to the design in some way. But if you have a heap smaller than 32GB, the existing G1 would probably work well too. ZGC shines much more on very large heap sizes since that's where long tail GC latency often hits hardest.

WantonQuantum · on April 24, 2023

Note that the Z Garbage Collector was released in JVM 11 so it’s about 5 years old now.

pron · on April 24, 2023

Yes, but generational ZGC (https://openjdk.org/jeps/439) has not been released yet. Possibly in JDK 21 this September.

re-thc · on April 24, 2023

Might have been easier calling it ZGC 2.0

krylon · on April 24, 2023

If it's generational, ZGC Junior would be more appropriate. ZGC II, maybe.

(SCNR)

Gravityloss · on April 24, 2023

ZGCson? Easy to pronounce too!

kaba0 · on April 24, 2023

ZGC, Gen Z

visarga · on April 24, 2023

This title is a perfect adversarial example for text classification. Contains both "Z" and "generation" in it but has nothing to do with Gen Z.

re-thc · on April 24, 2023

And Java has nothing to do with the island called Java. Javascript is not Java either. Hm.

It's fine.

capableweb · on April 24, 2023

If the logo didn't already made it clear which Java inspired the name, the island Java was/is also instrumental when it comes to growing coffee, so still, Java the language was a bit inspired by the island Java :)

vips7L · on April 24, 2023

Love ZGC's autotuning, I'm hoping for the day where we don't need -Xmx as well!

nayuki · on April 25, 2023

It seems ZGC's autotuning is following in the footsteps of Azul's proprietary C4 (Continuously Concurrent Compacting Collector) which came a decade or two earlier. Both GCs are low-latency and their main tuning option is just the heap size.

ogogmad · on April 24, 2023

Is the next generation of GCs going to use Reinforcement Learning and Machine Learning? The problem with a lot GC is that there's a currently unavoidable tradeoff between memory usage, latency and throughput.

celeritascelery · on April 24, 2023

How would machine learning help balance those trade-offs?

slavboj · on April 24, 2023

Predicting memory usage patterns can help allocation strategy - eg if you get a flood of traffic every day at 9AM you can time GC to run before the rush, and efficiently preallocate space for the objects you predict needing.

kookamamie · on April 24, 2023

[flagged]

pulse7 · on April 24, 2023

Why don't you delete your comment by yourself?

re-thc · on April 24, 2023

Waiting for garbage collection...

nielsbot · on April 24, 2023

Crazy to me that garbage collection systems are a subject of continual evolution and discussion. You'd think it would be "solved" by now... Or is that just in Java world?

_a_a_a_ · on April 24, 2023

Not an expert but bear in mind that there can't really be one single thing called a 'good' garbage collector, because the behaviour of the programs that produce the garbage can be hugely varied, and the garbage so produced can likewise be hugely varied, and how you want to recover it can depend on the platform you're running on (if embedded, you probably want reference counting the tight memory, in other circumstances you might be very happy just to leak everything and let the OS clean up at the end. Edit: yes, it's sometimes ok to leak).

As somebody says elsewhere, it's multidimensional issue and a single dimensional garbage collector can't be a brilliant match for everything.

The book "Garbage Collection" by Richard Jones, recently discussed on HN, is something I'd highly recommend you buy and read.

stingraycharles · on April 24, 2023

It’s just as “solved” as regular memory allocators are for C/C++ — it’s constantly evolving, with many competing solutions. TBB, tcmalloc, jemalloc, mimalloc, etc etc etc.

Memory management is hard, and in a world where memory sizes keep increasing, new techniques being developed is expected and perfectly normal.

kaba0 · on April 24, 2023

Hell, not even sorting algorithms are “solved”, even though they are more than half a century old.

TylerE · on April 24, 2023

Tim sort, used in python, V8, Android, and Swift, was invented in this millennium.

re-thc · on April 24, 2023

And there are yet newer and faster sorting algorithms being implemented.

vlovich123 · on April 24, 2023

Notably quadsort which if I’m remembering my etymology correctly has pdqsort, fluxsort, and glidesort as descendents

killingtime74 · on April 24, 2023

I'm not sure what "solved" would mean for GC. Can't there always be improvements in maintainability, pause duration, correctness, throughput etc. Many dimensions to continually improve on.

Unless someone comes up with a no pause, no compute power no code fully correct GC?

mike_hearn · on April 24, 2023

There are some dimensions yes, but I do wonder if GC research is starting to approach the point of being "solved" well enough that we'll see a drop in investment. Not yet, clearly, but approaching that territory.

If you look at the very latest JVMs you have three GCs which a spread over throughput-optimized (parallel), a middle ground jack of all trades (g1), and latency-optimized (zgc). They're a lot more self configuring in the past, so no matter what needs your app has it's usually just one CLI switch to get something fairly optimal.

But ZGC is going generational, at which point it will have significantly higher throughput than before. Also, all the JVM GCs can handle massive heaps these days (like, terabyte scale). So at that point the choice might become even simpler: just use ZGC all the time unless you can tolerate higher pauses and want to reclaim some CPU for the app, in which case go with parallel. And in many cases the right tradeoff between latency and performance could be inferred automatically from the frameworks and libraries that are on the classpath, meaning that for most apps even needing to make that choice could go away.

If you have a GC that doesn't pause, can handle terabyte sized heaps, has great throughput, is largely or completely self-configuring, is portable, open source, supports dozens of languages and has clean code, there are really very few metrics left to optimize after that. It seems like HotSpot is nearly there, especially when combined with GraalVM for language support.

_a_a_a_ · on April 24, 2023

> If you have a GC that doesn't pause, can handle terabyte sized heaps, has great throughput, is largely or completely self-configuring, is portable, open source, supports dozens of languages and has clean code

Yeah. If.

mike_hearn · on April 24, 2023

That's basically ZGC Generational + GraalVM. So, code that's available now albeit you'd have to compile it from version control (graalvm stable releases don't support zgc yet, released zgc isn't generational yet). Give it another year or so and that'll be shipping out of the box in ordinary JVMs.

_a_a_a_ · on April 24, 2023

No pauses and high throughput? Do you have any links to how it does this? I'd really like to know.

pron · on April 24, 2023

That's the subject of the video being discussed. More technical info here and in the links: https://openjdk.org/jeps/439

_a_a_a_ · on April 24, 2023

Appreciated.

Someone · on April 24, 2023

The hardware still evolves, so at the least, garbage collectors will have to be tuned or self-tune themselves for that.

I also think it’s unlikely that a GC can be written that handles both terabyte sized heaps on machines with hundreds of cores and megabyte sized heaps on single-CPU systems well.

mike_hearn · on April 24, 2023

Single GC, no. Single VM, yes. HotSpot serial GC is designed for the latter use case. I forgot to mention it before but it's still there and supported, I think even selected automatically in some cases.

But there aren't so many single core systems with megabyte sized heaps anymore. Even watches are multi-core these days.

addaon · on April 24, 2023

Are there any production-ready GCs that have strictly bounded latency? I'm not aware of any, and that means that GC is still too immature for use in about 95% of the computers in my house, the deeply embedded ones.

mike_hearn · on April 24, 2023

Metronome did this for embedded hard realtime use cases. I don't know much about it. ZGC has bounded pause latency at least but it's not really meant for very small machines. Android has used a pauseless GC for a while.

kaba0 · on April 24, 2023

Strictly bounded anything (hard real-time) is an entirely different category, with very few uses for the average person - and contrary to what one might think from its name, it is not “fast” at all. It’s mostly about “calculating the trajectory of that incoming projectile should definitely happen in 0.3s”, which is many orders of magnitude more than what a modern computer would take to do 3 multiplications.

But to answer your question, there is JamaicaVM and many niche JVM implementations with hard real time GCs — but it’s simply false that a GC would be somehow a problem for most of your devices.

addaon · on April 24, 2023

Hard real-time has a huge amount of uses for the average person. I just made toast -- the software-controlled thermostat in my toaster oven has (slow) hard bounds on its sense-to-control latency to avoid dangerous overheating. Before that, I brushed my teeth -- the brushed motor controller detects a motor stall by increased current if I brush too hard and stall it, shuts down the motor to avoid overheating (with a hard real-time sense-to-actuate constraint), and beeps to inform me. Then I spent some time on HN, with a laptop with PMICs and other microcontrollers that have hard real-time constraints all over the place. By far the majority of my computer interactions so far today have been with computers that have hard real-time constraints, and I haven't even gotten in my car yet!

kanzenryu2 · on April 24, 2023

Java already has that GC, it's called Epsilon https://blogs.oracle.com/javamagazine/post/epsilon-the-jdks-...

Alifatisk · on April 24, 2023

Epsilon doesn't GC at all?

kaba0 · on April 24, 2023

I think it was a tongue in cheek reply, as it is indeed a no-code, no-pause, no-compute power GC that does eventually reclaim the garbage (at process exit).

Alifatisk · on April 24, 2023

Correct me if I am wrong but does it really GC at all if you receive an OutOfMemoryError?

kaba0 · on April 24, 2023

It’s meant for short lived apps where you simply wouldn’t get OOM either way. I don’t know about an objective definition for what a GC is, one I quite like is “gives the illusion of infinite memory”, based on that it isn’t a GC.

But there are also conservative GCs, RCs, etc, that fail different definitions also (leaving behind circular dependencies, having false positives, in reverse order), like in case of many things in CS, there isn’t an objective definition.

Alifatisk · on April 24, 2023

Thank you for the explanation, I think I get it now!

tsimionescu · on April 24, 2023

Theoretically someone could perhaps one day discover the theoretically optimal GC algorithm, and someone else could write the TeX of GCs using that algorithm, and then perhaps we'd have the final GC.

Extremely unlikely that such an algorithm exists of course, and even if it did, looking at sorting will quickly suggests that improvements will keep being found even if the best asymptotic complexity had been achieved.

ignoramous · on April 24, 2023

> Unless someone comes up with a no pause, no compute power no code fully correct GC?

Functional programming languages can be: https://www.reddit.com/r/haskell/comments/d5d13i / https://archive.is/4iyaG

matsemann · on April 24, 2023

Lots of things in CS are actually cyclic: RAM is expensive, implement swapping from HDD. RAM gets cheaper, load all data there and instead optimize your code for CPU. Data gets bigger, need to go back to swapping again. Suddenly we get SSDs and hard drive access is lots faster, invent something new but similar to the old stuff.

And so it goes, the bottleneck is always moved.

JoachimS · on April 24, 2023

And also in system design, computer architecture and engineering. T. H. Meyer and I. E. Sutherland coined this generational behaviour the "Wheel or Reincarnation" in their paper "On the Design of Display Processors." IMHO a paper that is still well worth reading:

http://cva.stanford.edu/classes/cs99s/papers/myer-sutherland...

dwaite · on April 24, 2023

Optimal memory behavior and lifetime is highly application specific, so the behavior of an automatic system likewise will be better for some applications than others.

In Java, the goal is to have something that performs reasonably well, without degenerative cases, for a wide variety of applications and services. It also has to handle cases where a monolithic code base results in a multitude of different memory behaviors. Even thinking too hard on the ideal behavior will negatively impact application code.

The research field of garbage collectors is a study of trade-offs.

A highly tuned, application-specific memory strategy will likely outperform Java, but the goal is to not require anyone to build such a thing for most software.

nayuki · on April 25, 2023

> You'd think it would be "solved" by now... Or is that just in Java world?

I'm not sure what you mean but I'll try to relate. Automatic garbage collection is a problem not just in the Java world. It's also needed in JavaScript, Python, Go, Haskell, etc.

Citation needed, but I suspect the Java Virtual Machine ecosystem has the most advanced GC algorithms in the world. Even if that's not true, I can tell you with certainty that other language environments have adopted features many years after Java did, such as JavaScript V8 introducing concurrent generational copying in year 2017 ( https://v8.dev/blog/orinoco-parallel-scavenger ). I'm sure there are examples in Go as well where their GC implemented features that Java had a decade earlier.

kaba0 · on April 25, 2023

The JVM is definitely the state of the art when it comes to GC algorithms.

sideeffffect · on April 24, 2023

GC is by no way a solved thing. Let alone in Java! Consider these developments:

* Distilling the Real Cost of Production Garbage Collectors

https://twitter.com/stevemblackburn/status/15196411576216576...

https://www.youtube.com/watch?v=OUZt0mo1xic

https://old.reddit.com/r/java/comments/udtdke/the_real_cost_...

https://old.reddit.com/r/ProgrammingLanguages/comments/udt2p...

https://news.ycombinator.com/item?id=31192261

* Low-latency, high-throughput garbage collection

https://twitter.com/stevemblackburn/status/15183861337467289...

https://www.youtube.com/watch?v=1TLmawuxHfY

https://old.reddit.com/r/ProgrammingLanguages/comments/ubp2d...

https://old.reddit.com/r/java/comments/ubgyva/lxr_a_new_java...

https://news.ycombinator.com/item?id=31153434

darthrupert · on April 24, 2023

Java is still slower than C in its memory management for many cases, so we can safely say that this is not a solved problem.

kryptiskt · on April 24, 2023

Memory management for C is not itself a solved problem, not only is there a lot of performance to squeeze out of malloc itself (the benchmarks on https://github.com/microsoft/mimalloc exemplifies the variance between the implementations), but it's up to the programmer to implement memory management in the large in an efficient way, which is not an easy task. One sure mark of a slow C program is one with a ton of mallocs and frees strewn all over.

kaba0 · on April 24, 2023

If we can know exactly where a particular object’s lifetime ends, then indeed we can write more efficient programs than possible with a GC, but that in the general case is not something we can determine at compile time (Rust’s lifetimes are a strict subset of the general case).

Anything more dynamic will require some “ad hoc, informally-specified, bug-ridden, slow implementation of a GC”.

pron · on April 24, 2023

Rust also relies on a GC for the more dynamic lifetimes in Rust programs (reference counting), it just doesn't call it a GC, and while Rust's GC has a lower footprint than tracing GCs, it has worse performance and it suffers from cycle problems. I wouldn't call it "bug ridden" (perhaps "limited" is a better term), but it is slow.

Some think that reference counting gives better performance predictability than a tracing GC, but that is no longer true, certainly compared to the generational ZGC presented here. If you could determine the time where the collection would take place with a refcounting GC, i.e. when the last reference is cleared, you wouldn't need a refcounting GC. It is unpredictable pretty much by definition.

pkolaczk · on April 24, 2023

> Rust's GC (...) has worse performance [than tracing GC]

That's an unsubstantiated claim.

While reference counting may be slower than tracing in the general case, Rust's reference counting is not the same thing as general reference counting in languages like e.g. Swift or Python. There are two reasons:

- The good majority (99%+) of objects in Rust don't need to be managed by reference counting at all.

- Rust relies heavily on move semantics to avoid updating the ref-counts, so the good majority of refcounted objects have very few (sometimes just 1) refcount updates.

> reference counting gives better performance predictability than a tracing GC, but that is no longer true, certainly compared to the generational ZGC presented here.

Where does ZGC claim nanosecond latency? But even if it did, it would be still worse, because refcounting doesn't do pauses at all. Pausing applies to pausing all application threads. Spending X nanoseconds freeing memory by one thread does not count as a pause, because the system can still make progress.

pron · on April 24, 2023

Even then updating the counter is still more work than tracing collectors do, which is nothing almost all the time.

> refcounting doesn't do pauses at all

At the cost of throughput.

BTW, I'm not comparing Java to Rust, just ZGC to Rust's GC. I am well aware that it is not used for most objects.

pkolaczk · on April 24, 2023

But reference counting is just a tiny subcomponent of Rust's GC. The majority of Rust's automatic memory management is based on affine types and static code analysis, not refcounting. You're just picking subcomponents to make Java solution look better, but what matters for developers is the whole picture.

And BTW, I wouldn't be so sure about throughput either. ZGC is capable of allocation rates of the order of only ~5 GB/s, which is way below the theoretical memory bandwidth. I bet jemalloc can do much better.

pron · on April 24, 2023

I thought I made it clear in my last comment that I'm not claiming that Java's memory management is somehow "better" than Rust's. I was merely pointing out that even Rust has a GC, and it's not a good one, but it might certainly be good enough for Rust's needs as it doesn't depend on it. Overall, obviously Rust and Java choose very different tradeoffs in their memory management strategy.

> ZGC is capable of allocation rates of the order of only ~5 GB/s

That was the non-generational ZGC. Indeed, ZGC's max throughput was well below that of ParallelGC and G1. One of the main goals of generational ZGC is to increase the throughput, and as you can see in the video, they got a 4x increase in throughput.

kaba0 · on April 24, 2023

The statement was that ZGC (or any JVM GC for that matter) is better than Rust’s GC, which is just naive ref counting. This is true - if we were to create a program that allocates the same amount in both implementations, Java would win by a huge margin.

pkolaczk · on April 24, 2023

Have you ever tested it by yourself? I tested and no, it didn't win by a huge margin. Malloc+free was only a tad slower than the default Java GC new (stop the world, parallel) but the default GC is much faster than G1 and ZGC so I would not be surprised if malloc could beat those modern ones at allocation rate benchmark.

kaba0 · on April 24, 2023

Do you have a public benchmark for that? It’s quite hard to write a benchmark that actually measures what we are interested in. Not doubting you, I would be interested only.

pkolaczk · on April 24, 2023

This has been on my list of next blog posts for so long. But unfortunately no, not yet.

kaba0 · on April 24, 2023

That last line was just a play on this ( https://en.m.wikipedia.org/wiki/Greenspun%27s_tenth_rule ), but you are right of course.

yw3410 · on April 24, 2023

It does give better performance predictability! Decrementing a refcount may either be a no-op or expensive but crucially the op can happen only during that particular call, nowhere else.

pron · on April 24, 2023

It does not. Maintaining the counter requires more work than tracing GCs do, and you don't know in advance when the last reference is dropped. True, you get more promptness in deallocation (not the same as predictability), which reduces the footprint but comes at the expense of compaction, the lack of which in most refcounting GCs also adds unpredictability. In contrast, in the common case, allocation with a compacting, tracing GC is a pointer bump and there's no overhead at all to copying references or other memory operations. During collection cycles, GC barriers are triggered, which add overheard to things like mutating pointers or reading pointers from objects in the heap.

There's a great paper (linked from this post https://www.cs.cornell.edu/courses/cs6120/2019fa/blog/unifie...) showing how refcounting GCs can match the performance of tracing GCs -- in fact, how they can each have the properties of the other -- but that requires sophistication in the implementation of refcounting GCs that most of them don't have. So it is true that a sophisticated refcounting GC and a sophisticated tracing GC can exhibit similar behaviour, but while OpenJDK's tracing GCs are very sophisticated (especially ZGC and G1), most refcounting GCs in the wild are quite unsophisticated. It may certainly be true that unsophisticated refcounting GCs have more desirable properties than unsophisticated tracing GCs, but in practice the choice offered is between unsophisticated refcounting GCs and very sophisticated tracing GCs. The latter beat the former on most counts except for footprint overhead.

pkolaczk · on April 24, 2023

Ah, it's that topic again. The whole point "but refcounting requires more maintenance per object than GC" is totally moot when there are only 0.01% of objects being refcounted (in Rust or C++) vs 100% objects traced by GC in Java.

pron · on April 24, 2023

I never claimed otherwise. Java's and Rust's overall memory management strategies are far too different to compare. I was talking specifically about refcounting vs tracing.

yw3410 · on April 25, 2023

But what you call promptness is predictability from the PoV of the programmer, no?

I don't think I've mentioned performance anywhere; thanks for the paper I'll take a look.

pron · on April 25, 2023

> But what you call promptness is predictability from the PoV of the programmer, no?

I don't think so. You don't know when memory will be freed or how long that would take only that it will be freed as soon as possible. The end result is lower footprint but that's about it.

pkolaczk · on April 24, 2023

Rust's lifetimes is a strict subset of the general case, but it covers a good majority of use-cases with affine types - which means trivial allocation and deallocation. The number of times I had to use lifetime specifiers or dynamic lifetimes (RefCell/Rc/Arc) is extremely low, so low that it is not even a blip on the profile. Also, unlike most GC languages, Rust favors stack allocation over heap allocation, which makes most Rust program allocation rates several orders of magnitude lower than any Java program I saw (at the expense of more inter-stack copying, but that is extremely cheap because the top of the stack is almost always in L1 cache).

kaba0 · on April 24, 2023

In my experience most applications do need an escape hatch from a strictly tree-like lifetime pattern, it really is limiting.

Object layout is the primary reason for the performance differences between low-level languages and manages ones — Java uses a thread local allocation buffer, which is for all practical purposes just as hot as the stack. What is more expensive is arrays of references and such, but that will be solved by value types once, hopefully.

EDIT: What I’m trying to say is that there is very little overhead to creating short-lived objects in Java.

pkolaczk · on April 24, 2023

> EDIT: What I’m trying to say is that there is very little overhead to creating short-lived objects in Java.

I've been working on high performance Java software (databases) for many years now and the overhead of creating short lived objects is bearable only because we're going so far trying not to create those objects, writing Java in a kinda C-style - no objects, just primitives as much as you can.

The total overhead is not just bumping up the pointer and zeroing memory but actually all what happens when the GC runs out of the slab. By allocating short lived objects youre making GC collect earlier and overall doing more work. Also the faster you allocate, the more objects survive the collection and more objects must be copied. That process is very expensive and also very complex.

ackfoobar · on April 24, 2023

> You'd think it would be "solved" by now

This can be an interesting inqury. Sad that you have to ruin it with snark.

I guess in some sense it was "solved" more than a decade ago in Azul's C4 (Continuously Concurrent Compacting Collector). I wonder how a generational ZGC will compare against it.

nielsbot · on April 25, 2023

No snark intended actually

executesorder66 · on April 24, 2023

Java was released in 1995. I wonder if you have ever updated any software that came out before then? Surely you don't update your OS. Windows/Linux/Mac all came out before 1995, and must surely be "solved" by now. In fact you probably still use Windows 95. Surely Windows 98 or any later version can't be any better?

What's crazy to me is the sort of cognitive dissonance that you must have to hold a view like that, and still apply software updates at all.

Jeff_Brown · on April 24, 2023

The name is a little misleading because once you know it's garbage, "collection" is trivial. The trick is to figure out what is garbage cheaply.

datadeft · on April 24, 2023

Since garbage collection based memory models are mush more complex and require a lot of software engineering it is not a surprise that we have many iterations.

nielsbot · on April 25, 2023

People seem to think I was being snarky here, but I wasn't. It was honest surprise.

Ygg2 · on April 24, 2023

Due to Rice theorem any interesting property of computer program can be solved by solving the halting problem.

moonchild · on April 24, 2023

That's backwards. One proof of rice's theorem relies on the fact that, if you could decide any interesting property of programs in general, then you could solve the halting problem; it doesn't follow that, if you could solve the halting problem, then you could decide any interesting property.

It happens to be true that, if you could solve the halting problem, then you could decide any interesting property of programs; but this is simply because the antecedent is false.

Ygg2 · on April 24, 2023

> That's backwards.

See Wikipedia entry for Rice theorem subsection proof by reduction from the halting problem.

1. Have intersting property solver (e.g. is input squared)

2. Convert solver to halting problem solver

3. Solver is undecidable

So they are in some equatable/convertible.

moonchild · on April 24, 2023

Being able to turn an interesting property solver into a halting problem solver is not the same thing as being able to turn a halting property solver into an interesting problem solver.

Ygg2 · on April 24, 2023

My claim or at least intent was you can solve interesting property solver like you can solve a halting problem. I.e. not in a satisfying way (no false negatives or false positives with unlimited iterations).

You can solve it to some extent (e.g. does program terminate in X steps).

UncleMeat · on April 24, 2023

But this is vacuous and unrelated to the halting problem or Rice's Thm. A program that emits "Yes" on every input will decide some property of programs with zero false negatives.

Ygg2 · on April 24, 2023

Hence, the interesting property part. Program that returns "true" on everything is trivial.

UncleMeat · on April 24, 2023

"Interesting property", as defined by Rice's Thm, is "any semantic property that is not either always true or always false for all program executions." It does not mean "interesting" as in "interesting to humans."

And I'm talking about the decider, not the program being fed as input.

Ygg2 · on April 24, 2023

I'm not a computer scientist, but I am aware Interesting doesn't mean interesting to humans.

> And I'm talking about the decider

EDIT: If by decider you mean program tested for having property P, that's a particular case and not applicable in general, ergo non-important (a program that's a NOP can be shown to leak no memory as well).

So the property P you are testing always returns true? Then it's not interesting per your definition. Leaking memory however is interesting.

UncleMeat · on April 25, 2023

No that's not what I mean by decider. The decider is the program that takes the other program as input and tells you whether that input program has the semantic property you want.

The only noninteresting semantic properties of programs that exist are "this is a program" and "this is not a program."

Ygg2 · on April 25, 2023

Yes. And you said

> A program that emits "Yes" on every input will decide some property of programs with zero false negatives.

Followed by

> any semantic property that is not either always true or always false for all program executions

So what you said is your decider always outputs true. That's not an interesting property, because it's true for all inputs.

Btw what is the point of your argument? It still doesn't refute my initial statement. If you want to determine an interesting property of program like leaks, or automatic lifetime calculation or whether output is square of input, you run into Rice Theorem, which makes the problem undecidable. In other words it's impossible to get correct yes and no algorithmically.