As a Clojure programmer, I'm so happy that I'm getting all these improvements over many years. It's fashionable to complain about Java — but I don't use Java, and yet I benefit from all the fantastic work that is being done on the JVM.
The excellent backwards compatibility rocks too. This weekend I wanted a library to diagram the class hierarchy of a Java project I put together.
I found a 10 year old library that worked with no issues, I was done in minutes even though I rarely use Java or Clojure so my familiarity is low. There are very few languages where I would have a similar experience.
That is the positive attitude, it always irks me when guest language comunities bash the platform that make their language even possible in first place.
Everything sucks, but then they gladly take the platform, the ecosystem of libraries, debuggers and what not from the host platform.
It depends, some languages have more to complain about than others. I’d argue F# on .net basically had to spend years treading water while Microsoft’s various teams figured out .net core/framework. The community feels like it lost a lot of inertia it used to have.
I feel that the latest JVM developments have really been aimed at the developers: easier syntax, nice threading support, fast garbage collector, etc. On the GUI side I know that JavaFX has also been pushing for more features and more performance to become competitive again.
Is it still fashionable to bash Java? I remember the bashing of Java around 15 years or so ago, IMO justifiably because it could be slow to the point of making programs unpleasant to run.
But nowadays… I mean Java can be a bit verbose, but other than that it seems like a fine language and the JVM is pretty great. Plus we don’t have annoying Java applets in our browsers anymore (perhaps one of the main sources of legitimate Java hatred, and something so bad that we’ve intentionally all forgotten about it).
I say this all the time to people that say they want to use Node because they hate Java. There are so many languages that you can execute on the JVM. It’s almost unfortunate that it’s still called the JVM.
It think the JVM needs a re-branding. Even if they just change the meaning of the J. I suppose that is what GraalVM is doing, but it’s an awful name and its purpose isn’t being conveyed well. The JDK should just switch to this VM and get rid of the name JVM.
I haven't been in that world since Java 7. What's the startup time on the JVM these days? It used to be a pretty hefty cost, and it didn't seem like it would get much better because of the way class loading was done.
Not sure what startup time matters for running a persistent service is, and it's obviously tied to hardware. You can get applications up and running in less than second or so (we were using Micronaut to resolve dependencies at compile time). If you use GraalVM you can compile to native code and now we're talking a few tens of milliseconds to get an application up and running (or execute and exit).
That was always the story back then, too — the assumption that no one writes anything other than services with long uptimes. No CLI tools, no lambdas, no quick-launching GUIs. Sounds like that bias hasn't really changed over the years?
Not sure if it's a "bias". The JVM does start up slowly. This has never been an issue for me in all the years I've been using Clojure.
If I need "scripts" with short start-up times, I'll use Babashka (GraalVM) or ClojureScript (Node). Same language, instant start-up. But then I don't expect to get good performance from those VMs for long-running services with large memory allocations.
You pay for what you use. If you bring in a litany of libraries to connect to some endpoint in every conceivable IPC mechanism then sure, it will take some time (but let’s not forget that it would also take considerable time with native machine code). But besides very very frequently run CLI tools like ls, the problem is entirely overblown. Come on, quick-launching GUIs? I hardly even remember that being a thing.
JVM still has some limitations. https://stackoverflow.com/a/2682962/31667 Those actually influence either the design or the performance of the higher level language.
And the last modification was only wording, with the same 13 years old content. [0]
Always check what changed in SO posts, because this practice is very common.
The improvement in throughput is tantalising. I'd love to see a comparison to the G1GC throughput. I wonder if it's getting close enough that ZGC might some day become a candidate to replace G1 as the default collector?
I recently did a deep dive on gc issues at work, and compared eden the default collector and z.
Our app was collecting pretty heavily, and an average collection time of about 50ms. With Z, and Linux huge pages it was collecting sub millisecond. This was at the cost of a bit more cpu, as z uses more threads for collection.
Using gatling for performance testing, we saw a small increase in requests per second. But the benefit was a smoother response time, with less spikes to high p9X measures.
Using flight recorder with this is a great way to get details of how the gc is performing.
Addendum: As noted below zgc uses more memory, I only found it viable on 16GB heaps or above. With it really improving performance around 32GB.
There are some politics involved here. ZGC is competing against Shenandoah for the role of next-gen GC; ZGC is from Oracle and Shenandoah from RedHat. Because they are pretty evenly matched I find it unlikely that one would be promoted as default over the other
As far as I know, ZGC uses a lot memory than G1, and for as long that's the situation, it's unlikely to replace G1 as a default configuration. Another understanding I have is that ZGC is best used for large workloads, compared to G1 which is more generalized and doesn't perform as well with larger workloads.
But, I'm no JVM experts and might have understood it incorrectly. Happy to be corrected!
Yes, on small heaps you are still forced to use a 64bit pointer instead of a 32bit pointer, which means that your per-object overhead is higher, this means that on "normal" heap sizes, < 32Gb, you end up paying a reasonable penalty.
That said, for a lot of use cases, this is still a reasonable trade-off to make, but it probably won't be promoted to "standard" until it's better in all use cases.
ZGC relies on the space of 64-bit pointer (and the size of the 64-bit virtual address space) for its pointer coloring algorithm, and so it is by definition incompatible with UseCompressedOops even at small sizes.
I suppose it's possible to fix this with a change to the design in some way. But if you have a heap smaller than 32GB, the existing G1 would probably work well too. ZGC shines much more on very large heap sizes since that's where long tail GC latency often hits hardest.
If the logo didn't already made it clear which Java inspired the name, the island Java was/is also instrumental when it comes to growing coffee, so still, Java the language was a bit inspired by the island Java :)
It seems ZGC's autotuning is following in the footsteps of Azul's proprietary C4 (Continuously Concurrent Compacting Collector) which came a decade or two earlier. Both GCs are low-latency and their main tuning option is just the heap size.
Is the next generation of GCs going to use Reinforcement Learning and Machine Learning? The problem with a lot GC is that there's a currently unavoidable tradeoff between memory usage, latency and throughput.
Predicting memory usage patterns can help allocation strategy - eg if you get a flood of traffic every day at 9AM you can time GC to run before the rush, and efficiently preallocate space for the objects you predict needing.
Crazy to me that garbage collection systems are a subject of continual evolution and discussion. You'd think it would be "solved" by now... Or is that just in Java world?
Not an expert but bear in mind that there can't really be one single thing called a 'good' garbage collector, because the behaviour of the programs that produce the garbage can be hugely varied, and the garbage so produced can likewise be hugely varied, and how you want to recover it can depend on the platform you're running on (if embedded, you probably want reference counting the tight memory, in other circumstances you might be very happy just to leak everything and let the OS clean up at the end. Edit: yes, it's sometimes ok to leak).
As somebody says elsewhere, it's multidimensional issue and a single dimensional garbage collector can't be a brilliant match for everything.
The book "Garbage Collection" by Richard Jones, recently discussed on HN, is something I'd highly recommend you buy and read.
It’s just as “solved” as regular memory allocators are for C/C++ — it’s constantly evolving, with many competing solutions. TBB, tcmalloc, jemalloc, mimalloc, etc etc etc.
Memory management is hard, and in a world where memory sizes keep increasing, new techniques being developed is expected and perfectly normal.
I'm not sure what "solved" would mean for GC. Can't there always be improvements in maintainability, pause duration, correctness, throughput etc. Many dimensions to continually improve on.
Unless someone comes up with a no pause, no compute power no code fully correct GC?
There are some dimensions yes, but I do wonder if GC research is starting to approach the point of being "solved" well enough that we'll see a drop in investment. Not yet, clearly, but approaching that territory.
If you look at the very latest JVMs you have three GCs which a spread over throughput-optimized (parallel), a middle ground jack of all trades (g1), and latency-optimized (zgc). They're a lot more self configuring in the past, so no matter what needs your app has it's usually just one CLI switch to get something fairly optimal.
But ZGC is going generational, at which point it will have significantly higher throughput than before. Also, all the JVM GCs can handle massive heaps these days (like, terabyte scale). So at that point the choice might become even simpler: just use ZGC all the time unless you can tolerate higher pauses and want to reclaim some CPU for the app, in which case go with parallel. And in many cases the right tradeoff between latency and performance could be inferred automatically from the frameworks and libraries that are on the classpath, meaning that for most apps even needing to make that choice could go away.
If you have a GC that doesn't pause, can handle terabyte sized heaps, has great throughput, is largely or completely self-configuring, is portable, open source, supports dozens of languages and has clean code, there are really very few metrics left to optimize after that. It seems like HotSpot is nearly there, especially when combined with GraalVM for language support.
> If you have a GC that doesn't pause, can handle terabyte sized heaps, has great throughput, is largely or completely self-configuring, is portable, open source, supports dozens of languages and has clean code
That's basically ZGC Generational + GraalVM. So, code that's available now albeit you'd have to compile it from version control (graalvm stable releases don't support zgc yet, released zgc isn't generational yet). Give it another year or so and that'll be shipping out of the box in ordinary JVMs.
The hardware still evolves, so at the least, garbage collectors will have to be tuned or self-tune themselves for that.
I also think it’s unlikely that a GC can be written that handles both terabyte sized heaps on machines with hundreds of cores and megabyte sized heaps on single-CPU systems well.
Single GC, no. Single VM, yes. HotSpot serial GC is designed for the latter use case. I forgot to mention it before but it's still there and supported, I think even selected automatically in some cases.
But there aren't so many single core systems with megabyte sized heaps anymore. Even watches are multi-core these days.
Are there any production-ready GCs that have strictly bounded latency? I'm not aware of any, and that means that GC is still too immature for use in about 95% of the computers in my house, the deeply embedded ones.
Metronome did this for embedded hard realtime use cases. I don't know much about it. ZGC has bounded pause latency at least but it's not really meant for very small machines. Android has used a pauseless GC for a while.
Strictly bounded anything (hard real-time) is an entirely different category, with very few uses for the average person - and contrary to what one might think from its name, it is not “fast” at all. It’s mostly about “calculating the trajectory of that incoming projectile should definitely happen in 0.3s”, which is many orders of magnitude more than what a modern computer would take to do 3 multiplications.
But to answer your question, there is JamaicaVM and many niche JVM implementations with hard real time GCs — but it’s simply false that a GC would be somehow a problem for most of your devices.
Hard real-time has a huge amount of uses for the average person. I just made toast -- the software-controlled thermostat in my toaster oven has (slow) hard bounds on its sense-to-control latency to avoid dangerous overheating. Before that, I brushed my teeth -- the brushed motor controller detects a motor stall by increased current if I brush too hard and stall it, shuts down the motor to avoid overheating (with a hard real-time sense-to-actuate constraint), and beeps to inform me. Then I spent some time on HN, with a laptop with PMICs and other microcontrollers that have hard real-time constraints all over the place. By far the majority of my computer interactions so far today have been with computers that have hard real-time constraints, and I haven't even gotten in my car yet!
I think it was a tongue in cheek reply, as it is indeed a no-code, no-pause, no-compute power GC that does eventually reclaim the garbage (at process exit).
It’s meant for short lived apps where you simply wouldn’t get OOM either way. I don’t know about an objective definition for what a GC is, one I quite like is “gives the illusion of infinite memory”, based on that it isn’t a GC.
But there are also conservative GCs, RCs, etc, that fail different definitions also (leaving behind circular dependencies, having false positives, in reverse order), like in case of many things in CS, there isn’t an objective definition.
Theoretically someone could perhaps one day discover the theoretically optimal GC algorithm, and someone else could write the TeX of GCs using that algorithm, and then perhaps we'd have the final GC.
Extremely unlikely that such an algorithm exists of course, and even if it did, looking at sorting will quickly suggests that improvements will keep being found even if the best asymptotic complexity had been achieved.
Lots of things in CS are actually cyclic: RAM is expensive, implement swapping from HDD. RAM gets cheaper, load all data there and instead optimize your code for CPU. Data gets bigger, need to go back to swapping again. Suddenly we get SSDs and hard drive access is lots faster, invent something new but similar to the old stuff.
And also in system design, computer architecture and engineering. T. H. Meyer and I. E. Sutherland coined this generational behaviour the "Wheel or Reincarnation" in their paper "On the Design of Display Processors." IMHO a paper that is still well worth reading:
Optimal memory behavior and lifetime is highly application specific, so the behavior of an automatic system likewise will be better for some applications than others.
In Java, the goal is to have something that performs reasonably well, without degenerative cases, for a wide variety of applications and services. It also has to handle cases where a monolithic code base results in a multitude of different memory behaviors. Even thinking too hard on the ideal behavior will negatively impact application code.
The research field of garbage collectors is a study of trade-offs.
A highly tuned, application-specific memory strategy will likely outperform Java, but the goal is to not require anyone to build such a thing for most software.
> You'd think it would be "solved" by now... Or is that just in Java world?
I'm not sure what you mean but I'll try to relate. Automatic garbage collection is a problem not just in the Java world. It's also needed in JavaScript, Python, Go, Haskell, etc.
Citation needed, but I suspect the Java Virtual Machine ecosystem has the most advanced GC algorithms in the world. Even if that's not true, I can tell you with certainty that other language environments have adopted features many years after Java did, such as JavaScript V8 introducing concurrent generational copying in year 2017 ( https://v8.dev/blog/orinoco-parallel-scavenger ). I'm sure there are examples in Go as well where their GC implemented features that Java had a decade earlier.
Memory management for C is not itself a solved problem, not only is there a lot of performance to squeeze out of malloc itself (the benchmarks on https://github.com/microsoft/mimalloc exemplifies the variance between the implementations), but it's up to the programmer to implement memory management in the large in an efficient way, which is not an easy task. One sure mark of a slow C program is one with a ton of mallocs and frees strewn all over.
If we can know exactly where a particular object’s lifetime ends, then indeed we can write more efficient programs than possible with a GC, but that in the general case is not something we can determine at compile time (Rust’s lifetimes are a strict subset of the general case).
Anything more dynamic will require some “ad hoc, informally-specified, bug-ridden, slow implementation of a GC”.
Rust also relies on a GC for the more dynamic lifetimes in Rust programs (reference counting), it just doesn't call it a GC, and while Rust's GC has a lower footprint than tracing GCs, it has worse performance and it suffers from cycle problems. I wouldn't call it "bug ridden" (perhaps "limited" is a better term), but it is slow.
Some think that reference counting gives better performance predictability than a tracing GC, but that is no longer true, certainly compared to the generational ZGC presented here. If you could determine the time where the collection would take place with a refcounting GC, i.e. when the last reference is cleared, you wouldn't need a refcounting GC. It is unpredictable pretty much by definition.
> Rust's GC (...) has worse performance [than tracing GC]
That's an unsubstantiated claim.
While reference counting may be slower than tracing in the general case, Rust's reference counting is not the same thing as general reference counting in languages like e.g. Swift or Python. There are two reasons:
- The good majority (99%+) of objects in Rust don't need to be managed by reference counting at all.
- Rust relies heavily on move semantics to avoid updating the ref-counts, so the good majority of refcounted objects have very few (sometimes just 1) refcount updates.
> reference counting gives better performance predictability than a tracing GC, but that is no longer true, certainly compared to the generational ZGC presented here.
Where does ZGC claim nanosecond latency?
But even if it did, it would be still worse, because refcounting doesn't do pauses at all. Pausing applies to pausing all application threads. Spending X nanoseconds freeing memory by one thread does not count as a pause, because the system can still make progress.
But reference counting is just a tiny subcomponent of Rust's GC. The majority of Rust's automatic memory management is based on affine types and static code analysis, not refcounting. You're just picking subcomponents to make Java solution look better, but what matters for developers is the whole picture.
And BTW, I wouldn't be so sure about throughput either.
ZGC is capable of allocation rates of the order of only ~5 GB/s, which is way below the theoretical memory bandwidth. I bet jemalloc can do much better.
I thought I made it clear in my last comment that I'm not claiming that Java's memory management is somehow "better" than Rust's. I was merely pointing out that even Rust has a GC, and it's not a good one, but it might certainly be good enough for Rust's needs as it doesn't depend on it. Overall, obviously Rust and Java choose very different tradeoffs in their memory management strategy.
> ZGC is capable of allocation rates of the order of only ~5 GB/s
That was the non-generational ZGC. Indeed, ZGC's max throughput was well below that of ParallelGC and G1. One of the main goals of generational ZGC is to increase the throughput, and as you can see in the video, they got a 4x increase in throughput.
The statement was that ZGC (or any JVM GC for that matter) is better than Rust’s GC, which is just naive ref counting. This is true - if we were to create a program that allocates the same amount in both implementations, Java would win by a huge margin.
Have you ever tested it by yourself? I tested and no, it didn't win by a huge margin. Malloc+free was only a tad slower than the default Java GC new (stop the world, parallel) but the default GC is much faster than G1 and ZGC so I would not be surprised if malloc could beat those modern ones at allocation rate benchmark.
Do you have a public benchmark for that? It’s quite hard to write a benchmark that actually measures what we are interested in. Not doubting you, I would be interested only.
It does give better performance predictability! Decrementing a refcount may either be a no-op or expensive but crucially the op can happen only during that particular call, nowhere else.
It does not. Maintaining the counter requires more work than tracing GCs do, and you don't know in advance when the last reference is dropped. True, you get more promptness in deallocation (not the same as predictability), which reduces the footprint but comes at the expense of compaction, the lack of which in most refcounting GCs also adds unpredictability. In contrast, in the common case, allocation with a compacting, tracing GC is a pointer bump and there's no overhead at all to copying references or other memory operations. During collection cycles, GC barriers are triggered, which add overheard to things like mutating pointers or reading pointers from objects in the heap.
There's a great paper (linked from this post https://www.cs.cornell.edu/courses/cs6120/2019fa/blog/unifie...) showing how refcounting GCs can match the performance of tracing GCs -- in fact, how they can each have the properties of the other -- but that requires sophistication in the implementation of refcounting GCs that most of them don't have. So it is true that a sophisticated refcounting GC and a sophisticated tracing GC can exhibit similar behaviour, but while OpenJDK's tracing GCs are very sophisticated (especially ZGC and G1), most refcounting GCs in the wild are quite unsophisticated. It may certainly be true that unsophisticated refcounting GCs have more desirable properties than unsophisticated tracing GCs, but in practice the choice offered is between unsophisticated refcounting GCs and very sophisticated tracing GCs. The latter beat the former on most counts except for footprint overhead.
Ah, it's that topic again. The whole point "but refcounting requires more maintenance per object than GC" is totally moot when there are only 0.01% of objects being refcounted (in Rust or C++) vs 100% objects traced by GC in Java.
I never claimed otherwise. Java's and Rust's overall memory management strategies are far too different to compare. I was talking specifically about refcounting vs tracing.
> But what you call promptness is predictability from the PoV of the programmer, no?
I don't think so. You don't know when memory will be freed or how long that would take only that it will be freed as soon as possible. The end result is lower footprint but that's about it.
Rust's lifetimes is a strict subset of the general case, but it covers a good majority of use-cases with affine types - which means trivial allocation and deallocation. The number of times I had to use lifetime specifiers or dynamic lifetimes (RefCell/Rc/Arc) is extremely low, so low that it is not even a blip on the profile. Also, unlike most GC languages, Rust favors stack allocation over heap allocation, which makes most Rust program allocation rates several orders of magnitude lower than any Java program I saw (at the expense of more inter-stack copying, but that is extremely cheap because the top of the stack is almost always in L1 cache).
In my experience most applications do need an escape hatch from a strictly tree-like lifetime pattern, it really is limiting.
Object layout is the primary reason for the performance differences between low-level languages and manages ones — Java uses a thread local allocation buffer, which is for all practical purposes just as hot as the stack. What is more expensive is arrays of references and such, but that will be solved by value types once, hopefully.
EDIT: What I’m trying to say is that there is very little overhead to creating short-lived objects in Java.
> EDIT: What I’m trying to say is that there is very little overhead to creating short-lived objects in Java.
I've been working on high performance Java software (databases) for many years now and the overhead of creating short lived objects is bearable only because we're going so far trying not to create those objects, writing Java in a kinda C-style - no objects, just primitives as much as you can.
The total overhead is not just bumping up the pointer and zeroing memory but actually all what happens when the GC runs out of the slab. By allocating short lived objects youre making GC collect earlier and overall doing more work. Also the faster you allocate, the more objects survive the collection and more objects must be copied. That process is very expensive and also very complex.
This can be an interesting inqury. Sad that you have to ruin it with snark.
I guess in some sense it was "solved" more than a decade ago in Azul's C4 (Continuously Concurrent Compacting Collector). I wonder how a generational ZGC will compare against it.
Java was released in 1995. I wonder if you have ever updated any software that came out before then? Surely you don't update your OS. Windows/Linux/Mac all came out before 1995, and must surely be "solved" by now. In fact you probably still use Windows 95. Surely Windows 98 or any later version can't be any better?
What's crazy to me is the sort of cognitive dissonance that you must have to hold a view like that, and still apply software updates at all.
Since garbage collection based memory models are mush more complex and require a lot of software engineering it is not a surprise that we have many iterations.
That's backwards. One proof of rice's theorem relies on the fact that, if you could decide any interesting property of programs in general, then you could solve the halting problem; it doesn't follow that, if you could solve the halting problem, then you could decide any interesting property.
It happens to be true that, if you could solve the halting problem, then you could decide any interesting property of programs; but this is simply because the antecedent is false.
Being able to turn an interesting property solver into a halting problem solver is not the same thing as being able to turn a halting property solver into an interesting problem solver.
My claim or at least intent was you can solve interesting property solver like you can solve a halting problem. I.e. not in a satisfying way (no false negatives or false positives with unlimited iterations).
You can solve it to some extent (e.g. does program terminate in X steps).
But this is vacuous and unrelated to the halting problem or Rice's Thm. A program that emits "Yes" on every input will decide some property of programs with zero false negatives.
"Interesting property", as defined by Rice's Thm, is "any semantic property that is not either always true or always false for all program executions." It does not mean "interesting" as in "interesting to humans."
And I'm talking about the decider, not the program being fed as input.
I'm not a computer scientist, but I am aware Interesting doesn't mean interesting to humans.
> And I'm talking about the decider
EDIT: If by decider you mean program tested for having property P, that's a particular case and not applicable in general, ergo non-important (a program that's a NOP can be shown to leak no memory as well).
So the property P you are testing always returns true? Then it's not interesting per your definition. Leaking memory however is interesting.
No that's not what I mean by decider. The decider is the program that takes the other program as input and tells you whether that input program has the semantic property you want.
The only noninteresting semantic properties of programs that exist are "this is a program" and "this is not a program."
> A program that emits "Yes" on every input will decide some property of programs with zero false negatives.
Followed by
> any semantic property that is not either always true or always false for all program executions
So what you said is your decider always outputs true. That's not an interesting property, because it's true for all inputs.
Btw what is the point of your argument? It still doesn't refute my initial statement. If you want to determine an interesting property of program like leaks, or automatic lifetime calculation or whether output is square of input, you run into Rice Theorem, which makes the problem undecidable. In other words it's impossible to get correct yes and no algorithmically.