I remember the 2x rule from 20 years ago - do you know if things have changed? If locality is more important now, tracing GC might never be as performant as reference counting. Either you use 2x the memory and thrash your cache, or you use less and spend too much CPU time collecting.
Java has had AOT compilation for a while, so traditional GC and its massive overhead are no longer a strict necessity. Even AOT Java will probably stay behind Swift or any other natively compiled language in terms of memory usage, but it shouldn't be that drastic.
As for performance and locality, Java's on-the-fly pointer reordering/compression can give it an edge over even some compiled languages in certain algorithms. Hard to say if that's relevant for whatever web framework Apple based their service on, but I wouldn't discount Java's locality optimisations just because it uses a GC.
For a while means since around 2000, althought toolchains like Excelsior JET and Websphere Real Time, among others, were only available to companies that cared enough to pay for AOT compilers, and JIT caches.
Nowadays to add to your comment, all major free beer implementations, OpenJDK, OpenJ9, GraalVM, and the ART cousin do AOT and JIT caches.
Even without Valhala, there are quite a few tricks possible with Panama, one can manually create C like struct memory layouts.
Yes it is a lot of boilerplate, however one can get around the boilerplate with AI (maybe), or just write the C declarations and point jextract to it.
> Java has had AOT compilation for a while, so traditional GC and its massive overhead are no longer a strict necessity.
You mean it does escape analysis and stack-allocates what it can? That would definitely help, but not eliminate the GC. Or are you thinking of something else?
Thinking about it more, I remember that Java also has some performance-hostile design decisions baked in (e.g. almost everything's an Object, arrays aren't packed, dynamic dispatch everywhere). Swift doesn't have that legacy to deal with.
Java also has a lot of culture making optimization-resistant code so there’s the question of whether you’re talking about the language itself or various widespread libraries, especially if they’re old enough to have patterns designed around aesthetics or now-moot language limitations rather than performance.
I’ve replaced Java code with Python a few times and each time even though we did it for maintenance (more Python devs available) we saw memory usage more than halved while performance at least doubled because the code used simpler functions and structures. Java has a far more advanced GC and JIT but at some point the weight of code and indirection wins out.
That’s why I said “culture” - by all rights the JVM should win that competition. I wrote a bit more about the most recent one in a sibling comment but I’d summarize it as “the JVM can’t stop an enterprise Java developer”.
> Enterprise developers and architects would be the same, regardless of the programming language.
This is true to some extent but the reason I focused on culture is that there are patterns which people learn and pass on differently in each language. For example, enterprise COBOL programmers didn’t duplicate data in memory to the same extent not only due hardware constraints but also because there wasn’t a culture telling every young programmer that was the exemplar style to follow.
I totally agree about C++ having had the same problems but most of the enterprise folks jumped to Java or C# which felt like the community of people writing C++ improved the ratio of performance sensitive developers. Python had a bit of that, especially in the 2000s, but a lot of the Very Serious Architects didn’t like the language and so they didn’t influence the community anywhere near as much.
I’m not saying everyone involved are terrible, I just find it interesting how we like to talk about software engineering but there are a lot of major factors which are basically things people want to believe are good.
> I’ve replaced Java code with Python a few times ... while performance at least doubled
Are you saying you made Python code run twice as fast as Java code? I have written lots of both. I really struggle to make Python go fast. What am I doing wrong?
More precisely, when deploying the new service microservice it used less than half as much CPU to process more requests per second.
This is not “Java slow, Python fast” – I expected it to be the reverse – but rather that the developers who cranked out a messy Spring app somehow managed to cancel out all of the work the JVM developers have done without doing anything obviously wrong. There wasn’t a single bottleneck, just death by a thousand cuts with data access patterns, indirection, very deep stack traces, etc.
I have no doubt that there are people here who could’ve rewritten it in better Java for significant wins but the goal with the rewrite was to align a project originally written by a departed team with a larger suite of Python code for the rest of the app, and to deal with various correctness issues. Using Pydantic for the data models not only reduced the amount of code significantly, it flushed out a bunch of inconsistency in the input validation and that’s what I’d been looking for along with reusing our common code libraries for consistency. The performance win was just gravy and, to be clear, I don’t think that’s saying anything about the JVM other than that it does not yet have an optimization to call an LLM to make code less enterprise-y.
Okay, I understand your point. Basically, you rewrote an awful (clickbat-worthy) enterprisey Java web app into a reasonable, maintainable Python web app. I am sympathetic. Yes, I agree: I have seen, sadly, far more trashy Java enterprisey apps than not. Why? I don't know. The incentives are not well-aligned.
As a counterpoint: Look at Crazy Bob's (Lee/R.I.P.) Google Guice or Norman Maurer's Netty.IO or Tim Fox's Vert.x: All of them are examples of how to write ultra-lean, low-level, high-performance modern Java apps... but are frequently overlooked to hire cheap, low-skill Java devs to write "yet another Spring app".
> but are frequently overlooked to hire cheap, low-skill Java devs to write "yet another Spring app".
Yeah, that’s why I labeled it culture since it was totally a business failure with contracting companies basically doing the “why hire these expensive people when we get paid the same either way?” No point at ranting about the language, it can’t fix the business but unfortunately there’s a ton of inertia around that kind of development and a lot of people have been trained that way. I imagine this must be very frustrating for the Java team at Oracle knowing that their hard work is going to be buried by half of the users.
IMO the “Spring fever” is the most horrible thing that has happened to Java. There genuinely are developers and companies that reduce the whole language and its ecosystem to Spring. This is just sad. I’m glad that I have been working 15+ years with Java and never touched any Spring stuff whatsoever.
It all depends, but one major advantage of the way the JVM GCs is related memory will tend to be colocated. This is particularly true of the serial, parallel, and G1GC collectors.
Let's say you have an object that looks like A -> B -> C. Even if the allocation of A/B/C happened at very temporally different times and inbetween different allocations, the next time the GC runs as it traverses the graph it will see and place in memory [A, B, C] assuming A is still live. That means even if the memory originally looks something like [A, D, B, Q, R, S, T, C] the act of collecting and compacting has a tendency to colocate.
That's the theory -- a compacting collector will reduce fragmentation and can put linked objects next to each other. On the other hand, a reference count lives in the object, so you're likely using that cache line already when you change it.
I don't know which of these is more important on a modern machine, and it probably depends upon the workload.
The problem is memory colocation not RC management. But I agree, it'll likely be workload dependent. One major positive aspect of RC is the execution costs are very predictable. There's little external state which can negatively impact performance (like the GC currently running).
The downside is fragmentation and the CPU time required for memory management. If you have an A -> B -> C chain where A is the only owner of the B and B is the only owner of C, then when A hits 0, it has to do 2 pointer hops to deallocate B and then deallocate C (plus arena management for the deallocs).
One of the big benefits of JVM moving style collectors is that when A dies, the collector does not need to visit B or C to deallocate them. The collector only visits and moves live memory.
> The downside is fragmentation and the CPU time required for memory management. If you have an A -> B -> C chain where A is the only owner of the B and B is the only owner of C, then when A hits 0, it has to do 2 pointer hops to deallocate B and then deallocate C (plus arena management for the deallocs).
I suspect this puts greater emphasis on functionality like value types and flexibility in compositionally creating objects. You can trend toward larger objects rather than nesting inner objects for functionality. For example, you can use tagged unions to represent optionality rather than pointers.
The cost of deep A->B->C relationships in Java comes during collections, which still default to be halting. The difference is a reference counting GC will evaluate these chains while removing objects, while a reference tracking GC will evaluate live objects.
So, garbage collection is expensive for ref-counting if you are creating large transient datasets, and is expensive for ref-tracking GC if you are retaining large datasets.