This isn't a bug in the compiler or the JVM. The Java spec explicitly allows this optimization to happen. From The Java Language Specification (SE 7), section 12.6.1:
> Optimizing transformations of a program can be designed that reduce the number of objects that are reachable to be less than those which would naively be considered reachable. For example, a Java compiler or code generator may choose to set a variable or parameter that will no longer be used to null to cause the storage for such an object to be potentially reclaimable sooner.
Also, the "this" keyword is treated like a parameter. (See the end of section 2.6.1 of the JVM spec.)
Fully agree. This is a well known issue that users of native interop from managed languages commonly encounter when they first start out. One of the first bugs I found in .NET related to this; the Ping component didn't expect that Finalize could be called on the same instance before a method on that instance finished executing.
What is less convenient is that Java doesn't have a handy method to ensure that an object is reachable, that the JVM won't optimize away or eliminate. .NET has GC.KeepAlive; AFAIK, Java doesn't have an equivalent. I asked a question about this on SO[1] some time ago. An approximate equivalent can be cobbled together, with probabilistic assignments to a volatile variable.
I've done a fair bit of work interfacing memory managed languages like Java & Python with "unsafe" native code and found that this problem actually crops up often.
The first issue is that finalizers should NEVER be relied upon to clean up system resources the way that you might in a language with deterministic garbage collection like C++. Instead, expect clients of interfaces that use system resources to manually call some 'close' like method. Use finalizers to warn clients that they are leaking. Josh Bloch has an excellent writeup on this[†]:
The second thing, as I believe @btown is saying, is that the child object seems to depend on memory that it doesn't own a strong reference to. I bet the whole memory graph here is just a tangled mess. I used to think that that was just the nature of graphs. When I first started working with GC that couldn't handle these kind of cyclical structures without being explicitly told I thought that it was an absurd premature optimization. Then, this wise old hacker peered at my code, smacked me on the back of the head, and grumbled that if I couldn't clearly explain the reference hierarchy it wasn't just the software that would be befuddled, it was too complicated for his old brain as well.
I'm sure that this can't be true, but I am starting to suspect that all the tremendous work that has gone into fast pause-less concurrent GC has been for naught. I wish that when GC encountered a cycle it would just tell the programmer that they now have a leak, maybe even suggest where it could be fixed with one of those neat little IDE quick fixes, and continue chugging along.
tl;dr When you have a hard problem let someone else deal with it.
[†] Granted, Bloch does make an exception for 'native peers'. My understanding of that pattern, however, is that the key is not letting some other object walk off with a reference to your peer, which is exactly what has gone wrong here.
(Aside: Josh Bloch's Effective Java & Java Puzzlers are fantastic. They are worth reading for programmers who use any language.)
> the child object seems to depend on memory that it doesn't own a strong reference to
Exactly. Having made similar types of robotics interfaces myself, the memory graph for OP probably isn't as complicated/cyclic as you might think: multiple types of frames (at various stages of visualization processing) look into each others' memory buffers for efficiency. It's probably best modeled as an acyclic directed graph where frame objects reference (many-to-many) memory buffers. The problem is that unless you abstract out those buffers into garbage-collectible objects themselves, it's very tempting just to pass a weak reference (like a pointer from JNI) to the buffer owned by an EarlyStageFrame to the processing code in a LaterStageFrame. Thus the segfault when the EarlyStageFrame is garbage-collected.
That is wonderfully clear. I haven't touched much Java post nio but could the app just use something like java.nio.ByteBuffer to model the raw memory at the bottom of the graph, then have EarlyStageFrames muck with those using static native methods that take the ByteBuffers where needed and then LaterStageFrames playing with the EarlyStageFrames and reaching inside them to get at their ByteBuffers when absolutely needed for performance?
> It's probably best modeled as an acyclic directed graph
My pet theory is that a good representation for most systems should maybe always be an acyclic directed graph?
Fantastic examples! I would also add möbius strips, toroidal spaces, and of course, the venerated circle. Sorry, my thoughts above are poorly worded. Perhaps what I am trying to say is that cycles should be created with intentionality, instead of willy nilly spilled throughout your code base. My complaint is really with all the mad spills of Java that I created when I thought that GC was some magic wand that freed me from having to consider the structure of the memory I was manipulating.
Someone should go around sabotaging regular expression parsers and replacing them with PEGs:
It is a crime against honest words to try to cram so much logic into so few characters.
I once was told that regular expressions were originally designed for creating AIs. The image I have in my mind is that of graduate students in a dungeon somewhere banging on type writers creating some sort of God box.
>I wish that when GC encountered a cycle it would just tell the programmer that they now have a leak, maybe even suggest where it could be fixed with one of those neat little IDE quick fixes, and continue chugging along.
Are you saying that cyclic data structures are always a bad idea? So no one should ever make a doubly-linked list?
Or, to take a case from a game I'm working on: I'm writing a set of GUI controls for the game's UI. There will almost always be at least a few controls on screen, so I have a WindowManager object that lives for the entire duration of the application and does, among other things, tasks like telling all of the controls to draw themselves, determining which control has focus, etc. The controls all have a reference to the WindowManager object (or whatever their direct parent is, once I start supporting more complex controls) so they can implement methods like onClick() { parent->removeGroupOfControls(); }
Even though the WindowManager is persistent, the individual controls come and go quite often. So if I was using a GC (I happen to be doing this in C++) that simply threw up its hands whenever it encountered a cycle, none of the controls would ever get garbage collected, leading to a huge memory leak over time.
Is this a case where you think cycles are appropriate? Or is there a better way to refactor this so it doesn't use cycles? (If someone has ideas for a better design, I would absolutely love to know.)
I cannot imagine creating any sort of complex system that does not, at some conceptual level, contain cycles. What I am suggesting is that the author of a complex system should, whenever possible, untangle as many of these cycles as possible by differentiating between the different classes of references. Think of it like the difference between your directory tree and your symbolic links in unix.
Reg. your example, I have generally found it effective to model UI hierarchies where the top level components refer to their children 'strongly' while the children refer 'weakly' to their parents. I am not familiar with the recent smart pointer stuff in c++ but I believe that these concepts have been cleanly modeled with abstractions such as this:
N.B. A more difficult conundrum that I have encountered when dealing with UI hierarchies is whether or not child components should require a constant reference to their parent passed upon creation or you should be able to 'reparent' or even be in an initial parentless case. My current thinking is that this is a mistake and interactions such as drag and drop are better handled by sending a gesture up to your data layer which then regenerates the component in its new place. This might get ugly, however, if you have a heavily direct manipulation based UI which doesn't really need a data layer on top.
By the way, what has lead you down the road of UI toolkits? They are a fetish of mine. Feel free to shoot me an email if you'd like to connect.
"I wish that when GC encountered a cycle it would just tell the programmer that they now have a leak..."
Wow. Overreact much? :-)
Garbage collection works great for one use case: managing heap memory. Please, let's not return to a universe where you need to free each chunk of memory piecewise. That makes freeing memory O(n) in the number of chunks freed, where modern collectors can free unlimited numbers of objects in O(1) time. Compared to this, malloc and free are a little like keeping score in soccer by tracking every action that does not result in a ball entering a net.
The trouble is finalizers. Finalizers suck. They tie the management of one resource (say native memory or file handles) to an unrelated resource (heap memory). This has the terrible consequences highlighted by this story, plus others. It keeps the native resource tied up until the collector decides to run, for one thing. It also harms performance in secondary ways even when the finalizer isn't running. For example, finalizers typically defeat escape analysis, meaning objects with finalizers can't be stack-allocated.
It's a good idea to use finalizers mostly for detecting invalid final states of objects (like leaking resources), as long as you stay aware that finalizers can slow down your program in unexpected ways.
[These are my personal opinions, not those of my employer.]
>That makes freeing memory O(n) in the number of chunks freed
A GC can't outdo that. For everything the GC frees, it has to do some work to answer the question, "Should I free this or not?" Even if that question can magically be answered with a single CPU instruction, that's still O(n) answers in the number of chunks freed.
I think you're referring to a mark and sweep collector. Java's garbage collector uses better algorithms. The collector never even looks at objects that are not referenced from the root set, and an allocation operation is nothing but a pointer bump. Even Cheney's algorithm from the early 1970s works this way, and collectors have only improved since then.
The JVM doesn't release the memory to the OS when garbage is collected; only when the heap shrinks. Any zeroing the OS might do is proportional to the size change in the heap, not to the number of objects freed.
However, that brings up a related issue - namely that the JVM requires all (or close enough to all) object memory to be initialized. And given that the amount of memory required to be initialized is at minimum linear w.r.t. the number of objects required to be initialized...
Work done is still O(n) minimum w.r.t. the number of objects allocated regardless.
Well, it's hard to deny that the work done by the program is O(n) (or even Ω(n)) in the number of objects created by the program. That's almost tautological.
The thing that is interesting about the asymptotic complexity of garbage collection, though, is that it gives you a peek into the future of GC overhead as heaps get larger.
Consider a program with some bounded steady-state memory consumption, and a GC that takes O(1) time to free unlimited amounts of memory. In such a system, you can reduce your GC overhead arbitrarily simply by enlarging the heap. A larger heap gives a larger time interval between collections, yet doesn't make the collections any slower. Don't like 6% overhead? Fine. Make the heap 10x larger and you have 0.6% overhead.
This time-space trade-off is always available to the end user of such a system. It's certainly not available with more primitive systems like malloc+free.
[These are my personal opinions, not those of my employer.]
It doesn't matter what the time to free memory is as long as it is O(n) w.r.t. the number of objects (or less), the limiting factor is allocation. As, in steady state (as you are assuming), all memory freed will be reused, and (effectively) all memory used will be written to at least once, and memory used by a group of objects is at least O(n) w.r.t. the number of objects.
So no, I don't see the advantage you say. I see that, in the best case, it's on par with non-GC'd systems, asymptotically speaking.
And yet, I see plenty of disadvantages. To start:
I have yet to see a GC that's O(1) w.r.t. the amount of memory to free. Best case? I'll grant you that, for an extremely naive copying collector. But the chance of that best case occurring is... not significant. Especially with the more sophisticated algorithms.
And again, as above, it doesn't matter how long it takes to free memory. As long as it's linear or sublinear, it'll be dominated, asymptotically speaking, by how long it takes to initialize that memory again. And in steady-state operation it will be reinitialized.
Not to mention the practical effects.
For instance: GC thrashes cache. Badly. It completely kills the concept of a working set. You can't work around that by making your heap larger - if anything it makes it worse. It makes the time period between thrashings longer, yes, but makes them worse when it does happen.
GC also doesn't scale well to large numbers of cores. In particular, your worst case "always" will be when the GC has to check with every core to see if it still has a reference to something. And given that processor numbers are where most of the performance improvements seem to be coming in... You mentioned memory size improvements but didn't mention increasing numbers of cores.
GC also (almost always) has field(s) per object. Minimum of O(1) overhead per object initialized.
Which means that, generally speaking, the overhead doesn't asymptote to zero. It asymptotes to a constant - and non-negligible, at least from what I've seen - amount of overhead.
I seem to have hit a nerve here. Perhaps I haven't been clear about some of the points I'm making, which I don't think are all that contentious.
I'm not saying a fast GC can reduce the cost of allocation. I'm saying a fast GC can reduce GC overhead. I think that's an uncontroversial statement. I can only insist so many times that Java's GC doesn't touch dead objects at any time during its scan. If you want to disagree with me, that is your prerogative.
I agree that a GC walk can thrash cache when it happens. However, a copying GC can also rearrange object layout to improve locality. Which effect is more significant depends on the workload.
I didn't mention the number of cores because it didn't occur to me that it was significant to you. GCs can scale just fine to many cores. No core has to "check" other cores: each core can check its own local data, depending on the tactics employed to deal with NUMA. There are always challenges in scaling any parallel workload, of course, and there are always solutions.
Our GC (in IBM's J9 JVM) uses 8 bits in each object. With classes aligned to 256 bytes, there are eight free bits in each object's class pointer. Hence, J9's object model has one word of overhead on top of the user's data, which is what malloc/free typically needs to track each object's size anyway. No disadvantage there.
[These are my personal opinions, not those of my employer.]
>and a GC that takes O(1) time to free unlimited amounts of memory
If the unlimited amounts of memory are all in one contiguous block and ALL of them are ready to be free'd (and determining that all of them are ready, takes O(1) time somehow) then what you say is plausible.
In any other case it's fantasy. Suppose I have n variables and I point each of them at a distinct new object (not pointed to from anywhere else). Then I re-set a random subset of them to null. Based on which ones I randomly set to null, there's 2^n possibilities for what the GC must do. If your claimed GC O(1) is long enough for the processor to take k conditional jumps, there are at most 2^k possible results of running it. For n>k, there are necessarily subsets in the above scenario which your GC fails to handle.
>This time-space trade-off is always available to the end user of such a system
People routinely run servers on virtual machines with very limited RAM.
Your argument proves that the time taken can be no less than O(n) in the number of object references in live objects. The (potentially billions of) unreachable objects are never touched.
I agree that limited RAM environments are important. Asymptotic complexity only matters when you're effectively operating at the asymptote.
Yeah, managing memory with malloc and free directly was a nightmare. Back when I worked with Java, however, GC still suffered from unpredictable performance as well as a requirement to have much larger heaps. Are these now mostly solved problems? If not, what do you think of something like this (syntax aside):
My personal experience with similar such systems was that even when performance was not a concern the cost of having to untangle and document my reference structure was well worth the time.
Has any language ever gotten the balance of tradeoffs for finalizers right for strong GC?
Why is that? I would have thought they would work really well together - if an object such as a file stream doesn't escape you could inline and run the finaliser after the last use of the object goes out of scope.
Or do you mean they defeat current implementations of escape analysis?
It's hard to say. The finalization spec is subtle.
To make this work, the JIT would need to perform escape analysis on the finalizer too (since the finalizer can make an object escape). If that analysis succeeds, then I suppose the object could be stack-allocated and the finalizer called directly. Even then, though, you would have to think carefully about all the subtle ramifications, like the consequences running a finalizer on an application thread instead of the finalizer thread. (What if the finalizer grabs a lock in an attempt to protect itself from the application code? Now this becomes a recursive lock by one thread, and so both the application code and the finalizer can enter it at the same time!)
If this ever became important enough, it's possible the JIT developers could get this right in every case with enough effort invested. I wouldn't count on that happening any time soon.
[These are my personal opinions, not those of my employer.]
(Whoops, I've misunderstood you twice. I think my other reply answered the wrong question. Let me try again.)
I didn't say synchronization can prevent EA. I said that calling a finalizer on an application thread can defeat the mutual exclusion that would usually be guaranteed by Java's locking. If the app thread holds a monitor when the finalizer is called, that would usually make the finalizer block until the app exits the monitor. If they're on the same thread, that becomes a recursive monitor enter on the same thread, which doesn't block.
Even if you and I solve this particular problem, these are only the problems that occurred to me off the top of my head. With these kinds of subtle, thorny issues lurking around every corner, I wouldn't expect JIT development teams to give this any real attention unless there's some workload where it matters.
[These are my personal opinions, not those of my employer.]
As I recall, the memory hierarchy was actually simple--the issue was that it called some structure_free(parent) in C++, which freed the entirety of that part of the hierarchy--the parent and the children; which would obviously cause problems if the children weren't done working.
That said, I'm looking over my backups from that era; I can't quite figure out what real code corresponds to the pseudo-code in the post (I didn't have the real code when I wrote the post).
public type1 getFrame() {
type2 child = this.getChild();
type3 var = this.something();
// `this` may now be garbage collected
return child.somethingElse(var); // segfault comes here
}
> Where the destructor method of this calls a method that will free() native memory that is also accessed by child; if this is garbage collected before child.somethingElse() runs, the backing native code will try to access memory that has been free()ed, and receive a segmentation fault.
If child needs some block of shared memory contained in the parent (this)... shouldn't there be a shared object referenced by both parent and child that owns the block of shared memory? Then as long as child isn't GC'd, even if the parent/this is GC'd, the shared memory-owner won't be GC'd, and the memory will be safe.
For the C++ equivalent of this pattern, you'd want to scope the shared memory to a reference-counted RAII resource (say, a shared_ptr), and make sure that any actor that interacts with it holds a reference for the actor's lifetime. It's unreasonable for Java to "guess" that you meant anything different from this pattern - what if you actually wanted the parent object to be eligible for GC (say, if it's hogging other resources the child doesn't need) between this.something() and child.somethingElse()?
So I'd consider this a WontFix if I were a JVM developer, and maybe write a blog post about it and put it on HN if I had the time. Something something feature not a bug, etc.
EDIT: To be clear, the child is trying to have low-level access to a resource it has no ownership stake in, implicitly or explicitly. So there's no reason for that resource to stick around for the child. It's like a spouse taking the house in a divorce - the "prenup" in Java is that only reference-holders have any expectation for an object to be accessible to them. If the "child" didn't want to find itself segfaulted on the streets, it should have thought about writing that "house" (the shared memory) into its prenup, hmm?
EDIT 2: The analogy kind of fell apart, I think. But my point still stands. Apologies to any divorced HN'ers I may have offended.
this is not a bug. author doesn't seem to understand how jvm or managed memory works.
he uses memory from parent object in child code without holding a reference to it in child object. If his code was in pure java this simply won't be possible. The only reason he is able to do this is via native code.
Even in native code, it is wrong to have a pointer to managed memory without using the Global/Local Ref() methods that the jvm provides.
In summary, the jvm is behaving correctly with respect to GC. Author doesn't know how to use JNI and has caused the crash.
>The issue was that Java was making an unsafe optimization (I never bothered to figure out if it is the compiler or the JVM making the mistake, I was satisfied once I had a work-around).
He says it was the JVM or the compiler's fault. He doesn't blame the code.
IMO, whether he though this explicitly or not, he was really just disagreeing with the Java spec. Of course, effectively any bug you encounter could be dealt with by blaming the spec writers, but in this case I may agree with him. Depends on how much this particular optimization saves.
I've faced similar issues in the .NET world. There is a built in method, `GC.KeepAlive()`, that does nothing other than keep the reference alive to the eyes of the garbage collector.
It completely defeats the whole point of a garbage collector if you have to resort to hacks like this. I understand why it is done. A lot easier to use the work around then to fix the root problem.
Unfortunatelly these types of bugs have a nasty way of showing up in other places where it can be even harder to debug. Long term these bugs need to be fixed otherwise your code will just be a ticking bomb. The worst part is when they combine with other bugs and they become even harder to debug.
I wouldn't say "completely defeats". The hacks are usually there only when you need to manage the lifetime of something that's not managed by the garbage collector. I've seen it used for mutexes, for example, to keep a mutex alive and locked.
You could say that the problem is that people abuse the garbage collector to manage something it shouldn't manage, or you could say that the problem is that garbage collectors are bad at managing anything that isn't memory. (I would lean towards the latter.)
nope, when you have handles on resources outside managed memory, the GC is out and you need to have an acquire/release mechanism. This resource will probably have a bit of state visible on the managed side, so there will be some pinned memory.
I am very disappointed that this.getSize(); at the end fixes the issue. Since it has no side effects and the value is never used anywhere, I would expect the compiler to remove that call.
Me too. I would fully expect the JIT compiler to remove this call unless it's more complicated than it sounds. I don't think this is a real fix.
I work on IBM's JIT compiler. We had to do something similar to fix this problem in DirectByteBuffer a few years ago, by adding a call to a magic method called keepAlive. To make this work, we had to teach the JIT to handle this method specially, or else it would have ripped it out, and the fix would have had no effect.
[These are my personal opinions, not those of my employer.]
The Java compiler does almost no optimization. I think it's an idealistic thing: All optimization is supposed to be done by the JVM. Personally I think that sounds stupid.
This is a very sound approach. The main benefit is that all JVM languages can utilize optimizations if the optimizations are in the JVM, thus logic duplication is reduced.
Also, please note that both Clang and GCC use the same approach, except in their cases the programmer typically does not see the intermediary code (which is LLVM for Clang, and a family of intermediate languages for GCC).
Except that, as evidenced by this case, the JVM isn't doing the optimization either!
(Also: it's not a stupid approach. Due to the way the JVM loads things, most of the time you don't know that any particular method won't be overridden later by something that does, in fact, have side effects. There are very few optimizations that could actually be done by the Java compiler.)
Yes, actually I find it hilarious that the JVM at the time had an aggressive garbage collector, but a relatively weak optimizer (it should have optimized out the call to this.getSize()).
Since the VM apparently does optimization now, perhaps that won't get eliminated until the code path is hot, i.e., after a few thousand calls to the method.
Ah, yes, the instability of fake fixes to compiler bugs. I recall a friend of mine once had a bit of C code that would compile and run correctly only if debugging was enabled. His fix, to get it to run with full optimizations without segfauting was what he referred to as "the God code": int god = 1; To this day no one knows why that did anything at all, but at some point the compiler bug was fixed without anyone filing this bug that we know of.
That's a classic symptom (and "fix") for an array that was {over}/{under}run by one (or some small number of) element(s): the extra allocated space on the stack may have prevented it walking into unallocated memory (and segfaulting).
Now, if he'd run it under valgrind and it showed no problems of that sort, and it still segfaulted...
My understanding is that it was reversed from that, where with stack canaries and debug tracing nothing odd was happening, but when things are run without defenses it _stops_ working, instead of starting to work because you're paying less attention. That was why it was so odd -- obviously the array overrunning (or passing (void*) &local_var into subroutines where it's cast to a different type, etc) would be a pretty obvious candidate.
I'm curious. Was it something that existing across many JVM updates ? Given the stability of the language if I have ever seen an issue (rare) there's always been a choice of 10-20 JVM versions to pick from.
The competition gives us six weeks to build the robot, before we have one or two three day competitions. Apart from possibly attending non-official off-season competitions, that is the only time we would use the code before the next years competition, which is completely different, and for which we are required to right entirely new code[1].
Once we got a hack working, we ran with it and rarely tried to clean up. In this case, we didn't bother trying a different JVM.
[1] Unless the code is open source (as ours was), in which case we were explicitly allowed to use it.
One more reason not to use "finalize". Did you know that there is no guarantee that it every gets called? That's because there is no guarantee for objects to be garbage collected as long as there is enough memory.
I only have an amateur interest in JVM development, but doesn't this bug break some rules around GC 'object roots'? Even if 'child' and 'something' are referentially copied to local variables on the calling stack, I believe 'this' should continue to act as a root for both of them. I think.
"this" can be collected in the body of a method, if the last reference dies before the method returns. It is treated no differently from the explicit method parameters.
Not necessarily. Those "getChild()" and "something()" methods could, for example, return new objects. this is only the root of child and something if this retains references (i.e. member variables) to these objects, which I'm guessing was not the case in this example.
Ah, yes, didn't consider that. But what about 'this'?; for what reason could the JVM choose to GC an object that the calling thread is currently running in? The example method exists in an instance of 'this' and isn't static.
Execution in Java does not have a concept of "current object". 'this' is passed in a magic way, but it does not exist in a magic way. It's like any other variable, and can be collected as soon as it's not needed. Which can be extremely early, because optimizations can reduce the 'need' for a variable to only its side effects.
Go pick apart aggressively optimized and inlined code and I bet you can find situations where 'this' never even exists.
There have been 116 updates since then (33 public). Given that segfaults almost never happen in the JVM (I've never seen one) I can imagine it would be the highest priority to be fixed.
> Given that segfaults almost never happen in the JVM (I've never seen one)
I've seen 5 segfaults over the last week in our system. Unfortunately we don't enable core dump in our system (yet) so we haven't really had a good sense of what might cause the system. We've been seeing about 30 SIGSEV's over the last three months, but we haven't been able to root cause it yet.
> Optimizing transformations of a program can be designed that reduce the number of objects that are reachable to be less than those which would naively be considered reachable. For example, a Java compiler or code generator may choose to set a variable or parameter that will no longer be used to null to cause the storage for such an object to be potentially reclaimable sooner.
Also, the "this" keyword is treated like a parameter. (See the end of section 2.6.1 of the JVM spec.)