Creating a memory leak with Java

TimJYoung · on April 18, 2017

So, a question: is it better or worse that the usage of GCs has changed the mindset of developers to stop worrying about allocation/freeing of resources ?

Given that developers still need to worry about freeing/releasing native resources, I think it might have inadvertently created a false sense of security and, in the process, resulted in developers that don't even possess this type of thinking anymore. We ran into this with the .NET Data Provider for one of our database engines. A good portion of .NET developers that used it were surprised to find out that they needed to call the Dispose method on certain database objects in order to free/release their native handles/resources. This was followed by a period of confusion about what to dispose of, and what not to dispose of (like this: http://stackoverflow.com/questions/2024021/does-the-dispose-...).

jerf · on April 18, 2017

"resulted in developers that don't even possess this type of thinking anymore."

The good old days weren't. What it resulted in on average was developers who just bashed on their manual allocation until it "seemed to work", not glorious hand-constructed shimmering perfect crystals.

At least now the code that "seems to work" usually does, and is memory-safe now, so yes, it's an improvement. If you still want to be careful and do it by hand in such a language, they're still there, and there's even better options arising.

mschaef · on April 18, 2017

> At least now the code that "seems to work" usually does,

Some of that may come down to better hardware. It's possible to be incredibly naive with today's hardware and still get results that would've taken high level engineering 20 years ago.

Edit: This is down-modded, but it's fundamentally good news. Increases in our capacity to do productive work with these machines come from both software and hardware improvements. Not sure what else you'd expect.

sgift · on April 18, 2017

> Edit: This is down-modded, but it's fundamentally good news. Increases in our capacity to do productive work with these machines come from both software and hardware improvements. Not sure what else you'd expect.

The downvotes are probably from the realization that this is good and sad news at the same time. On the one side, it is good news because many software projects wouldn't have been realized if you couldn't be that naive, on the other side who knows what would be possible with a bit more "discipline" (for lack of a better word)

kpil · on April 18, 2017

I think you could leak a little 20 years ago too, or use unnecessary amounts of memory - in most cases.

It's just a few applications and parts of applications that really are resource constrained and critical enough to really matter...

mcguire · on April 18, 2017

Certainly. All those Unix-style filter programs had little need to be too careful since they didn't run for long.

On the other hand, anyone know if the leak in the X server ever got fixed? IIRC, it was small enough that it was only a problem running the server for months on a resource constrained machine.

gopalv · on April 18, 2017

> At least now the code that "seems to work" usually does, and is memory-safe now, so yes, it's an improvement.

The most significant part of that statement comes from accepting contributions into your project.

Java has made it much saner to just look at a patch and LGTM +1, over any work I've ever done in C/asm.

Trust in the careful handling of memory by a stranger has been replaced by trust in the system to complain if they don't ... which has made for bigger and wider communities of strangers sending each other patches.

mcguire · on April 18, 2017

The power of a notation is not in what it lets you think about, but in what it lets you not think about.

mschaef · on April 18, 2017

Agreed... the notational convenience of GC is easy to overlook but important. Particularly when anonymous closures, etc get involved, it's very useful to not have to save off a reference and pass it around to the point where it can be manually freed.

pbh101 · on April 18, 2017

Also, many more tasks are now economically feasible whereas before, the high fixed upfront cost of programming would have made them never done. In other words, we're making more programs now to do more more-trivial things because the cost of doing so is lower, and that's more or less a good thing.

mikeash · on April 18, 2017

I don't think it has really changed the mindset of developers. All it has done is lessen the consequences of not caring, or of caring and getting it wrong.

I once took over an app that did a live search of a dictionary as the user typed. Looking over the code, I noticed that the memory management was a little funky, so I did a leak analysis on it. Turned out it was leaking something like 3,000 objects with every single keystroke. It still "worked," because the OS acts as the ultimate GC by cleaning up all of an app's memory when it terminates. Users mostly didn't leave it running long enough for the objects to build up, and if they did then it would die on its own eventually.

There's a famous story of Microsoft's heroic efforts to maintain backwards compatibility in Windows. The original version of SimCity would free memory and then read from it. This worked fine on Windows 3.x because the newly freed memory would be left undisturbed until something else allocated new memory. Windows 95 didn't work that way, so SimCity started crashing. Microsoft added code to check if SimCity was running and change the behavior of the memory allocator to stop it from crashing.

I've been programming on Apple platforms for a long time. The current standard languages are Objective-C and Swift, both with automatic reference counting (which is mostly a GC, although you have to manually break cycles). Automatic reference counting is a relatively new addition to Objective-C, and before 2011, everyone did manual reference counting instead. For a long time, the state of the art was to follow the official rules for writing refcounting code and occasionally look for leaks and crashes by running memory debugging tools on the live app. About eight years ago, the Clang project put out their static analyzer which could look at Objective-C code and tell you when you weren't following the rules properly. The reaction of every Objective-C programmer I knew, even the really good ones, upon running the static analyzer was something like, "holy shit, I had no idea my code was so full of bugs!"

algesten · on April 18, 2017

> The current standard languages are Objective-C and Swift, both with automatic reference counting (which is mostly a GC, although you have to manually break cycles).

I don't want nitpick, but isn't there a _world_ of difference between ARC and your standard GC? In a GC I expect various different "spaces" for different generations of memory allocation and an assortment of algorithms for detecting unreferenced memory at various times. "Stop the world" events etc.

I may be wrong, but I always thought of ARC as a somewhat glorified pointer structure where each allocated memory got a counter of how many places are currently pointing to it, and when that counter reaches 0, deallocation happens. No fancy generation spaces, no complicated cycle finding algorithms and no stopping the world.

mikeash · on April 18, 2017

ARC is indeed just a matter of giving every allocation a counter of the number of strong references it has, and deallocating that chunk when it reaches zero.

But nothing says GC must be fancy. A standard tracing garbage collector can be really simple too. All it needs to do is periodically do a recursive tree walk starting from roots and seeing which allocations it can reach, and then destroy all the allocations it can't reach. Different spaces, different generations, and all the rest are enhancements that are added to improve performance, not something fundamental to garbage collection.

Ignoring performance, there are really only two hard parts to creating a garbage collector. You need some function that can return all of the root pointers, and some function that, given a pointer, can give you all of the pointers contained within that object. Given those two things, a competent programmer can write a basic garbage collector in about five minutes.

The only thing that makes ARC possibly "not a garbage collector" (depending on your definitions) is its inability to collect cycles. You can even fix that by tracing references each time you decrement an object's reference count and seeing if it's participating in a cycle with no outside references. This is sort of an inversion of the traditional tracing garbage collector, and can run entirely synchronously, although the scan can also end up looking at all of memory if it's run on something that's highly connected.

algesten · on April 18, 2017

Thanks!

hyperpape · on April 18, 2017

Reference counting is garbage collection: http://www.cs.virginia.edu/~cs415/reading/bacon-garbage.pdf

bluGill · on April 18, 2017

I think the right answer is what we are missing a middle ground. All the "leaks" are weird special cases. I think modern C++ is getting closer to the right answer: you always have to think about object lifetime, but sometimes the thought is easier than others.

One of the things C++ gets right is deterministic destruction. 5-10% of the time this is important. In particular C++ has the ability to manage resources other than memory along with object lifetime. (Python has a simplified for of this, I don't know about other languages). Of course the inverse is: 90% of the time deterministic destruction is not important.

80% of the time object lifetime is easy: this object is tied to the stack lifetime (if you have a garbage collector is should be a fatal error if the object is accessible outside of the stack lifetime). Of this 5% of the time deterministic destruction when the stack frame goes out of scope is important. Generally stack based destruction uses less CPU than real garbage collection. (though real garbage collection may move destruction outside of the performance critical parts)

Then 15% of the time the object is shared outside of the stack, but reference counted garbage collection will take care of it. Of that maybe 2% is cases where reference counting (deterministic destruction) is important, the other 13% other garbage collection is better.

4% of the time you have a shared object outside of a stack frame that cannot be reference counted, but other garbage collection algorithms will work just fine.

The last 1% of the time something weird and complex is going on. You will need your best engineers to spend a lot of effort making sure you have all the cases right, and this effort will continue as long as the project lives.

Note the the above percentages are educated guesses. They seem right based on my experience, but every problem domain is different. I'm not aware of any formal study of the problem. (I can't even think of a useful way to run one)

btschaegg · on April 18, 2017

> In particular C++ has the ability to manage resources other than memory along with object lifetime.

Such a simple statement, yet I can't tell you how much I miss this property every time I have to deal with some hacky IDisposable in C#.

danbruc · on April 18, 2017

A good portion of .NET developers that used it were surprised to find out that they needed to call the Dispose method on certain database objects in order to free/release their native handles/resources.

If the dispose pattern is implemented correctly, you strictly speaking don't have to call Dispose, the finalizer will eventually free the unmanaged resources. But there are not many guarantees when this will happen and so you may hit resource limits before unused unmanaged resources are freed and therefore it is of course a good idea to explicitly free them by calling Dispose explicitly or implicitly.

TimJYoung · on April 18, 2017

Yeah, I should have clarified that, so thank you for doing so. With our database engine, it was very important that Dispose be called in order to avoid some nasty surprises due to the GC finalizing the outer .NET class instance from a different thread. You can also run into this with a plain ODBC driver using the ODBC.NET data provider.

hdhzy · on April 18, 2017

The problem is even worse than it looks on the surface (from the client side). If you're implementing a library that holds native resources in .NET it's better to familiarize yourself with CriticalFinalizerObject [0], SafeHandles and Constrained Execution Regions. Writing correct finalizers that work even when out of memory exceptions are thrown is no easy task...

[0]: https://msdn.microsoft.com/en-us/library/system.runtime.cons...

osd · on April 18, 2017

To be fair, the correct use case for Dispose is extremely confusing in .NET. Dispose feel like a language construct to novice developers, but it's left to library designers to properly implement it. There are multiple cases of Microsoft designed API's that don't properly implement dispose, and therefore can't properly be used with 'using.' I'm sure the amount of non-microsoft-blessed code with this issue is far larger.

marcosdumay · on April 18, 2017

I do agree, but I think it means we need urgently languages that automatically enforce the release of generic resources, not that we should make manual memory management mainstream.

And we are moving there. We are getting higher level constructs for managing resources all the time, on every language.

iopq · on April 18, 2017

You already have Rust, which uses RAII, but also has memory safety.

fny · on April 18, 2017

Don't get too comfy with Rust! Memory leaks are memory safe, so a logic error can still lead to a leak. RAII doesn't guarantee that no leaks are possible.

Just take a gander at std::mem::forget[0], which is part of "safe" Rust:

> "Leaks a value: takes ownership and "forgets" about the value without running its destructor. Any resources the value manages, such as heap memory or a file handle, will linger forever in an unreachable state."

[0]: https://doc.rust-lang.org/std/mem/fn.forget.html

dbaupp · on April 18, 2017

You're correct but that's kind-of missing the point: I imagine the parent was trying to say you can have the deterministic management of non-memory resources like C++, without the downside of loosing memory safety.

It's low-signal to say that memory leaks are possible, since, as the article in question demonstrates, it's perfectly possible to get leaks in languages like Java.

On `forget`, it's equivalent to the following Python, which can be translated into pretty much any language one wants:

  holds_reference = []
  def forget(x):
      global holds_references
      holds_reference.append(x)

There are cases where one needs to think about avoiding leaks, but no more than C++.

marcosdumay · on April 18, 2017

> I imagine the parent was trying to say you can have the deterministic management of non-memory resources like C++, without the downside of loosing memory safety.

At least, that's what I was talking about.

But the GP has a point. It will leak other resources the same way it leaks memory. That is, it will rarely do so, but it's something you have to keep in mind.

marcosdumay · on April 18, 2017

Rust is certainly my next language to learn. It seems to get a pretty nice balance between high and low level features.

problems · on April 18, 2017

This is why I prefer the Rust and modern C++ approach more personally. Put everything on the stack by default for fast, easy access, it automatically gets destroyed when you leave the scope. Then use reference counted shared pointers or other tactics when you need to store large amounts of data in memory.

This keeps users thinking about lifetimes but at the same time encourages safety and performance.

MartinBeckCop · on April 18, 2017

I'm a hiring manager and when I see a resume with a few years of Java I ask the same question.

It's all about objects being "unintentionally reachable" which generally all boils down to putting something in a collection and never taking it out, or completely forgetting about object lifecycle. Just because you have a runtime and a GC doesn't mean you can forget any kind of teardown. (If you register an event listner and never unregister you are ignoring lifecycle.)

People who have a few years of Java on their resume should have some idea of when an object is eligible for collection by the GC and the common programmer errors shown by other posts in this thread that can cause leaks.

daemin · on April 19, 2017

That's the classic example of a memory leak in a GC language. Where you add objects to some list or map so that they can be looked up quicker than being created again. Some people call it a cache. What people forget though is some way of removing the items from the cache, and so it leaks these objects for the lifetime of the application.

Not really as sophisticated as other ways of leaking memory, but it can also be easily done in all other languages.

lightlyused · on April 19, 2017

I see this a lot, especially with crud apps and not closing result sets.

fwefwwfe · on April 18, 2017

There's also off-heap space usage.

msluyter · on April 18, 2017

Seems like there's some debate over what "memory leak" really means. On the one hand, you can just inadvertently grow some resource -- say, by appending to a list indefinitely -- until your sever crashes. But in this case, if the list goes out of scope it should be garbage collected, so such memory is in practice reclaimable.

The more interesting case is allocation of memory which is not in practice reclaimable (as described in the first answer.)

The former seems pretty trivial to understand, but if the original interviewers meant the latter I doubt I would have been able to come up with a good example off the top of my head.

tannhaeuser · on April 18, 2017

Memory leaks on the server side are also a consequence of long-running single-process server software. What was traditionally run as a CGI (eg. Apache creating a fresh process for each request) or was started from inetd is now frequently implemented in a multi-threaded or evented fashion, so that using the O/S as garbage collector for memory and temp files doesn't work anymore. Not only does this put additional strain on getting memory management 100% right, it also creates memory fragmentation when done naively. While the consensus and narrative seems to be that evented I/O is more performant because the O/S doesn't have to schedule loads that have "nothing to do anyway most of the time" so to speak, I've yet to see a benchmark comparing traditional process-per-request vs. multithreaded and/or evented approaches; it's not that user-space memory management doesn't introduce additional overhead.

My suspicion is that in many cases memory management strategies of desktop software have been used for server-side software without evaluation, or out of a habit because Java introduced GC into server software. But OTOH even OpenBSDs "new" (3-5 years old) httpd integrates traditional CGIs via the so-called "slowcgi" bridge, and uses evented I/O and asynchronous APIs natively so maybe I'm wrong on this one. If even OpenBSD developers themselves don't use ASLR and other techniques of OpenBSD to ensure non-deterministic memory allocation and go to great length to get equivalent protection within single-process request processing containers, they sure must be on to something.

algesten · on April 18, 2017

> Memory leaks on the server side are also a consequence of long-running single-process server software.

On a tangent to this. I noticed that one benefit my company reaped from embracing a micro service architecture was that we get away with _a lot_ of shit code.

Suddenly a NodeJS process can have glaring memory leaks and work fine for weeks because whenever it crashes, it restarts in 0.5 second and no one even bothered to investigate.

Compare that with the good-ol-JBoss monolith with a 3-4 minute startup time.

On the one hand, crap code can go unnoticed for weeks, which feels wrong and probably will bite us somewhere in the end, on the other hand, we can focus on function not code quality.

koolba · on April 18, 2017

Plus if you build for this from the get go and assign a max life to your processes with auto restart, it becomes a native part of of their lifecycle.

With multiple processes listening to the same socket it's easy to create a continuously on service that under the covers respawns with no apparent loss of service.

roguecoder · on April 19, 2017

This is the Google approach to scaling to thousands of developers: build rock-solid infrastructure so none of the code can go too badly, and you can worry a lot less about the quality of most of the code. For most functionality, writing amazing code would just be wasted effort when a less-amazing developer could have implemented something good enough.

hinkley · on April 18, 2017

Any time the memory necessary to perform a similar action on similar data increases over time, people will call it a memory leak.

It's just shorthand now for "we are using up all our memory and getting nothing in the bargain"

bluGill · on April 18, 2017

Actually the more interesting case is your first: some list is appended to indefinitely. The memory isn't leaked in that you can access it, but by the act of not cleaning it up your run time performance slowly goes down over time: each list traversal hits elements in the list that presumably never need to be considered again. (if you are not traversing the list from time to time why are you keeping a list?) Several real world bugs have been root caused to some variation of this.

sqeaky · on April 18, 2017

I think I see where you are coming from and I disagree. I don't think that whether or not the memory is "reclaimable" is an important detail.

When going to fix a performance issue or otherwise reduce memory usage the process to fix it is largely the same. Some programmer needs to understand the code in question at least well enough to mentally model its use of resources. Then that programmer needs to find a flaw and either prevent the allocation or clean up after the allocation.

Whether or not it is "reclaimable" doesn't play a major role in the problem solving process. Both can be hard to find and both with bring an application down if left unchecked. Both need cleanup or prevention. There are tools to find both kinds of issues.

Which tool is one of the first differences a programmer might see, but I think it is hard to argue that choosing between valgrind and a memory profiler is a deeply interesting issue that warrants separating these kinds of issues.

tyingq · on April 18, 2017

The most common ones I've seen are unclosed sockets or other connections due to bad logic in try/catch/finally blocks.

djsumdog · on April 18, 2017

This was my first major memory leak; specifically not closing all my JDBC result sets. Fun fact, if you get table metadata, that creates a result set. :-P

mschaef · on April 18, 2017

Easy to overthink this: create a memory leak by retaining un-needed references.

I forget where I read it, but I once saw GC characterized as changing memory management from managing the nodes of graph to managing the edges. It's more automatic and safer, but there is still a finite heap and there are still pitfalls.

rbanffy · on April 18, 2017

Just create a Vector and keep adding things to it, indefinitely.

I once encountered that in a production environment (circa 2000). The server kept stats while it was running, appending them to a vector on every iteration of its main loop. On the development environment, on x86 processors, it took days to exhaust the memory and the issue was never spotted. On the production environment, on an 8-way ridiculously fast SPARC machine, it took a couple hours to completely lock up the machine.

sorokod · on April 18, 2017

Vectors haven't been used that much this century.

What you describe is not a leak. A leak would be when you loose all references to your Vector and memory is not reclaimed by the GC.

coding123 · on April 18, 2017

Yes, it's not a leak but many Java devs are aware of these things and instead of typing out "somewhere in my graph I've retained a reference to something that shouldn't" has been colloquially shortened to "GC Leak" and if you see those two words together, they know what it really means.

the_af · on April 18, 2017

I'd say even in Java, infinitely adding elements to a collection is not a memory leak, but merely forgetting that memory is finite. No Java programmer I know would call it a memory leak.

A leak is when you think you no longer hold references to an object, but you still do, and therefore the GC cannot free its memory. An example is the accepted answer in the StackOverflow link.

joncrocks · on April 18, 2017

It's important to note that in Java you don't have leaks in the same way that you do in C.

When someone refers to Java 'leaks', they actually mean "Objects that are kept in memory by references that are never used again, but in theory could be." It's like your code is a hoarder, keeping all those old back-issues of newspapers 'In case I need them again' when in fact they are just taking up space.

daemin · on April 19, 2017

I'd argue that a memory leak is when you've allocated memory that you are not using any more but it has not been freed for some reason. If this happens because the reference has been truly lost, or if there's something else (like a helpful cache) that is keeping it alive, it doesn't matter. It is still leaked and will only be cleaned up when the program exits.

This applies to resources in general, if they are only reclaimed by the OS when the program exits, then it is a leak.

sorokod · on April 19, 2017

A GC is unable to guess that a programmer is not using a given object. To be eligible for GC the object must not be (strongly) referenced - that is it. Different GCs are tuned for different scenarios - typically balancing throughput and latency. I know of no GC that uses a "strategy" of waiting til program exits to reclaim memory.

krapp · on April 19, 2017

>if they are only reclaimed by the OS when the program exits, then it is a leak.

If this is true, then memory leaks have become a paradigm. I've seen it argued that you should never bother freeing memory yourself, and just let the OS take care of it. It always bothered me.

daemin · on April 20, 2017

I think the statement refers to when already shutting down the program. In that case you don't need to worry about freeing OS managed resources as the OS will do that for you when your program exits. Which kind of makes sense but it isn't easy to do without some global flag that you can check and a non-normal destruction path for your objects/data.

rbanffy · on April 18, 2017

The answer in SO is just a more elaborate instance of this same thing: allocating stuff and forgetting to delete all references to it.

fleg · on April 18, 2017

This would also make a leak in languages without a GC, possibly making it quite hard to spot.

With traditional leaks, where you're allocating memory and not freeing it, it's quite easy to find the problem using a sanitizer or valgrind. But with a memory wasted like this, which would get nicely cleaned when the program is exiting - well, you need to start a profiler and find the leak by yourself. Can be annoying.

nathan_f77 · on April 18, 2017

That's funny that this would come up on HN right now.

I've just spent the last day or two struggling with a potential GC bug on Android. It might be somewhere in JavaScriptCore, Android, or Genymotion, but I really have no idea. The last few hours have consisted of repeatedly clicking through my app, and trying to figure out why it crashes every 5 minutes or so. In this case it's not a memory leak, it's memory that is getting wiped for no good reason.

I've also posted about it on StackOverflow [1].

So far I've really enjoyed working with React Native. iOS has been almost perfect, but all my problems seem to be happening on Android. There was one case where my components were just disappearing randomly during animations (even without native drivers). The only workaround was to rotate the component "keys" so that they were regularly destroyed and recreated. The native animation drivers for Android are also incredibly unstable and buggy [2]. Ah well, I would still recommend React Native, but only if you're already comfortable with Cocoa, ObjC, and Java. You're going to face a ton of roadblocks if you can't get your hands dirty with native code from time to time.

[1] http://stackoverflow.com/questions/43470160/in-a-react-nativ...

[2] https://github.com/facebook/react-native/issues/13530

[3] https://github.com/react-native-community/react-native-blur

BoorishBears · on April 18, 2017

Because I've worked so much with native Android bugginess (versions of Android that crash if you try and pause an HTML5 video, crashes after X number of MediaPlayer loops where X is a large number around 5000, etc.), it can hard for me to use React Native because there's so many workarounds you're hoping the people behind components are implementing unless you want to start augmenting them yourself

nathan_f77 · on April 18, 2017

That's actually one of the reasons why I like React Native, because I'm hoping that it can be like "jQuery" for Android apps (in terms of jQuery's cross-browser compatibility). I'm assuming that the core RN code already contains a lot of workarounds for different Android versions, so we can just use one consistent API via JavaScript.

But you're right that it can get very tricky when you want to use a lot of third-party libraries. For example, I thought I would be able to just drop in 'react-native-blur' [1] and everything would work out of the box. I didn't realize that the Android component has been broken for a very long time, so I had to dive into the code and fix it. And then I ended up basically rewriting the whole library for iOS and Android, including the example apps, the README, and even the GIF in the README. But the nice thing is that other people can use it now, and they won't have to go through all of that. The other nice thing is that I learned a ton about developing native modules for React Native, so I don't regret it at all.

Just think that you're going to have to implement those workarounds anyway, so it makes sense to contribute them to an open source library that everyone can use. For your video examples, it would be awesome if you could take a look at react-native-video [2], and see if they already include those workarounds. (I would personally appreciate that a lot, because I'm working on a cross-platform RN app that plays a lot of videos.)

[1] https://github.com/react-native-community/react-native-blur

[2] https://github.com/react-native-community/react-native-video

BoorishBears · on April 18, 2017

The fear for me is running into a situation where I unexpectedly have to come up with the workaround. With native I generally know where the bugginess is and plan for it, there's a small layer of indirection when you use RN. I say small because to be fair, in a world without deadlines I'd just vet every library I want to use upfront. But sometimes things slip, and finding out the component you used is failing on <insert specific android version or Samsung model> before right before release is a little scary (but again, even with native that's a risk).

benmmurphy · on April 18, 2017

i would try creating a minimum test case that reproduces the problem. you are probably not going to be able to get much help with your issue if others aren't able to reproduce it.

to me it doesn't look like a memory corruption issue because you are seeing zero data as the corruption instead of trash data. but it could just be that whatever is corrupting the data is writing zeros :/

i'm not familiar with android but if its possible to turn on any runtime memory protections i would try that. if something is doing use-after-free or overflowing a buffer it might be able to catch where it is happening.

https://source.android.com/devices/tech/debug/asan

AldousHaxley · on April 18, 2017

I get the value of garbage collectors, but honestly until Go I wasn't too fond of them, as I'd find myself spending about as much time thinking about memory management in Java as in C. At least with C you don't have the same lack of determinism as you do with traditional GC languages.

I find recent language trends interesting. Moving away from VMs to native binaries, away from Java-style GCs and embracing ARC and smart pointers. I have a strong bias toward minimalism, so I like the trend. If it serves to remind people that CPU time and memory, while incredibly cheap, isn't free, then all the better.

mabbo · on April 18, 2017

Threads. They solve so many problems, and they also let you really mess yourself up.

Every service request would new some worker object which would in turn spin up an FixedThreadPoolExecutorService (or something like that) to handle some concurrent work it needed before returning.

But I failed to properly close or dispose or whatever it was with the executor service. So the threads? They stuck around, doing nothing. And every service call that came in, another K threads were created.

Thank god we had a good metrics system tied into the JVM stats or we might not have even noticed, except that processes seemed to die from time to time.

mschaef · on April 18, 2017

> Every service request would new some worker object which would in turn spin up an FixedThreadPoolExecutorService

I've recently reviewed (and failed) code that did exactly this. Maybe it's a more common misunderstanding than I'd have thought.

mmimica · on April 18, 2017

People deal with memory leaks in Java on daily basis, eg: https://medium.com/@milan.mimica/everybody-leaks-f210631f13e... http://www.evanjones.ca/java-bytebuffer-leak.html http://www.evanjones.ca/java-native-leak-bug.html

wruza · on April 18, 2017

Almost the same method for e.g. Lua:

  debug.getregistry()[{}] = big_value
  debug = nil

These aren't really "true" leaks, because from GC point of view the referencing object in both cases (ClassLoader, Registry, etc.) is just a hidden gc-root that cannot be reset from regular code. It is the same way you cannot un-leak this C++ pseudocode:

  main()
  {
    p = malloc(1_000_000 * sizeof(int));

    App app;
    return app.run();
  }

... because active stack frame is also a gc-root. From practical point of view though, this definitely leaks, since leaked data cannot be [re]used anywhere.

wyldfire · on April 18, 2017

I don't see how this example relates. Your C++ example is a real puzzler. Even if `p` were in use during run(), it could still be (and should still be!) free()d before returning from main(). It's a bit tricky but there's potentially LOTS of code that will execute after "`return app.run()`" but before the process exits.

bpicolo · on April 18, 2017

seems like something that optimizer would get rid of.

wyldfire · on April 19, 2017

No, the static destructors for example cannot be optimized away because their effects are intentional and required. In C++, close braces have potentially limitless implicit code that gets executed.

haddr · on April 18, 2017

I'm amazed that nobody mentioned in comments that this memory leak is not a "typical" memory leak, but rather quite an obscure one.

It happened to me once (through some 3rd party library), and before that I had no idea that PermGen leaks even exist...

I'm not sure if you can spot this memleak by only monitoring the heap space. The good news is that you can observe it in VisualVM when you watch the "classes" chart (loaded classes count, bottom left) and "PermGen" tab (top right, but you have to click the PermGen tab).

mcculley · on April 18, 2017

There are a lot of answers here and on Stack Overflow that are a lot more complicated or esoteric than they need to be. The simplest example in my mind is a vector of references (e.g., java.util.Vector, java.util.ArrayList). In the implementation of removing a tail element or truncating the vector, one must set the removed references to null, otherwise the objects they point to cannot be collected and yet they are unavailable to users of the vector.

jdmichal · on April 18, 2017

I'm not quite sure where you're getting that interpretation... Of course, any removed element must have its reference replaced with `null`, because otherwise that's a live reference. This is true for any object, not just collections. Your program not being able to reach it is a function of the interface, not the object.

I think that a separation needs to be made between purposefully and unpurposefully unreachable or retained references. For instance, it's perfectly valid for a method to return one of two internal references based on a boolean flag. So both those objects are still reachable, even if only one reference is "purposefully" reachable at once. That isn't a memory leak; it's just programming logic.

mcculley · on April 18, 2017

One can implement a class with an interface like java.util.List and have remove() just decrement an internal counter when the item to be removed is at the end of the list. This would be a correctly functioning implementation. One must go further and set the reference to null to prevent a leak.

jdmichal · on April 18, 2017

My point is that it's only a "leak" because the List interface makes it clear that the program no longer intends to reach that object. So this would be "unpurposefully reachable" -- The program did not intend for the object to be reachable, but it is. This has nothing to do with the List interface, and everything to do with how a interface is used and what intent that usage proves.

mcculley · on April 19, 2017

I think we are arguing over the definition of "leak". When you have one developer producing a data structure used by another developer, this kind of leak is an easy mistake to make. I think in the sense of answering the Stack Overflow question, that makes it a more useful answer than the examples about ClassLoaders and non-heap resources.

aliakhtar · on April 20, 2017

You have to work pretty hard at it, the sample code they provided is pretty long: https://gist.github.com/dpryden/b2bb29ee2d146901b4ae

And it extends ClassLoader which you wouldn't normally ever do.

ruleabidinguser · on April 18, 2017

Would you really expect a programmer you're hiring to know this? Is this useful knowledge?

Spivak · on April 18, 2017

Getting the answer right isn't really important, but the candidate's answer to this question potentially reveals a lot of information.

* Most likely they've never had to think about this before. How to they approach unfamiliar territory?

* Do they have a deep enough knowledge about Java that they could at least make a conjecture about how such a thing could be created?

* Do they understand that Java has a GC and how it works?

* Are they the rigorous definition type and declare that such a thing is impossible without a bug?

* Will they try to interpret the interviewer's meaning and talk about unintended long-lived references or ways to accidentally consume a lot of memory?

Overall I like questions like this that require some creativity to answer.

MartinBeckCop · on April 19, 2017

Yes! I am a hiring manager and this is exactly why I ask this question.

It's a very tellin question if they claim Java experience and can't say anything coherent about leaks or "unintentionally reachable" objects.

It means both that they haven't had to attack memory pressure issues, haven't used heap dump tools, etc., and haven't had the intellectual curiosity to know how the platform works.

stlHusker · on April 21, 2017

Bull.

The question as-is ("How do you create a memory leak in java?") is poorly constructed and will, a significant portion of time, lead down the wrong path when interviewing. Both you and the interviewee have to make assumptions.

If you want to know if they have had to deal optimizing memory usage, simply ask. What was the problem? How did you detect it? How did you solve it? No ambiguity, no assumptions.

"It's a very tellin question if they claim Java experience and can't say anything coherent about leaks or "unintentionally reachable" objects."

You didn't answer the root of the OPs question -- why is that knowledge important to the application you are building or the work you are doing? Sorry, you failed his interview...

smsm42 · on April 18, 2017

My experience always has been if you run any sizeable Java app for a while, you've created it :) Now, "how you find and eliminate a memory leak in Java" is a million-dollar question. In some cases, it may be literally.

lightlyused · on April 19, 2017

If you have a leak that you are having trouble finding, it pays to search the issue queue of any library you are using to see if they have caught something.

louithethrid · on April 18, 2017

Distributed systems, holding references to objects on other distributed systems. Loosing the connection, the refrence persists- the memory leaks.

rahilb · on April 19, 2017

would sun.misc.Unsafe.getUnsafe.allocateMemory(Long.MaxValue) work?

Is an intentional memory leak still a _leak_?

gildas · on April 18, 2017

What about using JNI for this?

OhHeyItsE · on April 18, 2017

import org.apache.commons.logging.Log;