Being able to automate tests for memory leaks is very helpful but I would much rather incorporate the browser dev tools than not when debugging memory. Doing it only in javascript feels very janky (weak maps, attaching giant array objects to things, waiting ?? amount of time for the automated GC to run).
I was debugging memory leaks in some pretty large javascript products and while I did initially start with methods like this post, I only started to make reliable progress when I became familiar with the heap snapshot tool. Diagnosing problem is trivial (just filter the heap snapshot by a Class name and count the instances) and it is The Tool To Use to actually resolve it (tracing the chain of references in the retainers view)
In the future if I need to write automated tests to identify memory leaks I will look into automating the Dev Tools rather than fumbling with WeakMap or performance.memory.usedJSHeapSize
The devtools protocol is super simple. It's very easy to automate everything devtools can do. You can do it with something high level like puppeteer/selenium, or just directly with a websocket to the browser.
It'd be better if the committee would just stop pretending that JS has some sort of holy determinism that would go away if we were allowed weak references. They're going to have to put it in soon anyway in order to support wasm refs.
It's a shame that WeakMap and WeakSet aren't enumerable. Ephemeron tables in Lua are enumerable, so creating a data structure that detects when an object is gone is possible. And because Lua supports finalizers, you can trigger an event the moment the object is garbage collected.[1][2]
[1] Lua also supports reviving objects. For example, if your finalizer leaks the object to the outside environment. Implementing this can be tricky, which is perhaps why JavaScript doesn't permit it. So many limitations in JavaScript exist because a particular feature would be too difficult to implement in existing implementations.
[2] Of course, because Lua supports finalizers it's possible to detect GC without using ephemeron tables. But if you can't control the target object's finalizer, then ephemeron tables are handy for doing this. And you can use your own finalizer on the ephemeron table value object to detect GC, though it'll trigger one or two GC cycles behind when the key was collected.
If you can observe GC from content then site behaviour will start to depend on the specifics of one GC implementation. Not only will this be a disaster for cross-browser interop, it will prevent that single engine improving its GC design going forward because doing so will break working sites.
Note that this is somewhat different from a language like Lua where it isn't necessary for all existing script to work with every new iteration of the runtime without any changes.
The fact that implementation details like GC timing have remained behind the veil of abstraction is one of the only reasons that JS engines have managed to evolve from being relatively slow interpreters to highly optimised JIT compilers.
The key to understanding Lua is that the C API is a first-class citizen, and the Lua authors strive to keep the Lua scripting and Lua C API semantics symmetric (my characterization). As a general rule, anything you can do in Lua script you can do from the C API--closures, coroutines, etc--with the same logical semantics. (And using the standard C-compliant Lua C API--no C compiler extensions required.) The VM implementation, language semantics, and Lua C API strongly reflect each other and channel language design and VM implementation. Finalizers are critical for a good C API, and whether the finalizer is a lua_CFunction or an in-language function should be and is irrelevant. And a rule that didn't permit leaking would be rather brittle and error prone from the perspective a C module author.
Of course, relying on ephemerons to merely detect GC collection (as opposed to their primary purpose of caching and memoizing data associated to a particular blackbox object instance) is a rather obtuse hack, and not something I've seen in practice for anything other than debugging.
I don't agree that JavaScript's semantics are what have permitted JavaScript optimizing compilers to work so well. Details like the above aren't necessarily difficult to implement efficiently; they're difficult, if not impossible, to implement if the implementation architecture didn't contemplate their existence, or if the architecture was predicated on their non-existence. For awhile LuaJIT outperformed mature JavaScript engines despite having to deal with more complex semantics and more abstract constructs (e.g. stackful coroutines that were independent of the C stack, fidelity to the Lua C API, goto, variable return lists, guaranteed TCO, etc), as well all the same dynamic typing headaches as JavaScript and Python. And it did so with an appreciably smaller VM. And I have no doubt it could do it again with the benefit of another 10 years of knowledge and experience.
What Lua has going in its favor is 1) the authors are only weakly committed to backwards compatibility--there are always breakages with each new version, usually small but they add up over time; and 2) a strong commitment to a symmetric C API, which means the authors have to think long and hard about language constructs and architectural details. #1 provides them the liberty to experiment with better semantics, including discarding constructs or behaviors that don't work out. #2 is a constraint that prevents them from taking certain shortcuts, such as intermingling the "C stack" with the logical language stack. The reason JavaScript only supports explicit "async" methods rather than a more seamless control flow construct is because all implementations (except DukTape?) mingle C stack semantics with the in-language stack, including especially the JIT compiler components. This chumminess isn't necessary--LuaJIT isn't particularly handicapped by avoiding it--but it's not something you can fix after the fact; it'd requiring starting an implementation from scratch, and that's probably not ever going to happen again, at least not merely for the purposes of providing better control flow semantics. VM authors tend not to appreciate how architectural details limit the range and scope of future language constructs. The Lua C API gives the Lua authors fewer degrees of freedom, which paradoxically results, I think, in them avoiding implementation dead-ends.
I do agree that it's problematic to rely on certain aspects of a GC, like absolute timing in the case of mark & sweep style GC. But other aspects, like finalizers and object revival, are deliberate guarantees focused on language semantics. Another guarantee that Lua provides is that objects are destroyed in reverse order of creation, which is a useful guarantee from the perspective of a C module writer. Again, because of the nature of the Lua C API and the need to provide more than just a blackbox GC, Lua is far more deliberate about which aspects of the GC can be relied upon (and how), and which aren't their problem. Other languages avoid this--the very thought of providing any GC-specific language features is anathema--but in time invariably find themselves providing ad hoc guarantees and interfaces, or locked into accidental semantics.
> That it’s a bad idea to put multimedia apps on the web?
Yes? Obviously? The only reason Flash ever worked for anything useful (-ish depending on your opinion of video games) was that it was in practice isolated from the rest of the web browser. Javascript... isn't.
I do not buy into this argument. Finalizers are generally non-deterministic, regardless of implementation. Sure, sometimes that causes issues, but that is not a good argument against having them.
I don't think the problem are the semantics, per se. It's knowing or anticipating the semantics before you ever write the first line of VM code. LuaJIT was written to provide near perfect semantic and C API fidelity to PUC Lua 5.1; semantics that were and still are quite sophisticated relative to popular languages. (Lua is a well-disguised functional language, notwithstanding that dynamic typing is no longer en vogue.)
Today's modern JavaScript engines were written with the primary purpose of making then existing JavaScript constructs fast, and those constructs were few and mostly simple. (Exception: JavaScript was an early adopter of lexical closures, notwithstanding its block scoping quirk. Prototypes are conceptually simple but, like closures, rather complex from the perspective of the VM and especially a JIT engine.) They took liberties and shortcuts--all useful, but few without similarly performant alternatives--that had the effect of constraining their ability to implement newer constructs. Because nobody would demand the major browser implementations to introduce drastic architectural changes, newer language constructs were and continue to be tailored to fit the design constraints of existing implementations.
Lua has accumulated so many sophisticated constructs because it's a tiny VM that is substantially rewritten with each new major version (5.1 -> 5.2 is a major version bump). And Lua isn't beholden to a strong backward compatibility guarantee; they can discard things that don't work, keeping the VM relatively simple and small and thus easier to refactor. It's notable that while LuaJIT is much faster than PUC Lua, LuaJIT is stuck at 5.1 + some 5.2 extensions. Most newer constructs and semantics added to PUC Lua since 5.1 are relatively difficult to add back into LuaJIT.
These changes are not really important though, in my opinion. Lua authors do research first, not engineering first (still good and oldschool, as seen from sources). LuaJIT and 5.1 found themselves embedded in many environments, because it was one of the most successful variants. And later additions to 5.2/3 which are incompatible and non-compatible with LJ/5.1 were disputable in a mailing list. Biggest non-compatible and LJ-unimplememted changes were _ENV and integers. _ENV was a cute alternative, but not a huge revolution. Integers... well, I hope Roberto and Luis know what they’re after with that.
In a context of js/lj, lj is a single-variant of always compatible language, which I think shares more similarities with js in this regard, rather than with lua 5.x series.
It's bollocks. JavaScript is the only GC system of any widespread use that doesn't allow programmers weakrefs. JVM languages all have it, .net has it, SmallTalk, various collectors for C++, Cocoa (ARC as well as the briefly-viable GC), even ActionScript had it. Very few universes were torn asunder.
> Some browsers, including Chrome but not Firefox, have the ability to check the amount of Javascript memory used. So the solution to test if an object is there, is to make it sufficiently large so that it has a noticeable impact on memory.
There's a decent chance that JavaScript will eventually copy these features and their design. So the current Java features offer a glimpse of what could happen.
Use the profiling tools to figure out where your memory leaks exist. Doing it in code has a lot of negative consequences. Weak maps and sets, for example, will slow down your GC.
The typical usage of Weak maps and sets is for caching. This allows the GC to give itself extra room on the heap at the expense of potentially another round trip or new calculation.
This doesn't make any sense. If you don't know how to create one, if you don't know how to test that you don't have leak, how can you know you don't have leaks?
A simple python memory leak:
a = get_a_huge_object()
use_huge_object(a)
# a function that takes long
# that does not use a, but allocates more resources
f()
You can fix this by doing:
# get_huge_object() returns a context manager
# it deallocates at the end
with get_huge_object() as a:
use_huge_object(a)
f()
or even just:
def g():
a = get_a_huge_object()
use_huge_object(a)
g()
f()
if you can rely on built-in deallocators.
I hardly believe javascript code is divinely protected against something like this.
I don't consider this to be correct since the main computation in that function will be done in `f()` where `a` is inaccessible and therefore cannot be deallocated. This is similar to mallocing `a` and then not free'ing. Obviously when the program dies python will deallocate everything, duh, that's not a useful definition of memory leak since even in a memory leaking C program, OS will clean up the memory after program dies. While the program is working though, it's going to keep leaking memory this way even in python.
I am sorry but you are wrong. Your understanding of the concept is not complete yet. Close but not complete.
Memory leak has a specific meaning. A memory leak is memory that is been allocated but cannot be accessed and deallocate anymore. How useful is that memory is not important. The Wikipedia article explains it much better than myself: https://en.m.wikipedia.org/wiki/Memory_leak
There are useful definition of memory leak, some software need to run for a very long time, they cannot leak memory.
In languages where the memory is automatically managed it is quite hard to generate a memory leak.
Indeed is the original article that simply used a wrong terminology.
If you got more questions let me know! Happy to help!
I understand this definition, I explained it above in my comment. The problem is that definition is not useful. It's useful to detect memory leaks this way since it's easier, but that doesn't help us understand all problems our program can have with leaking resources. First and foremost, the problem the original definition points out doesn't exist any more. When a program dies, all the resources it allocated, file descriptors, memory, ports etc will safely be cleaned by the OS. Any modern OS that doesn't do so will consider this a bug. So there is no practical reason to sweat the original problem. It's still useful to sweat it since it's an indication of a problem, which is that your program fails to deallocate resources it allocates. However, this failure is not important just at the moment your program dies. It will remain important the entire lifetime of the program. As I explaind above, if your program runs in a loop, and needs to make O(N) resources runs on this loop and you can use constant memory every time, all you really need is O(1) memory since you can deallocate at the end of each loop. The problem arises when you fail to do and use O(N) resources. This will make your program break on large inputs when it really can work. What's more crucial is:
(1) Analyzing your program's minimum asymptotic resource need
(2) Observing your program's real asymptotic resource need
(3) Optimizing your program in a way (2) is closer to (1)
I just believe we work in very different environments.
We had problems with compilers having memory leak. The software I write, runs for weeks or even months without being restarted. Yeah, the original problem of memory leak, or even resources leaks in general is still very very real in some field.
Now, of course if you use python or golang, or javascript, basically you will never have a real memory leak. But this is not a good reason for calling bad use of resources "a memory leak".
BTW:
> First and foremost, the problem the original definition points out doesn't exist any more. When a program dies, all the resources it allocated, file descriptors, memory, ports etc will safely be cleaned by the OS. Any modern OS that doesn't do so will consider this a bug. So there is no practical reason to sweat the original problem.
Memory leak never concerns the OS, like never. When the OS allocate memory to a software then is the software responsibility to deallocate it, returning it to the OS.
> Any modern OS that doesn't do so will consider this a bug.
This is true but it is not what we are discussing.
Anyhow, I am just trying to help you understand what people usually means with "memory leak" because you seems a little confused.
But if you are sure and you are definitely not confused and you think that it is me being wrong, I am not going to engage in any discussion.
In JavaScript where there's often a single thread running a long-lived process I think the common case is when you keep allocating stuff through event handlers and not dereferencing them. A single wasteful build-up of memory is not really what I see discussed when people are discussing memory leaks in JS. As another commenter said, the classic case being adding UI event handlers and then removing the DOM object without dereferencing the handler. Doing this over and over will eventually make the app unresponsive but not immediately.
Like I said in another reply, I don't think I have leaks because memory sits at a fixed percentage even though I get hundreds of requests a second. If there was a memory leak wouldn't the memory be eaten up eventually?
If there is a memory leak, then yes, memory usage would increase as your program continues making progress.
That's like saying: "I run my program and it works, how can I have bugs??"
Unless you know how to properly test, of course you won't be able to find memory leaks and memory usage will stay the same. Do you click the same thing 5000 times and observe every time memory is cleaned? Do you have unittests against this?
Idk, long story short "I don't know how to make memory leaks in javascript, so I must not have them" is non sequitur.
> Do you click the same thing 5000 times and observe every time memory is cleaned?
I don't know how to check that but not sure I understand. If it really was a memory leak the hundred of requests a second I get would surely make the app consume the whole server's memory eventually. I don't need to click stuff 5000 times to test that. It's happening in production as we speak. Just to clarify, it's a single-threaded Node process and its memory consumption has stayed the same for weeks, since the last time I restarted it.
It seems highly unlikely you have a memory leak as it doesn’t sound like you are observing continually increasing memory usage.
I’ve worked a lot in C# and JavaScript and I’ve only had to watch out for memory leaks with event aggregators for messaging between UI components.
So the UI has a button to start a use case. Each time the user goes into the use case a new object is created which on construction registers a callback against an event type with the event aggregator.
The memory leak occurs if this callback is never de-registered on completion of the use case, since the callback is holding onto a reference to the new object created at initiation of the use case.
If the use case is called repeatedly (think something like a check-in terminal), eventually the memory is exhausted.
On backends I’ve only ever worried about potential memory exhaustion from things like unbounded caches.
It depends on your application. I'm working on an application in Node right now that keeps hundreds of megabytes of configuration in memory since each model can take seconds to generate, and due to recent changes the # of models doubled.
It's not a good design. I've been trying to get buy in on using an LRU cache...
Also, I've worked in a few environments (Java and Node) where a spike of requests come into the system and prevent the GC from running - and then you run out of memory. I think that mostly went away with G1. We had a lot of stuff that ran on our customers sites and was prone to traffic spikes...
I've learned that if you're not thinking about GC it usually bites you later. Better to think about performance, at least a little, up front.
Maybe it's more common on the frontend? Complex UIs are a hotbed for leaks, where it can be a nightmare to figure out they're happening and the cause e.g. a previous screen or UI component not being destroyed properly so a table of data you generated to be displayed doesn't get collected.
Just search the React and Angular GitHub issues for "memory leak".
> (1) I have no leaks and (2) I don't know how to create them -- These don't really square with each other.
If he runs his service for a long time and the memory usage doesn't keep growing, no resource exhaustion problem occurs, then he can reasonably say he has no leaks. At least, he has no significant leaks.
I don't have to know how to create nuclear bomb to know I don't have one.
I was confused by his description of WeakMap, because I hadn't looked at the spec in a while. For reference, the reason why it's not applicable to the task at hand is because it's not primarily meant to have the values weakly referenced. Rather, it's meant to not lock the keys in. I'm not sure what he means by "best used to link objects together". WeakMap simply allows you to keep data about an object outside of it and without keeping it from being freed when it's not needed anywhere else.
That's what he means by linking objects together. Though I would describe it as you did; WeakMaps are normally used for associating data with an object, without having the WeakMap entry itself keep the object (aka the WeakMap key) alive. If the object goes away otherwise, the entry will go with it. (Or if the WeakMap goes away, then all of its entries will go away at once.) In terms of "liveness", a WeakMap entry is an "and" edge between the key and the map: the entry is only kept alive if both the key and map are still alive. (And the entry keeps the value alive, though it may not be the only thing doing so.)
WeakMaps (ephemeron tables) were designed specifically to make garbage collection not be observable, for the reasons jgraham described above.
If you really want to know, such as for testing, then WeakRef and FinalizationRegistry are the newer ways to do this. But know that if your app needs this stuff as part of its normal functioning, then there's at least a 90% chance you're doing something wrong. The GC does not run because you think it should run. If anything, it tries to not run, as much as possible. And the more we optimize it, the better it will get at not running. "Running slow code" is worse than "running fast code" is worse than "not running code at all".
Is there anything like VisualVM for NodeJS? It's nice in JVM land that you can just see how many objects of a particular class are on the heap.
Chrome's devtools kind of work, but it's not nearly as good.
I'd love to be able to see how many objects exist with a certain signature, or prototype. I found some mdb_v8 thing but it seems it's built for a very old version of V8.
> If you are writing an application in Javascript, soon you will have to worry about memory leaks.
What? No! Memory leaks are impossible in fully garbage collected environments. A memory leak is when you manually allocate memory and loose the pointer, thus making it impossible to access the memory. (Because there's no pointers in Javascript, there's no memory to leak.)
What's happened is not a memory leak, it was an unregistered event handler. These are best diagnosed with profilers, not gymnastics with weak references. (It's not a memory leak because there is still a valid reference through the event handler, ie, no lost pointer.)
Even if there is a more correct technical term for this, I think you’re trying to fight back an ocean here.
If you have a function which each time inadvertently creates a new object in memory which it never de-references (thus is never GC’d), repeated use of the function will ultimately result in memory exhaustion and termination of the process.
Everyone I’ve ever met would refer to that as a memory leak.
I was assuming a WeakMap is just a map (hash table) from a key to a WeakRef? What is it if it isn't that (Explain like I'm a 5 year old person who doesn't understand dynamic types or JS)? I read here that WeakRef is only a proposal yet, so clearly a WeakMap/WeakSet can't make use of a WeakRef...
Not quite. First, a map from a WeakRef to a value wouldn't do anything useful, since a WeakRef's target dying does not magically blow away the WeakRef itself. Second, you would have no way of getting from an object to its WeakRef(s), so you couldn't do a lookup anyway. But those criticisms are unfair; you're just using loose language to mean a map whose entries hold their keys weakly and values strongly.
But it's not that either. If it were that, and your key died but your map didn't, then the value would still be kept alive. And it doesn't hold both key and value weakly; in that case, you could have both map and key alive and yet the value could die; WeakMaps won't allow that.
It's something subtly different. It's a collection where both the key and map have to be live for the WeakMap entry to keep the value alive. "Weak" in the name is something of a misnomer, in my opinion. Weak normally means "something that can refer to an object without keeping it alive". WeakMap entries are not weak, they are normal strong references that very much keep their values alive -- but only if both the map and key are both alive.
Thanks. That does feel a bit backwards, for the purpose of GF tracking as you can’t store id->obj.
I assume if/when a WeakRef is exposed in the language then a normal map can be used to store id->WeakRef(obj), so no real need them for also having a “mapWithWeakRefValues”?
A WeakMap works with objects as keys. You can do it with objects you don't control, without adding some sort of id field. It can't be fully simulated with a WeakRef.
One common use is to associate extra data with objects, without attaching it to those objects directly. In your map, you add an obj->extradata mapping. Having the target be a WeakRef would just mean you'd lose your extradata while the source obj is still around.
I was debugging memory leaks in some pretty large javascript products and while I did initially start with methods like this post, I only started to make reliable progress when I became familiar with the heap snapshot tool. Diagnosing problem is trivial (just filter the heap snapshot by a Class name and count the instances) and it is The Tool To Use to actually resolve it (tracing the chain of references in the retainers view)
In the future if I need to write automated tests to identify memory leaks I will look into automating the Dev Tools rather than fumbling with WeakMap or performance.memory.usedJSHeapSize