Hacker News new | past | comments | ask | show | jobs | submit login
Borrow Checking, RC, GC, and the Eleven () Other Memory Safety Approaches (verdagon.dev)
189 points by todsacerdoti 7 months ago | hide | past | favorite | 68 comments



I think it’s better to view reference counting and gc as two kinds of garbage collection: https://courses.cs.washington.edu/courses/cse590p/05au/p50-b...


That is an excellent paper. One of those things that is obvious in hindsight, but no one wrote it down clearly before.


IMO it's not remotely obvious. Not to me anyway.


It’s obvious that is the same problem. It’s not obvious that it’s a more continuous linear tradeoff between the approaches


I have to disagree? Lumping them together is like lumping together formal verification with machine learning...

Two fundamental characteristics of garbage collection (in pretty much every programmer's experience) are that (a) it can handle cycles (read: more general programs), and (b) it does not provide hard performance guarantees in the general case. Reference counting is literally the opposite, and that's exactly what people love about it.


First of all, I recommend giving the paper a read, because I think you're misunderstanding the claim (plus, it is a very good paper). The claim is not that they are equivalent, but that tracing GC and reference counting are dual solutions to the same problem, two ends of the same spectrum if you will, with hybrid solutions existing in between.

Second, what you seem to consider to be fundamental characteristics of tracing GC and RC is not in fact so fundamental.

For starters, RC absolutely can handle cycles (e.g. through trial deletion). Such implementations may be difficult or impossible to implement as pure library solutions, but there is nothing that says it can't be done. The most prominent example of a programming language that uses such an approach is probably Python.

Nor does the claim that tracing GC cannot provide hard performance guarantees in the general case (while RC does) hold up under closer examination. Leaving aside the problem that it's already non-trivial to provide hard real time performance guarantees for malloc()/free() and ignoring the issue of cascading deletions, it doesn't hold under the more relaxed assumptions discussed downthread.

For starters, we have such predictability only for the single-threaded case, not for arbitrary multi-threaded situations. And even in the single-threaded case, there are real use cases where predicting performance becomes functionally intractable. Examples are implementations of binary decision diagrams or certain persistent data structures, where the presence of shared subgraphs of arbitrary size make predicting performance of individual deallocations impractical.

In contrast, in the single-threaded case we can absolutely bound individual operations of a tracing GC by either a constant or (in the case of arbitrarily sized allocations) make them linear in the number of bytes allocated (e.g. Baker's treadmill).

What is true is that in the absence of cycles, (naive) reference counting will free memory at the earliest opportunity, which is not something we can say for tracing GC.


> First of all, I recommend giving the paper a read

I'll put it on my list, thanks for the recommendation, but it really has no impact on my point (see next point).

> because I think you're misunderstanding the claim (plus, it is a very good paper)

Note: I wasn't criticizing the paper. I was criticizing the comment, which claimed "it’s better" to view these as special cases.

If it's not obvious what I mean, here's an analogy: it's the difference between having a great paper that reduces ~everything to category theory, vs. claiming "it's better" for the audience I'm talking to to view everything in terms of category theory. I can be impressed by the former while still vehemently disagreeing with the latter.

> For starters, RC absolutely can handle cycles (e.g. through trial deletion).

"Can handle" is quite the hedge. You "can" walk across the continent too, but at what cost?

> The most prominent example of a programming language that uses such an approach is probably Python.

You're saying Python uses RC to handle reference cycles, and doesn't need a GC for that? If so please ask them to update the documentation, because right now it specifically says "you can disable the collector if you are sure your program does not create reference cycles". https://docs.python.org/3/library/gc.html

> [...] hard real time [...]

Nobody said "real time". I just said "hard guarantee".


> Note: I wasn't criticizing the paper. I was criticizing the comment, which claimed "it’s better" to view these as special cases.

I didn't assume you were. My note about it being a good paper was just a general "this is worth reading" recommendation.

> "Can handle" is quite the hedge. You "can" walk across the continent too, but at what cost?

It's not a hedge. You claimed that (tracing) GC can handle cycles, while RC was "the opposite", which I read to mean that you believe it cannot.

While we are at it, let's go through the basics of trial deletion.

Trial deletion first looks at possible candidates for objects involved in a cycle (in the original algorithm, those were objects whose RC got decremented without reaching zero). Then, you do a recursive decrement of their children's (and their children's children's, and so forth) reference counts.

Unlike with regular reference counting decrements, you visit children even if the reference count doesn't reach zero. The net result is that reference counts are reduced only along internal paths, but that objects that are still reachable from external paths have reference counts > 0 after that.

Thus, any object with a reference count of zero after this step must be part of an internal cycle and can be deleted. All other objects have their original reference counts restored.

Because trial deletion operates on reference counts differently, it's not something that you can easily implement as a library, which is why you don't see it much except when a language implementation chooses to go with reference counting over a tracing GC.

> You're saying Python uses RC to handle reference cycles, and doesn't need a GC for that? If so please ask them to update the documentation, because right now it specifically says "you can disable the collector if you are sure your program does not create reference cycles". https://docs.python.org/3/library/gc.html

This is a terminology thing. Python uses a variant (generational) trial deletion approach [1]. It's not a traditional tracing GC, and it's also not inaccurate, because GC can mean more than using a traditional tracing GC.

> Nobody said "real time". I just said "hard guarantee".

I was not sure what you meant, so I answered both, as you may have noticed.

[1] https://github.com/python/cpython/blob/796b3fb28057948ea5b98...


Reference counting does not provide hard performance guarantees. If you disagree consider what happens when a large tree is freed at the root.


> Reference counting does not provide hard performance guarantees. If you disagree consider what happens when a large tree is freed at the root.

Sure it does. One hard guarantee in that case is that freeing M objects (out of N total that are alive) takes O(M) time. (Another is that you can access all the other objects in parallel without any impact. Etc.)


There's no hard guarantee that freeing an object runs in constant time. Reference counting tends to provide more predictable performance, but I don't think the phrase "hard guarantee" is appropriate.


You're conflating a bunch of things, but the biggest one here is that "hard performance guarantee" is not a synonym for "constant-time performance". Nevertheless, feel free to substitute your own preferred vocabulary. I'm just trying to get the underlying point across about how different GC and RC fundamentally are.


With optimized RC (which deffer deallocation) you have no "constant-time performance guarantee", because you don't know at which time the deffered deallocation will happen... And all modern RC implementations are heavily optimized - especially in Apple's ecosystem...


Yeah I think generally what people mean here is, "if I'm aware that freeing objects takes time, I expect to spend that time when I free the objects, not at some undetermined point in the future". That's not the case if you do any kind of deferring, at which point you're pretty close to a "classic" GC system.


Semispace GC takes O(N) for a live set of size N. You're also ignoring the work to actually free which is generally slightly super linear. I'm also not sure what hard means in that case.


> Semispace GC takes O(N) for a live set of size N

Is this a rebuttal? How? You have a million live objects, you just want to free 1 object, and this awesome GC accesses O(1 million) objects, and that's somehow comparable to refcounting's O(1)?

> You're also ignoring the work to actually free which is generally slightly super linear

That's like arguing that everyone who claims binary search is O(log n) is "ignoring" the fact that page-table traversals and/or bit-complexity "actually" make it O((log n)^2).

It's "ignored" because the performance of the underlying system is entirely beside the point being made about the comparison of 2 algorithms on top of it. You analyze an algorithm based on the number of calls to the APIs provided to it, which in this case would be mmap/munmap (or VirtualAlloc/VirtualFree, or whatever), and you assume each of those operations takes constant time, unless specified otherwise. It doesn't make sense to dig into the underlying system, especially when nobody has specified what the underlying system is to begin with.


> One hard guarantee in that case is that freeing M objects (out of N total that are alive) takes O(M) time.

Cheney's algorithm gives you O(N-M) time which really is in the same order of time complexity.


> O(N-M) time which really is in the same order of time complexity.

No, it's not. M could be 1 and N could be a million. The entire point of O(M) is that it's independent of N.


So, yes, if you have a million objects and need to release all of them except 1 then Cheney's algorithm does exactly 1 operation compared to 999'999 operations the reference counting does. Ergo, GC is faster than reference counting. It can even be faster than stack allocation [0]!

[0] https://www.sciencedirect.com/science/article/abs/pii/002001...


It's so frustrating and time consuming replying to non-rebuttals like these. I'm going to have to cut it off after this one.

> if you have a million objects and need to release all of them except 1

That literally only happens at most 1 time for each allocation during its lifetime. But the GC can still waste its time traversing that object a billion times before then, despite nobody adding or removing references to it. Whereas RC would only touch each allocation when you're actually doing something to the object.

It's like arguing for wasting several decades of your life, just to make sure your funeral is quick...


I am still not sold that e.g. using an arena as a scratch space for real-time frame rendering of the scene is worse than refcounting all of the temporary data, especially if the workload fits comfortably into the available memory.

Also, generations are a thing, so no need to "waste several decades of your life". And why is that metaphor applicable anyhow? Why not something like "stop constantly opening and closing the warehouse door during the work, and reinventorize it once a year instead of making a note every time you take something out of it or put something back in it" instead?

Anyway, thanks for you time and exchange.


One other advantage of RC that I don't think you've brought up yet, is that you know exactly when the work to free M objects will take place.


No you don't. If you did you could just call free. You know one of the decrements will do it, but not which one.


Good point.

I was thinking of a lot of cases where RC is being used as a general solution for lifetime management with automatic refcounting; but most objects have a well-known lifetime with their refcount being 1 95% of the time. And another 4% have their refcount bump up to 2 briefly when they're temporarily shared, or transferred to a new owner (think returning a shared_ptr<>).

But, yeah, the "genuinely shared" case - either the 1% in the scenario above, or when RC is only being used for objects with complex lifetimes - is definitely a significant use-case I overlooked.


Nitpick on usage of term “hard guarantee” in the form of rephrasing for GC: “One hard guarantee is that GC takes O(N) (N number of alive objects)”.

“O(M) guarantee” and “you can access all the other objects in parallel” are valid points, but they come with caviats that to utilize the first, one must know how many objects will be freed, and to utilize the second, one must know that two objects are not reachable from each other, which is not always the case e.g. objs=find(scene,criteria)

(Obviously you are aware of that. This comment is meant to inform other readers and prevent further idealisation of RC)


> to utilize the “O(M) guarantee”, one must know how many objects will be freed

No -- unless I'm misunderstanding you(?), this is incorrect. The following pseudocode (I'll just use Python syntax for simplicity) has no idea how many objects are being deleted:

  while True:
    print("[Progress] Deleting: " + node.name)
    prev = node
    node = next(node)
    del prev   # Assume this happens M times
Assume each node owns O(1) blocks of memory.

With GC, the user can see random arbitrarily-long stutters, depending on the shape of the rest of the object graph, and the phase of the moon.

With RC, you can rely on the user seeing your progress indicators consistently... without knowing or caring what M was.

> to utilize “you can access all the other objects in parallel”, one must know that two objects are not reachable from each other, which is not always the case e.g. objs=find(scene,criteria)

Again, this is false, and (subtly) misses the point.

The claim wasn't "you can access all objects in parallel". That's not true in any system, be it GC or RC.

The claim is "you can still access some objects in parallel" with RC. The crucial difference here is that, under a GC, all threads are at risk of getting throttled because of each other arbitrarily. They simply cannot do any work (at least, nothing that allocates/frees memory) without getting throttled or interrupted at the whim of the GC.


> With RC, you can rely on the user seeing your progress indicators consistently... without knowing or caring what M was.

What does that mean? How is that helpful for the user to know you've free-d 56,789 objects so far if he doesn't know if there's 60k in total or 600k?

> The crucial difference here is that, under a GC, all threads are at risk of getting throttled because of each other arbitrarily. They simply cannot do any work (at least, nothing that allocates/frees memory) without getting throttled or interrupted at the whim of the GC.

And that's also the case for all thread which share the same RC-d object. They will all be throttled when the memory is freed.

The biggest benefit of RC is how it seamlessly interact with native code, that's why it's a great fit for languages like Obj-C and Swift or Rust (opt-in), but in terms of performance, be it latency or throughput, it's not a particularly good option (the trade off it makes is comparable to copying GC but with higher max latency than copying GC and even lower throughput, and doesn't have fast allocation and memory compaction as a side benefit).


> If you disagree consider what happens when a large tree is freed at the root.

Isn't that an implementation detail of the GC itself? For example this RC algorithm [1] (which detects cycles) can be implemented by tracking free and potentially free objects in their own special structure that can be evaluated incrementally.

[1] https://dl.acm.org/doi/10.1145/1255450.1255453


It guarantees that the referenced memory will be available for subsequent operations. Knowing precisely how much memory is available at any particular instant in time is pretty critical for performance engineering.


Only if one ignores memory fragmentation.


Pretty sure everything they wrote is true even with fragmentation.

But, also, if your point is that defragmentation is a selling point... I mean, the last time I saw someone who moved from RC to GC because of defragmentation was... never? Probably because you can just solve it by throwing money at it and buying more memory to permanently solve the problem. Whereas people go in the other direction all the time... probably because you can't just perpetually throw more money at the problem and make compute time go N times faster.


Only on the cloud and classical desktop PCs.

You never saw anyone move from RC to GC, because there are very few high performance pure RC implemenations to make them a first option anyway, and in most cases that requires a rewrite in another language.


Fragmentation is always a possibility unless you've handled it specifically, like with a moving garbage collector.


Which reference counted implementations never make use of, when they do, they slowly evolve into a tracing collector when fully implemented.


The more optimized RC is, the more similar it is to GC - and this includes properties like cycle handling and performance guarantees.


RC is and will be slower than GC. Even in C++ I can have pointers managed by GC that are faster than shared_ptr.


This paper is referenced in this post too. As "a whole spectrum between the two." (RC and tracing GC)


Reference Counting is one way of GC and you can clearly read in great detail about it in the book "The Garbage Collection Handbook", chapter 5 [1]

[1] https://gchandbook.org/contents.html


Never free and periodically crashing by OOM killer is rarely used but can be useful in limited circumstances. There are some shops that arbitrarily kill any worker over X hours old under the hypothesis that a memory leak is present or probably present.


> Never free

AFAIK it's how you get memory safety in missile guidance software: you just have to put enough RAM on the board so it never run out of memory before the end of its flight.

Edit: TFA just talks about that in the “never free”


I'm aware of several Java service backends that just disable the GC entirely and take a once-a-day crash as a cost of doing business. As long as you have enough nodes that the crashes aren't correlated, your overall service can maintain decent uptime


Best part

>I was once working with a customer who was producing on-board software for a missile. In my analysis of the code, I pointed out that they had a number of problems with storage leaks. Imagine my surprise when the customers chief software engineer said "Of course it leaks". He went on to point out that they had calculated the amount of memory the application would leak in the total possible flight time for the missile and then doubled that number. They added this much additional memory to the hardware to "support" the leaks. Since the missile will explode when it hits its target or at the end of its flight, the ultimate in garbage collection is performed without programmer intervention.


> "Linear reference counting is an elusive concept, where we can completely eliminate the counter integer, and do all of the reference counting at compile time."

Right. I've occasionally thought about that for Rust, as a way to do back references. You can have an Rc owning an object, which can own another object, which can have a weak RC reference back to the original. This allows you to find an object's owner when needed, so you can do inheritance-like things when you need to. The single-ownership and dangling-pointer checks are made at run time and will panic if they fail.

Now, the question is, can you statically check access behavior and determine that those panics will never occur? If you can, you can eliminate the reference counts and checks. That looks 1) possible, and 2) a PhD thesis sized problem.



It's weird to see no mention in this post of separation logic, which would seem to provide a basic, general foundation for most of these techniques if perhaps not all of them. For instance, the RustBelt paper gives a description of Rust borrow checking that can arguably be understood as relying on some form of separation logic.


I am surprised that call stacks aren't mentioned. The call stack is a great way to free up memory that's no longer being used by a function. Maybe it was too obvious to include?


That doesn't solve the "use after free" problem, so not memory safe (as long as not combined with something else).


What is the "use after free" problem? A function leaves the computed result on the stack or modifies data higher in the stack. There are no pointers, nothing to free.


If you return a pointer from a function that points to a stack local variable, it'll be invalid. Generally this is easy ish to detect though.


In this stack only scheme there are no pointers. Function leaves its returned data at the top of the stack.


Most of the fancier runtime-less non-GC techniques fail when it comes to graph data structures (or even linked lists) without a tremendous amount of upfront work from the programmer.


Are you just trying to throw shade on Rust?

https://doc.rust-lang.org/std/collections/struct.LinkedList....

> NOTE: It is almost always better to use Vec or VecDeque because array-based containers are generally faster, more memory efficient, and make better use of CPU cache.

https://docs.rs/petgraph 78 M downloads


> And as it turns out, reference counting can be blended with immutable region borrowing to greatly reduce its cache misses and make it data-race safe, something no language has done yet.

Rust prevents data races, so I'm not sure what this is referring to.


I think it’s saying that no language has blended reference counting with immutable region borrowing, not that no language is data-race safe.


One kind of data races, in-process memory not shared with other processes.


Isn't the "regions" approach similar/same in principle, as Rust's memory model?

A region's model is described as:

> Pony's iso keyword: an iso'd object (and its contents) are only reachable from your pointer; nobody else points at that object or anything the object contains [...] we can "temporarily open" an isolated subgraph for a given scope, and afterward it would still be properly isolated.

In Rust, AFAIR, one gets a mutable reference to an object, and the borrower can independently operate on its fields (subgraphs), while the root mutable reference can't be accessed from anything else other than the borrower.


Early Fortran worked in a way that every function was allocated a global static memory area for its arguments, variables and the result. No stack, no heap, no worries.



A compile-time reference counting and reference count ellision scheme would be awesome for jq. It could only happen with an escape hatch to reference counting. We had an interesting contribution that added a piece of this, but meant for libjq-using applications, not for jq itself.


The language the development of which this blog discusses is worth checking out too: https://www.uiua.org

It is an interesting APL successor I was planning to learn after getting current projects out of the way.


I thought the language was https://vale.dev?


Oh. You are, of course, right.

Damn pattern recognition, the logo looks almost the same if you squint!


Java supports the never free strategy with Epsilon GC


TFA is pretty awesome, not least because of all the links. I'm going to be reading for a while.


what about "self freeing" pointers?

you allocate and must / are extending its lifetime all the time or else it frees itself! very predictable behaviour of memory collection.

you get pointer and keep it allocated by "extending" its lifecycle.


This is interesting but exhausting to read. I get the joke but come on, after the first few of these can I just read the information in line.


The explanation for the the chosen style is given near the end of the piece: the author says that they’ve found ‘imagining a better solution already exists and is just out of reach’ (my paraphrase) is an effective way to break the mental habit of assuming we’ve discovered everything there is to know about a problem.


Agreed. I don't really have time for informational articles that don't just present their information in a straight-forward way.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: