Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A garbage collector that knows that all values are immutable will be rather interesting, I think. Typically, a garbage collector will stop the world (i.e. halt execution) to do heap defragmentation of the old generation. When all values are immutable, though, you can defragment the heap by copying a value to another fragment and just swap the internal pointer to the value.

This might actually have significant performance benefits. When everything is immutable, GC can also be inherently incremental. You just do your defrag copy faster than new object creation. Then you cam limit your stop-the-world to a small "catch-up" so your copy-from can become a copy-to. Incremental GC nirvana!



Haskell's generational garbage collector is able to use some tricks to take advantage of the fact that values in Haskell are (almost all) immutable:

http://www.haskell.org/haskellwiki/GHC/Memory_Management


I don't see how immutability makes much difference to garbage collection. Garbage collectors don't look at the value inside memory they are collecting, only whether something live is still retaining that part of memory. Whether the value is mutable or immutable has absolutely no bearing on the performance or logic of the GC. Either something is still using the memory, or it is free to collect.

Also you've just described existing GC systems - The Java G1 collector is very similar that (it's a lot more complicated though).


Generational GC usually depends on write barriers to detect creation of references to newer generations inside older generations.

Immutability prevents the mutation of older generations, so no new references can occur.

So I expect generational GC to be easier to implement. I wouldn't expect it to be faster or slower though, because programs will need to be written differently to cope with the limitations on mutability. O(1) algorithms will likely be replaced by O(log(n)).


In the case of Clojure, log is log base 32, so it's close enough to O(1) as to make no difference in most circumstances.


Many immutable algorithms replace arrays with trees. Trees are pointer-heavy, and pointers are less cache-friendly than arrays.

Memory access is often over 100x the cost of an L1 cache hit. It doesn't take too many of those to make a big difference if you're CPU bound.

My comments are general, I'm sure Clojure has access to arrays where necessary for interop at least.


Again, true, but the big problem I'm facing isn't raw performance so much as it's guaranteed latency.


But the constant factor is Z, a very large number. A number so large it has a color and name. It pays taxes, has lunch and sees movies on the weekend.


Not true in the context of dynamic languages. The constant factor for persistent maps in that context is quite small.


The actual GC of freeing memory is indeed possible (and common) without stop the world. Heap defrag is another story, though. This involves moving objects to different locations in memory, and if the memory is mutable, measures like stop the world is required since multi-byte move is not atomic.


Doesn't matter. In an immutable model, the old copy is guaranteed not to change, so you can safely copy it non-atomically. (That's one of the big benefits of immutability.)


Which makes really good incremental collection much easier to write


But not faster.


True. But what I find myself wishing for is not more throughput, but seemingly pauseless operation.


> This might actually have significant performance benefits.

Not really. On the other hand, it makes the implementation of GCs much simpler.

(immutability semantics also tends to make memory churn much higher, so what little gain you get from a simpler implementation is often lost in the GC having to do more work)

Also, it's not like these things are unstudied FFS.


It's not just GC, but virtually pause-less GC that is the possible benefit. Yes, these things are studied, but I would challenge you to find a free high level language implementation with GC suitable for soft real time. Right now, I know of Erlang, which isn't the fastest language and running Clojure on the proprietary Azul VM.


I'm not sure what you're trying to say. If no such implementation exists, even though there are a number of languages predicated upon immutability, one would conclude "virtually pause-less GC" isn't such an obvious or easy "possible benefit" of immutable structures.

Unless you're using refcounting: since immutable structures can't create cycles, a refcounting GC won't need a cycle-breaker, and thus won't need to ever pause. But you'll be using throughput (especially for allocation I'd guess).

Also not sure what your point is re. Azul, immutability is not what enables it since Zing is first and foremost a java VM.


I was thinking about refcounting, actually, but not exclusively. My point is that there aren't that many systems where you can build in a high level language, deal elegantly with concurrency, but still have really good latency guarantees. This project might improve on that.


> immutable structures can't create cycles

Perhaps I'm misunderstanding something, but doesn't "tying the knot" in Haskell do just that? (Create a cycle from immutable structures.)


Tying the knot is possible only if you have pervasive laziness.


Which is precisely why an immutable OS that's Clojure all the way down might be desirable.


I don't get if you imply that Clojure has pervasive laziness or not. The contract of lazy sequences is too loose to make tying the knot not a brittle hack -- and I think it's a good thing.


I'm referring to the article. The Clojure inspired OS would, I suspect. If no one has referred to the unrefined parts, what would be the problem?


"When all values are immutable, though, you can defragment the heap by copying a value to another fragment and just swap the internal pointer to the value."

Wouldn't that turn shared objects into not-shared objects? Theoretically, that doesn't make a difference in a system without mutable data (assuming you leave out 'identity check', too), but in practice, it likely will blow up memory usage tremendously.

Edit: I think the way around this is to replace the moved object with a "this object moved" object. The trick, then, will be to know when it is OK to discard that "this object moved" object. That probably is the case after a full heap scan.


http://en.wikipedia.org/wiki/Cheney's_algorithm Is quite common in systems and uses "forwarding pointers".


Agreed. The potion/QISH implementation of the two-finger cheney loop is extremely efficient even without register values (volatile), typically 3ms pause time on a mid-sized heap. Compare that to typical java or ruby pause times, starting from 25ms to typically 150-250ms. You need more memory though, as in most lisps it's using copying, i.e. compacting, not just simple mark&sweep. Incremental is doable, but you need to add GC state then, which is a bit tricky in threads. Actually libgc is pretty good.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: