Hacker News new | past | comments | ask | show | jobs | submit login
Inside D's GC (olshansky.me)
144 points by arunc on June 20, 2017 | hide | past | favorite | 42 comments



The D language is currently heading towards deterministic and safe memory management, making it possible to avoid the GC overall, hence further work on GC improvements was deprioritized. http://dlang.org/blog/2017/06/16/life-in-the-fast-lane/

The problems are known and indeed a full rewrite on this ancient GC would be in order. Since D 2.072.0 it's possible to link and use different GCs, so a faster one could be written as an external library which would be a very welcome effort. https://dlang.org/changelog/2.072.0.html#gc-runtimeswitch-ad...

I'd be interested in experimenting with a Connectivity-Based GC (https://www.cs.purdue.edu/homes/hosking/690M/cbgc.pdf), leveraging type information to do partial collections. Write barriers come with a performance penalty (~3-5%) that we don't want to impose on people using deterministic memory management, but most GCs capable of performing partial collections, e.g. generational GCs, do require write barriers, Connectivity-Based GCs do not.

For the time being the pragmatic advise is to replace major sources of GC allocations with deterministic memory management when the GC heap grows too big (~1GB) and performance becomes a problem.

So far this hasn't prevented various people from writing extremely fast D programs.


> The D language is currently heading towards deterministic and safe memory management, making it possible to avoid the GC overall

Is this really true?


Yes. I gave a presentation on this at DConf 2017.

http://dconf.org/2017/talks/bright.html


But is it really possible to completely avoid the GC? I'm thinking of memory blocks with cyclic references, and stuff like that.


Yes, but the D community isn't structured enough to have such a specific goal. The main effort that I can think of is dip1008, which aims to add reference counted exceptions (Thus eliminating one of the few truly difficult things to get around, e.g. @nogc exceptions require user cooperation else memory gets leaked)


Write barriers are cost effective with a language that makes high use of heap storage allocation, like Java. It makes a lot less sense for D, because D uses a lot of stack allocation and embedded instances instead.


Write barriers (or related techniques) are also important for incremental garbage collection.


D is the most promising "one to rule them all" languages for me, given it's expressive power, speed, compilation speed and support for multiple paradigms. That said, D severely lacks manpower in its development.

I'd like to port author's effort to Windows, but I don't have OS-level programming experience, so I don't even know what are the differences and how they affect GC algorithms described.


Immix is a copying collector. It's late and I may be misreading, but glancing at the code I don't see any precise object layout info for the heap in D's GC. That is, D seems conservative for both stack and heap. How can you make Immix work with this kind of conservative GC?


One simple "solution" is to not implement evacuation, thereby not moving objects. In my opinion this somewhat defeats the purpose of using Immix, but some projects decided to take this route (e.g. Rubinius has/had an Immix GC but never implemented evacuation if I'm not mistaken).


Presumably adding precise stack maps and object maps would be the first step of adding IMMIX. I mean it's the compiler that chooses where to place things in the heap and the stack, so it already knows where they are.


Since, I don't know D, I am curious whether D-code is written in such a way that objects are assumed to be non-moving, especially when interfacing with C.


The spec says the GC may move, though the current GC doesn't do that:

https://dlang.org/spec/garbage.html#obj_pinning_and_gc

There's also been some work on a precise GC for D:

https://forum.dlang.org/post/hdwwkzqswwtffjehenjt@forum.dlan...


> written in such a way that objects are assumed to be non-moving

Not generally, but you can take the address of anything and shove it into a union or pass it off to C code that stores it somewhere. Because you can, it would be very dangerous to move objects around because of that.


Ah, yes maybe FFI and also perhaps inner pointers.


Good luck with this. I find GC programming really hard as any small mistake snowballs and becomes super hard to find during debugging.

Some of the mistakes seen by the author could just be the original programmers being tired of painful debugging.


The mistakes seen by the author aren't bugs. They're just less performant algorithmic choices.

One charge not leveled at D's GC is bugginess. The GC has proven to be pretty robust.


What are you talking about? GC programming is usually the easiest to avoid small mistakes.

Circular references? GC will collect them! (Reference counting does not handle circular references.)

Usually the two mistakes with GC are forgetting to empty collections, (which is also a problem with manual memory management,) and relying on finalizers to clean up resources.

GC will not "read your mind" and magically close your open sockets, release your file handles, and dereference your static references. It just frees you from the tedium of releasing transient memory, like strings; and it frees you from tracking lifetimes of memory when you don't need to.

In my experience, the programmers who have trouble with GC are the ones who don't understand the difference between memory management and resource management.


They meant implementing the GC itself.


If D manages to side-step the GC completely then Rust suddenly has a formidable competitor.


As a former Oberon user and a Modula-3 fan, I would rather see the GC being improved.

History is full of GC enabled systems programming languages since the Xerox PARC days, the latest example being System C# for Midori.

Most projects failed more due to political reasons than technical ones.

Which is why I appreciate Apple, Google and Microsoft's attitude regarding forcing automatic memory management on developers for their OSes, to a certain extent.


Garbage collection is very hard to combine with real time, and at the kernel level it is unacceptable (because it introduces a pathway with very hard to bound latency).

Since that is the exact use case that interests me when it comes to languages such as C, Rust and D I would rather see the whole thing taken care of at compile time with very strict rules about how long it can take to traverse a certain piece of code, especially when it runs on the other side of the syscall interface.


Garbage collection at kernel level is just yet another kernel daemon with the same constraints as all the others.


Rust already had a much more formidable competitor in C++. :P It would make no sense for Rust to bother competing directly with D, and likewise it would make no sense for D to bother competing directly with Rust. If you want market share, you have to pile on the leader rather than waste your energy cannibalizing the minnows.


More important than competing is to make it easy to call libraries written in one language from the other. extern(Rust) would be a nice addition to D.


Rust just needs to add extern(C++) like D has and then they can communicate.


or extern(C)...?


I find Rust's lack of a GC it's biggest weakness; the language has a lot of complexity to avoid using a GC.

That being said, I anticipate that I'll write a lot of Rust in the future!


This is somewhat subjective, but I'd argue you've got it exactly backwards. I already know "more than enough" GCed languages and use them frequently. That niche is well covered for me - add a GC, and remove the language tools designed to help deal with the lack thereof, and Rust would be just another language for me to ignore. Even within GCed programs, I sometimes need to reach for manual control over memory or other resources, control lifetimes, control ownership, manage shared access, etc.

And sometimes those needs are so pervasive that I want or need to effectively avoid using a GC myself - and so reach for a non-GCed, non-JITed, "manual control over everything" language. This I do not have well covered - traditionally my choice has been the undefined behavior infested, data race prone, "didn't even have threads in the standard library until ~2011", source of bugs and misery - C++.

Rust's lifetime and ownership semantics are complex, yes. But on the other hand, they can be used as a much more uniform, clean, and flexible version of e.g. https://clang.llvm.org/docs/ThreadSafetyAnalysis.html , which is but the tip of the iceberg of static and dynamic analysis, unit and fuzz testing, etc. I've already been applying to my C++ programs, to try and keep the bugs in check. So Rust is simpler than what I've been doing in practice.


Which complexity in Rust is due to the lack of GC, and not due to, say, the desire for thread safety?


> Which complexity in Rust is due to the lack of GC, and not due to, say, the desire for thread safety?

The whole lifetimes business, which is done more generally for memory safety. It adds a lot of complexity.

On the other hand, there's clearly a niche need for a language with no GC at the systems level, so I can appreciate the reasons for Rust's decision. But after playing around with Rust for quite a bit, I'm come to the decision that at least for me, Rust's lifetimes are too complex, and (partly as a result of no GC), the overall language is just not functional enough to be a general purpose programming language. Scala is currently filling that role for me.


Thread safety (specifically, freedom from data races) is memory safety.

Additionally, lifetimes are also part of the system that statically avoids problems that appear in GC'd languages like iterator invalidation (for instance, ConcurrentModificationException).


Freedom from data races is not hard and has nothing to do with managing lifetimes or avoiding GC; Concurrent Pascal did it in the 1970s for a language without shared references, Eiffel did it in the 1980s for references, too (and with GC). Rust just uses one specific model (the ownership approach, see in particular Boyapati's and Flanagan's papers), but it's not the only one. There's a huge body of research especially from the 1990s that covers this area in considerable detail.

The problem is that a lot of mainstream programming languages pretty much ignored the state of the art when implementing their concurrency models. See Per Brinch Hansen's rant about Java [1], for example.

[1] http://brinch-hansen.net/papers/1999b.pdf


I didn't say it was the only model, nor did I say that Rust was the origin of its style of data race prevention. You seem to be annoyed that Rust is using techniques similar to older research to avoid data races and also annoyed that other languages are ignoring the same older research??

Thank you for the references (although I think you meant Flanagan without an h). From a quick glance at one of the papers (without any guidance from an expert I have no idea where to start for a good summary), it seems like the retrofitted PRFJ has to fight the pervasive sharing of having a GC, resulting in a much less orthogonal system with higher annotation burden. In any case, at least two of the Rust team did PhDs in the areas of parallelism/concurrency so I'm sure they can give a more academic comparison of Rust and the research.


> You seem to be annoyed that Rust is using techniques similar to older research to avoid data races and also annoyed that other languages are ignoring the same older research??

Neither. I am just unhappy that these days avoidance from data races often gets reduced to the ownership approach and I wanted to provide a more complete picture (even if a quick summary is still woefully incomplete).

(And thanks for pointing out the typo – now fixed.)


> Additionally, lifetimes are also part of the system

That "additionally" is exactly why I said "more generally". I didn't say that lifetimes were not for thread safety. I just said that they were more generally for memory safety (I would count iterator invalidation etc as memory safety, would you not?).


Iterator invalidation can lead to memory unsafety in C++, usually due to a pointer or reference becoming dangling, but most other languages just treat it as a correctness problem: GC avoids the problem of invalidating pointers. As my parent comment hinted, some languages, like Java, try to detect (some occurrences of) these problems at runtime.


Some languages throw exceptions if their iterators are invalidated, right? That seems separate from memory safety to me.

Or put another way, having memory safety doesn't guarantee iterator safety.


Getting rid of the GC completely will never happen: It's too useful. Being able to not have to manually annotate memory lifetime (shared_ptr, unique_ptr) is very useful. I don't think having it done automatically in the compiler is ever going to happen (politically, rather than technically)


The GC also makes CTFE (Compile Time Function Execution) particularly nice.


last time I tried Dlang, it had the option to turnoff the GC completely


It still does, but that doesn't mean you aren't able to create garbage. There is effort to both improve the GC and to make it easier to avoid allocating GC managed memory.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: