Hacker News new | past | comments | ask | show | jobs | submit login
Three bad things: threads, garbage collection, and nondeterministic destructrors (apenwarr.ca)
53 points by bensummers on Aug 16, 2010 | hide | past | favorite | 28 comments



The author spends a lot of time claiming garbage collection is bad because it can't solve problems it wasn't designed to solve. Garbage collectors are for freeing memory allocations. If you're using them to manage scarce resources (file descriptors and sockets, especially), your design is already broken.

Ditto destructors -- you should write your code such that a destructor closing a resource is considered a programming error. assert(FALSE) in a debug build, log it in production, but don't just let such a bug creep silently by. The only exceptions are C++ RAII wrapper classes, which are designed to have magical destructors.

C#'s using(), Python's with statement, Haskell's with* functions -- they're all designed to solve the problem that a generic garbage collector doesn't and shouldn't, which is "I need precise control over the lifetime of this resource allocation". Java doesn't have such a mechanism, but then, Java doesn't have much of anything for a supposedly high-level language. C doesn't have such a mechanism, but that's OK, because it also doesn't have exceptions.

Additionally, programming with threads is much less complicated than the author is making it out to be. Sure, it's hard if you're manually (un)locking mutexes like a damn caveman -- but every major high-level language available today (except, again, Java) provides better abstractions. Some, like Erlang and Haskell, make concurrent and parallel programming especially easy.

Finally, if you can get by with a single-threaded, refcounted, deterministic program -- go ahead. Have fun. The rest of us, here in the real world, have applications more complicated than a high-school midterm to work on. And customers won't pay you if the user interface is sluggish because you were too ignorant to write a proper thread pool.


Finally, if you can get by with a single-threaded, refcounted, deterministic program -- go ahead. Have fun. The rest of us, here in the real world, have applications more complicated than a high-school midterm to work on.

At a previous employer I worked on a single-threaded, refcounted program that served a website with over a million dynamic pages/hour and which cleared over $30 million profit each year. OK, we parallelized - we ran our service with prefork, and a standard reverse proxy configuration, and split the load across something like 10 machines. (2 of which were able to take the load by themselves.) Nor is this close to the largest such website that I know of.

Don't knock the "high-school midterm" projects.


Another point worths mentioning is the recurring, bogus argument about GC, non-determinism and comparison with malloc. If you think malloc is any more deterministic than gc, you are kidding yourself: think about what happens when malloc gives you a block that contains pages which will be pulled from disk when you access them. When you want to program something which has low, guaranteed latency, you locks your pages in memory, and you never calls malloc in that thread.


> you should write your code such that a destructor closing a resource is considered a programming error.

Not disposing your resources properly is an error. Your language should make it as easy as possible to make this impossible (or at least highly unnatural).

> The only exceptions are C++ RAII wrapper classes, which are designed to have magical destructors.

They are the (only?) correct form of resource management: automatically clean it up when you're done with it.

> C#'s using(), Python's with statement, Haskell's with* functions -- they're all designed to solve the problem that a generic garbage collector doesn't and shouldn't, which is "I need precise control over the lifetime of this resource allocation". Java doesn't have such a mechanism, but then, Java doesn't have much of anything for a supposedly high-level language.

using() is the same as try{}finally{}, which Java does have. The problem with these is that you can forget them.

The problem with garbage collection is that it seems to always come with the baked-in assumption that memory is the only resource that really matters, so handling of other resources is a bolted-on afterthought. GC impeded correctness, not because of anything fundamental, but because it is seen to substitute for proper resource management.


The "With" style construct is the correct way to handle this. Embedding resources like file-descriptors in objects is a Bad Idea.

Python has added the "with" construct so it can move away from ref-counted implementations, because the amortized time for refcounting is slower, and even if you don't care about that, there is still the circular reference problem.

[edit] I should make it clear I think tying an FDs lifetime to that of an object is a Bad Idea; I'm not suggesting a blanket ban on ever having a FD stored in an object.


Correct resource management is to clean it up when everyone is done with it. As one book put it, liveness of an object is a global, not local, property. If a reference to an object escapes the function which created it, it rapidly becomes infeasible to statically prove it is ever safe to destroy the object, and the penalty for almost getting it right is undefined behavior. That's why memory deserves to be handled specially. Leaving files and connections unclosed can cause errors, but they can't make correct code go unpredictably insane.


using isn't quite the same as try/finally. using is a construct explicitly for freeing resources. With a try/finally construct it's up to the user to fill in the finally block correctly.


I don't see any sign you're telling the author anything he doesn't already know. At least, it looks to me like his complaint is precisely that using and with are awkward solutions, because they break object encapsulation in a particularly ugly way.

Or are you trying to argue that objects should never contain file descriptors or sockets?


The author is arguing that reference counting is superiour to garbage collection, because it provides deterministic resource deallocation.

I say this is wrong -- object lifetime and resource deallocation should be entirely separate. He should use with and using(), not depend on a quirk of a particular implementation of the language.

If an object's lifetime ends while it still holds an open handle to a resource, this should be considered a programmer error.


Java was an innovative language when it came out. But future proofing byte-code put a very tight lid on innovation (resulting in generics hampered by type erasure, for example) and the language has very much stagnated at the same time many other languages have innovated like crazy.


As long as Oracle doesn't decide to scrap the set of promises that Sun had made for Java 7, that release will resolve many complaints and bring the language forward a bit, at least. It's not going to make Java the most powerful language around by any means, but it will help, and I think we'll start to see much tighter Java code coming out, at least from the people that know what they're doing.

For one thing, it will finally add automatic resource management blocks, which by my understanding do exactly what is being complained about here - they let objects deterministically clean up after themselves without relying on finalization.

What I'm still skeptical about is whether closures will actually make it in or not...it's much harder to add them on after the fact (while still maintaining backwards compatibility) than it is to design a language around them from the start, and I'm not really convinced that Oracle will care enough to see the efforts through to the end; at the very least, I don't see it as likely that it will happen without a serious schedule slip.


Java's chief weakness has been it's lack of first class functions. If closures don't make it in to Java 1.7 then it's game over, it'll be sealed as the COBOL of our times.


It sounds like you haven't looked at Java since at least 4 years ago. finally{} may be less convenient than using(), but does the job of controlling lifetimes just fine. And java.util.concurrent provides all the sophisticated synchronization concepts you could ask for.


finally { } doesn't really do the job. You have to write it all the damn time! Bad developers keep making mistakes writing try { } finally { } blocks. Some have absurd religions about where and how things should be initialized. In C#, IEnumerator<T> : IDisposable, which means you can iterate over resources using foreach loops. try { } finally { } blocks have a higher mental cost of -- they push more things out of short-term memory. finally { } does the job of being evaluated when exiting the block just fine, but that's all.


" Sure, it's hard if you're manually (un)locking mutexes like a damn caveman -- but every major high-level language available today (except, again, Java) provides better abstractions"

... java.util.concurrent?


I, on the other hand, tend to agree with the author. It would be nice to have an alternative to the JVM / .NET hammer. Note that I am not saying that those things are useless, just that I am tired of them used as the one true way to develop "business" apps.

Additionally, I have had to do a lot of "batch" type work in my career, such as "extract, transform, load" kind of work, and having to deal with GC sucks. It might be cheap to create and destroy temp objects, but holding a map/hash/dictionary in memory gets expensive, since the GC is always stomping around, and paging memory in and out of cache outside of the region doing the "real work".

I want options, not religion. http://roboprogs.com/devel/2009.07.html -- OK, I happen to agree with this guy's post.


"Then you can create a file, write to it, and close it, all in one line. The closing is implicit, of course; it happens right away as soon as there are no more references to the file. Because, obviously, nothing else would make sense."

This behavior isn't even guaranteed in CPython (and never has been as far as I know). There's a problem with deterministic destructors and refcounting: if there's a reference cycle, which destructor gets called first?

    import weakref
    import gc

    class Something(object):
         def __init__(self, other):
            self.other = other

         def __del__(self):
            print '__del__ called!'

    s1 = Something(None)
    s2 = Something(s1)
    s1.other = s2
    del s1
    del s2
    gc.collect()
    print gc.garbage
Even the cyclic garbage collector won't pick this up. These kinds of issues come up more and more if you rely on python's __del__ methods (aka destructors). The solution? Quit complaining and just use a with block. It isn't that bad.


With reference counting, those issues are never caught.


It's not just about reference counting though. It's a weakness of deterministic destructors. When you have a reference cycle, it's impossible to call destructors deterministically even if you aren't using reference counting.


D seems to be more sensitive to these concerns than other languages, at least as of late. Classes can be marked as "auto", which indicates they are to be destructed deterministically. I think this is a more elegant solution than requiring users to opt-in to proper semantics, though requiring the auto specifier at the use-site would make things more explicit.

Also, most languages have some sort of thread-based tasking framework that can make it easier to implement async message-passing based concurrency.


It's still bad, since it will lead to the effect "just write every class with auto."

C++ combined memory allocation and the object destruction in one thing and now we have too many people that think those should be one primitive. They shouldn't. Once you recognize that you have to care about both separately, you start to recognize that even GC is an answer to the wrong question.

For really "cool" programs (using the most of the resources etc) you simply have to manage memory yourself. Object should not manage their allocation, they should just "know" who's in charge for their allocation.

Some Bloomberg guys recognized that and made some libraries for C++, ans some work was done to get something like that in C++ standard, but I don't know if the result is useful.

I also don't know if such ideas would look nicer in D.


I'm somewhat confused by the D documentation. The documentation refers to "auto objects" in the prose somewhere, but it seems to refer to the feature as "scope classes" and uses the scope keyword to do so. Here is the page I am referring to: http://www.digitalmars.com/d/2.0/class.html . The reference also says the use of the scope keyword is required at the declaration site for references to scope classes.


The problem I can see with specially tagging types like that is that you can't compose them within classes that use GC. The determinism is viral.

You could make it an error to rely on destruction for resource release, but what stops the programmer from wrapping the resource in an object of his own making, and freeing the resource in the wrapper's destructor? Walk the stack to check if the code is called from ANY destructor?


My favorite bit:

So far, the vast majority of the programs I write have been single-threaded, refcounted (or manually memory managed), and deterministic. I think I'll continue to do it that way, thanks.

Well, gee, I'm so glad that's cleared that up for me - here I was, writing multi-threaded/non-deterministic code all over the place, just because I like it so much. Tracking down race conditions is a choice, we can choose not to get involved with any of it, and we'll be just fine! Some guy on the Internet says so, it must be true!


Well, you can write deterministic code in Java using threads, just use defined constructs that give you guarantees about how things will be executed and orders -- ExecutorService and BlockingQueue are your friends.


Real life shows, ref-counting, when done manually, is more error-prone then GC. Programmers make errors. As single missing release keeps your object locked in memory. One missing add-ref crashes your program. Microsoft soon learned this lesson after the introduction of COM and quickly added smart pointers. Still not a good solution. For their followup technology (.Net) they used GC. Lession learned.


Meh. Locking is going to be a requirement if you have mutable shared state. Whether it's threads or events that are driving it, the possible classes of problems look almost identical.

Merely having a drawback doesn't make something bad. So garbage collection cannot run destructors deterministically. That is indeed unfortunate but it in no way outweighs the benefits -- for example, that the vast majority of classes which don't hold these kinds of resources can just be discarded and forgotten. It's kind of a stupid thing to say that this alone makes them bad. It would be like if I said, "Feet suck because they can step on nails." Wait, that's not quite right.


Please, this is ridiculous. Find me a professional programmer who doesn't consider garbage collection and threads to be a useful thing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: