Hacker News new | past | comments | ask | show | jobs | submit login
Language Design Deal Breakers (sebastiansylvan.wordpress.com)
51 points by modeless on May 25, 2013 | hide | past | favorite | 49 comments



As a language designer, I’m glad to hear my goals might resonate with somebody. :)

Dynamic typing is easy to implement, and that’s about as far as its value goes, as far as I’m concerned. If your static type system is getting in your way, you need a better static type system, not a dynamic one. The same goes for dynamic scope—much as I like having it available in Perl and Emacs Lisp, it’s best used sparingly.

Memory safety is also kind of a no-brainer. I enjoy working at all levels of abstraction, and having control over memory layout is great, but I don’t miss it when working in a higher-level language. Compilers and runtimes for statically typed languages can usually be trusted on matters of value representation.

Efficient storage reclamation is something I’m particularly concerned with. In the absence of mutability and laziness, refcounting is an excellent GC replacement because you can’t get cycles. :)

I came to that choice from working with Haskell—it’s wonderful, but it suffers from performance issues due to the GC–laziness combination. Both immensely useful features, to be sure, but they come at a cost. OCaml hits a certain sweet spot: it uses GC, but also eager evaluation by default; has convenient access to mutable values; and the runtime uses unboxed primitives.

Windows support is…well, an unfortunate truth. The state of programming on Windows isn’t great outside the Microsoft stack—MinGW and Cygwin notwithstanding, as “using Windows to emulate Unix” isn’t the same as “using Windows”. Language designers have a sad tendency to neglect Windows as a platform but I can’t quite tell why. Can it be just because they work primarily with Linux, Unix, or OS X?


Refcounting?? I thought that idea died years ago.

Refcounting is an extremely poor choice for memory management on modern machines. Even putting aside the issue that it can't handle cycles: constantly writing to memory to modify reference counts can be a huge performance hit (you often have to do it even when you are accessing objects in a read-only fashion). Also it requires extra memory on a per-object basis. Also it doesn't play nicely with concurrent threads. Also it's not cache friendly.

I'd expect any halfway-decent modern GC implementation to be significantly more efficient than reference counting.

About the only valid justification I can see nowadays for not using a GC is a hard-realtime latency requirement. Yep, GC pauses suck. But for efficient memory management in nontrivial systems, GCs are definitely the state of the art.


> Refcounting is an extremely poor choice for memory management on modern machines.

Tell that to Linus:

    Data structures that have visibility outside the single-threaded
    environment they are created and destroyed in should always have
    reference counts.  In the kernel, garbage collection doesn't exist (and
    outside the kernel garbage collection is slow and inefficient), which
    means that you absolutely _have_ to reference count all your uses."
https://www.kernel.org/doc/Documentation/CodingStyle

Just to balance your downsides of refcounting, some downsides of GC: requires halting the program periodically and trashing all the caches, which is inconvenient and sometimes impossible. And performance-wise it's unpredictable and can require lots of tuning, as Java programmers who work on allocation-heavy workloads can attest.


Reference counting is surely not state of the art, and very definitely un-fancy, but what it has going for it is that it strikes a good balance: reasonably efficient, predictable and bounded memory usage, plays nicely with others (no worrying about scanning other heaps / object pinning / etc). Where reference counting shines is that its memory usage has a lower high-water mark than any GC algorithm.

Garbage collection has a lot going for it too, but any garbage collecting algorithm will suffer from at least one of the problems you mention, in varying degrees. Reads and writes should not require memory management code, like updating a refcount? That rules out a large class of GC concurrent/generational algorithms that depend on read and write barriers. No extra memory on a per-object basis? This rules out many incremental mark-and-sweep algorithms that store the object color. (True, the color may be implemented with a side table, but so also may reference counts.) Play nicely with concurrent threads? This rules out a large class of stop-the-world collectors. Cache friendly? This rules out semispace copying collectors, which may even page in memory from disk during a collection.

I'd say the evidence shows that GC performs best when there's lots of memory available and low interactivity, so it's a good choice for server-type applications. For interactive applications with limited memory, especially mobile apps/games, GC is more problematic, and it's not so clear. I don't know if the GCs in XNA or Android are "halfway-decent", but they both have significant overhead that causes real performance issues.


You’re right in the general case. My language does not have mutation, so refcounting is[1] just an optimisation to avoid copying and allow in-place mutation when the refcount is 1; it only applies to boxed values. I should have said “copy on write”.

And I’m well aware of modern GC implementations, but one of my goals is deterministic memory behaviour, a feature lacking in other functional languages. I swear I know what I’m doing. :)

[1]: Read: will be; the current implementation is interpreted but I’m working with a friend on an x86_64 VM target.


Amazing ... you just restated the article in half as many words, and avoided sounding like a prima donna. This is a useful skill if you actually want people to listen to you.


HN comments and blog articles are how I practice writing. Haven’t been writing any blog articles lately, so.


I intended my comment to be a compliment ... in fact, I'd written and discarded what I decided was a "too nasty" comment about the OP's article.


> but really people who think dynamic typing is suitable for large scale software development are just nuts

I would argue that ITAs software (http://www.itasoftware.com/solutions/travelers/flightsearch....) is larger than most software efforts, and it is written in LISP. Not statically typed.


Because everything is a list.


No, not really.


Everything is a graph.


I don't get the following requirement:

    Great Windows support
That doesn't have anything to do with the programming language. That's all about the libraries and toolchains that the language ships with. Sure, when I evaluate a language, I also take into account the maturity of the libraries and tools that the language includes. But I don't think lack of library support (especially for a single platform) should be a criterion in judging the design of the language.


Yeah, this point is just plain dumb. And although he says windows is an extremely popular o/s, it might be, in general, but not for software engineering. A unix like environment is far superior for software dev, unless you stay in visual studio all day (or eclipse)

The other points are largely flame bait too. A competent developer can manage just fine in java, python, c++ or a host of other languages that fail his tests.

A classic case or the bad worker who blames his tools. My response to this kind of blog post is grow up, deal with the reality of the languages that are in every day use and stop whining about things like null pointers!


That’s not what the post is about. He’s saying that if you’re designing a new language, you should meet these criteria because they are deficiencies in existing languages.

We live with the flaws of old tools because they have other, more important value to us—a new language has no value without adoption, so it had better improve on its predecessors in order to convince people to adopt it.


I think his title is more to blame than the content. He's not really just talking about the design, he's talking about the whole package, about doing actual work and getting money for it. He mentions other environmental concerns in the first paragraph as things that keep him in C++. Design just happens to be the most noticeable factor.


He works on videogames, so Windows is non-negotiable.

I had the same reaction, but given that his code has to run on other peoples' desktops, it makes sense.


I love this post but I don't wholly agree with the null-purity system. I think that the existence of the 'null pointer exception' is a sign that global, total, error handling in anything but trivial applications is really hard.

So let's remove null pointer exceptions and lets require that anything that could return failure be wrapped in an option type. That way you can determine if there is Some x or None inside of the possibly-None value. Well, then you need to deal with how to fail in the case of possibly-None.

Even the best functional programmers I've seen fall back to some kind of exit-or-exception semantic in the None case. So we've kind of re-invented the null pointer exception, by having cases where the easiest thing to do in our program in response to some failure is to just abort the program.

I think that this is fine from a memory safety perspective. The terrible thing about the 'NULL' pointer is that it is a sentinel that is overlaid into the domain of valid addresses to the machine. When we add a layer that checks if the address is valid without relying on its literal value, then we gain a lot of safety in terms of not generating a hardware exception (that is difficult to reason about) or scribbling on some other object (which could be impossible to reason about). Terminating the program, to me, is a fine tradeoff to avoid corrupting and modifying the program in an unknowable way...


>Even the best functional programmers I've seen fall back to some kind of exit-or-exception semantic in the None case. So we've kind of re-invented the null pointer exception, by having cases where the easiest thing to do in our program in response to some failure is to just abort the program.

The difference here is that the programmer is making a conscious choice to violate the option type by writing a partial function which escapes it. It also makes very plain and clear (via type annotations) which functions are allowed to fail and which are not (pure functions). With NULL pointers you're basically making every pointer in the entire program an option type with the potential to crash if you haven't written the boilerplate to handle it.

Oh and by the way, Haskell's Maybe and Either types are monads which can be handled very elegantly via do notation, simple combinators or monad transformers (MaybeT and ErrorT).


Yeah, I just think that at every point in your program where you will have a Maybe the failure case could be as inelegant to program execution as raising an uncaught NullPointerException, and escaping that involves a lot more work than using a language without a NullPointerException.

Maybe I should write more Haskell code, but this has definitely been my experience with OCaml...


I've never used OCaml, but Haskell isn't as bad as you describe. When you use Maybe, you do not have to deal with the Nothing (ie. Null) case until you want to escape the monad (even then, you could still ignore it, and just crash if you do have Nothing).

If an intermediate calculation results in Nothing, then the final value will be Nothing. Handling this is as simple as:

foo :: Maybe Int -> Int

foo Nothing = 0

foo (Just x) = x


I don't think the fact that a reasonable course of action in the event of a null type is to do something exception-like implies that having any statement that involves something nullable potentially do so is a good thing.

It's just that if it's the right thing to do, you should have an active hand in making it do it.


I'm kind of willing to debate this.

Is the big difference between like:

m.foo_meth() // may throw NullPointerException

and

m = getSomeM()

match m with Some x -> x.foo_meth() | None -> raise NullPointerException

that in the second case, the exception raising is explicit and in the first it is implicit? I kind of get that I guess, it just seems that it's a similar level of verbosity to deal with it whether or not you allow null pointers and you will, I think, always bottom out to some similar kind of code pattern for error handling...


The difference is that

(a) in a case where failure is supposed to be possible, you are strongly pushed to check for null when it first comes up, rather than just ferrying around the returned object and perhaps getting a NullPointerException five levels into a subsequent function call;

(b) in a case where failure is not supposed to be possible, which is probably the majority of cases where NullPointerExceptions come up in practice (since failure is more commonly represented by exceptions or error codes than just null), you have the type system verifying that there is no path through your code that accidentally fails to initialize a variable or otherwise introduces a null.

edit: Of course, initialization is not always predictable enough for the type system to verify it, but here functional-ish type systems can help: if your class data looks like "bool connected; Socket socket;", it's possible that you could accidentally end up with connected == true but socket == null, but if you write it as (pseudocode) "state : data Disconnected | Connected (socket : Socket)", with Socket non-nullable, you're assured that it's impossible.


those are both big advantages, but neither of them make dealing with the errors easier or more predictable as far as I can tell. it would definitely constrain the locations in your code where there could be surprise though and maybe that's enough?


Isn't fewer surprises almost by definition more predictable?


    Even the best functional programmers I've seen fall
    back to some kind of exit-or-exception semantic in
    the None case.
I offer two responses to this. I'm going to use Jane Street's alternative Core standard library for OCaml (http://janestreet.github.io/) in my examples, because that's what I'm most familiar with.

First, consciously raising an exception is far, far better than having an exception raised without you being aware of it. I can search OCaml code for all of the instances of raise, failwith, and _exn, and end up with a reasonable understanding of when my code might conceivably fail. I can't do the same thing in Java, because many of the functions I'm calling might return null.

And it's really not that hard to deal with optional values when you have them. Sure, it's slightly more verbose than not having to deal with null, but in practice you have to write a bunch of if statements checking if things are null everywhere anyway.

Also, most languages don't have good abstractions for dealing with nulls despite them being everywhere, so they're always painful. Languages like OCaml and Haskell do have good abstractions, so you can often avoid a lot of the verbosity: if you have an optional value and really do want to raise an exception (usually only in scripts/smaller programs), wrap your value in Option.value_exn. If you have a reasonable default value, call Option.value ~default:(...). If you have a bunch of optional values together, you can sequence them together with Option.(>>=), because Option is a monad.

Second, there are alternatives to raising exceptions in the middle of big programs. Consider the Or_error library (see https://github.com/janestreet/core/blob/master/lib/or_error....), which is a part of Core. Rather than having a type like Option everywhere:

    type 'a option = Some 'a | None
It also has types like Or_error (defined approximately like this):

    type 'a or_error = Ok 'a | Error Error.t
Error.t is a very useful type that lets you lazily construct error messages from values, so that you don't pay the (often high) cost of serializing things into a human-readable form until you actually decide you need the error message.

What this lets you do is write your code inside the Or_error monad. Because it's a monad, it's very easy to sequence error-generating operations together, and you don't have to check for errors unless you explicitly decide you want to.

Often, you can do something useful like log the error to a log file or display an alert, rather than killing off the entire program. Sometimes, of course, an error is bad enough that it should cause the program to die, and then you can make it terminate with Or_error.ok_exn. But that's relatively rare. If you're trying to write robust systems, many errors are things you can deal with intelligently, and having those errors encoded in the type system ensures you don't forget to do it.


thanks for this reply, it's super informative! I've done a fair amount of ocaml programming but not as much as I would like with Core. Or_error looks really nice.

I think I underestimate the benefits that come with reducing the surprise in code. I get a similar amount of distaste lots of matching on option types and throwing to comparing to null and throwing. or_error definitely would help with that distaste I think.

you still have to give a lot of consideration to exactly what the error means, I think. in a lot of programs I write, the presence of an error cascades a lot and doing things like unwinding transactions and ensuring that the fault leaves a component in a consistent state is much more difficult than identifying all of the locations in a component where a failure could arise. maybe I'm doing something much higher level incorrectly but I've read a lot of other peoples code and it either has 10x the code to deal with, for example, many potential null-return scenarios or "works most of the time" and presents extremely critical consistency problems if a "stars align" error occurs ...


Yep, this is a hard problem. I wasn't super enthusiastic about the verbosity of option types when I first started programming OCaml, but almost two years later I'm quite convinced that they're the right way to do things. I really don't think there's a whole lot of downside, and being able to basically just not think about a large and hard-to-identify class of bugs is incredibly freeing.

This may not cover your use cases, but for cascading errors you can often write code something like the following:

     match
       foo_err ()
       >>=? bar_err
       >>=? baz_err
       [... etc ...]
     with
     | Error e -> Log.error e
     | Ok result ->
       [...]
This ends up working pretty well when you're working with pure functions because there's not anything to unwind, but you're right that it doesn't solve all problems.

If you're interested in talking to more people about this kind of thing, the Core mailing list might be a good place to try: https://groups.google.com/forum/?fromgroups#!forum/ocaml-cor...


I'd just like to add that I definitely think I've seen the light and believe that this is the way forward, I just don't think that having nullable values is a dealbreaker when it comes to memory safety, because even if you don't have nullable values you still have huge problems to deal with in your code to make your system robust.

exception safety, maybe, but that's a superset of memory safety?


Ocaml? Static types, garbage collected, not sure about windows support, and it still has null (IIRC). http://programmers.stackexchange.com/questions/62685/why-isn...

It's C-fast.


Approximately C-fast, anyway. You still have a GC, tagged integers, and boxed floats by default. Also not sure about Windows support, as I haven’t been using it long, and then only on Linux and OS X.


I think the trick is that any programming language that lets me operate on bytes (and hence makes me care about things like memory addressing) shouldn't prevent me from having nulls.

A NULL (usually defined as 0) is a valid memory location, especially if you are doing certain types of systems work--the fact that we've used it as a sentinel value in C/C++ is a separate problem entirely.

Maybe we need to use a taxonomy where languages that do not express anything about layout in memory (as bytes, quibits, or whatever) are kept separate from other languages.

The issue, of course, is that eventually you need to move out from algorithm space and start dealing with real hardware, which speaks bytes. Then again, a model where you only interface over JSON blobs might obviate the need for that, provided you don't talk to anything that speaks in something other than numbers and human-readable strings.


Whether you have null pointers in the language is orthogonal from having control over memory layout. You can have nullable pointers at the machine level without introducing them into the type system. (The typical way to do that is Maybe/Option types.)

What the author is talking about is the idea that a pointer dereference operation will never result in a failure (exception, segfault, panic, etc). That's a separate concern from control over memory layout.


I think you are talking about a separate (and identical issue). "NULL" is not a valid memory location. 0 is a valid memory location. Because we have to express NULL as the same primitive type as valid memory locations, we have no way to distinguish between NULL and 0. In a language with a type system that prevents you from dereferencing NULL, this problem goes away. Now that the type system can distinguish between NULL and a pointer, we can once again use 0 as a valid pointer.


> I think you are talking about a separate (and identical issue).

Somehow, this reads very strangely to me. Are you saying something like "I think you are talking about the same issue from a different point of view"?


Save for the Windows requirement, I think the author would actually really like Objective-C. The compiler checks types. Messages to `nil` resolve to `nil` (0), so no null-pointer exceptions. Automatic Reference Counting gives you memory safety.


I don't think objective-C satisfies any of those constraints. It kinda gets close, but I don't think it does.

The compiler checks types... but there's no generics. Given that the author is coming from C++, they probably want a type system that can represent 'vector of ints' vs 'vector of strings'.

Nil doesn't cause null pointer exceptions... but it causes silent bugs, where you thought something was happening but it wasn't. The real problem (all types have an extra misbehaved value) is still present.

Automatic reference counting gives you memory safety... but it can't handle cycles automatically, and obviously it doesn't apply to the C subset.


> but it causes silent bugs, where you thought something was happening but it wasn't.

In Objective-C, it could be a silent bug or just intended behavior to send messages to a nil object. It is explained here: http://developer.apple.com/library/ios/#documentation/cocoa/...


Of course, the behavior may be intended. I've certainly taken advantage of nil doing nothing.

I've also accidentally had a nil where it wasn't expected, and lost time figuring out why particular effects weren't occurring. More time than I would have lost if the mistake had caused an immediate exception.

The same type of mistake is still possible (unexpected value), though the consequences are different, so I don't think it makes sense to say nil solves the null problem.


I find it fascinating that programmers working in C++ seem to spend a lot of time thinking about language design.

There is a lot of responsibility on the shoulders of language designers. There is the notion of technical debt that we all deal with as programmers, but the meta-debt the language designer accumulates is astounding.

Let's meditate a while. As we read this, some young, well meaning C++ programmer is writing another String class.

I don't even know if talking about Tony Hoare's NULL story is really that relevant. After all, that's a single feature of a programming language. In the context of C++, talking about NULLs at the language level is like arguing about the colour of your Edsel.


had to take the whole article with a massive grain of salt. clearly the author has alot of experience writing Windows desktop applications in C++. Equally clearly the author has very limited experience with other domains.


I used C# for a long time and consider it to be one of the best examples of a statically typed language. The generics are very well done. Implicit typing from 'var' makes half of your statements almost dymanic.

  var list = new List<int>();
Many of the base types are pass by value (int, bool, dates, etc...).

There is terrific functional support through LINQ.

It is a great language. I just don't want to work on windows and it only has full support there. The library support and community are also pretty bad.


> Finally, performance.

Wouldn't Strongtalk and the research around it be the counter example to this claim?


but really people who think dynamic typing is suitable for large scale software development are just nuts

Flame-bait. I happen to like static typing-- and I'd be excited to see more progress with optional static typing in Clojure-- but I disagree.

First of all, you can pretty easily get principled strong typing with a language like Clojure. One of the first things I write on a large Clojure project is defn-typed. If types were a major part of what I was doing (e.g. theorem proving) that wouldn't be acceptable, but for web programming it's generally enough. It's pretty easy using a few macros to make it so that you get almost all of the guarantees (from unit testing) that you'd get from compile-time type checking.

Remember: dynamic typing doesn't mean you don't have types. You can include types in your contract system. It means that type sanity isn't checked at compile-time. For certain problems that require extremely high levels of correctness, you need static error-checking/analysis because runtime unit testing isn't good enough. For every large system? No, not the case. What you need, in general, are good programmers and sane development practices; those matter a lot more than static vs. dynamic.

My experience is that static typing performs best when you're trying to solve a known problem extremely well, and that's not an uncommon thing in software. Dynamic typing is better-suited to a more chaotic world where you don't know the problem yet, because it makes it easier to "poke" at things in the REPL and build prototypes.

Also, improperly used, static and dynamic typing both fall down disastrously on large or distributed systems. For example, if you have circular dependencies in a large Scala project ("don't do that", you might say; but we're talking multi-developer now, with the full spectrum of skill) you're going to see gargantuan compile times because incremental compiles will fail you, and while Scala's compiler is an extremely powerful machine, it's not fast.

All this said, Haskell is still a very cool language and I think everyone who wants to be a serious computer scientist needs to get some exposure to what it has to offer. Real World Haskell is a great book, for starters. I also think Scala is great if you have strong developers who won't be tempted to OOP things up.


I've found that static types help with prototyping because you can make changes to types used in your program and the compiler lets you know when you missed a spot, those locations could live for months in your code in a dynamically typed project.

static typing systems work with good programmers and sane development practices as a force multiplier. dynamic typing systems become something that requires you to have good programmers.


I've found that automated testing helps with prototyping because you can make changes to tests for your program and the tests let you know when you missed a spot, those locations could live for months in your code in a project without tests.


Yes, but this requires the additional steps (and work) of making the changes to your tests. Tests give a good approximation of what static analysis gives you, but it is more cumbersome. (Of course tests can also do things static analysis can't)


I usually write a macro (defn-typed) that generates pre- and post-conditions (contracts, essentially) for crucial APIs when I'm working in Clojure.

It's not as good as static analysis at catching all type errors-- if that's what you need, I wouldn't recommend any dynamic language-- but it does mean that most type errors will be caught in unit testing, with minimal recurring work.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: