Some of these are not so much "human centered" issues as not cutting corners in the compiler front end.
Many years ago I wrote the front end of a compiler-like system (it was for formal specifications, not for runnable code) and dealt with some of these problems. Whenever a type problem was detected, the error was reported and the type of the failed object was changed to an internal error type. For any error in which an error type was involved, no message was generated. This avoided error cascading, something GCC inflicted on its users for decades.
For parse errors, display the line in error and mark correctly the item involved in the error. Don't just display the index into the source stream at the point the error was detected; that's often a token or two beyond the problem. Work back to the point at which things stopped making sense to the parser. You have to carry source position info with each token, but it's worth it.
For errors which represent an inconsistency between several parts of the source code, show all the places that conflict, not just one side of the conflict. Rust is good about this. They have to be, because the borrow checker reports inconsistencies between different code sections, not just declarations and uses.
It also helps if your syntax supports error recovery. For example, if 'if' statements end with 'fi', 'do' statements end with 'od', and function definitions statements end with 'end', and you get 'if ... do ... fi', you can report that you are missing a 'od' in the program, and recover parsing after the 'fi'.
C is horrible in this respect; all nesting looks the same. Because of that it took clang lots of effort to get good error recovery, for example to correctly report a missing semicolon at the end of a header file instead of reporting an error in the file including it (http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-re... (NB: that page is 5 years old, and does not represent the current state of gcc))
gcc makes that job even harder by allowing nested function definitions. That means that accidentally forgetting a single '}', as in
void f( int i) {
if( i > 2) {
exit(EXIT_FAILURE);
}
void g() {}
[...]
void h() {}
will make the compiler think that g() and h() are nested function definitions. So, the error you get is a “missing closing brace" on the last line of your file. Without nested function support, the error would be reported on the 'void g() {}' line. Still incorrect, but potentially thousands of lines closer to the source.
The Algol compiler (helped by Algol's syntax) I used years ago was way better. It frequently gave errors of the form
"x undeclared. Assumed real"
(if, say, you call sin(x) without declaring x) or
"semicolon missing after 'end' (inserted)”
Compilation would still fail after such errors, but you often would get meaningful error messages for the entire program. That was quite a boon if the compilation is run in batch, but it still is a good idea nowadays.
if you follow a sane style guide, it'll be obvious where thid error is anyway from the first error line no. usually and it won't matter that you got garbage errors.
I apologize in advance but I couldn't not bring this up: it's too hilarious.
About a year ago there was a bug in the elm compiler that occasionally caused the types in a type mismatch error message to be swapped. For example the compiler told you:
Something weird is happening with this value:
x
Expected Type: number
Actual Type: String
At the time I was trying to learn elm (and functional programming) and I didn't know it was a bug so it gave me a lot of troubles: I could never fully trust the compiler.
Then I left elm and focused on haskell but now I think I see how he solved the bug:
As I infer the type of values flowing through your program, I see a conflict
between these two types:
String
number
Anyway these new messages are very helpful, especially when compared to haskell error like:
Could not deduce (b ~ c)
from the context (Num b, Num c)
bound by the type signature for
f :: (Num b, Num c) => [a] -> (b, c)
at restriction.hs:(4,1)-(5,32)
`b' is a rigid type variable bound by
the type signature for f :: (Num b, Num c) => [a] -> (b, c)
That's a common problem with globally inferencing compilers, you strictly cannot know who's right about a type.
I think that's a good reason to prefer a language with local type inference, but no type inference across function boundaries. It saves you from too much boiler plate to type and has (IMHO) better developer ergonomics.
In Jay (a language not finished) there is a prefix for the parameter who is the 'master' type (if memory serves '$'), this has the following benefits:
-in fun f(a:$T, b:T) and f(1, 'hello') you can say
Error parameter 'b' is a string but generic type T is int as defined by parameter 'a'.
-no need to have a separate definition of generic types as in fn f<T>(a:T): fn f(a:$T) tells you already which type is generic.
Not entirely fair. The Haskell is example is more complicated code (using type classes and type variables), and the error message provides more detail. But Haskell would benefit from better wording and not relying on symbols like ~
> It is kind of shocking how much better things get when you focus on the user.
So true. We've seen this in Rust as well: every bit of time we spend working on better diagnostics makes Rust so much more wonderful to use. Even a small reduction in jargon helps new people out a ton. Diagnostics aren't particularly exciting, but they're really important and your users will love you for them.
> I have met a few folks who switched from gcc to clang mostly because of a feature like this!
I'm in that camp. Both compilers function properly for what I need to do, but clang's output is a lot friendlier. The choice to switch was easy to make.
Are template-related errors also better? I remember being deathly afraid of messing up something that uses std::string because I would just get back pages and pages of errors (or pages and pages of a single error).
I sure hope so. C++ templates are the worst thing ever for debuggability. And all the most useful C++ containers etc are templates. That's the Achilles heel of C++.
Somewhat. You still get error novels, but anecdotally, it seems like the error messages are more likely to identify the important details first before giving you all the information you might possibly want.
Based on my [limited] understanding, better error messages is one of the goals of Perl 6 and something the Racket folks put a lot of thought behind when creating the Student Language series. At the other end of the spectrum are JVM languages that fail spectacular with references to Java classes that the program's author never heard of.
Databases are another example of this, with Postgres actually giving useful hints, whereas Oracle just gives you cryptic numbered messages. If I never have to read "missing right parenthesis" again…
Heh, a long time ago I wrote a script using BeautifulSoup which given an ORA-XXX would return a detailed explanation. Amazingly, it still works: https://gist.github.com/bluetech/f9aa4ede5a25c765c6e6 (don't hold me to the code though :)
This is some very impressive stuff, I look forward to trying it out. Question: Have you run into any cases where the hints have actually caused significant roadblocks to discovering the underlying problem? I can't count the number of times a compiler error has sent me down the wrong path entirely, I wonder how much worse/better this is with more human-readable hints.
I haven't yet. Before this was released I was using the bleeding edge compiler at home and the stable version at work[0], and the bleeding edge version with the nicer messages was always strictly faster for debugging.
In one particular case I recall racking my brain to figure out what I was doing wrong at work, and then I went home and rebuilt with the new error messages, and immediately saw the problem.
Highly recommend! :D
[0] We use Elm in production at http://noredink.com - and by the way, we're hiring!
I'm a bit sad that "editor takes to you location of compile error" is actually touted as a feature! It's one thing to have to do this occasionally, but I'm sorry to hear that people are used to doing it all the time :(
Nevertheless the error message point is good. It's ridiculous how atrocious error messages can be. (And they probably seem even worse if you have to find the file+line yourself.) This is actually pretty straightforward to implement, and supported directly with flex+bison, so it's a shame that it's not more commonplace.
Perhaps in the early days of gcc it was seen as too much overhead? Anyway, should be de rigeur for anything being embarked upon today.
> I'm a bit sad that "editor takes to you location of compile error" is actually touted as a feature! It's one thing to have to do this occasionally, but I'm sorry to hear that people are used to doing it all the time :(
Very impressive work. I feel like these kind of helpful error messages will become the norm within 10 years, and people will look back at coredumps, C++ exceptions, and even Ruby/Python exceptions as confusing and archaic.
Core dumps are at a completely different layer, so they don't really belong in this comparison. Unless we phase out the traditional conception of an OS process in 10 years, which is extremely doubtful.
Horse feathers. Running a debugger, logging, and printf are all much more effective than core dumps.
A core dump may be the most effective way to debug a program that dumped core when crashing in a way that you can't reproduce, but that's because it's the only way to debug a program that dumped core when crashing in a way that you can't reproduce.
Debugging a C application which has already crashed without debugging/logging is what I meant. Assuming you don't add any specific debugging or exception handling code, in dynamic languages you're guaranteed to get at least a vaguely helpful exception message, and you will get one (for varying definitions of "helpful") sometimes in C++, too, if it did not segfault. You do not get that in C by default.
Sorry for the lack of clarity, my statement was incorrect as written.
If you meant that and didn't state it perfectly, then I was too harsh. A mistaken idea may deserve a sharp reply; an incorrectly stated point does not.
In dynamic languages you get an instant burst of maybe-helpful text. (Instant gratification)
With C coredumps, you get a snapshot of the entire system at the moment it crashed. You can literally walk through everything that wasn't corrupted beyond reading. If the system is, for example, a multiplayer game, you can literally navigate the entire gameworld like a frozen snapshot. This can be very helpful for finding well-hidden bugs.
We're professional developers. It is worth the tiny bit of learning curve in order to get more productivity out of our tools!
You're right, my statement makes no sense given the way I wrote it. Sorry. I clarified in my other reply that I meant "out-of-the-box" debugging and error messages, like what Elm is providing here at compile time.
Can you give an example of a nice Elm program (maybe 50 lines or less) that illustrates this? I haven't tried Elm, but I'm guessing it has to do with state management, which IMO is indeed the hardest part of programming.
Will there ever be an Elm which compiles to real machine code? I'm currently learning Haskell, and find it a combination of mindblowingly amazing (pattern matching, monads, STM) and mindnumbingly crazy-making (namespaces, debugging, strictness). Elm looks like a different more modern take on the same principle. But I don't want to have to run everything inside a Javascript VM.
I think so! I'd like to get Elm running way faster than JS in browsers (which is possible thanks to some design choices) and that'd involve getting all this together. That opens things up on lots of different platforms, including servers. I also expect the next release to make things nicer on node.js as a first step in this direction. Point is, I think we'll start seeing folks doing server stuff, and it's a goal of mine to generate machine code for lots of reasons! It's a big project so I'm not setting a timeline at this point though.
Also, thanks for taking a look at Elm! I think of it as a member of "the ML-family" of languages and I draw from a lot of lessons from working with these tools and seeing what issues have come up for other folks, both within the typed functional world and not.
Actually, while it's still not direct machine code, if you can target Javascript as a backend I bet you can target Lua, which has very similar but less weird semantics; that means you get to run it on LuaJIT. That gives you a tiny, very very fast VM and garbage collector.
You need a garbage collector either way, so doing one implies you have pretty much succeeded at the other. So it's like two stepping stones that are right next to each other and quite a long jump away :P I think we'll get there though!
Probably not in the next few years. However, this sort of thing is in the back of people's minds. There was a question of dropping the Int type a while back, and the objections (besides semantics, e.g. indices are always ints) included wanting to be able to compile to machine code, where (unlike JS) ints and floats were different.
I'd say Haskell is modern. As well as having many years of research and development behind it, compared to Elm. Would having Elm server side really be useful? We have Haskell already, with all the benefits of Control.Concurrent, Control.Parallel, STM, FFI, Cloud Haskell, etc. Not to mention proper monads.
I would be delighted if the same level of developer-friendliness were present in other pieces of our toolchain, most importantly in our version control systems. I routinely feel that while we have version control data structures suitable for the 2010s, the user interfaces on version control systems are about two decades behind where they should be.
I'll opine: git's sense of semantics seems to rival JavaScript's ("sure I'll add an int and a string"). I checkout branches but also checkout files to undo unstaged changes, but staged changes are a reset. How about "undo" as a word people would understand? Nope, doesn't do anything. Can I get a log function that shows the commit history by default? And when I check out a branch it says I'm up to date with origin/branchname but it's lying because that's just the local copy and it hasn't done a fetch. And of course in 2.0 they didn't try to clean anything up, they just changed a few defaults.
I think I deserve a medal for understanding enough git to get by. Specifically, a purple heart.
Sadly, to this is not as common because in order for it to happen, the idea of focusing on new-user friendliness needs to be baked into the culture of a team building a tool. Otherwise, improvements get held up as being superficial, especially when there is an existing user base who has already gotten used to how things work that the maintainers can point to and say "see, things work fine. We should not change things because that could introduce errors and the existing codebase is already tested."
Automated testing helps implement these things too.
Is there an option to make it print the file:line:column? Cause I use emacs and rely on the compiler printing out error messages in that standard format so I can jump to them easily; I could certainly hack together a solution for other output formats, but it's a tad more annoying. Not sure what kind of users would be complaining about having that and it seems like stronger editor support for standard compiler output would fix some (definitely not all) of the problems you identify and solve in this post.
`elm-make --report=json MyFile.elm` gives the error messages as JSON; very easy to parse out line & col. It's how the vim plugin that there's a video of in the post was made.
Beyond that, I'd live to see locations output for everything involved in determining the types of the things that wound up incompatible. With type inference, the actual detected collision can drift a bit from the error. It can usually be quick to pin things down with annotations and get enough localization to find the error, but it would be even nicer to be able to skip that step.
Elm has interested me for a while now; particularly as I get more and more into Haskell (one of the languages that inspired Elm). While I generally find Haskell's error messages to actually be quite helpful, there are certainly times where they could be a little more useful.
Well done, so far! I hope this kind of effort is eventually undertaken by many compilers (and their authors), as everyone benefits from simplifying the debugging/refactoring process.
Haskell error messages have come a long way. These days, they're generally (not quite uniformly) helpful and specific, at the cost of being a bit verbose.
I can eyeball-grep the relevant parts of a verbose message, if it actually has useful information in it. That is, I'll take verbose-but-specific-and-helpful over terse-but-not-useful-information any day.
I agree, for sure, on balance. Terse and equally useful is of course ideal, but often impossible.
That said, "eyeball-grepping" a specific set of messages, along with which bits are important, is a learned skill. I have found myself many times trying to fix the wrong thing because I skimmed and wound up with a wrong understanding of what the error was. And then found a more careful read told me precisely what was needed.
I tried making an application in Elm last summer, and the hardest part of the entire experience was the error messages! I'm glad that they've went ahead and fixed the problem. I'll be trying it out again soon.
I remember changing a compiler message to HP aC++ to quote the C++ standard verbatim, because at the time aCC was the only compiler doing template dependent name lookup according to the standard, so we kept receiving bug reports.
Moral of the story: sometimes, explaining why you are doing something may require hyperlinks, just saying it's wrong is not convincing enough.
The error messages in XL and Tao3D are terrible. I'm ashamed :-)
Nice, but it feels quite unnatural that compiler tries to speak to you like a human being ("As I infer types of values flowing through your program…"). I cannot say why I'm against it, or even if I'm against it, but I remember some postgresql error message guidelines where it was explicitly said not to do so, cannot remember why though. That's peculiar.
I'm a big fan of the coming anthropomorphisation of computers. It seems so clear (to me) that the future is heading that way.
Apple have already hired writers for Siri; how many more years before its personality starts to change to match the tone and manner that you ask questions? Young children have already started to mis-identify Siri as a real person! There was also the film Her the other year which seems quite prescient. Regardless of if we ever get to real AI, it seems quite clear we're going to face intelligent-seeming computers in the near future, and we're going to start to interact with them almost as if they were intelligent. It's going to be an interesting future.
This is a huge tangent, I know. It just seems like compilers speaking to you like a human being is just one more facet of this change.
> Software isn't "cautious" or "ambitious", those are qualities of alive beings. But maybe it serves us to think so.
Also this:
> Our minds respond to speech as if it were human, no matter what device it comes out of. Evolutionary theorists point out that, during the 200,000 years or so in which homo sapiens have been chatting with an “other,” the only other beings who could chat were also human; we didn’t need to differentiate the speech of humans and not-quite humans, and we still can’t do so without mental effort.
The future may be heading that way, but I'm personally not a fan of it.
As I understand it, we use different regions of our brain for interacting with tools, people, animals, etc. I enjoy using the tool part of my brain when computing. It feels "quieter" and contemplative. It feels like I am doing something, not like I am asking an assistant to do the work on my behalf.
I like social interaction a lot too, but as an introvert, it's kind of draining. I would have to leave computing if using a computer felt like being in an intense conversation with a computer-person all day. Even if the conversation is pleasant and fulfilling, it would be too much social interaction.
> I enjoy using the tool part of my brain when computing. It feels "quieter" and contemplative.
I totally get that. Personally though, I see conversational UI as an additional parallel mode of interaction, that you can use simultaneously. I can imagine e.g. coding, and interacting and thinking analytically about that, while still verbally being able to have a conversation with my computer for e.g. responding to email or calendar appointments. Not for everybody, of course.
Edit: I suppose, what I really mean, is that I'd like to interact with human things (email, meetings, messages, etc.) in a human way (conversationally, as I would a person), and I'd like to interact with machine things (coding, simulations, etc.) in a machine way. Trying to interact with human things in a mechanical way seems unintuitive, really, except it's all current technology affords, so we all have to learn these machine interfaces instead. And maybe that's why so many (non-techy) people find computers needlessly complicated.
I have tried to talk Evan (Elm's creator) out of the first person, but as you can see he likes it. But I think you bring up an important point: for some people, being able to use the social parts of our brain is like offloading to the GPU, and for others it's like firing up a generator when the power goes out.
Part of it might be that it obfuscates the structure of the error message. Error messages, even without error cascading, and especially when taking a top-down approach to writing code, are long and meant to be skimmed. Human languages are great for reading top to bottom, but not so great for skimming. Terse (but not cryptic) formatted text, as found in programming languages, is much better for skimming.
You'll only ever actually read "As I infer types of values flowing through your program…" a few times while getting used to the compiler, after that you'll recognize it by shape and never read it again, and at that point all the anthropomorphized text is doing is taking up screen real estate that could have been used to display more errors.
If you mean parsing by machine, the solution is to have a separate machine-readable output (JSON perhaps). If you mean comprehension, that's a valid point that is being discussed elsewhere on the page.
As I said I don't have strong opinion on that matter. It just struck me as something I remembered as not recommended by some guidelines, considered to be good example of error message standard and violated here, in another post about "good error messages". But let me play devil's advocate, anyway.
Compiler is relatively simple piece of machinery, it isn't really smart. So maybe it shouldn't pretend it is? It doesn't have personality, it doesn't think anything, it doesn't actually even do anything: it is just a set of transformations from one set of bytes to another. In the worst case it has some heuristically-statistical optimizations, but it still is some process that can be accurately described in a relatively short amount of time. I don't want it to have an opinion and actually I know it doesn't have one. I know all has happened is that I made a mistake and I just want to know exactly what went wrong. I don't want to read compiler's essay about how it spent its holidays, because I know it cannot write essays and didn't have holidays anyway.
At the moment I don't use Elm, but if I did, I would be getting such messages tens, hundreds times a day. Every time I have to search for something one could call "the real problem" in the bigger text — I do work. I spend energy. Sometimes I can do so while being mentally exhausted or in a hurry. Or both. The more short, exact and concise message will be, the easier to understand the problem it will be and the greater my gratitude towards the author of this message will be. And it's not only about length of the message, human-like natural language constructs are inherently more complicated and diverse than short messages and labels — that's the reason why we invent labels and such after all.
All the good answers must be the direct answers to the question one might ask. When I'm about to read the error message of the compiler, I'm not thinking "Dear friend Compiler, what have you been doing in the meanwhile and what do you think about life?". No, I'm asking "What the hell did happen?". What did happen, and not what "did compiler do" or "how things work" or whatever.
One more problem is that Elm compiler might work just nice, but it won't always be right. I don't believe it and you probably shouldn't rely on it. The less creative it will be in its search for "what might have happened", the more likely it is I won't be distracted by its opinion. Even as skeptical and suspicious as I am, I am prone to get used to how some tool behaves. If it usually is right and tries to deceive me by behaving smarter and more human like than it is — one day I will believe it and spend way more time to figure out the problem than I would have if I treated is as a compiler and not my mate.
So, finally, what is the real purpose of the error message? I love artistic people, but unfortunately it isn't to show how thoughtful and creative the author of this piece of technology is. The real purpose is to help me find my mistake. There are 2 general types of mistake I can make: a simple one, like a typo, syntax error, forgotten type, variable definition, you know what I mean — and a complicated one, like unexpectedly finding a bug in the compiler itself, some weird Rust-lifetime problem, — It's hard, to make up an example, but the point is it requires some thinking and research. Help I need in the case of simple mistake is generally just as precise place of the mistake in the code as possible. Maybe some function signature with good description of parameters. Something I can easily grep (ack) or google (DuckDuckGo) in the case of more "internally-oriented" errors. I don't usually find "typo suggestions" useful, but ok, why not. The thing is 3 times out of 5 I will fix my mistake quicker than I do the reading (even in the languages with less verbose error messages), literally. I would thank my compiler for being as concise and straight-forward as possible to shorten the gap even more. I don't want to read what I don't need to.
In the case of "the complicated mistake" it is not likely compiler will actually guess what's the matter. I'll need stacktraces (if any), precise and unique exception names (error codes), maybe some additional info, but nothing that compiler can actually handle to discover itself.
So the best compromise I can imagine, which, I think, would help in both cases, would be short, concise, "machine-like" messages, concrete things like function signatures, "expected/found" diffs and links to the related section of documentation (web-based or local one — the latter would be even better) where everything would be described in the detail. And no playing a human.
> Folks who prefer dynamicly-typed languages are generally of the opinion that working with compiler error messages sucks.
This is not at all why I prefer dynamically typed languages. Try doing some JSON parsing in Haskell for a non-trivial payload (like something with nested objects) without wanting to pull your hair out.
I can't tell from those examples if it's good or not, the json shown is all trivial 1 level deep json that you rarely encounter in real life. How it handles nested objects and array, deeply nested objects and arrays is what I'm interested in. Not the ability to handle a Point payload. I know in dynamic language it's typically a 1 liner. I'll endure a little more work than that, but my experience from Haskell was quite painful.
That's much less a function of static typing, and much more a function of immutability. There are programming constructs to help with that (such as lenses, mentioned in another comment), but there's also nothing stopping you from writing very compact statically typed code that mutates a value three levels down in a JSON tree, if you had a mutable data structure.
I like elm, but it would be nice to have a more practical elements in it. I have a problem with elm-html syntax: `
div [ class "profile" ] [ img [ src user.picture ] [], span [] [ text user.name ] ]`
Even indented, it looks unreadable in elm docs, especially for beginner.
Many years ago I wrote the front end of a compiler-like system (it was for formal specifications, not for runnable code) and dealt with some of these problems. Whenever a type problem was detected, the error was reported and the type of the failed object was changed to an internal error type. For any error in which an error type was involved, no message was generated. This avoided error cascading, something GCC inflicted on its users for decades.
For parse errors, display the line in error and mark correctly the item involved in the error. Don't just display the index into the source stream at the point the error was detected; that's often a token or two beyond the problem. Work back to the point at which things stopped making sense to the parser. You have to carry source position info with each token, but it's worth it.
For errors which represent an inconsistency between several parts of the source code, show all the places that conflict, not just one side of the conflict. Rust is good about this. They have to be, because the borrow checker reports inconsistencies between different code sections, not just declarations and uses.