Most of these errors are not really interesting. Almost all of them are simple typos or syntax errors.
I get that this is the errors that are simple to analyze. Reference/value confusion would for instance be more interesting, but I guess that's harder to autodetect.
Isn't that very interesting? It means the IDE / environment they are using allows them to make typos and syntax errors. Why weren't those corrected and/or highlighted as they typed?
The first programming language I used had magic keys which produced whole keywords (eg. pressing "P" produced "PRINT"). Typos were impossible! If you still managed to type a line with a syntax error, the cursor changed to a flashing $ over the error and you literally couldn't submit the line of code until you'd fixed it. And that was on a machine with 1 kilobyte of RAM and a < 4 MHz processor!
Yeah, I am pretty excited about the idea of IDEs just presenting ASTs instead of text (of course can always save as text). There's a haskell structured mode that tries to do a similar thing.
It might seem silly, but the biggest impediment I can think of is copy-pasting code from elsewhere. If your variable names are different, the IDE should reject the code (since you shouldn't be able to commit variables that don't exist), but then what do you do? Open it in Notepad and "fix" it?
Brings to mind Smalltalk's concept of images (which comes from lisp, I believe) and Forth.
Of course in Smalltalk, you do work with code-as-text, but then it becomes code-as-program (technically byte-code, with among other things a text representation). Does seem more reasonable than the anachronistic insistence on text->parse-to-AST->(whatever magic, and however many transforms the actual compile-part is).
Then again, we all know what a rich editor is. It's MS Word. And MS Word eats documents and depricates file formats.
On the other hand, I think Emacs Org-mode, Gimp images and the literate editor in Python, Leo[l] are examples of more pleasant "rich format" editors. It is a bit odd that what we take for granted for image files (edit history, binary format++) we fear in our IDEs.
> Overlord is the monitor of the LISP System. It controls the handling of tapes, the reading and writing of entire core images, the historical memory of the system, and the taking of dumps.
> Why weren't those corrected and/or highlighted as they typed?
That's such a horrible expectation. Do we expect children to spell correctly right off the bat? Or do we teach them, slowly & patiently, instead? Or are you saying we should just let autocorrect be their guide?
No? Then why would programming languages be different. It's part of the learning process. And it's by far one of the easier parts.
Aren't they? The fact that confusion between = and == is 4th most frequent is quite interesting for example. It's possible to disallow assignments in places where comparisons are most likely (expressions that are known to coerce to a boolean), Python does it, with presumed significant productivity benefits (lots of time isn't spent dealing with that common mistake).
D can arguably be mitigated by differentiating bitwise operators more them from their logical counterparts (for example by having the logical operators in plain English).
Many more are better handled at the editor/IDE level, but this seems like a really interesting read for anyone involved in PL design.
The Alto thread posted a ref to BCPL, and it was interesting that the original symbols for the comparison operators were text: eq, ne, gt, lt.
C broke BCPL syntax that was clear, memorable, and consistent and replaced it with a math-like syntax that must have wasted millions of person hours of debugging time in the decades since - for no good reason.
Similarly &(pointer address) and &&(logical and) are too close and too easy to typo.
Language design really should have considered human factors much more than it did.
The historical reason behind that is the > and | symbols were used as redirection operators to follow shell conventions so they used -gt and -bor. All of the other operators followed for consistency. There was also a big kerfuffle about statements like "$a = 1" or "$a++" not being able to return values like in C because if they were used stand-alone the return value would be printed out by the host.
The optional parens reminds me that this is a deliberate assignment. But as someone who now teaches Python, I thank the Python gods who disallowed such syntactic sugar. I've found it impossible to overestimate how difficult it is for beginners to grok "a = b"...and can you blame them, after years for math instruction telling them that that equals sign means something else? (Never mind the clusterfuck that occurs when trying to teach SQL -- which also uses singal equals sign for equality -- in tandem with a scripting language. The cognitive difficulty is so high that I've considered switching to teaching R, which at least has the optional arrow operator, "a <- b"
That would not pass review with me. Code should be as obvious and simple to read as possible. It doesn't matter whether or not the reader is a beginner or not, what matters is that it reads like it could be a typo. So there is more cognitive overhead than strictly necessary, and quite possible there are bugs hidden in those 'clever' bit that you love so much.
I did courses at two universities (I transferred) and in either case one would need to compile the assignment source code in order to actually do the homework. I think that a compiler, especially javac, would catch almost all of these and warn about some of the others.
Moreover, good syntax highlighting or an IDE with some static analysis in it would help a lot too. I think that might be a useful thing to put in intro programming classes. Eclipse is free right? I use IDEA-based editors for most of my work, but even the syntax highlighting in, say, emacs without installing packages (at least on the OS/distros I'm familiar with) would go some fair distance to this goal. I assume the same would be true of vi.
After having a short look on the compilation, it shows one thing very accurately IMHO: C syntax is very error prone.
To sad, that so many new programming languages chose to use exactly this syntax that is so error prone.
C was a very good programming language and the short syntax might have some appeal -- but for learning programming, this syntax is not the best option, as long as you don't use it as type of intellectual test to find the best computer-people ...
English is also error-prone, as you have demonstrated—perhaps unwittingly—but we still use it around the world. I would argue that this fact makes C syntax a particularly good choice for one learning to program: Computers don't think like humans, unless we program them to think like humans.
Though it is in this case worth noting that there is enough redundancy in English that you can both a) tell that "to bad" is wrong and b) know they meant "too bad", which isn't always the case for =/==
Error D would be particularly confusing, since it would still produce the result the programmer intended (at least with my setup where implicit int -> bool).
A different perspective:
The paper shows:
(1) Humans quickly learn to avoid simple syntax mistakes after they compile code and get an error message. These messages often pinpoint the error location and suggest the fix, so this result is hardly surprising (e.g., Invalid token '}', did you forget ';').
(2) The authors assume every type error is unintentional. This may not be true: Consider transitioning from using a String to represent a number (eg., a command line argument), to a numeric type. This transition may be to check for errors upfront and to avoid parsing the number in multiple locations. All these locations will be pointed to by type errors, after the programmer changes the type.
Knowledge about students’ mistakes and the time taken
to fix errors is useful for many reasons. For example, Sadler
et al [10] suggest that understanding student misconceptions
is important to educator efficacy. Knowing which mistakes
novices are likely to make or finding challenging informs the
writing of instructional materials, such as textbooks, and
can help improve the design and impact of beginner’s IDEs
or other educatoinal programming tools.
In other words, yes, understanding what types of errors novice programmers make can be very interesting and useful.
I get that this is the errors that are simple to analyze. Reference/value confusion would for instance be more interesting, but I guess that's harder to autodetect.