The arguments in favor of Perl largely favor writability of code over readability. Code needs to be read, and most importantly understood, many more times than it is written, which is the whole reason Perl has fallen out of favor.
Elastic tabstops look interesting, but they won't solve any user experience problems until all major editors support them.
Using actual characters instead of substitutes like >= makes sense, except when I look down at my keyboard and try and imagine where all the extra buttons would go. I suppose you could still type the substitute and your editor could convert them to the actual symbols (LaTeX modes in emacs do something like this for you), but you'd have to get all editors to use the same substitutes to prevent another usability nightmare.
An interesting example of going too far the other way (favoring readability without enough thought to writability) is Applescript. It's very easy to read, even for those without too much programming experience. But it turns out that it's very difficult to write Applescript scripts.
Easy solution: you enter your source with ASCII approximations, and the language implementation can prettify it in-place with the canonical Unicode characters if you choose.
No need to touch the language, it can be done on the fly by the editor.
It's already possible to do such things in Emacs, for instance: haskell-mode can be configured to replace `\` by `λ`, `>=` by `≥`, `.` (the function composition operator) by `∘`, etc…[0] in the visual display of the file (the underlying source keeps the original tokens).
Of course you could add these new characters in the language's definition itself as well, so the editor can perform direct source replacements.
Good point. Plus there’s something to be said for separating content from presentation, which seems to be a pervasive problem in programming languages, given all the petty arguments about formatting.
> Plus there’s something to be said for separating content from presentation
Indeed, I would enjoy a world where this is done better (and more often), for instance having things like alignments, line-breaks or indentation be completely independent from the actual source.
Of course, this generates a number of other problems in turn: different software on a given machine have to all use the same configuration (or be independently configured to match), communications becomes much harder unless new data pointers are invented as the concepts of line and column become fuzzy, some tools/concepts become impractical to say the least (a context diff for instance, would have a hard time working correctly unless it was token-aware and language-aware with diff/patch reader software able to use the global source formatting configuration for the language to display the diff as the user expects), and shared data display (e.g. a github source view) would be rife with strife.
At the end of the day, I'm not sure this wouldn't create as many problems as it solves, if not more.
As a native English speaker who's tried to learn Japanese, the distinction between Perl, APL, and their ilk and other languages to me seems more akin to ideography vs. syllabary/alphabet (i.e., characteristics of the writing systems) than to natural language vs. computer language.
My big problem with reading and writing Perl is the same as my problem reading and writing Japanese: there's just so much you have to flat-out memorize. Look at the index to the Perl book: it begins with a vast collection of arbitrary symbol pairs. Sort of like written Japanese words are collections of often somewhat arbitrary ideographs, because many words were borrowed from Chinese over many centuries, while the Chinese writing system was evolving.
Of course, people still learn to read and write Japanese fluently; it just takes a long time (12 years of schooling). I bet I'd be fluent in Perl if I spent 12 years learning it as well. But I'd rather not have to do that to use a programming language well.
This is off-topic, but I don't think it's fair to say that it takes 12 years to read and write Japanese fluently. That's just the time that it takes to graduate from high-school in Japan.
In fact, the mandatory education in Japan comprises only elementary and junior high school, which takes 9 years in total. By the time you graduate you're supposed to know all of the 2000ish 常用漢字 (Jouyou Kanji=Common Use Kanji). If you add a couple of hundred of place and people names, you'd get a good approximation of "fluent reading and writing Japanese", as far as kanji is concerned.
If you're learning on your own, there's no reason it should take 9 years. If you learn it at a rate of 10 characters a week, you could learn 2000 in 4 years.
That said, I think the emphasis people put on learning a lot of kanji, without being able to speak the language, is misguided.
Except that it's not, because there are many good alternatives to Perl that have equivalent, if not greater, power, and require very little memorization. You can be up and running with a full understanding of the syntax for most flavors of Lisp in a few hours...
Actually, Feynman had exactly this criticism of the way biology was taught; that the students memorized way too many things that they could just look up(and this was pre-Google, mind you!)
This article is clearly a troll, specifically the issue of monospace. Also, some monospace fonts are better than others. I was using consolas for a while, and have recently switched to ubuntu mono for the sake of beauty and clarity. I would never use a proportional font for programming, in part because positions of things matter and inform your reasoning. Here's a good blog post about beautiful programming fonts: http://hivelogic.com/articles/top-10-programming-fonts/
In particular, this claim is totally unsubstantiated: "Fixed-width fonts are a typographical oddity that survived through tradition and little else."
As a pretty young programmer, I don't use monospace or really anything only for the sake of tradition. It's the best tool for the job, even if it was by accidental means. I don't think a language optimized for proportional fonts would be superior in any way for it. In fact, monospace is superior in that you can choose to convey information with STRUCTURE. This capability becomes muddied and confused with proportional fonts.
Stroustrup's C++ book stands out in my mind as one of the few programming books that I own (possibly the only one) that does not use a fixed-width font for code examples, and I find it very unpleasant to look at. Accordingly, I have avoided variable-width fonts for programming altogether.
But the font in that book is by no means the only variable-width font. Could anyone here recommend a variable-width font that you have had good experience coding with?
It was (is?) pretty fashionable in European programming books for some time. For example, all of Niklaus Wirth's books use variable-width fonts. I also find it unpleasant or even irritating at times.
It's useless to compare programming languages to natural languages. Programming languages are formal languages and require some amount of precision for proper tokenization and parsing. If you misspell a word in English, or forget to capitalize something, or miss a space, people can still read it; if you do the same thing in a programming language, you often get an error. Typographic errors in code stand out better in a fixed-width font than they do in a variable-width one.
I'd like to see a study, too. Unfortunately, I don't know of any.
I do, however, vaguely recall an ergonomic study finding that it's easier (or at least faster) for humans to read text in variable-width fonts. I'll try to find a reference when I get off work.
That being said, code isn't the same as text. My own experience has been that a missing or wrong character is easier to notice with a fixed-width font (especially narrow characters), and that extraneous or missing spaces are less obvious. Where those things don't matter too much for human readers, they do matter for computer readers. It's obvious to a human that BubleSort is probably supposed to be BubbleSort, but in a more formal setting there is a distinction.
My concern for using a variable-width font is knowing how deeply indented I am at any point. With a fixed-width font, I can simply look up and see how many characters deeper I am compared to the previous line. I don't know how that would work with a variable-width font. I'd probably have to move my cursor back and forth, which would be a pain.
I don’t understand what you mean—it doesn’t really matter how many characters or columns you’re indented, and can’t you discern depth visually in the same way regardless of typeface?
I've been using DejaVu Sans (proportional) for programming (including Python) for the last few months, and found it very easy to get used to.
There are just a few glyphs that could be adjusted to line up better (closing braces in JS/PHP are a few px off), or in the case of quotation marks, two single quotes render the same as one double quote.
I've started editing DejaVu (it's OSS) to correct these small number of issues, I don't think they are insurmountable.
Well I'm not sure that it's actually a problem in practice but with a W taking up two ' ' spaces I can imagine a scenario where you over correct and end up in the wrong level.
No, it doesn’t. Your indentation level matters, but provided your indentation is consistent, it doesn’t matter whether you use tabs or spaces, and the notion of “column” isn’t relevant to proportional-width fonts. If you’re worried about invalid indentation hiding from you, don’t use spaces for indentation with a proportional font—it doesn’t make sense anyway.
It matters more when you've got a statement broken over multiple lines. If you've got a function which takes a lot of arguments for example, lining them all up vertically can make the code a lot easier to read. This can be difficult or impossible to do with a proportional width font (and would almost certainly look very strange to anyone else looking at the code in a fixed-width font).
That's understandable. I agree that tabs would work much better than spaces with a proportional font. However, in the case of Python PEP 8 states that you should use 4 spaces. Not doing so will make it very hard to collaborate with other devs.
It's not that I think that proportional fonts couldn't work (even with python) It's just that I have concern about visually assessing indentation. Maybe it's time I try one and find out.
I think part of the problem is that C++ was not designed with proportional-width typefaces in mind, but I’ve had good experiences with broadly spaced sans-serifs, even the basics such as Arial Unicode and Lucida Sans Unicode. http://programmers.stackexchange.com/questions/5473/19828#19... suggests Ubuntu Sans, which I think looks excellent. I think serif typefaces look nicer for more mathy languages such as Haskell.
I remember reading one of Douglas Hofstadter's books - Metamagical Themas - and liking the bold Helvetica font he used for all the Lisp code. But I doubt he coded with it...
Each space in a proportional font is a set width, so the indentation structure is just as clear as with a fixed width font. Am I missing something? I can only imagine a problem if there is a mix of tabs and spaces..
It's not about getting adjacent lines with equal depth to line up. Its when you are indenting, and try to put yourself four spaces in. In a fixed-width font, you can just line yourself up with the fourth character of the previous line. In a variable width font, it's harder to tell how many spaces in you've gone.
Any reasonable editor can be set up so that the tab key will take care of that: ideally, you would also indent with tabs not spaces, and then you could set any tab width (in pixels, or better, points) you wanted and your code would still look right.
The problem comes if you want to left-align some code with a character on a previous line which is not the first non-space character. People often do this for run-on lines, for instance. Then there's no guarantee that spaces or tabs will align you correctly.
The ideal solution to the latter problem is:
(i) Don't Do That, Then.
(ii) If you want to align code in some carefully prescribed circumstances such as run-on lines, your editor should handle it for you when it displays on the screen (and the alignment hints need never make it into the saved file).
Ah, ok. This problem doesn't manifest in emacs, as the tab key indents you correctly by 4 spaces in most modes. I can't imagine it's much of a problem to line yourself up with the first character of the previous line and press the spacebar 4 times though.
I've not read Metamagical Themas, but your words remind me that one of my favorite programming books, The Little Schemer, does in fact use a variable-width font for the code.
On errors: Yes, this is the main point I agree with. Fortunately most languages have frameworks with do this now. The main thing I would wish for would be a more configurable 'warning' or 'notice' level wich would indicate e.g. type coercion or addition of strings or other nonsense when debugging.
On typography: seems like a lot of trouble for no substantial gain.
On Input Methods: People know `x - y` is left to right. `x = y` is easy enough. These points take a couple minutes to memorize each.
I am intrigued by nested quotes, however. In some languages such as Chinese they use arrow quotes -《》 - which would be easily nested and visually distinguishable.
Lost me at suggesting Perl is the exception because of its use of sigils.
I'd advocate for pascal et al over C-like languages simply because of case insensitive identifiers. The way modula-2 handles block commands (if endif) also helps make code much more readable.
I’ve programed perl for ages, and at this point I actually like the sigils, as they force you to keep track of your data types. At this point they are definitely a plus for perl, and perl readability for me.
But starting out - yeah, they were confusing, and felt unintuitive . I think they just grow on you (if you stick with it long enough).
> I'd advocate for pascal et al over C-like languages simply because of case insensitive identifiers.
Then what happens when a German programmer has an identifier called weiß? Will accessing it as WEISS work? When dealing solely with US-ASCII, case-insensitivity might be useful, but Unicode and the use of non-English non-Latin writing systems opens a whole new can of worms.
How about NOT using a German identifier? His code could sometime end in the global community, and the meaning will be lost, while almost every programmer understands basic English (it's kind of a prerequisite).
Whenever I try to learn e.g Haskell, the small things stand in my way; for example everything looks like a series of identifiers separated by whitespace, and the only way to understand it is to manually parse it in my mind, knowing the arity of every function, data constructor, and the like. Probably a minor thing to most Haskellers, but still an obstacle to me..
Besides being easily able to visually parse a source file, a usability issues in some languages for me is 'voicing' the code in my head or reading it to another programmer; a problem I also have with mathematical notation sometimes.
When creating my own language for teaching programming, I decided to have a 'canonical reading style' so that teachers and students always know how to read a given code snippet out loud or in their head, facilitating understanding and communication.
Well, Haskell functions are curried, so you could think of the arity as always being 1. ;-)
If you're writing code, you can always use redundant parentheses at first. Then you refactor, gradually removing the parentheses or replacing them with an appropriate combination of ($) and (.).
Another useful thing to learn is how to write point-free code. An example:
These are essentially the same function, but the latter is often preferred. Note also that they have the same type, namely String -> String.
If you're reading code and you find it hard to parse, you probably need to read easier code (for instance, from LYAH or RWH), but more likely you need to write more code of your own.
I know about currying, but there's still a certain meaning to arity even in this case.
It might be argued that on the long run the current syntax is better for the veteran Haskeller, but it's still some barrier to entry. I wonder if a Haskell IDE could have a 'clarify calls' mode that shows where each function argument goes or comes from..
There are plenty of things to criticize in the Haskell syntax rules — they're hard to get used to and they're not so beneficial that the learning curve seems worthwhile — but parsing isn't as ambiguous as it seems at first. When you see a sequence of identifiers separated by spaces, the first one has to be a function accepting all the others as arguments.
It's different (and complicated to explain) if there are newlines involved, though. Haskell isn't a whitespace-insensitive language.
This reminds me of the quote which I believe I read in Programming Languages: Application and Interpretation but cant seem to find right now in the pdf version - "Syntax is the Vietnam of programming languages."
Absolutely. The idea that a compiler/interpreter should give reasonable and intelligent errors is a good one. It's probably the only reasonable critique in the entire blogspew.
Programming notation is the way it is primarily because of the fallacy that programming is mathematics. In reality, writing software is also very much like, well, writing.
Programming is much more like mathematics than writing. In fact, I'd argue that the only thing programming shares with literature is the fact that programs are expressed using character strings rather than with more specialized notation. Literature can be ambiguous, and can rely on the reader to infer what the author meant. Program must not (and indeed, in many cases cannot) be ambiguous.
Larry Wall is a notable (partial) exception to the norm, appropriate considering he studied linguistics back in the day. Perl is what you might call a semi-naturalistic programming language.
That is Perl's greatest strength and its greatest weakness. It's very hard to write Perl that doesn't run. The interpreter will go to great lengths to parse your program in the most generous way possible (if you let it). However, most of the time, if you've written something ambiguous, it's wrong, and should be flagged as such. This has led to a host of addons (like use strict) which attempt to make Perl less generous.
Another thing Perl mimics in natural language is implicit reference. $_ is like the pronoun “it”, a default thing, the current subject of discussion, which in many situations can be assumed. Programming languages use explicit reference almost exclusively. In order to perform a series of operations on a value, the programmer must explicitly name that value for every operation.
Explicit references are uambiguous. Has the author ever read a legal document? In legalese, just like in programming, all nouns, save for the most common ones, are defined before they are used. The reason for this is the same as in programming - it is best to be unambiguous when attempting to communicate in a formal manner.
In fact, when I'm dealing with Perl, I absolutely hate the use of $_. It adds massively to the state that I have to carry in my head when I'm reading the code, because it's so easy to miss cases in which $_ gets modified.
Naturalistic programming languages will never be pretty. They are not minimal, or elegant, or simple, but despite all that, they are intuitive and useful. More importantly, they meet users’ expectations about how language is supposed to work.
I don't care about how "expressive" language is. I care about how easy it is to write 1) correct and 2) maintainable programs in that language. These 2 criteria militate for minimal syntax (less syntax mean less syntax to screw up) and a level of formalism that catches errors as quickly as possible from the point they occur.
Erroneous Errors
I'm not even sure where the author is trying to go with this section. We've gone from talking about programming language design to... compiler error messages? The language is not the compiler and the compiler is not the language. Most programming languages have more than one compiler or interpreter. These compilers will give differing error messages. As another comment demonstrates, Clang gives better error messages than gcc. Is C somehow a better language if you use it with Clang? Is it somehow a worse language if you use it with gcc? No! The language remains the same, regardless of what you use to compile it.
Terrible Typography
Programmers prefer monospace fonts because there are fewer variables, and therefore, fewer things to mess up. With a monospace font, I don't have to worry about kerning. I don't have to try to guess how far my code is indented. If I use spaces for indentation, I don't even have to worry about my code looking different on different computers. Wherever I go, my code should look exactly, unambiguously, the same. For me, that concern trumps readability a thousand times over. I'll take an ugly font that's unambiguous over an ambiguous pretty font any day.
Impossible Input Methods
APL tried to have funny characters as operators[1]. It was a bad idea. I don't want to have to switch my keyboard around to match the programming language. I don't want to have to type ALT+<code> to get an operator. It's much easier if programming languages use standard characters that are guaranteed to be present on almost all keyboards. Indeed, the C preprocessor defines trigraphs[2] for keyboards that don't have characters like '{' and '#'.
Even worse, when I'm reading code, I don't want it to be peppered with blank boxes just because I don't have the particular font for this programming language installed.
Perhaps the biggest problem with programming language design is that, because it is so bad, people are afraid to use tools that can help them.
Either that, or because many of these ideas have been tried and rejected when hard experience proved that they didn't enhance productivity very much and hurt maintainability immensely.
Well said! I don't think I found a argument in that article that I would agree with. The mentioning of Perl as a great example is probably the worst. The purpose of a programming language is to express things as precisely as possible, both to a computer and to a human. Perl's extreme context-sensitivity is what I dislike most about it.
Regarding Unicode support, I see it having some advantages, but current editors don't support Unicode input well. Emacs has an input mode where you can type \alpha and it replaces it with the appropriate unicode character, but I haven't seen any other editors with support for that. For some more obscure symbols, it can also be difficult to find out how to input them.
I'm wondering whether that article is actually meant seriously -- some of these claims are just so absurd.
Yeah, I was reading the article and saw the mentions of Perl and was like -_-.
I think the legal writing analogy is interesting. I think legal writing and programming are closer to each other than either are to math or literature.
There are pleasant, non-awful ways of doing characters in programming languages that don't involve memorizing the hex codes for Unicode characters or weird keyboards. For example, the Fortress language has three representations—a pretty, LaTeX-ish image form with operator symbols and the like, a plain Unicode form, and an ASCII form; I am under the impression that you type it up in ASCII, and then it gets prettified for printing. I'm actually as I type procrastinating from writing some Agda code, and Agda also uses Unicode in its source (it is admittedly heavily tied to its emacs mode, and inputting characters is done by typing in the LaTeX character entity, which the mode converts to the Unicode equivalent.) Agda is still ASCII-tolerant, as well; for example, it understands both → and its ASCII equivalent -> as being an arrow.
The issue is, a lot of languages don't necessarily need this. In a close cousin of Haskell like Agda, it makes sense, because it's very much like writing mathematics on a page, so using Greek letters and operator symbols is expected and will be understood by Agda programmers, especially if there's an assumption that it will be widely printed or read, as Fortress would be. But I honestly can't look at C++ and say, "Oh, it would be almost infinitely better if everyone updated their compilers so I could overload the × operator! That's exactly what I want—to be forced to use a Unicode terminal font so I can read source code over SSH!"
UX implies usability as well as aesthetics. Any time spent worrying about the aesthetics of code is time not spent solving problems. In terms of usability, off the top of my head, two things that I find useful are:
1) Editors with syntax highlighting. I can't be bothered to worry about what typefaces or encoding it supports.
2) Languages that force a consistent approach to implementing a given task. I'd have to throw out Perl and its Tim Toady ethos.
Perhaps I wasn't clear, but I was referring specifically to the aesthetics of the typeface. Good code in monospace is no less beautiful than good code in font X.
I very much believe in beautiful and clear monospace. This can translate into benefits for the time it takes your eyes and brain to interpret what you see. Time is money.
If you're going to generalize across programming languages and assert that they suck at something, I expect so see evidence of knowledge in many, many programming languages. Not to be a jerk, but I don't see that sort of evidence here. (Then again, I don't have the kind of knowledge that would qualify me to make such assertions, or contrary ones, either.)
"Bad" typography isn't a big deal IMO. Yes, it's slightly embarrassing and ugly, in this Unicode-enabled era, that we use <= for "less than or equal to" and != for "not equal to", when Unicode has given us better glyphs. But honestly, I don't see how that costs us much time to get used to. It's like the parentheses in Lisp: more time is spent complaining about things like that than just adapting to it and making the problem go away. Parentheses wouldn't even crack the Top 5 of my complaints about Clojure (which is, despite some flaws, a great language).
Regarding monospace fonts, monospace is a huge asset. Sometimes you want to align a column of numbers, and you don't want 11111 to be narrower than 8888. Code isn't supposed to be "pretty" in the sense as a word document or marked-up Latex file. It's supposed to be easy to read and understand. Aesthetics are a part of this, but beauty at the expense of clarity is a no-go. Technical book writers could easily use non-monospace fonts for code examples and they don't-- for good reasons.
What matters is how the language looks after you've been using it for months. After 6 months using it, does the language still piss you off on a regular basis (note: no language is perfect and all will piss you off sometimes) or do you generally enjoy working in it? I've found that statically typed functional languages (Scala, Ocaml, Haskell) perform the best in this regard. The worst I've seen are Java and C++. But until you get comfortable with the languages, they all look the same (hideous, confusing, foreign).
We need to get rid of that kind of thinking—not just so that language design can move forward, but so that we can quit worrying about languages and get shit done.
Sorry, but... language differences aren't really about superficial appearances. Ocaml could be re-written with S-expression syntax and it wouldn't be much different. These differences are much deeper than that. The shape of code that emerges in a Haskell codebase is dramatically different (more "foreign" compared to conventional imperative programming, but generally less verbose and easier to maintain) from what appears in a C++ or Java codebase.
>"Bad" typography isn't a big deal IMO. Yes, it's slightly embarrassing and ugly, in this Unicode-enabled era, that we use <= for "less than or equal to" and != for "not equal to", when Unicode has given us better glyphs. But honestly, I don't see how that costs us much time to get used to.
<= and != are easier to type than using alt-codes for unicode characters. Unless you want your programming language to come with a keyboard driver for every OS that only is used for that language, there is no reason to use it.
Many well-designed proportional fonts have fixed-width numerals (if not by default, then available as alternates) to address alignment.
I don’t think tech writers necessarily use monospace fonts for good reasons—most likely they’re simply accustomed to it, and it doesn’t occur to them that there’s something better out there.
And this isn’t just about superficial appearances—I focused on poor legibility because it’s a readily apparent detriment to usability, but there are plenty of other things wrong with programming languages—enough to fill a book, which I should probably write. Strictly imperative programming, for instance, is a degenerate style of problem-solving that forces developers to exert far more effort than necessary to get the same amount of work done. Haskell is better, and more productive for me, than any procedural or object-oriented language I’ve used.
Thanks for the comments, though. I really appreciate this kind of long-form feedback because it helps me improve my future writing.
>"Bad" typography isn't a big deal IMO. Yes, it's slightly embarrassing and ugly, in this Unicode-enabled era, that we use <= for "less than or equal to" and != for "not equal to", when Unicode has given us better glyphs. But honestly, I don't see how that costs us much time to get used to.
> Technical book writers could easily use non-monospace fonts for code examples and they don't-- for good reasons.
Actually, Stroustrup did use a proportional font (and an italic one at that) for the code samples in The C++ Programming Language.
I've never seen anyone doing serious research on the effect of this style on comprehension or reading speed. Subjectively, I didn't find it unduly difficult to read. I suspect any concerns over alignment could be overcome by using the same typographical tools we use in other contexts: tabs/tables, proportional vs. lining figures, etc. We already use specialist editors and fonts for programming routinely anyway, so I don't see any inherent reason we shouldn't make them look good. :-)
Elastic tabstops look interesting, but they won't solve any user experience problems until all major editors support them.
Using actual characters instead of substitutes like >= makes sense, except when I look down at my keyboard and try and imagine where all the extra buttons would go. I suppose you could still type the substitute and your editor could convert them to the actual symbols (LaTeX modes in emacs do something like this for you), but you'd have to get all editors to use the same substitutes to prevent another usability nightmare.