Syntax is the last thing you should design

aaron-lebo · on March 5, 2017

The author is right that syntax isn't the be all end all, but syntax often drives semantics and vice versa. More than one language has crashed on the rocks of unforgivable syntax, and even successful languages even have small syntax annoyances that will never go away.

My growing belief is that if you can't express your language using a Pratt parser you need to rethink what you are doing. Once you grok them and get your first one up they're more extensible than anything else - far simpler than a parser generator and very easy to write in any language.

http://javascript.crockford.com/tdop/tdop.html

http://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-e...

http://www.oilshell.org/blog/2016/11/01.html

http://effbot.org/zone/tdop-index.htm

Several of them link each other. Thorsten Bell's book where you write an interpreter in Go uses it, too, to parse his own language Monkey.

zengid · on March 5, 2017

What might be an example of a syntax that can't be expressed (parsed) with a Pratt parser?

vidarh · on March 5, 2017

The right question is probably what syntax can't be expressed cleanly with a Pratt parser. Operator precedence parsing in general can be extended to parse anything relatively easily by introducing different kinds of operators with special treatment, but if your grammar can't be cleanly expressed in a way that naturally results in a sensible parse-tree by applying operator precedence (e.g. asterisk should bind tighter than "+" so that "5 * 2 + 1" is parsed (+ (* 5 2) 1) rather than (* 5 (+ 2 1))) you can end up with ugly special casing.

An example that makes it more complicated (not impossible):

In Ruby "x" depends on context. It can be a method call or a local variable reference. If "x" has been assigned to prior to the reference, it's a local variable. Except if it has a parameter list. "x 1" is a method call. Except if it's within a literal hash. "{ foo: x 1 }" is a syntax error - the parser wants a "," after "x".

Even with Ruby I handle a substantial part of the syntax in my compiler with an operator precedence parser where the operator precedence parser class itself is about ~170 lines of Ruby (excluding the table of operator priorities which adds another 145 lines). But on top of that I so far have ended up with about the same again to "massage" the resulting parse tree into something that's nicer to work with.

The challenge with these types of parsers, though, is generally more often that they are harder for people to reason about. If I express the rules I started with (for "+" vs asterisk) in a BNF-type syntax, I could do it like this:

    expr     = plusexpr
    plusexpr = mulexpr ("+" plusexpr)*
    mulexpr  = simple ("*" mulexpr)*
    simple   = number|identifier

It's quite clear if you've seen some variant of BNF before that "1 + 2 + 3" will go expr -> plusexpr -> mulexpr -> simple, return 1, then fail to find asterisk , find "+", parse a second "plusexpr" find 2 + 3, then exit with (+ 1 (+ 2 3)). And that "1 * 2 + 3" meanwhile will go expr -> plusxpr -> mulexpr -> simple, find "1", then find asterisk parse a second "mulexpr" which will find "2", return up to plusexpr, find "+" and eventually "3", and result in (+ (* 1 2) 3).

But the grammar above, in terms of an operator precedence parser might be expressed by a table like this:

    +, INFIX, 10
    *, INFIX, 20

The rest is "obscured" in the parser function. As long as it's this simple you can probably guess what's going on (the last number is the priority, and the values are arbitrary - only their relative size and whether high or low values are treated as binding tigher or loser by the parser matters), but when you e.g. add things like different types of brackets and parentheses, function calls, operators with different priorities if used as a prefix vs. infix etc. it can easily become hard to understand intuitively from the table of operators how the priorities interact.

EDIT: Modified due to HN handling of asterisk...

zengid · on March 5, 2017

I agree with you that operator precedence parsers are somewhat opaque in terms of how they function (at least I can say Pratt parsers are, of which I am most familiar). One has to juggle a lot of mechanics in their head. Also, they have to be able to imagine the recursive nature of the expression parsing without a nice BNF style definition that resides in one location as you mentioned.

bunnybender · on March 5, 2017

Thorsten Ball's

odiroot · on March 5, 2017

Funnily enough syntax is one of the most important factors for me. Call me superficial but that was my main reason for choosing Python.

paulddraper · on March 5, 2017

And unfortunately, it's also the biggest reason I hear for not liking Python.

I agree with the article. I work with ideas, incidentally written left-to-right top-to-bottom as text.

tomatsu · on March 5, 2017

That's only true if you don't simply go with C-like syntax.

For example, Dart prioritized familiarity. They wanted to make it easier for people who already know JavaScript, C#, Java, or ActionScript.

JavaScript was also made to look somewhat like Java and they even put "Java" right in the name to make it look more appealing to the masses.

Creating completely new syntax is a very risky move which will always hinder adoption.

darfs · on March 5, 2017

Sorry, i thought Java is in the category of "C-Like Languages"?

Maybe my teachers where wrong here. Possible.

tomatsu · on March 5, 2017

Java also uses C-like syntax.

danielhooper · on March 5, 2017

Going to join the "I strongly disagree" camp with Swift as my example. The expressive syntax in combination with the strict type system quite often challenges me to design succinct app architecture.

I personally prefer to omit type information from naming, so for example I would declare a username text field of a view controller as so: `let username = UITextField`. Other devs might declare the text field as a `usernameTextField`, and somewhere else declare a variable called `username` to represent the string from the text field, but now you have a view controller concerned with both the textfield and the data from the textfield. By naming the textfield simply as 'username', I force myself to not have a 'username' value anywhere else in this particular view controller, which results in forcing myself to entirely separate these concerns. I can elaborate on this if someone is interested in trying to get this working in practice.

yxhuvud · on March 5, 2017

Thanks for the link to cheery/chartparser. It is the first complete and readable implementation of the parser type with right recursion optimization and parse forest generation that I've been able to find, and I have been looking as I've been trying to implement this (and failed!).

kazinator · on March 5, 2017

Irony: g++ has an (evidently) hand-written parser that's well over a megabyte of code in one file.

Ace17 · on March 5, 2017

Parser generators really shine when you're experimenting with a new language, whose syntax isn't stabilized. At this stage, you want to be able to change your grammar nearly for free. You don't want to be trading enhancements of your final grammar for parser developement time ; and you don't care a lot if your parser is slow or your syntax error messages are imprecise.

I agree that the syntax of C++ is still evolving, but nobody would qualify g++ as test bed for experiments on C++ syntax. The benefits of using a parser generator here are lower - but I certainly agree that having one megabyte of code inside one file is undesirable!

See: http://www.drdobbs.com/architecture-and-design/so-you-want-t...

lifthrasiir · on March 5, 2017

In addition to the inherent complexity, parsers do not split well in my experience. Probably you can somehow split expressions and statements provided that your language distinguishes them and they do not interact to each other (but it is very frequent that they do interact, especially when you want robust error recovery), but I cannot think of other easy split points.

vidarh · on March 5, 2017

I disagree with this intensely.

For starters, syntax drives how I interact with a language as much as - maybe more than - semantics. How expressions are laid out is intensely important to me, as it affects how I remember and visualise the code. I can visualise the layout of code I have not worked with in years when the syntax is clear, and the code is well formatted.

I can work around painful semantics and find ways to pretend they don't exist by avoiding features or picking patterns that work better; but painful syntax usually stares me in the face ever moment I work with a language.

I have more than once rejected or picked languages based on syntax. E.g. I can't look at a Python program without getting annoyed with the syntax, and I avoid using the language whenever possible over it, and I work with Ruby whenever I can for the same reason (though the language geek in me wants to cry whenever I think about the Ruby grammar)

I also reject the idea of avoiding hand written parsers to start with. I sympathise a bit with the idea. I can see quickly testing changes with a parser generator. And certainly, if you hand write a parser, you need to avoid the temptation of adding all kinds of awful exceptions.

E.g. I love Ruby as a user of the language, but the MRI parser is beyond awful, and I think the syntax could have had most of the nice aspects and avoided most of the awful syntactical warts with a bit more discipline ("favorite" wart at the moment: '% x ' parses to the literal string "x" - "%" when not preceeded by an operand that makes it the infix operator "%" starts a quote-sequence where the following character indicates what the quote character should be - with the exception of a few special character, most characters will set the quote character to its identity. So in '% x ', the quote character is space).

Though MRI uses a Bison parser, but contains thousands of lines of handwritten exceptions, demonstrating both the bad parts of hand writing irregular exceptions into parsers, as well as how easily you can mess things up even if using a parser generator if you have one that isn't strict enough.

But to me, if your hand written parser becomes big and/or problematic to maintain, you're designing a language that will be problematic to parse cleanly, and it's probably worth revising your grammar (I wish this rule had been adhered to for Ruby).

Nice, regular, clean grammars tend to lend themselves very well to small, compact hand-written parsers. In practice I've never run into a situation where a grammar change required major rewrites of a parser in any project I've worked on for this reason, unless the rule deviated majorly from what I'd consider good practice in language design in ways that would cause problems for most parser generators too.

Modularising a hand written parser along the lines of the grammar rules is easy, and few changes cut so deeply across grammar rules to make this difficult.

But what a hand written parser tends to get you over a parser generator, is better ability to do clean error reporting, and better ease of introspecting how parser changes actually changes the processing in ways that are meaningful to mortals. To me at least, this is a lot more difficult to do with ever parser generator I've tried (and I keep hoping to be proven wrong; I've tried writing my own too, to try to prove myself wrong, and so far I've failed to come up with something I consider a usable replacement to handwritten parsers - you certainly can come up with something expressive enough, but it tends to end up being verbose enough to lose most of the benefit over clean code in the target language that saves you from having to deal with idiosyncracies of the generator).

To me the "solutions" offered demonstrate exactly why syntax matters to me:

I deeply admire Forth and Lisp and descendants on a technical level, but the syntax has always been a massive barrier to me for both language classes. I chose a s-expression inspired syntax to kick off my own compiler project by basically treating it as a serialization format for the parse tree, and first adding a parser on top later, but I did that first to be able to toy with semantics of something I didn't intend to make into its own language, and then to act as the "guts" of my in-progress Ruby compiler, not because I'd be willing to work with it more than that.

If anything, I've found it incredibly painful to work with, and I'd never have "held out" for very long without bolting a more human-friendly parser on top very early on. The experience has made me more insistent - not less - on if not starting with the syntax, then at the very least co-evolving semantics and syntax from the outset.

kazinator · on March 5, 2017

> But what a hand written parser tends to get you over a parser generator, is better ability to do clean error reporting, and better ease of introspecting how parser changes actually changes the processing in ways that are meaningful to mortals.

Here is the thing: in a hand-written parser, we can more or less have a function corresponding to a grammar feature. And we can can trace the entry and exit of that function and put a breakpoint on it. And that's "`nuff said", pretty much.

vidarh · on March 5, 2017

You can do that with parser generators too. But I agree, in that too many parser generators do things like generate tables instead. In the past I've written generators that generate recursive-descent parsers that mimic how I'd handwrite them, and that is a lot easier to debug than table based generated parsers.

But the reason I've not pursued that more is that once you've added in various exceptions, and tree building, and error checking etc., you're not saving all that much over just hand writing the thing entirely.

yxhuvud · on March 5, 2017

One could consider how the Ruby parser would have looked like if the parser would have been built around a more powerful parser engine though. Would the developers have abused the added power to add even more exceptions or would the added power have made a lot of the exceptions go away?

(by more powerful I mean more powerful in the way the chart parser from the article is more powerful than an LR based parser, ie that it can parse a bigger subset of CFG in linear time).

vidarh · on March 5, 2017

It's hard to tell. The current Ruby grammar doesn't really need a very advanced engine per se. It certainly could have been written more cleanly even with Bison. It seems more like a case of painting yourself into a corner through in some cases.

Maybe a better engine would have led to more discipline, but I doubt it.

E.g. one of the things people tend to like about Ruby is the ability to leave out parentheses around arguments in method calls. People often disagree on where/when to use it, but it allows for DSL's that looks more "integrated" with the language, for example.

But it causes all kinds of weirdness around the edges. E.g. for starters, you need to look at the symbol table for the current scope to know whether to treat a given identifier as a reference to a local variable or a method call. In itself that's a tiny little infraction, but the reason it is necessary is to prevent big surprises in common cases, and the result is to leave a variety of surprises in more unusual cases.

If a valid local variable name has been assigned to, it's a local variable except in contexts where it would be a syntax error unless you treat it as a method call. If we assume "x=42" earlier in the scope, couple with the "fun" of the optional parentheses, you get stuff like this.

* "x" is a local variable.

* "x 1" is the method call self.x(1)

* "x + 2" is the method call x.+(2) (so on the object held in the local variable x)

* "[x 1]" is a syntax error.

* "y[x 1]" is a nested method call: y.[](self.x(1))

* "z(x 1)" is a nested method call: self.z(self.x(1))

* "z(x 1, x 2)" is a syntax error.

* "z(x 1, x(2))" is a nested method call, that tends to surprise people: self.z(self.x(1, self.x(2)) (most people tends to expect it will parse as self.z(self.x(1), self.x(2)))

* "z(x 1, x)" is a nested method call where "x" refers to two different things: self.z(self.x(1),42)

(with the caveat I've had too little sleep, so I might very well have messed up one or the other of these...)

It's not hard to parse, and not all that hard to understand and reason about, but it is messy, and tack on a number of them and the parser gets ugly, and you create odd "dark corners" of the grammar that most people never run into but that surprise them whenever they do, and that parsers for alternative implementations needs to get right.

naasking · on March 6, 2017

> For starters, syntax drives how I interact with a language as much as - maybe more than - semantics. How expressions are laid out is intensely important to me, as it affects how I remember and visualise the code. I can visualise the layout of code I have not worked with in years when the syntax is clear, and the code is well formatted.

A good syntax could easily hide semantic warts that cause far more problems in real programming. If you focus on syntax too early, you're liable to introduce too many syntactic shortcuts to hide semantic ugliness when you should really address the problematic semantics.

For instance, Rust had lots of syntactic short hands to hide GC'd pointers, owned pointers, borrowed pointers and so on. These small syntactic conveniences produced some pretty code at small scale, but trying to compose independent programs using these concepts into larger systems caused all kinds of problems because of the non-compositional semantics. They ended up clarifying the semantics which led to dropping most of the special syntax for a clean core language, and now Rust is enjoying surging popularity.

vidarh · on March 6, 2017

Is it a good syntax if it hides important details? To me that sounds like a bad syntax. I want a readable and beautiful and clean syntax, but that to me also implies making important details clear.

The most important purpose of syntax is to ensure semantics are presented in a clear, readable and consistent manner. If it hides important information, it fails in that.

Note that the languages I've complained about the syntax of in this discussion, such as Lisp and Forth, have very minimal syntaxes, and it may be that they are too minimal, but it is not that minimalism in itself I take issue with, but that the minimalism sacrifices readability. There are languages with small syntax I admire, such as e.g. Oberon (grammar fits in a page and a half of BNF), where the syntax is very focused on clarity - my ideals for syntax are orthogonal to size/complexity.

wycats · on March 5, 2017

> "favorite" wart at the moment: '% x ' parses to the literal string "x" - "%" when not preceeded by an operand that makes it the infix operator "%" starts a quote-sequence where the following character indicates what the quote character should be - with the exception of a few special character, most characters will set the quote character to its identity. So in '% x ', the quote character is space.

How is this different, in principle, from any other unary/binary operator like plus? In most languages, when `+` is preceded by an expression, it's a binary operator, otherwise it's a unary operator.

The same seems true here: when `%` is preceded by an expression it's binary `%`, otherwise it's a unary operator with the semantics you describe.

vidarh · on March 5, 2017

It's not horribly hard to parse. It is however ugly and surprising to almost everyone that sees it.

The difference between this and "+" is that if I present you with "+ x ", you know that this represents two tokens: "+" and "x". If you don't know what preceded it, you don't know if "+" is a prefix or infix operator, or what parse to return, but assuming the string starts at a token boundary, you can unambiguously tokenize it lexically without additional knowledge.

But for "% x ", you don't know in isolation if it represents the single token representing the literal string "x" or if it represents the infix operator "%" and the identifier "x".

It's an example of one of the features that prevents you from doing bottom up lexical analysis of Ruby without doing a full parse and pushing information down from the parser.

As I said, it's not hard - in the case of an operator precedence parser, if your value stack is not empty when you see "%", then you need to parse what follows as an expression. In the case it is empty, you need to parse it as a quoted string. There are a variety of ways to do that. But it's relatively uncommon for languages to be impossible to unambiguously tokenize without doing higher level processing.

I've also yet to see a single example of Ruby code where this freedom to pick pretty much any quote character has been used in a sensible way - or at all (I certainly have seen cases where "expected" quote characters have been used). Thankfully. So it's an unnecessary wart.

inimino · on March 5, 2017

I am beginning to think there must be something about Ruby that poisons the mind in this way.

There is nothing (major) wrong with Forth, Lisp, or Python syntax, and if you think there is, it's just a prejudice that is slowing you down. Stop being so delicate and get over it.

mschuetz · on March 5, 2017

Might as well use Cobol then, if you're that syntax agnostic. I, on the other hand, can't stand the syntax of python or COBOL and it's one of the major reasons I avoid these languages.

inimino · on March 6, 2017

If COBOL was the right tool for the job I wouldn't let the syntax, of all things, stop me. But that is hardly the greatest of COBOL's weaknesses at this point. An odd example.

If Python is otherwise a good choice for whatever you're doing and you're avoiding it just because of the syntax, that's an example of what I'm talking about and you should just get over it. It's just a change of perspective.

abecedarius · on March 5, 2017

I think you're being kind of mean. I love Forth, Lisp, and Python, and I'd encourage people prejudiced against them to give them more of a chance, but I can also understand how a language can just rub you the wrong way. (Personal example: Go. It has a lot to like pragmatically, but it wants you to code in a particular style. I'd rather code in mine.)

vidarh · on March 5, 2017

The thing is I can see tremendous value in both Forth and Lisp in terms of the semantics, but in some cases we just have to accept that people favour different styles and cross-fertilise the ideas rather than try to make people use the same languages.

One of my old pet peeves that I wish I had time to pursue was to experiment with a language that tried to separate presentation from the semantic model - I know there has been other attempts. I wish that got more attention. It's a tremendously hard thing to get right without ending up hampering communication more than you enable it, but the need for high fidelity interchange of programs is one of the biggest barriers to more rapidly iterating on improvements to the presentation of code.

inimino · on March 6, 2017

You can't design a language with Haskell's semantics and Ruby's syntax. So if you want to use Haskell, you just have to get over your dislike of the syntax. It has a beauty of its own, but you'll never see it until you get familiar with it.

vidarh · on March 6, 2017

> You can't design a language with Haskell's semantics and Ruby's syntax.

Not identical of course, no, but same style would not be a problem. There's nothing in Haskells grammar that can't easily be adapted to a less terse style. It wouldn't solve all issus I have with Haskell by far, but it would go a long way to make me consider it readable.

> but you'll never see it until you get familiar with it.

I am familiar with it. It is how I came to detest the syntax.

inimino · on March 6, 2017

I look at it like someone who wants to learn French. Maybe they live in a French-speaking country and have French-speaking friends. When it comes to learning French, they decide they just can't stand the way it is spelled, and immediately give up. Wouldn't that be ridiculous?

vidarh · on March 6, 2017

French is a bad example, because the syntax is familiar to English users.

How about you consider Mandarin instead.

inimino · on March 6, 2017

Having learned Mandarin, I can tell you that giving up because of the way it is written would be a perfectly sensible thing to do.

However, we are comparing apples and oranges. Learning any language takes years, Mandarin a few more. The syntax of a programming language takes a few days at worst.

When someone invents a programming language where each of 30,000 library methods is a unique character that must be memorized separately, I'll be prepared to grant your point.

vidarh · on March 5, 2017

No, I will not "stop being so delicate". I know what I like to work with, and I know what slows me down, and I am in the lucky position of not having to suffer through languages I don't want to work with. (yes, suffer)

If you think this has to do with Ruby, you are showing your own prejudice.

I first played with Lisp, Scheme and Forth about 35 years ago, and didn't pick up Ruby until about 12 years ago.

If anything it was the opposite. I picked up Ruby because it was finally a language that provided similar power to a lot more austere languages packaged in a syntax I can mostly enjoy, and where the syntax mostly fit my way of thinking.

Yes, the syntax fits the way I'm thinking - how I write and structure the text of the code affect how I think about things, so I care very deeply about how I lay things out. To me the goal with any code is that it must in the end flow like a story or a poem, or I will be as frustrated as working my way through a badly written novel.

To me this is a major difference in how people engage with code. You have the camps that treat code as maths - you see it in extremes in languages like Haskell - and think of it in the abstract, and you have people (like me) who engage with it as exposition.

On top of that, as another dimension, you have the issue of to what extent people care about visualisation and aesthetics of the code. To me that is also important. I don't think of code in concepts, I think of it as visual blocks of exposition, carefully laid out to draw attention to specific words or phrases, like a poem, where shape can also matter.

Telling me to "get over it" is about as useful as telling me I should just start liking a dish I have already tried and know I hate the taste of. No. I have no reason to suffer for no reason.

inimino · on March 6, 2017

If there is truly no reason, then I wouldn't tell you to get over it. But if there are other languages you'd be learning and using, and the only reason you don't is because of syntax, then you have a reason.

To take your example, Haskell can be poetic too, but until you become familiar with the syntax you'll never see it.

vidarh · on March 6, 2017

You keep assuming I'm not familiar with the semantics, and jumping to conclusion about my exposure to these languages without any basis. On the contrary it was through learning them I came to dislike the syntax. I have no doubt you can find Haskell poetic too, but it is not to me.

kazinator · on March 5, 2017

Still, though, you basically back's the article's the point "design syntax last". You like Ruby syntax and it has semantics you like; but the semantics was there first, then Matz came in with the syntax.

"Design the syntax last" does not mean that syntax is an unimportant afterthought.

vidarh · on March 6, 2017

I'm not saying that you shouldn't have semantics in mind, but no, I disagree with designing syntax last as much as I would with designing syntax first.

The syntactical decisions you make influence how you think about a problem as much as the semantic choices you make, and it needs to be a back and forth.

E.g. lisps homoiconicity fundamentally alter the freedom you have with respect to syntactic sugar (though I'd argue that it makes less difference than some want to think). If you leave the syntax considerations to the end, you will have potentially made semantic choices that constrain your syntax in ways you don't want.

This also applies in reverse.

You can't divorse the two.

The semantic concepts that made it into Ruby largely already existed, sure, but not the mix of them, and that mix is strongly affected by the syntax.

One effect of this interplay which I find a bit annoying (though I've looked long and hard for a solution that wouldn't be a grammar-nightmare), for example, is that while Smalltalk can (re)define core language features in a transparent way, Ruby can't, because while Ruby-style blocks can be very unintrusive, they are not invisible, so you can't simulate something like "if" or "while" without creating syntactic "noise".

These are conscious tradeoffs in Ruby - the cleanness of the core syntax was seen as more critical than retaining those features from Smalltalk. If the semantics had been designed first and dictated those features, then the current Ruby syntax wouldn't have been possible. We'd have gained some marginal utility, but we'd have lost readability.

kazinator · on March 6, 2017

OK: let's go with: design syntax any damn time you feel like it. Design it first and put finishing touches on it last. Design it while brushing your teeth or going to work, etc.

Don't give up too easily on Lisp. It also has picked up various syntax which has leveraged people's abilities.

As an exmaple, the backquote notation. Without it, it would be very painful to write macros at just one level of expansion. There is tremendous value in being able to do `(a b c (,@foo)) and so on, and it is a character syntax, like infix.

The possibilities for adding notations to Lisp without disturbing its syntax are not exhausted.

In my own Lisp dialect, I have introduced a modicum of notations which make quite a difference:

   obj.slot.a.b.c   ->   (qref obj slot a b c)

   obj.bar.(method arg)  -> (qref obj bar (method arg))

   a..b  ->   (rcons a b)   ;; construct range object

   [seq from..to]  ;; slice

   (fun arg . rest)   ;; dot notation with atom serves as apply

   (. blah)  -> blah  ;; dot not preceded by anything reduces to atom

   #"word list literal" -> ("word" "list" "literal")

   `quasi @literal @{blah[3]}`

All such little things make the "little code" you write daily a lot nicer.

It's understandable that coders might be put off by verbiage like (concatenate 'string ...) or (slot-value obj 'slot) or (aref array index).

vidarh · on March 7, 2017

> Don't give up too easily on Lisp

You assume I gave up on it easily. I didn't. I simply found nothing important enough to keep suffer the syntax. Not because there's nothing of value in Lisp, but because most things of value have been copied by many other languages by now. I'm not saying that Lisp doesn't have value any more, but that what it provides that other languages doesn't is not sufficient to outweigh the inconvenience of the syntax to me.

> The possibilities for adding notations to Lisp without disturbing its syntax are not exhausted.

The problem with this is that while you can do it for yourself, until/unless it gains traction you still need to deal with the surrounding ecosystem too.

inimino · on March 6, 2017

Furthermore, the semantics of Ruby are not unusual at all.

Being a scripting language in a class with Python and many others, Ruby basically competes on syntax and libraries/ecosystem. As long as the ecosystem is good enough, it's reasonable to choose Ruby over, say, Python for developing web apps purely because you like the syntax.

What I'm getting at is how people will avoid Scheme, Prolog, Haskell, etc. because of "syntax". Which is a loss, and a bit ridiculous as sufficiently different semantics require different, unfamiliar, "scary" syntax. Getting over that is syntax prejudice is just a matter of adjustment.

vidarh · on March 6, 2017

> Furthermore, the semantics of Ruby are not unusual at all.

Depends on what you consider "unusual". It's not unusual if you're used to Smalltalk, Self or Lisp, with Smalltalk being it's largest semantic influence. Given how niche they are, there are plenty of areas that are "unusual" to a lot of people.

> Which is a loss, and a bit ridiculous as sufficiently different semantics require different, unfamiliar, "scary" syntax.

Except they usually don't. There are aspects of this with the Lisp family, where homoiconicity is important, that makes different syntax harder, but even there s-expressions was not the originally intended syntax, and "modern" variations show that there is room for other variants even with that property.

Heck, my Ruby compiler uses s-expressions as an internal representation (that you can "escape into" from source - it's used to implement low level details).

EDIT: You keep falling in two traps: One is to assume that rejecting a language means rejecting the lessons in semantics it brings. Ruby is proof that this is not true - it's a collage of ideas taken from other languages, and predominantly from Smalltalk, another language where syntax is one of the things people balk at. And it contains plenty of lessons from languages that went before - e.g. it has lambdas, it has continuations. It even has call/cc. It's not perfect - that's not the point at all - but it demonstrates that you can "lift" these ideas from other languages and package them in another syntax just fine.

The other trap is to assume that because syntax is unimportant to you, it's unimportant to others, and we just can't be bothered.

This is like being deaf and jumping to the conclusion that just because you do fine without hearing, everyone else should just ignore the extra signals they get from sound.

To me the use of syntax signals intent, and give hints about semantics (when well done) and outline the model of a piece of code. It provides extra information that is lost if the syntax doesn't let me express it. To sacrifice would be like losing my hearing: I'd still be able to function, but why in the world would I willingly give that up?

If there was something that offered sufficiently compelling semantics advances, I might. This is one of the reasons I've spent thousands of hours learning additional languages. But here's the thing: Of all the several dozens languages I've spent time on, very few have been all that different when it comes down to it, and especially modern languages are increasingly borrowing most of the important semantic concepts. Most of the semantic concepts that have been "left behind" have been left behind for a reason (consider INTERCAL's wonderful "COME FROM"; an early example of support for aspect-oriented programming)

This leaves syntax as a much more important differentiator for most than you might think. You've written off Ruby as "not that unusual" on the basis of syntax, while e.g. not considering that Ruby's main lineage are niche languaes like Smalltalk. That Ruby doesn't seem all that unusual is a testament to syntax.

E.g. a concept like being able to execute code in a class definition (not just being able - even "def" is conceptually just syntactic sugar; you can write Ruby code using just the method call define_method instead) is something that seriously messes with peoples mind. That 'require' in most peoples code is not a keyword but a method call that has in most cases been overridden by either rubygems or bundler is something most people aren't even aware of.

It's not that these concepts are "unusual" in the sense that they're not there in other well known languages. Ruby certainly isn't another Befunge or False or Aardappel [1]. These concepts certainly are present elsewhere - this is part of Ruby's Smalltalk heritage. But they're not what most developers are used to. Ruby has brought those concepts into the mainstream in a way languages like Smalltalk never managed. That Ruby is not "unusual" is a feature, not a bug.

[1] http://strlen.com/aardappel-language

inimino · on March 7, 2017

You fell into a trap yourself! Did I say somewhere that syntax is unimportant to me? I am as particular about syntax as anyone you are likely to find.

What I actually am saying is that missing out on the good things a language has to offer because of an inability to get over your dislike of the syntax is a weakness. If you have no taste, there is nothing to get over. That's not what I'm saying.

Yes, yes, I know how you "suffer". That's something you can get over.

I don't know your goals--if you program only to satisfy your own creative urges then it doesn't matter how you choose a language.

Assuming you work on things that other people also work on, then there is a cost to rejecting a language only because of syntax.

For example, if you are doing some work in an area where everyone is using Python numeric libraries, and you refuse to use Python because of fussiness about syntax, then you can't collaborate with those people. There's no good reason for that.

Reasonable people should be able to see when certain choices are arbitrary, and get over their attachment to arbitrary choices when it gets in the way of more important things, like collaboration.

Failure to do what I just described is endemic among programmers, to our shame.

logicallee · on March 5, 2017

It is possible to completely avoid syntax, period, by putting boxes of various colors on a whiteboard and labeling or connecting them with pictographic symbols representing what you want to do.

As soon as you do so you will realize that all modern programming languages are as dumb as a sack of rocks, and the functionality you are coding is trivial.

You can then design the syntax, which is what separates your language from every other dumb as a sack of rocks language.

Quick, name a language where I can do something simple like mention "Whenever this function is called, make sure there is enough free memory (at least ---- MB) before actually calling it; if there isn't, first swap out any objects from memory to disk, starting with the ones that had been used the longest time ago, and after the function has finished running, swap those back into memory."

That's pretty straight-forward and well-specified. Name a language I can do that in?

Or how about this: "analzye this library and include a logically simplified version that only needs to address these cases:" (a list of conditions.). What language will even try to simplify included source code?

Or take debugging. Name a language I can add this line to: "if variables A and B ever both change as a result of the same function call, print the following debugging message:" trivial. Name a language that can do it.

Languages are dumb. They do almost nothing. I can't wait for the future, it can't get here fast enough.

coldtea · on March 5, 2017

>Quick, name a language where I can do something simple like mention "Whenever this function is called, make sure there is enough free memory (at least ---- MB) before actually calling it; if there isn't, first swap out any objects from memory to disk, starting with the ones that had been used the longest time ago, and after the function has finished running, swap those back into memory."

The first fallacy is thinking that this is "simple" merely because it is "well specified".

The second fallacy is thinking that this should be the task of a language.

logicallee · on March 5, 2017

Hey you're right, a language shouldn't let me just describe what my executable should do - and then give me that executable. My mistake!

A language should just be a super tiny syntactic framework that elegantly lets library writers rewrite the same thing they already wrote in C++, Python, Perl, Go, Java etc libraries for - but this time in my language!

It shouldn't really "do" anything. Who needs a programming language with any abilities! That's not what languages are for!

/s

Samis2001 · on March 5, 2017

Here's the problem with that idea: How do you convert your abstract high-level description into a set of specific operations? First you need to parse your specification, and parsing natural language is already hard. Secondly, you have to infer everything else that was requested and will be needed but was not originally specified.

Alright, now you have your list of requirements. Firstly, you must break each requirement down into a long series of small steps that perform exactly one operation - such as add 1 and 1 together. You must also find the correct order for all of the steps, as well as the operations to store and manage all required data in memory as meeded.

Finally, the list must either be translated to the native language of your computer's CPU or ran by a program running on the CPU.

As you can see, this is actually a large and difficult task, even if you manage to not encounter any unsolvable problems along the way. I would think the following comes naturally: The difference between your language and the 'dumb-as-rocks' ones is that the latter are implementable and usable.

logicallee · on March 5, 2017

>Here's the problem with that idea: How do you convert your abstract high-level description into a set of specific operations? First you need to parse your specification, and parsing natural language is already hard. [my emphasis]

If that were true it would be impossible to make a flow-chart based interface to a "compiler" that produces an executable based on the flowchart (without any keywords, variable names, etc). But plainly it is possible to make such a flowchart->executable compiler+visual IDE, so you are incorrect regarding the claim that the specification needs to be parsed lexically. Functionality does not depend on lexical parsing. It's that simple.

vidarh · on March 5, 2017

> A language should just be a super tiny syntactic framework that elegantly lets library writers rewrite the same thing they already wrote in C++, Python, Perl, Go, Java etc libraries for - but this time in my language!

The rewrite issue is exactly why it makes sense to avoid putting things in the language if it doesn't need to be part of the language.

Consider that a more generic way of offering what you suggest is to punt most of it to the OS and have the language/runtime instead "just" offer you a way of having it pre-allocate sufficient space from the OS. Many do in some way or other, either by getting out of the way (e.g. C, C++, any other language with "manual" memory management) or by offering switches or known behaviour to coerce the allocators to act how you want (e.g. in most VM based languages you'll find options to tune the allocation/GC patterns in various way).

In practice there are ways of doing what you want in most language/OS combinations, if not always as cleanly as you might like. If you want functionality like that as part of the language, a more flexible approach would be to e.g. do things like support cross-cutting (effectively inserting hooks before/after function/method calls) or method/function aliasing and provide a means to switch out the object/memory allocator. Doing so would be far more generic, and would allow the functionality you want to be written as library code instead of adding specialised language construct for a specific memory allocation strategy.

coldtea · on March 5, 2017

>It shouldn't really "do" anything. Who needs a programming language with any abilities! That's not what languages are for!

Everything you have available in software (and/or hardware) is the result of some of those languages that "don't do anything" and don't have "any abilities".

You want a "a language that lets [you] just describe what [your] executable should do - and then give [you] that executable".

That's exactly what existing languages already do -- the description happens in the language's syntax.

If you want a declarative description in physical language, you actually want a programming team, not a language. And the description is called a "design document" or "functional requirements".

You will get those when AI replaces programmers. And even then only as much as any set of loose language can be translated to a precise behaving program.

Don't hold your breath for either.

For all the misplaced sarcasm I don't think you understand how computers work.

I, and many others in here, can write a language like one of those you mention (Perl, C, etc). Can you write one like you described you want? You know, to show us all how it should be done, and how we're idiots for now offering it to you?

Reminds me more of those freshmen that hear about assembly being faster and want to rewrite everything, OS, apps, etc in assembly, because faster.

logicallee · on March 6, 2017

You can return to this comment in 20 years and compare the state of languages then with the features programming languages implement in 2017. Today it is my opinion that languages differ most markedly in their syntax and some of the primitives, but generally speaking implement largely similar functionality.

For this reason there are source-to-source compilers (transpilers) and it is very common to host one language in another, in the way that Wordpress was ported to .NET: https://news.ycombinator.com/item?id=13753445 The way in which this was done was to expand .NET to run php.

Languages just don't do very much today, in my opinion. Everything is hard. If you disagree with me then we ca return to this in 20 years and compare the state then and now.

I am not designing another language at the moment. The above is just my opinion.

Turing_Machine · on March 5, 2017

"The second fallacy is thinking that this should be the task of a language."

Yes. This should be the job of the OS, not the programming language.

logicallee · on March 5, 2017

In my opinion a language should do whatever I want. And in the future languages will.

coldtea · on March 5, 2017

Your opinion is more about magic than computer science.

logicallee · on March 5, 2017

You know the quote I'm going to put here :)

inimino · on March 5, 2017

By the time languages have the DWIM feature, AI already took the whole job.

logicallee · on March 5, 2017

This is very insightful. I'll let AI take the whole job, if it means it will just do whatever I want after. Small loss if that means lots of programming jobs disappear.

spc476 · on March 5, 2017

> Whenever this function is called, make sure there is enough free memory ...

Lisp? Forth? Smalltalk? And that's just off the top of my head. But really, an operating system with a virtual memory system will handle that mess for you.

> analzye this library and include a logically simplified version that only needs to address these cases

I think SynthesisOS (an operating system, not a programming language) could pull that off to a degree (the kernel was JIT'ed and therefore, customized system calls could be developed).

> if variables A and B ever both change as a result of the same function call ...

I wouldn't be surprised if GDB couldn't do this. I know it can watch both reads and writes, so this should be possible. How easy is a different answer.

And you know, the best way to get to the future is to invent the future ...

vidarh · on March 5, 2017

> I think SynthesisOS (an operating system, not a programming language) could pull that off to a degree (the kernel was JIT'ed and therefore, customized system calls could be developed).

I keep meaning to spend more time looking at SynthesisOS [1] - it deserves more attention. The JIT'ing as a means to speed up system functions by specialising them could be useful all over the place, and I think one of the most important contributions is that it doesn't need to be done everywhere - you can specialise JIT'ing of very small portions of hot codepaths and get a lot of the benefit at very low cost.

For those not familiar with Synthesis, chapter 3 of [1] is a good place for a quick summary of the techniques used. Section 3.3.1 gives concrete examples.

(The old Amiga user in me is very pleased Synthesis used M68k - makes the assembler examples very nostalgic...)

[1] Synthesis: An Efficient Implementation of Fundamental Operating System Services Alexia Massalin - http://valerieaurora.org/synthesis/SynthesisOS/

renox · on March 7, 2017

If I remember well it had no memory protection, incompatible by design so..

theoh · on March 5, 2017

The specific case of pre- and post- memory management could be handled by hooks (like in emacs lisp) or automatically through an abstraction along the lines of https://en.wikipedia.org/wiki/Resource_acquisition_is_initia...

I'd argue that some languages have some quite powerful syntactic conventions. Pattern matching and algebraic data types are very clear and concise (and statically-checkable) ways to write code. A whiteboard can't compete in the concrete case of covering all the possibilities and proving it. There's clearly more wit in those constructs than in a simple AST, which seems to me to contradict your claim.

prymitive · on March 5, 2017

Quick, name a language where I can do something simple like mention "Whenever this function is called, make sure there is enough free memory (at least ---- MB) before actually calling it; if there isn't, first swap out any objects from memory to disk, starting with the ones that had been used the longest time ago, and after the function has finished running, swap those back into memory."

Why would you need a language for that? Isn't memory management what a kernel will do for you, including this specific scenario (the swap back to memory part will be done if pages are needed in memory)?

logicallee · on March 5, 2017

Why would you need any new language? Likely anything you would want has been done in some language already.

I was simply giving an example of something simple and well-specified that isn't being done.

-

In response to the other comment (sorry this is breaking the comment ordering, but I am going to sleep soon and triggered some submitting too much message):

>And you know, the best way to get to the future is to invent the future ...

Nobody is going to accuse you of being an idea man/woman if you just stand around and wait. (My original comment was downvoted to -3, though this surprised me).

Meanwhile I can pick my battles (what to invent) carefully, away from the dumb as a sack of rocks status quo. Which I was just reporting a part of.

Don't shoot the messenger. (But also, as a practical matter, this story is about how syntax is the last thing you should design - which I kind of disagree with.)

-

I also wish I had added in parentheses (in my original comment) after "pictographic symbols", the words "(or evocative post-modern landscapes that have meaning only to you, the language designer, with a brief sentence underneath each one, of the type that if you were to ever revisit it or have to explain it to anyone else you could)" in parentheses, though now that I see the backlash against my comment, it doesn't really matter. The reason I can't add those parenthetical words is because I am outside my edit window.

webmaven · on March 5, 2017

> I was simply giving an example of something simple and well-specified that isn't being done.

But as has been explained elsewhere in this thread, it is being done, but either by the OS or the VM, rather than as part of the language semantics.

Improvements in the explicit handling of memory allocation are both possible and desired in programming languages (eg. Rust has some interesting memory-related syntax and semantics), but this particular suggestion of yours just isn't well thought out.

logicallee · on March 5, 2017

You picked one of three examples I gave - it doesn't matter if they were well-thought-out, I just gave three examples of things that aren't easy in any of the languages I code in. That's because every language I code in is ALMOST the same (there are translators that translate one to the other robotically, for example) - their biggest difference is syntax. Of course they differ a lot in their functionality - but none of them have much of that to begin with, in my opinion. Today's languages are tiny and hardly do anything. In my opinion. (Check this comment in twenty years and I think you will find I was right.)

AndyKelley · on March 5, 2017

> Quick, name a language where I can do something simple like mention "Whenever this function is called, make sure there is enough free memory (at least ---- MB) before actually calling it; if there isn't, first swap out any objects from memory to disk, starting with the ones that had been used the longest time ago, and after the function has finished running, swap those back into memory."

Can you imagine trying to debug a program that used such a language feature? What a nightmare.

logicallee · on March 5, 2017

Yeah it's a lot easier to reason about languages that are as dumb as a sack of rocks - they're so well-defined!

Basically just syntax. Add a bit of syntax highlighting and you know EXACTLY what's going on.

Which is my point.

I look forward to languages that make it trivial to do super hard stuff. If debugging them is hard, make it easy to ask them what's going on. (Also difficult today, again, because they are as dumb as a sack of rocks.)

Every example I gave was super well-defined.

al2o3cr · on March 5, 2017

"I'm going to remodel my nasal cavity to accommodate a swarm of bees" is a well-defined goal - it just isn't something you'd want to do, even if you've got access to tooling that makes it straightforward.

webmaven · on March 5, 2017

Sure it is well defined, but the resulting performance of the program isn't necessarily deterministic, it will be super sensitive to whatever else is going on in the machine as well as whatever else the program is doing.

Why would you want to add a syntactical feature that duplicates the services the OS is already providing and makes the performance of the program less predictable and consistent?

klibertp · on March 5, 2017

> Quick, name a language where I can do something simple like mention [...]

Every single language out there is capable of doing all these things. Like, even Python fits the bill:

> Whenever this function is called

Use decorators to add pre and post processing to any function.

> make sure there is enough free memory (at least ---- MB)

Call out appropriate OS function (http://stackoverflow.com/questions/2513505/how-to-get-availa...) via ctypes. Or use Python's own `gc` module, if you're only interested in your process' memory.

> if there isn't, first swap out any objects from memory to disk

Use `pickle` to serialize your objects, use advanced meta-programming to create proxies for the serialized objects which you can then use (store in dicts, list, pass to functions, etc.) which defer loading from disk until needed again in the future. Remember to initiate gc cleanup after you're done dumping things to disk.

> starting with the ones that had been used the longest time ago

Store access time on objects, sort by that time when serializing.

> and after the function has finished running, swap those back into memory.

Touch every proxy object so that it reads and unpickles the object it replaced earlier.

--

So, let me repeat: every existing language lets you do all of that and more. The amount of work needed to make all these differs by language, of course, but you can do all that if you want, in any language you choose.

I don't know what you're thinking. Given that the above is obvious to any competent programmer, could it be that you're a simple troll?

logicallee · on March 6, 2017

Everything you said is much more complicated than the simple directive I gave (as an example of functionality).

You should not accuse others here of trolling, you can read the guidelines here:

https://news.ycombinator.com/newsguidelines.html

Please be civil on HN.

I further gave you an extremely difficult functionality, which I phrased as " 'analzye this library and include a logically simplified version that only needs to address these cases:' (a list of conditions.)". You ignored this.

This is obviously not functionality any language includes. It is not easy or anything like that.

If you don't like my examples, you can come up with your own. Pick something hard that programming languages don't do.

You are correct that most programming languages have most of the same functionality, and differ only in their syntax: this was my point. For this reason there are transpilers (source to source compilers) which output a different language.

If you don't like my examples, pick your own. Anyway in this comment I have already quoted myself saying such an example, which you ignored.

My overall point is that most languages don't strive for such features. They most markedly just differ in syntax. You confirmed this at the top of your comment.

Thanks for your additional thought, and please remember to be civil.

klibertp · on March 6, 2017

> Please be civil on HN.

Please don't use phrases like "dumb as a sack of rocks" when referring to someone's work.

Such a waste of time...