There are several technical advantages to the types-to-the-right style. For one, it's often easier to write a parser for: having an explicit let or val keyword makes it immediately obvious to a parser without lookahead that the statement in question is a declaration, and not an expression. This is less of a problem in languages like Java, where the grammar of types is simpler, but in C and C++, if you begin a line with
foo * bar
then it's not yet clear to the parser (and won't be until after more tokens are observed) whether this is a variable declaration for bar or a statement multiplying foo and bar. This isn't a problem for name-then-type syntaxes.
On a related note, it's also often advantageous for functions to indicate their return type last, as well, especially when the return value of the thing might be a function of an earlier argument. There are plenty of examples of this in functional or dependently-typed languages, but even C++ (which historically has listed the return type of a function first) has added an alternate (slightly clunky) syntax for function types where you can specify the return type after the arguments for this reason:
template<typename Container, typename Index>
auto
foo(Container& c, Index i)
-> decltype(c[i])
{ ... }
The parsing problems have long been solved. The `foo * bar;` issue is solved in D by the observation that the * operator returns a value, and no use of the value is made here, so it must be a declaration. No symbol table lookup is necessary. (D explicitly does not rely on a symbol table to parse.)
It is possible that the * operator is overloaded and the overload has side effects that are relied upon here, but D discourages overloading arithmetic operators for non-arithmetic uses, and considers the side-effect only case as particularly wretched and so has no problem not supporting it.
Lookahead issues are also trivial to solve, so shouldn't be a factor in modern language design.
That's why C is usually parsed with a symbol table, which is the way it was meant to be parsed. When parsing "foo * bar;", then it should be clear after parsing "foo" whether the statement is a variable declaration or an expression syntax. (It's a variable declaration if "foo" is registered as a type in the symbol table).
The advantage of this style is that types align nicely to the left and no additional keywords (like let) are required. The disadvantage is that parsing becomes highly contextual. We can't start with parsing somewhere in the middle of a file.
In C, for example dollar-sign and at-sign are unused and could be used for pointers. But I don't like those. Both are visually clunky characters.
Personally I like how it is in Pascal - caret sign for pointer types, and postfix unary operator caret for dereference, like^.this which doesn't need an additional syntax ( like->this is just (* like).this in C). But now, caret is usually also the XOR operator.
The deeper problem is that Pascal cannot use caret as XOR binary operator since postfix operators cannot easily be distinguished from binary operators with only 1 token lookahead - it would require more lookahead or type resolution to determine that it's not a binary operator. While in C the deref operator is a prefix which can be distinguished from binary operators without lookahead.
Surely you can have such a keyword in either case. Both of these are certainly possible:
var int x = 1
var x int = 1
I don't see that as an advantage of types to the right so much as an advantage of choosing to define the grammar to include a terminal symbol that makes parsing easier.
It is true that newer languages seem to be more likely have a keyword like 'var', but that has less to do with types left or right and more to do with type inference. If types are optional, then there needs to be something that remains which identifies it as a variable declaration.
OK, I see what you're saying, and I have to partially agree.
You could say there are two parsing challenges: knowing that you are inside a declaration (as opposed to an expression, statement, etc.), and parsing the parts of the declaration once you know you are in it.
If you have a (required) keyword like "var", then the first challenge is solved in any case. (And that is the challenge that was actually mentioned in the above comment.)
But yeah, the second challenge exists, and since variable names are exactly one token long (in any language I know of), it's easier for them to come first. Sometimes. It depends on the language, though: if the language has a separator terminal symbol ("var int : x"), then I don't see how the order matters. Some languages do and some don't.
Not at all. C/C++ grammar is explicitly contextual. `foo * bar` is a declaration if-and-only-if `foo` was previously declared as a type. Otherwise it is an expression. The subsequent tokens have zero bearing on this.
Nor is this a good argument for types-to-the-right. If C named pointer types like `⋆foo` instead of `foo⋆` [stars used to avoid formatting glitches], then putting the type to the right as `bar * foo` would be equally ambiguous as your example.
I recently updated my editor (https://github.com/DigitalMars/med) to highlight in yellow all search matches on the screen. It works so damn well I am disgusted with myself for not thinking of this decades ago.
It neatly finds the first use, etc. It also does a word search so `it` does not highlight when searching for `i`.
Easier for the machine means easier for people to write tooling for the language. Imagine if you could ast.parse() C++ code the same way you could for Python code.
There's something that's unsatisfying about most of these examples which is that a sane person wouldn't write code that multiplies two objects and throws away the result, so people might think that if you added extra constraints to prevent depending on silly side effects, the issue would go away. This is definitely false, but it's not exactly obvious if you don't think about it too hard... I wish people actually paid attention to it when constructing counterexamples.
So, for anyone else feeling similarly unsatisfied, here's another example that might be more satisfying in that respect:
y = f(g<T, U>(x));
In conjunction with the fact that templates are already Turing-complete, we can see that detecting whether this is a call to a templated g or two comparisons is a Turing-complete question.
Also notice a similar parsing issue arises in C# as well, but I don't think generics there can be used to perform Turing-complete computation... though not sure.
You can use generics and overload resolution to solve satisfiability problems in C# and Java. But not in the way like templates in C++ allow for compile-time computation.
But you can very easily get the AST.
Obviously it may happen that you have to wait until the hell is frozen to get a result, but generally it doesn’t happen in code that you see day to day..
Oh, but you do, because the parser will invariably leak into the way you write code. If your code takes twelve times longer to compile [1], or you need to add a space because the parser thinks that ">>" at the end of a nested template is ambiguous [2], or any number of "hacks" we have internalized as idioms because they make the parser happy, it's not really a problem with your machine anymore.
[1] doesn't have anything to do with parsing (the problem would exist even if the compiler were fed the AST directly; the parser does not perform type inference).
Please reconsider your tone. It sounds like you were disappointed that you didn’t get to show off something you know, so you chose to insult someone else instead.
The site would be better without such comments. This might make you feel better but it’s just noise that hundreds of people have to skip over.
I don't think it has much to do with type inference. It's more that type systems became more complicated, and so did type names. And with a long composite type name, the name of the variable gets pushed too far out and obscured. It worked great in Algol, and still works pretty well in C (although that is partly because it splits the declarator to keep array and function syntax to the right of the variable name), but in C++ with templates it's already hard to read.
There are also a variety of issues with parsing it that way, most of which go away entirely if the name is first.
"One merit of this left-to-right style is how well it works as the types become more complex." ... "Overall, though, we believe Go's type syntax is easier to understand than C's, especially when things get complicated." ... "Go's declarations read left to right. It's been pointed out that C's read in a spiral! See The "Clockwise/Spiral Rule" by David Anderson."
I am still of the opinion that not putting some sort of visual break between a variable name and the type is somewhat unpleasant. Almost every other language uses a colon to make these easier to visually parse.
I'm curious if you also believe the same about key/value separation in map literals - it could also be done with juxtaposition, as in e.g. Clojure, but Go still uses a colon there.
In C the type of a variable is "variable-centric" rather than "type-centric": it's denoted by syntax akin to the one you would use to use the variable. So if you have
int (*arr)[2]
that means "if you dereference arr, the expression represents an array of 2 elements".
So, for example, if you want a pointer to a function that returns a pointer to an array of two elements, how do you denote it? You'd go step-by-step based on how you would use it:
int a[2]
int (*f)[2]
int (*(*fp)())[2]
Once you understand this, the typing syntax should make sense, and you see that any description like spirals or whatever misses the big picture.
Main issue here is that such syntax is write-only. Every time you read this expression later you have to go through the mental gimnastics you've just described to understand it.
So how do I declare a function pointer for a function that takes two arguments and returns a value, all of which are function pointers of that same type?
I mean if you take the grammar to be part of the official language specification and you believe it prohibits recursive typedefs then isn't that sufficient? It seems a bit like saying it causes you trouble on occasion that they never explicitly spelled out that 'deftype' isn't a synonym for 'typedef'? I would've assumed that if something is not valid in the grammar then it's not valid in the language described by it.
Well the hope is that somehow I'm misunderstanding the grammar... because there simply must be a way to do this. How can the language committee leave this flaw unfixed?
I think that's a strange rationale and it doesn't fit the philosophy promoted by golang otherwise, which is that we should optimize for the 80% or 90% cases.
Well, the parsing issues go away mostly because the actual pattern is "keyword on the left, type on the right". `let a: My_Type` is a whole lot easier to parse than `My_Type a`, and this goes even further when instead of My_Type you have something more complex, due to the C-family syntax trick of having "declaration mimic use".
`a: My_Type` or even `a My_Type` (assuming that juxtaposition is not used for something else) is easier to parse, since you only need one token of lookahead here to know that it's a declaration, and that the rest is a type constructor - and this is true regardless of how complicated that type constructor is.
For an extreme example on the other end of the spectrum, in C++, for something like this:
a<>::b<c>d;
It's impossible to even say whether it's an expression statement or a variable declaration, because it depends on what exactly b is. In fact, it might be impossible to determine even if you have the definition of b in pure C++. Consider:
template<size_t N = sizeof(void*)> struct a;
template<> struct a<4> {
enum { b };
};
template<> struct a<8> {
template<int> struct b {};
};
enum { c, d };
int main() {
a<>::b<c>d;
}
Now whether it's a declaration or an expression depends on pointer size used by your compiler.
Needless to say, this is all very fun for tools that have to make sense of code, like IDEs. And I think that's another vector for a different syntax - as "smart" tooling (code completion etc) became more common, PL designers have to accommodate that, as well. C++ helped by first becoming popular, and then teaching several painful lessons in that department.
Looks to me like the world had mostly settled on types-on-the-right already in the 1970s, except C was an anomaly and languages which imitated its syntax in other ways often imitated that too.
Types are moving to the right because having them on the left makes the grammar undecidable. Languages like C or C++ can only be parsed when semantic information is passed back into the parser.
For example, consider the following C++ statement:
a b(c);
This is a declaration of `b`. If `c` is a variable, this declares `b` of type `a` and calls its constructor with the argument `c`. If `c` is a type, it declares `b` to be a function that takes a `c` and returns an `a`.
> Types are moving to the right because having them on the left makes the grammar undecidable.
This is just an artifact of how they designed the syntax in some languages like C++ or C#. You can easily put types on the left in a way that makes this false.
I disagree with the fact that types are moving over to the right side for readability's sake or anything like that. Modern theorem proving languages explicitly specify that type declarations are simply equivalent to set inclusion, which makes the type specification operator (usually :) equivalent to set inclusion (∈). This influence was what rubbed off onto modern MLs, and by extension, inspired many of the ML inspired/type safe languages that have come out recently.
Type annotations on the right reads way more naturally to me e.g. "customerNameToIdMap is hash map of strings to UUIDs" over "there's a hash map of strings to UUIDs called customerNameToIdMap".
int num_entries = ..
float div = ..
float result = num_entries // div
vs
num_entries: int = ..
div: float = ..
result: float = num_entries // div
And that's an extremely simple example. I tried out Nim but dropped it because while writing some code with list of lists of different types etc it became completely impossible to parse it quickly.
I don't see much difference here. It's probably not a big deal when the type annotation is short but if it's something longer reading the short variable name first helps give some context first. When scanning code I think I read the variable name first as well.
Type inference should be able to deal with the simple cases anyway.
It's from Pascal; at least I can't think of any earlier language that had it. It's very noticeable when you compare it to ALGOL W, which Wirth did just before Pascal.
Nitpick, but in the graphic at the beginning, C# is presented as being a language designed in the 21st century when it in fact was doing designed in, and released in the last year of, the 20th.
Just a note, nowadays people tend to use : to denote the type of a value in type theory, while ∈ is used in set theory to denote that a value is a member of a set.
Is there a name for side on which the type declaration sits? Left/right typed? If that's the case, I'd argue for back/forward or start/end to account for right-to-left languages.
That first chart really annoys me because rather than giving us the names of the languages it uses a logo, which do not really describe what languages are if you do not recognize the logo. And generally it's an extra cognitive layer and load on your brain. Makes it pretty unreadable.
Interesting observation. Is a wide enough sample of languages considered? What scares me is that we still use text files to program in 2019 such that we are having this conversation. The author makes a good observation nonetheless.
Why would it scare you? Computers have keyboards. Majority of program code consists of identifiers which is 100% text. Each line can contain several AST nodes without severe complexity issues in the average case. Unless someone invents a brain to machine interface computer languages will be based on text for the forseeable future.
There's no such thing as writing a conceptual entity like an AST "directly", you always have to use the mediation of some UI, be it text or a GUI. And you can definitively have ambiguous GUIs.
That is essentially the way it is done in Scala (2004), F# (2005), *ActionScript 3 (2006)*,
Go (2009), Rust (2010), Kotlin (2011), TypeScript (2012), and Swift (2014) programming languages.
The latest TIOBE index [0] put both ActionScript and TypeScript in "The Next 50 Programming Languages", from rank 51 to rank 100
also Pascal is mentioned at rank 204
based on that yes ActionScript do make the cut in term of popularity, or you can also remove TypeScript and Pascal from the article
also hypocrisy much?
in one of your other comments
You are welcome. D did not actually made it to the list
based on the language-popularity-selection criteria I’ve used,
but I thought it is prominent enough to be included and
overrode my criteria for it.
so that part "prominent enough" is completely subjective right?
is pretty clear. Although I'm not convinced that declaring multiple variables like this should be allowed at all. Imo it's clearer to declare them seperately when they're used.
For example:
for (let i in 0..n) {
for (let j in 0..i) {
// do stuff
}
}
It's not so much that it's clearer, it's that it reads more naturally in English: "integer i" is very much akin to, say, "president Lincoln". Which is probably why it was adopted first.
I suspect it's also why the colon is often used when the order is reversed, even when it's not strictly necessary to disambiguate parsing - you want something that reads "i is integer", and colon already kinda sorta has that meaning.
In JavaScript with prototypical inheritance, where you inherit from an instance rather than a type, it becomes very confusing what Foo would mean in a line saying Foo bar. So there some more explicit distinction is needed which is easiest to do with colon and moving to the right.
So much easier to read and write code with inferred types.
Though my co-workers obsessed with writing ‘good’ code refuse to use them. They also refuse to write comments because their code is so good ‘it documents itself’.
While C++ auto is nice to save on some typing I think when you have to go look at the function definition to figure exactly what type you're dealing with makes it sometimes inconvenient.
Admittedly, I don't use any sort of IDE that does fancy mouseover type checking and only occasionally poke at C++ but there's one lib I wrote[0] where I had to remove a bunch of autos so I could figure out exactly what was going on (and, ironically, make the code "self documenting").
--edit--
[0] OK, translated from java to C++ so I also had to figure out the logic behind it to do any sorts of modifications.
I think most programmers end up in this place and I think they are right, at least concerning the reading part (which is often more important since most code is read more often than written)... How long have you been programming?
I disagree - there are places where explicit types help, but for most local variables they're just noise. An accurate name for a variable is much more important than a type when it comes to figuring out what the code does that involves it, and types are a mouse hover away in any decent tooling anyway.
(I've been programming for 20 years, and mostly in statically typed languages.)
20 years. I find reading code with inferred types much much easier. The signal is what the code is doing. The typings are noise.
Edit: In a big complicated system, knowing the type like ‘user’ or ‘account’ doesn’t tell you much. There’s always more information and context. A small type definition is only the tip of the iceberg.
This is one of those aspects where a good editor makes a big difference.
I've recently moved from writing javascript in vim, to writing typescript in vim, to writing typescript using VSCode. Being able to hover over a variable and see a variable's type makes it easier to read complex code. This is useful irrespective of whether or not you're using type inference, since you don't need to hunt down the variable's declaration to figure out its type. And if you're reading back the compiler's type information through your IDE while programming, you may as well take advantage of type inference.
I'm not a big fan of types, either. As a C programmer I use types pretty much only as much as they are required (indicate to the compiler how information is represented). It's mostly integers of various sizes, plus a few floats and structs and typedefs. Still, types in general convey more information to me than the variable name. For example, it's important to me to group fields in structures by their type to prevent superfluous padding and to get a feel of cleanliness. I tend to think of types as the relevant information that says what actually happens, and of names I tend to think as secondary information that gives an impression why it should happen.
int[] UserIDs; If you had a choice of knowing the variable was an int[] or knowing it’s name was UserIDs the latter would be much more useful.
This is why new languages are trending towards inferred types. Much easier to write and read (you get your padding taken care of for free as well when you use var)
Also ‘what’ is happening is the actual flow and logic of the code. ‘Why’ it is happening is in your comments.
> (you get your padding taken care of for free as well when you use var)
How so? Padding is inserted when the layout (order of fields) is bad. There are no language that reorder structures, are there?
> Also ‘what’ is happening is the actual flow and logic of the code. ‘Why’ it is happening is in your comments.
That's certainly one way to put it. But I was relating to the "show me your tables..." quote's kind of feeling. So, to be more clear, maybe more to the "what is possible" vs "what have the authors currently thought of / implemented".
I wouldn't under-estimate the influence of Pascal here. Noted language designer Anders Hejlsberg for instance has a strong Pascal background (Delphi) and type location in C# was supposedly a heavy debate precisely because of that. Typescript benefited from the repercussions/learnings/findings of that debate and decades of C# usage.
That said it's probably a "both" situation: ML and Pascal are both clear founders of type descriptions and their both deciding on relatively the same type syntax within just a few years of each other likely a convergent evolution indicator.
… which got it, I think, from Milner's exposition; see, for example, p. 351 of https://www.sciencedirect.com/science/article/pii/0022000078... . This notation makes particular sense for function types, where a mathematician would write `f : A → B` even if not thinking type-theoretically.
I'm not convinced. The article found one situation where it makes more sense for the type to be on the right, but in most situations it makes more sense for the type to be on the left. The reason being that it reads more naturally. It's the difference between saying "Golfer Tiger Woods adopted a cat" and "Tiger Woods, golfer, adopted a cat." Nobody speaks the latter.
It stops reading naturally as soon as you get composable type constructors, and people start actively using them. Even in English, once there's enough qualifiers, we move it to the right - e.g. "Tiger Woods, a famous but controversial golfer, adopted a cat".
> "Tiger Woods, a famous but controversial golfer, adopted a cat".
"Famous but controversial golfer Tiger Woods adopted a cat" might be awkward, but it's hard for me to buy an argument that it's either unnatural or even particularly hard to understand.
>but in most situations it makes more sense for the type to be on the left. The reason being that it reads more naturally
Not really. And the Tiger Woods example is a whole phrase with a verb and an noun (describing what he did to whom), so not representative of a type declaration (which is just just describing what thing something of a specific name is).
So, it's more like:
Tiger Woods: a golfer.
vs:
A golfer: Tiger Woods.
Which of course is an argument for types on the right -- the first reads much better.
'Naturally' is almost always a red herring in programming language design, because what is 'natural' tends to be 'whatever I was previously familiar with'. There are a lot of unstated assumptions here:
· That what's best for natural language is also the best for programming languages. This is still a debated topic, but my personal feeling is that the two are different enough in both mechanism and purpose that what's good for one isn't necessarily good for the other. (For an illustrative example, look at how sigils worked in Perl 5, as they were explicitly designed to work like English demonstrative words like "that" or "these", and many programmers when first exposured to Perl feel like they were 'illogical'. They weren't, but in this case, mirroring a natural-language convention tended to obscure, rather than clarify, what was going on in the language!)
· That phrase ordering in English is necessarily 'natural'. Lots of naturally-occurring spoken languages feature word order that differs tremendously from English!
· That the English-language phrase you're describing as 'unnatural' is in fact unnatural: it's actually a common convention, especially when you're introducing a new fact to a conversation! "Tiger Woods, a famous golfer, adopted a cat."
On a related note, it's also often advantageous for functions to indicate their return type last, as well, especially when the return value of the thing might be a function of an earlier argument. There are plenty of examples of this in functional or dependently-typed languages, but even C++ (which historically has listed the return type of a function first) has added an alternate (slightly clunky) syntax for function types where you can specify the return type after the arguments for this reason: