Hacker News new | past | comments | ask | show | jobs | submit login
Syntax across languages (rigaux.org)
116 points by catern on Feb 1, 2015 | hide | past | favorite | 39 comments



Something that recently caused an interesting amount of problems for me was the simple fact that the modulo operation with negative numbers changes its behaviour depending on the programming language: http://rigaux.org/language-study/syntax-across-languages/Mth...

For a more complete list: http://en.wikipedia.org/wiki/Modulo_operation#Remainder_calc...


While we had to implement the Modulo operation as part of an assignment in assembler, several other students and me started discussing this topic, ultimately settling on "do whatever you want, just write down what it’s supposed to do", because we couldn’t really agree on which would be the better solution.


Actually, if you consider strict nomenclature, modulo should be positive since it's a mathematical concept and that's what it's mathematical definition would require.

The fact that it has been (ab)used as a shorthand for the remainder of integer division is unfortunate, but that suggests there ought to be a "remainder" keyword (or operator).

We've gotten into a bad habit of calling things by the name of their implementation rather than by what we actually expect of them. This confusion is just a side effect of that problem.


This is why Racket has both.


And Common Lisp, despite this page not having it listed.


And Clojure too (mod and rem).


There's a great guide like this for Lisp dialects that I often refer to here: http://hyperpolyglot.org/lisp

They also have a number of similar comparative guides on their main page now: http://hyperpolyglot.org/


Interesting, but it seems to be unaware of some Clojure libs. It suggests,

  (.trim " foo ") ;; Java interop
instead of,

  (trim " foo ") ;; With clojure.string
Clojure.string is available from Clojure 1.2, which is ostensibly the version used in the comparison, yet is for some reason not actually utilized. (By the way, we're approaching Clojure 1.7 at the moment)


I did notice it seemed to use the Java fallbacks a lot, yeah, but I don't use Clojure much so didn't realize it was that out of date on the CLJ side.


I just skimmed this, but a large fraction of their information was just wrong for common-lisp, even if you assume a standard readtable.

One example: they state "anything without a space and is not a number" for identifiers; it's hard for that to be more wrong.

The actual rule is: a single token that could either be a symbol or a number should start with any non-macro character, and can be made up of any combination of non-macro and non-terminating macro characters, as well as any escaped characters. For example:

    a\ b
Is a valid symbol.

Now that we have a token that is either a symbol or a number, we need to distinguish which; that's slightly more complicated, so I'll link the spec (which includes an EBNF): http://www.lispworks.com/documentation/HyperSpec/Body/02_ca....


Now I kind of want to cross-reference this with the TIBOE index and make a frankenlanguage with the most popular syntax.


Didn't PL/1 try that? (Or maybe they made the classic error of "worst syntax" instead... not sure. Been a few years since I looked at it =)


Disappointed there's no integer division, or explanation of the different division operator behaviours (some languages have separate integer and float operators, some use floats if indivisible but otherwise integers, some do integer or float depending on the operand types, etc.)


I personally find the site "Rosetta Code" [0] much more useful for this sort of thing.

[0]: http://rosettacode.org/wiki/Rosetta_Code


Fun read. A nit, if the author is reading:

C# allows Unicode letters as identifiers, so the variable regexp should reflect that. This might be handy: http://www.regular-expressions.info/unicode.html#category

Also, would love to see Go in there. Similar identifier regex.


So does C, although compiler support is limited.


A cursory skim leaves the impression of just how alone Haskell is on many syntactic choices (though supposedly with alot in common with "Merd".) I never took the time to learn Haskell, but it always looks beautiful on an aesthetic level, which differentiates it from the other loners in the list.

Can anyone point me to any papers/documentation/discussion on WHY certain syntax decisions were made? Many could have been arbitrary choices to match the constructs, but the way the community presents itself makes me think there's more to it.


The best part of Haskell is that function application is just whitespace.

    f x = f(x)
I would attribute almost 90% of it's elegance to that idea alone. (This is not an exact number or measured estimate.)

I also have a major thing for pattern-matching, and it always seems nice when you know that's what you're looking at. For instance, it is far easier to understand what `int * * var` is in C if you see it as a kind of pattern de-structuring.


>The best part of Haskell is that function application is just whitespace.

It's the worst part, in my opinion. Is A B C D equal to A(B(C(D))), A(B,C,D), A(B,C(D)), A(B(C),D) or A(B(C,D))? Often you need to disambiguate with parentheses, so I guess there is some default interpretation that you can assume when reading code, but in the end it just looks like an unnecessary mental translation step.

Lisp has weird syntax, but lisp has a reason to have that syntax. Haskell has a weird syntax because it wants to either be different for different's sake or to just be plain inaccessible.


Consider that, in Haskell:

1. all functions take a single argument (i.e., are curried)

2. function application is left-associative

So "a b c d" is just applying the function "a b c" with the argument "d", the same as "(a b c) d". "a b c" is the application of "c" to the function "a b", which, in turn, is the application of "b" to "a". So "a b c d" is the same as "((a b) c) d".

I personally find using the space symbol for a core concept of the language (function application) very elegant, and not dissimilar to other languages that strive to have simple and consistent syntax, like Smalltalk, where "obj meth1 meth2 meth3" is the same as "((obj meth1) meth2) meth3", even though Haskell and Smalltalk might be as different as programming languages can get :)


How does currying help? You still have to figure out which argument belongs to which function. GHC demands parentheses if you try to write A(B(C(D))) as A B C D, so I'm pretty sure I'm right.


No, you're wrong. `A(B(C(D)))` is different from `(((A B) C) D)` (which is the parenthesized version of `A B C D`).

Currying allows simple partial application; if A is a function that "takes 3 arguments" (i.e. it has type `w -> x -> y -> z`, which means that it takes an argument of type `w` and returns a function with type `x -> y -> z`, which takes an argument ...), you can say e.g.

  A_ = A B C
  E1 = A_ D1
  E2 = A_ D2
where `A_` is the partially applied function `A`.


> No, you're wrong.

And you and he both gave evidence to the initial assertion that "it's confusing".


I'm not sure if "confusing" is fair -- just different than C-family languages.

The rule is simply "The first expression is a function, all following expressions are arguments to that function"

It's no different than "The first symbol is a function, all comma-separated, parentheses-bound expressions are arguments to that function"

Both are pretty easy syntactic rules, I think.


Its arguably confusing, due to the lack of parentheses factoring a list of functions into one or more invocations. How is the nesting done here? Something more subtle. A lifetime of reading parenthesized functions is going to be confounded by changing the convention - which equals 'confusing'


I think you're probably overestimating the difference. Here is an algorithm for mechanically transforming a C function call to a Haskell function call:

    for all arguments in the function
        if the argument is not a symbol surround it with parentheses
    delete all commas
    delete the two initial parentheses 
And an algorithm for converting Haskell calls to C calls:

    put an open parenthesis between the first expression and second expression
    put a close parenthesis at the end of the expression
    put commas between all expressions between these two parentheses
The fact that it's so easy to convert between them highlights to me that they are both pretty easy.


I understand currying and partial function application. That's not the problem. The problem is that function application whitespace demands an arbitrary application precedence and obscures function signature to the reader.


ah, that. Well, you get used to it. Also, it helps that application has the highest precedence, so it's quite easy to read expressions like `f x y + g z` (i.e. `(f x y) + (g z)`).


You don't have to figure out which argument belongs to which function.

The syntax is "The first expression is a function, all following expressions are arguments to that function". Fullstop.

It would be unfair of me to say "C has confusing and ambiguous calling conventions, the compiler complains if I write A(B(C(D))) as A(B,C,D)"

Because you would respond "Well of course, that's just the calling conventions of the language!"

To which I say "Exactly"


A B C D is necessarily equal to A(B,C,D) in C-family syntax due to the precedence rules, there is no ambiguity.

The reason whitespace is more elegant is because your functions can return other functions, which due to function currying and partial application is equivalent to your function simply taking more arguments.

I'll use Python as an example.

    def flip(f): return lambda a,b : f(b,a)
That is, it's a function that takes a function of two arguments, and returns a function that does the same thing, but with the argument order flipped.

To use these function, it would look like:

    flip(f)(a,b)
In Haskell, a direct translation with the lambda would be:

    flip f = \a b -> f b a
The type of which could be written as:

    flip :: (a -> b -> c) -> (b -> a -> c)
That is, a function which takes one argument, that argument is a function that takes two arguments of types a and b and yields a value of type c, and returns a function that takes two arguments of types b and a and yields a value of type c.

We could use this function like:

    (flip f) a b
But due to function precedence we could use

    flip f a b
This segues into an equivalent (and arguably better) way to write the function taking advantage of partial application and currying:

    flip :: (a -> b -> c) -> b -> a -> c
    flip f a b = f b a
Which would be used the exact same way:

    flip f a b
This illustrates the elegance of currying.

    a -> b -> c
Is equivalent to

    a -> (b -> c)


All Haskell functions are curried, so there's no ambiguity.


>Lisp has weird syntax, but lisp has a reason to have that syntax.

Lisp has that syntax because it provides a uniformity to parsing the language, and that's the same thing Haskell offers .

If I write

    a = b + c
You might as well parse that as `a(=(b(+(c))))` with the appropriate semantics.

In that sense, Haskell is just trying to do something like what Lisp does without all the extra parentheses.


That's... actually really neat. So there's no operator precedence?


Yes, there is operator precedence, but it only matters for infix notation. Or rather, function application has highest precedence, so there is no ambiguity using it.

My broader point was that the space is the most common symbol in English because the function that it performs is the most fundamental action of the language. And since Haskell is a functional language, function application is fundamental, and thus you gain a lot of terseness by using spaces to denote it.

Lisp does the same thing: `(+ a b)`, it just puts parentheses around things to make the parsing rules simpler. If you parse it in a different way, you won't need that many parentheses.


Haskell inherits a lot from Standard ML. This paper explains a lot of the design decisions of Haskell, syntax amongst them:

http://haskell.cs.yale.edu/wp-content/uploads/2011/02/histor...


"NB." for comments in J made me laugh.


To me it makes it more obvious that succinct != more readable: (top two) http://rigaux.org/language-study/syntax-across-languages/Fnc...


I'm really missing TeX. Could someone add it?


the seed for random number in CL: (make-random-state &optional state)




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: