Hacker News new | past | comments | ask | show | jobs | submit login

Nice article. I especially liked:

> program3 fails because runFunction can only run first-order functions and runFunction is a second-order function – a function that takes a first-order function as a parameter.

Here I had no idea that JavaScript implicitly typed `runFunction` that way. That's cool.

Also, I never thought of "higher-order functions" as breaking into a countable hierarchy of nth-order functions, which is an interesting thought. In Haskell the hierarchy would start off as

    zerothOrderFunction :: a
    firstOrderFunction :: a -> b
    SecondOrderFunction :: a -> (b -> c)
    SecondOrderFunction' :: (a -> b) -> c
In general, an nth-order function is a function which either

1. Takes an (n-1)th order function as an input and returns a simple type.

2. Takes a simple type as an input and returns an (n-1)th order function as an output.

The upshot is that typed programming languages not only catch bugs, but prevent you from legally expressing many non-sensical expressions (analogous to the set theory paradoxes). For example, consider the expression

    (\x -> x x) (\x -> x x)
If you expand this expression, it reduces to itself:

    (\x -> x x) (\x -> x x) == (\x -> x x) (\x -> x x)
In a purely untyped language, this expression would be legal. But what would be its meaning? Arguably, it is non-sense, and should be excluded from the set of legally expressible expressions. The way to do this is through a type system. And indeed, if you typed this statement into a Haskell REPL you would get a type error; this statement can actually be proven to be untypable (IIRC).

On the other hand, this means that typed systems are in some sense strictly less expressive than their untyped counterparts. It would therefore be interesting if somebody found an expression which was both (i) meaningful and (ii) only expressible in an untyped language. You would then have an argument for untyped languages :-)




> > program3 fails because runFunction can only run first-order functions and runFunction is a second-order function – a function that takes a first-order function as a parameter. > Here I had no idea that JavaScript implicitly typed `runFunction` that way. That's cool.

In case it's not clear, it's just that it eventually ends up with a run time error (the talk about runFunction being a second-order function is just an explanation for why you should expect it to end up with a run time error). The evaluation is

       program3()
    == runFunction(runFunction)
    == runFunction(1)
    == 1(1)
    == Error: func (with value 1) is not a function


Depending on your system, 1(1) is quite sensible, That is applying 1 to a list of parameters just returns the 1st parameter. This is a matter of the semantics that one uses when application is run against any value.

1(1) -> 1

Just because the common idea of application is that it only applies to functions does not mean that application applied to other types is nonsensical and should end up with a runtime error.

If your language allows application to be defined for different types, then the type system should be capable of determining the type that returns from application.

A practical example of a language in which application is valid for integers is Unicon/Icon. Both functions and integers have specific semantics with regards to application. And failure is an option.


I see. The error makes sense now.


    foo :: a -> (b -> c)
Not saying it's wrong, but "foo is a second-order function" is a distinctly minority opinion.

To a useful first approximation, Haskell works as a cartesian-closed category, which basically means we have tuples and that tuples harmonize with functions so that foo behaves just like

    uncurry foo :: (a,b) -> c
which is a first-order function.

So the majority opinion calls foo a first-order arity-2 function.

Here's a bar function that's truly second-order:

    bar :: (a -> b) -> c
In general, the order of a function is the count of parens nesting the leftmost innermost arrow. It is invariant to currying/uncurrying as well as permutation of arguments.

The same counting principle applies to other invariants such as ranks and kinds.

"How high can the order of a function go?" is the question explored in this blog post [0]. Note how the author's definition of order -- essentially the same given here -- got left and right mixed up.

[0] http://math.andrej.com/2006/03/21/interesting-higher-order-f...


I think it is not just a minority opinion, but just plain wrong.


> On the other hand, this means that typed systems are in some sense strictly less expressive than their untyped counterparts. It would therefore be interesting if somebody found an expression which was both (i) meaningful and (ii) only expressible in an untyped language. You would then have an argument for untyped languages :-)

Here's an example: The W combinator `\f x -> f x x` can be expressed in Haskell but it doesn't work everywhere it should, unlike in an untyped universe:

http://us5.campaign-archive1.com/?u=4937a9a2eb9eee0a26e1e0a2...

In other words, types can sometimes make you lose reuse.

That's amply compensated by types removing massive amounts of junk in the untyped world. If you don't believe me, try hacking in an untyped calculus sometime.


> Here's an example: The W combinator `\f x -> f x x` can be expressed in Haskell but it doesn't work everywhere it should, unlike an untyped universe:

That's an example in one type system, not a feature of all type systems. There's a good argument that typing is more expressive than dynamically typed languages: you can always add a dynamic type to your style system and recover all dynamically typed language features, but you can't go the other way around and recover all the features of a statically typed language. For instance, the performance benefits.


>On the other hand, this means that typed systems are in some sense strictly less expressive than their untyped counterparts. It would therefore be interesting if somebody found an expression which was both (i) meaningful and (ii) only expressible in an untyped language. You would then have an argument for untyped languages :-)

Any type system will reject some correct program. As a trivial example, consider:

    if( f() ):
        1 + "a"
    else
        1
This is a perfectly fine expression iff the result of f() is always false.


Interesting. It could still be argued this program isn't meaningful (or perhaps "not useful").


At this point, you start hitting a problem from fundamental computation. When are 2 programs equivalent? Is it when they behave equivalently in any implementation or is it when there is a chain of rewriting rules that change one to the other.

The issue with the first definition is 2-fold. What set of implementations are you qualifying over, and what to do with weird programs that crash occasionally. This is similar to what happens in the given example. The program is equivalent to 'return 1' when f is always false. Otherwise, it might crash.

In the second definition, 'return 1' and the given program are just different programs, until you actually substitute an f and continue reduction.


Well, it is a trivial example. In reality, such things would be hidden in the complexity of the code. One could argue that if the correctness of the code cannot be accepted by the type system then it would also be confusing for a human to look at, and should therefor be refactored; which is why (in practice) this is a non issue.

For a less trivial example, consider the expression "f(x) + g(x)" where both f and g can return either a number or a string. It is conceivable that for any given x, they would always return the same type, but a type system cannot detect every such case.

Or consider a program like:

    x = 1
    print_int(x)
    x="a"
    print_string(x)
I have seen code just like this in dynamically typed languages. You can also run into a simmilar situation that is not obviously correct if, for example, x is a global variable while the program is running a state machine. You can then view each state a transitioning the type of x. Determining if such a program is correct would involve knowing each state transition, what type of x is expected when control enters a state, and what type x is when control leaves a state (and a given state might have multiple output (or even input) types. I have seen one program that actually worked like this; and it was not fun to modify.

Of course, a type system is no guarentee that you do not have this sort of problem. For example, you could say, in the first example, that x is a union type of Int and String (and print_int and print_string are defined to operate on such a union). In this case, the program is well typed, but has partial functions, so the type system cannot guarantee that there will not be a runtime error when you call them (it won't, however, be a type error, since the type system missed it).

You could also have x be an untagged union. In this case, I don't see a way of doing any better than undefined behavior.

A completely different type of example, is C code such as:

    *0x01 = 0
which means "set the memory at address 1 to 0", but is poorly typed.


    "f(x) + g(x)"
This example is way too trivial to explain why type systems are generally undecidable. Let's imagine:

    def f/g(x):
      if isinstance(x, str):
       return 6
      elif isinstance(x, int):
       return 'a'
It is decidable for any type of x that is previously known. More importantly, type systems were essentially created to be able to prove whether a given expression is decidable or not.

    x = 1
    print_int(x)
    x="a"
    print_string(x)
This is solved by A-normal form (https://en.wikipedia.org/wiki/A-normal_form). Or by a more general concept that is available outside functional languages, SSA (https://en.wikipedia.org/wiki/Static_single_assignment_form).

You can make a point that this code is hard to modify in dynamic languages, but it is an issue with the code style rather than type systems. For example - IntelliJ IDEA would highlight such Pythonic code as incorrectly typed, since you have changed the type and therefore the variable itself.

I find the general argument against complexities in the type systems such as Haskell to be quite amusing, to be fair. Most of the time you are not dealing with code that is undecidable - and that's why type systems are interesting.


the real world is not strictly typed, i think types are for the compiler rather then humans. Im however a big fan of runtime bound checking so that the program always have a sane state.


> For a less trivial example, consider the expression "f(x) + g(x)" where both f and g can return either a number or a string. It is conceivable that for any given x, they would always return the same type, but a type system cannot detect every such case.

You can encode this quite easily in a language with a dependent types.


for the case of

    x = 1
    print_int(x)
    x="a"
    print_string(x)
the type system could simply instantiate a new variable, x: string, that shadowed the old x: int. this is perfectly valid ocaml, for instance:

    let x = 1 in
    Printf.printf "%d\n" (x + 1);
    let x = "hello" in
    print_endline (x ^ " world")


You could, but that seems like a transformation of the program itself, not a feature of the type system.


the point is that just because two variables have the same name doesn't mean they are the same variable; you need a (name, scope) pair to fully disambiguate them. some languages have the scope of a name be the enclosing function, so that all references to x within a function body are the same x, and the type therefore attaches to (x, function). however that is not the only way to do it; in ocaml the let statement rebinding x introduces a new scope, so that (x, line 1..) is a genuinely different variable from (x, line 3..), and has a new type. you can even say something like

let x = string-of-int(x)

and the rhs will take the value of x (a well-typed integer variable from the previous scope, and the lhs will introduce a new x (a well-typed string variable) in the newly created scope.

this is an orthogonal thing to static/dynamic typing, incidentally; for instance in ruby you can say

    a = 10
    (3..5).each do |a|
      puts a
    end
    puts a
and you will get

  3
  4
  5
  10
the a within the do loop being a new variable that happened to have the same name as the a in the outer scope, but referring to a different actual variable.


Yes, my point is that the fact that "x" refers to two different variables is a feature of the language, not the type system.


    f(x) + g(x)
For any type of x is enumerable. What that means is that for this specific expression, you can enumerate all of the types that x may take, and then enumerate all of the types that a function would return, given the types (if you can). You can then decide whether this expression is true or not.

Now it depends on the implementation of f and g. And type systems, in most of the practical cases - would be able to deal with this.

    x = 1
    print_int(x)
    x="a"
    print_string(x)
The second example is always decidable, if we agree that semantics of '=' are generally a question of equivalence.


I think the argument for untyped languages is their simplicity. There are lots of things that are trivial to express in a dynamic language but that require extra features on the type system to be able to write on a statically typed language (parametric polymorphism, subtype polymorphism, datatype introspection, and so on). Sure, there are type systems for all of these things but it is hard to have a single type system that does everything at once because things quickly become very complicated and type inference gets harder.


Your definition of order is not the standard one!

Usually one would have:

    zerothOrderFunction :: a
    firstOrderFunction :: a -> b
    firstOrderFunction' :: a -> (b -> c)
    SecondOrderFunction :: (a -> b) -> c
After all, the functions firstOrderFunction and firstOrderFunction' are the same up to currying.

The standard definition of order is given here (at the end of the section):

https://en.wikipedia.org/wiki/Simply_typed_lambda_calculus#T...


> On the other hand, this means that typed systems are in some sense strictly less expressive than their untyped counterparts.

This happens quite often. Take datastructures. It is often very useful to structure different types of data together in a common structure. Now in most cases, you don't know in advance which type will go where in the datastructure, as the data might only be known or derived at runtime.

In fact I'd say dynamic behavior is the major expressiveness loss. Granted, I'm not sure how runtime vs static type system relate to type theory. Anyone can educate?


> It is often very useful to structure different types of data together in a common structure.

"Very useful" often just means: coding a properly typed, easily-understandable solution takes longer than doing it untyped.

I argue that is the case only when using a plain-text editor. When you have a good IDE for a typed-language at hand that can refactor, complete, analyed and follow code on the press of a button, you lose this "advantage" of untyped languages.


How do you define expressiveness? To me, it means saying more with less. Applied to code it means doing more with less work. Which directly translates to productivity gains.

Now, what IDE and language are you referring too? The best one I've used was VisualStudio for C#. While I'd say typing wasn't adding too much overhead in it, its type system is also poor, and it's arguable that it may not really prevent much bugs. Now, it does help make C# faster.


IDEA with Kotlin or Java is fantastic.


It’s partly just coincidence that dynamically typed languages tend to have less expressive type systems. You could run any typechecker at runtime if you wanted. There’s no reason you couldn’t have a Python-like language where the “list” type dynamically keeps track of the types of its elements, function types keep track of the types of their inputs and outputs, and so on, to give you more precise & useful dynamic type errors. That would let you do some interesting things like creating a type from runtime data, such as a database schema.

It’s just that historically people have found it markedly more useful to have that information available statically, because it lets you prove things about your code that hold “for all” cases (universals), not just “for some” that are executed (existentials). And static information can be used for optimisations because of those guarantees—if you have proof, you don’t need a fallback.


Can you explain what you mean by comparing universal and existential quantification to dynamic languages? Existential quantification is often used in statically typed languages and can be valuable in some cases (no pun intended). In Rust, for instance, you can say fn f() -> impl Fn(u8) -> u8 or something similar. This is existential quantification, because you're saying that f returns a type that is some Fn(u8) -> u8. This can expressiveness can be powerful in a type system. I don't see how it relates to dynamic languages.


I’m not referring to existentially quantified types—which are definitely useful & interesting in statically typed languages. What I mean is that with static types you can prove universal properties of functions.

If you give me an unknown function f : ∀T. T → T in a statically typed language, then I know it always returns a value of the same type as its argument. With some assumptions (purity, parametricity, halting) I even know it must be the identity function.

Whereas doing what I’m suggesting in a dynamically typed language, running inference dynamically, I could only recover existential properties: that there are some types for which f returns a value of the same type as its input. So I could recover more and more information (in the form of an intersection type) by calling the function with different inputs:

f : int → int

f : (int → int) ∧ (string → string)

f : (int → int) ∧ (string → string) ∧ (Foo → Foo)

But I could never arrive at ∀T. T → T, because that’s essentially an infinite intersection.

ETA: actually I might not even be able to get this much information; I only know the output type for each input value, not each input type.


I see what you were getting at now, thanks.


You create a reference cell containing an empty list. Because the cell is mutable, it can't have a polymorphic type - it must have a monomorphic one. How do you determine at runtime the type of this cell, before the first time you mutate it?


What I’m suggesting is that you could literally just run HM or another typechecker at runtime. You’d give an empty list the type “list<T>” for a fresh type variable T; appending an integer would introduce the constraint T = int; appending a string would introduce T = string and raise a runtime error. (Or not—it’s up to the typechecker if it wants to degrade to “int|string” or something.)


At runtime, the original program (i.e., the syntax tree, or some other representation from which the syntax tree is recoverable) may well not exist anymore. If you don't have a syntax tree, you can't type-check anything.

Now, you may say “okay, the original program doesn't exist anymore, but my language implementation is a term rewriting engine, so I have a current program that I can submit to the type checker”. Alas, most type checkers, Hindley-Milner included, operate on self-contained program fragments (i.e., not containing unbound variables), so you can't just submit any arbitrary program fragment. And, if you didn't want to type-check your original program, how likely is it that you will want to type-check the entire syntactic representation of the current program state?


Who says you need a syntax tree? You just store a byte of metadata next to each variable with the type. There should be more than enough data from that to recover the parts that you can't store like that and do type inference on them.


Why is that required? There's nothing stopping you from only giving the list a precise type after an element has been added. This mirrors the way statically typed languages with inference handle the same situation (except they wait for evidence that it's being used as a list of Ts, instead of the actual act).


> Why is that required?

Say, because you want to use that reference cell in two different threads.


I think I see what you’re getting at. If I create an empty mutable list and send a reference to it to two different threads A and B, where A adds an integer to the list and B adds a string, then it’s nondeterministic which thread raises a type error.

And degrading the type to a union (int ∨ string) as I mentioned is still nondeterministic even though unions form a lattice—I can recover determinism when writing to the list in both A and B, but supposing A writes first, B writes second, and then A reads expecting only integers, it gets a type error because it hasn’t checked for the string case.

But I’d argue that 1. this is bad form and you want a type error somewhere and 2. you have this problem already with type-changing mutable references in a dynamically typed multithreaded language. (Say just “int” and “string” instead of “list of int” and “list of string”.)


I need a container type that can be empty, because I need the monomorphic type of the cell to be undecided until the reference has been handed out to the worker threads.


If your program doesn't have a race condition, then you're fine. And if it does, then the list being monomorphic is not going to help (try your scenario with a C++ vector or Java ArrayList).


If your standard for a possible programming language that supports concurrency is that no data races are possible, then there's very few languages to be had.


This particular discussion wasn't about ruling out data races (although, of course, that is important too). It was about ruling out trying to read a list of strings from a reference cell containing a list of ints.


You're the one who brought up threads as a reason runtime monomorphisation doesn't work. If access to the list is properly synchronized, then what I proposed works just fine (you get your runtime error in whatever thread accesses the list second). If it's not, then unless you're using a concurrent data structure you're already screwed. The type system will at most change exactly how you get screwed.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: