I'm still not quite sure what GADTs are (after skimming the WP article, reading this article, and skimming the code in it) but very surprisingly to me, it looks like OCaml's GADT implementation tackles a lot of the memory fragmentation problems that come with object-graph-heap memory-model languages like Lisp, Python, Java, previously OCaml, JavaScript, and so on, in exchange for a little more type declaration overhead. (Although, in my small experience, debugging OCaml type errors usually involves adding some redundant type declarations anyway.)
This is really interesting and I hope I understand it someday soon!
Generalized Algebraic Data Types, GADTs, are algebraic data types which include both (a) existential types and (b) type equalities. The combination allows for a certain kind of interaction between the specific kind of value within a type and the type itself.
For instance, we might have a little expression language using normal ADTs [0]
type exp =
| Int of int -- integer constants
| Bool of bool -- bool constants
| Add of exp * exp -- addition
| Neg of exp -- negation
| Ifte of exp * exp * exp -- if then else
| Lte of exp * exp -- less than or equal to
but there's no particular guarantee that a value of `exp` is well-typed. For instance, the following is a valid value of `exp`
let x : exp = Add (Int 3, Bool true)
GADTs allow us to fix this situation by reflecting the type of the expression value into the type of the expression. Here's some syntax
type _ exp =
| Int : int -> int exp
| Bool : bool -> bool exp
| Add : int exp * int exp -> int exp
| Neg : int exp * int exp
| Ifte : bool exp * 'a exp * 'a exp -> 'a exp
| Lte : int exp * int exp -> bool exp
The way to read this syntax is to see it as giving the "type" of each constructor. Not only do we now parameterize the `exp` type with an index type corresponding to the type of the value this expression denotes, we also create very non-trivial type equations like
Ifte : bool exp -> 'a exp -> 'a exp -> 'a exp
and
Lte : int exp -> int exp -> bool exp
The upshot is that while the type of an interpreter for the first exp would have to look like
type value = VInt of int | VBool of bool
val interpret : exp -> value option
the type of the second `exp` is so constrained as to eliminate (a) the need for the separate `value` type and (b) the possibility of failure:
val interpret : 'a exp -> 'a
let interpret = function
| Int i -> i
| Bool b -> b
| Add (i1, i2) -> interpret i1 + interpret i2
| Neg i -> - (interpret i)
| Ifte (c, t, e) -> if interpret c then interpret t else interpret e
| Lte (i1, i2) -> interpret i1 <= interpret i2
Such a construction is known to be type-error free because GADTs allow us to ensure that type constructors follow certain type equations and it's possible to write such a clean evaluator because GADTs keep around enough type information so that the compiler can "trust" them.
Finally, the short answer of how they work is something like they interpret a constructor like
Ifte : bool exp * 'a exp * 'a exp -> 'a exp
as one with some existential types
Ifte : ... | 'x exp * 'a exp * 'a exp 'a ex
and some type equalities
Ifte : 'x == int | ...
then when typechecking it just has to collect all the unique new type variables and solve the system of equalities which arises.
[0] Yeah, the author pooh poohs at examples like this, but I think they're really particularly clear.
I've come to think that these toy interpreter examples are a bit questionable unless they contain a variadic structure like an arbitrary tuple - they give the impression that programming with GADTs is simpler than it really is.
Problem with toy AST examples is they're great if you're doing something like that, but don't give much motivation if you're not. (Unless you starting seeing all programming as AST manipulation, which can be an interesting viewpoint.)
More important that using GADTs for one thing or another is to start looking at the type checker as something you use to assist you in writing the kind of code you want to write, and avoiding the kind of code you want to avoid. GADTs are one amoung many useful tools in the modern toolbox for that.
That’s a really fascinating post! Thank you! I’m still not sure I understand GADTs (in part, here, because I know very little Haskell indeed) but this helps.
Do GADTs allow Turing-complete computation at compile time, like C++ templates? Are there limits to what kind of correctness properties you can use them to get compile-time machine-checked proofs of?
GADTs by themselves do not allow arbitrary computation[1]. But Haskell also allows you to write functions at the type-level ("type families" in the Haskell jargon), so you can directly write functions to compute types without any trickery, and the Propeller example uses this feature.
GADTs let you write down basically the same things that you could write as an inductive datatype in a dependent language like Coq and Agda. So (with some patience) you can make a function return a "proof" of anything that can be witnessed by a first-order inductive type, e.g. equations between expressions, list membership, proofs that a term in an object-language is welltyped, etc. The main limitation is that Haskell doesn't have any termination checking, so if you consider it as a dependently typed language there is no way to write a trustworthy proof that uses induction. Also its not very user-friendly, see [2] for some examples.
[1] Basically, GADTs are equivalent to having equations between types. If all the equations are written out in full, it is reasonably straightforward to check whether two types are provably equal. If you want to do type _inference_, i.e. figure out what equality assumptions a given function should make in order to become welltyped, then you run into various undecidable problems (e.g. "rigid E-unification"). My guess would be that if you wrote down a completely general typesystem featuring GADTs, then you would be able to encode arbitrary computations as type-inference problems, but the encoding would be very clumsy. The Haskell and Ocaml implementations deliberately don't try to be very general in the type inference, and instead try to pick a decidable subset, so the programmer has to add enough type annotations to guide it.
data Rec (f :: k -> *) (ts :: [k]) where
RNil :: Rec f '[]
(:&) :: f t -> Rec f ts -> Rec f (t ': ts)
infixr 9 :&
data ElField f where
Field :: KnownSymbol s => t -> ElField '(s, t)
x :: Rec ElField ['("foo", Int), '("bar", String)]
x = Field 3 :& Field "bar" :& RNil
It's an embedding of (variadic) typed records.
Anyway, I don't actually use GADT-driven embedded ASTs myself. They're just a pretty clear example.
[0] This is Jon Sterling's `vinyl` package in Haskell
I've been experimenting with a way to combine GADTs with polymorphic variants, for example to give invariants that all fields of a record use the same polymorphic variant tag (even if the type parameters are different).
For example to have the validation status as a parameter of a record, and have functions that only accept validated records (you could achieve something similar and probably simpler with functors):
type validation = [`Valid | `Invalid]
type valid = [`Valid]
module V : sig
type ('a, 'b) t
val of_string : validate:(string -> 'a) -> string -> ('a, [> validation]) t
val valid_or_exn : ('a, validation) t -> ('a, [> valid]) t
val valid : ('a, valid) t -> 'a
end = struct
type (_,_) typ =
| Invalid: exn -> ('a, [> `Invalid] as 'b) typ
| Valid: 'a -> ('a, [> `Valid] as 'b) typ
type ('a, 'b) t = { raw: string; value: ('a, 'b) typ }
let of_string ~validate raw = { raw; value = try Valid (validate raw) with e -> Invalid e }
let valid_or_exn s = match s.value with
| Valid _ as v -> { s with value = v }
| Invalid e -> raise e
let valid ({ value = Valid v; raw = _ } : ('a, valid) t) = v
end
type 'b t = {
x: (int, 'b) V.t;
y: (string, 'b) V.t;
}
let (v:valid t) =
let raw = {
x = V.of_string ~validate:int_of_string "4";
y = V.of_string ~validate:String.escaped "test\n"
} in
{ x = V.valid_or_exn raw.x; y = V.valid_or_exn raw.y }
Looks like the compiler knows that the 'valid' function can't take 'Invalid' types given the proper type annotations on the polymorphic variant (without type annotations it gives a warning that the match is not exhaustive).
Thank you for explaining! While I think I understand how your example works and how it's useful, I think I need to read more to understand what “existential types” are, since I’m pretty sure you’re not talking about dudes like Camus. And I’m not totally sure I understand them well enough to write a type checker for them. Which I think means I still don’t really understand GADTs.
Universal types occur in polymorphic types. Like length
val length : 'a list -> int
It works for all values 'a. Really, this is better denoted by an explicit binder
val length : forall a . a list -> int
and this is what the preceeding quote implies---this variable is "bound" by a universal binder. What it means is that the user of `length` gets to make a choice about what the actual value of `a` is.
"Existential types" is short for "existentially bound type variables". For this we might write something like
val x : exists a . a list
which indicates that we have a list containing some values of type `a`... for some, unknown type `a`! In other words, the following are both, simultaneously valid values at type `x`
let x : exists a . a list = [1; 2; 3]
let x : exists a . a list = ["foo"; "bar"; "baz]
The type system has "forgotten" what the type `a` is.
In GADT syntax we write these like
type ex = Ex : 'a -> ex
Because the type `'a` does not get exposed in the final `ex` type it must be "forgotten" and therefore existentially bound. To make it more clear we can determine the type of the following function:
let build a = Ex a
which is
val build : 'a -> ex
and should really drive home the idea of just "forgetting" a type.
At runtime, GADTs are actually represented the same way as plain ADTs (that is, a tagged union). The tighter types make it possible to remove some unused branches in pattern matching, so you may remove a few tests, but memory-wise it should be the same.
In my experience, a memory win takes place because you can use something lighter weight (ADTs) to model your data as opposed to something that consumes additional memory (objects or a record of bound functions) simply to get the type system to accept your code.
Furthermore, when using ADTs to store your data (and constructing modules that operate on them) I believe the resolution to the functions that operate on them are entirely done at compile time, not run time - whether or not that is being capitalized on is a different story, but I believe it has benefits now and will have even more later as the compiler authors continue to add more optimizations.
I can't tell if you are serious or not. OP goes through them by example.
Original type:
type 'a t = | Array of 'a array
| Bytes of bytes
GADT'd (same functionality as before):
type 'a t = | Array : 'a array -> 'a t
| Bytes : bytes -> 'a t
Now using power of GADT to add constraint:
type 'a t = | Array : 'a array -> 'a t
| Bytes : bytes -> char t
Just jumping into it seems like a fine choice to make. Allows for a (narrow) introduction to GADTs without getting bogged down in their full complexity/limitations. Shows that it's useful before/without a theoretical discussion.
Still, I wouldn't have minded a little discussion introducing GADTs.
The specific performance aspect is, essentially, immaterial. The idea, in over-simple terms, is say you have Foo<T> but for say T=X there is a faster specialized way to implement Foo<X>, well you can do this in the type system of Ocaml via GADTs and still write generic type-safe code for Foo<T> and the compiler does the right thing when when it happens that T=X.(+)
So you get to have your cake and eat it too.
(+) without brittle, verbose, horrific re-factoring required to do similar things in simpler type systems.
Yes, having a faster specialized way to implement Foo<X> is great. But the reason it's great is "the specific performance aspect". If the cost of using the faster specialized way is, say, massive interpretive overhead, then it may not actually be faster, and then there's no reason to use it.
GADTs actually seem to go in the other direction, though — they seem to enable the compiler to prove more at compile-time and thus generate code with less run-time tag checking.
But some hard performance numbers would go a long way toward proving that.
Especially since GADTs don't do too well with type inference. My take on "premature optimization is the root of all evil" is that I will take any "cheap" premature optimization which doesn't impact readability, but I'd like to be sure there is an actual performance improvement first.
This is really interesting and I hope I understand it someday soon!