Hacker News new | past | comments | ask | show | jobs | submit login
Exotic Programming Ideas: Module Systems (stephendiehl.com)
157 points by rwosync on Nov 13, 2020 | hide | past | favorite | 56 comments



Coming in without much OCaml experience, I don't really think this is a great demonstration of why this construct has value.

I don't really want to read a long form description of the OCaml implementation of modules. I want a comparison to the languages he dismissed at the beginning of the article, and a discussion of why this feature has some value that isn't provided by those languages.

Basically - This feels like a different take on generics to me. There might be a lot of value in how this is implemented when compared to generics in a language like Java/C#/Typescript, but I didn't find that content anywhere in the article...


Say you'd like to have an interface for things that are 'mappable'. For example, for arrays we could write:

    interface Mappable<Array> {
      map<A, B>(f: (a: A) => B, fa: Array<A>): Array<B>
    }
Likewise, for `Promise`s we could write:

    interface Mappable<Promise> {
      map<A, B>(f: (a: A) => B, fa: Promise<A>): Promise<B>
    }
But in order to generalize this interface to an arbitrary type constructor such that `F: * -> *`, we would need to write

    interface Mappable<F> {
      map<A, B>(f: (a: A) => B, fa: ?): ?
    }
which is not possible in TypeScript since it does not support higher-kinded types or type parameters that take type parameters or parametrized modules.


Not possible, but an approximation:

  interface Mappable<P extends Mappable<P, unknown>, T> {
    flatMap<U>(f: (x: T) => Mappable<P, U>): Mappable<P, U>;
  }

  class Maybe<T> implements Mappable<Maybe<unknown>, T> {
    x: T | undefined;

    public flatMap<U>(f: (x: T) => Maybe<U>): Maybe<U> {
        if (this.x) {
            return f(this.x);
        }
        return Maybe.nothing();
    }
  }


Yes, in fact this reminds me of the HKT implementation[1] found in fp-ts[2][3]

    interface HKT<F, A> {
      _URI: F
      _A: A
    }
    
    interface Mappable<F> {
      map<A, B>(f: (a: A) => B, fa: HKT<F, A>): HKT<F, B>
    }
where F is a unique identifier representing the type constructor and A its type parameter.

[1]: https://www.cl.cam.ac.uk/~jdy22/papers/lightweight-higher-ki... [2]: https://github.com/gcanti/fp-ts [3]: https://gist.github.com/gcanti/2b455c5008c2e1674ab3e8d5790cd...


gcanti’s work in Flow and Typescript is amazing and I use it daily :)


Seems like something covered by typeclasses in Haskell, right?


Seems like it, but typeclasses are inherently anti-modular, they provide a globally coherent unique instance. consider the Ord typeclass giving a single ordering for a specific type. To reverse the order you need to create a new type with a new instance of ord which reverses it.

Where in a module system its perfectly fine to have multiple instances of Ord for a given type. One gives global consistency where the other gives local consistency.


When you put it like that, modules sound more powerful than typeclasses. Is there any situation where typeclasses are superior to modules?


I think the extra power comes with cognitive overhead, I.e. operators are now relative to some module, and you're stuck with that overhead whether or not you actually use the power or not.

Trying to stay neutral, I think its no suprise that some people prefer typeclasses...


In addition to @ratmice's comment, check out this post[1] on Existential Type. I've dabbled in Haskell myself with not much experience in ML, so I found it interesting to see how ML modules differ from Haskell typeclasses. Though they seem to be equi-expressive for the most part.

[1]: https://existentialtype.wordpress.com/2011/04/16/modules-mat...


It's absolutely a different approach to generics. Or, rather, that's the ringer. I want to say first: OCaml's take on modules is just a really nice way of doing namespacing as well.

Secondly, generics depend upon (a) having a means to discuss functionality which abstracts over one or more types and certain behaviors those types must support, (b) having a means to bundle up one or more types along with some behaviors, and (c) being able to combine those two.

In Typescript/Java/C# this is mostly carried out by classes and subtyping. Abstraction occurs when we ask not for a specific type but instead for something a little less than that specific type, one of its supertypes; bundling occurs in classes; and the combination occurs naturally as subtypes are transparently upcast to their supertypes.

There are two practical drawbacks to this approach:

First, it's hard to abstract over behavior that doesn't merely consume your abstract type but also returns it. When we do (c) via subclassing we have to upcast and it's not always clear or possible to re-downcast things back to the appropriate type. OO has tons of workarounds for this issue and related ones.

Second, it's hard to abstract over multiple interrelated types at once. For instance, a generic graph implementation might want to be abstract both in the types of nodes and the type of edges. The generic implementation can thus handle annotations at either the edges or the nodes. In OO abstraction, you might do something like have the edges be an associated type of the nodes, but this creates an unnecessary asymmetry.

The solution is a classic one. Instead of having the class represent an object, have the class represent a bundle of operations which act on abstract objects (the C++ vtable approach). For example, in pseudocode

    class GRAPH

      type Graph
      type Node
      type Edge

      # These are hard to do with subclassing since Graph will often be upcast on return
      def emptyGraph(): Graph
      def simplify(g: Graph): Graph

      # These represent non-trivial interactions between multiple types abstracted simultaneously
      def addNode(g: Graph, n: Node): Graph
      def neighbors(g: Graph, n: Node): List<Node>
And this, with the appropriate type discipline, is what OCaml does. Unfortunately, what you'll find is that OCaml's type discipline is critical and difficult to emulate. Making this sort of modularity work consistently involves some notions of equivalences and transparency that are natural to discuss when talking about modules but rarely show up in OO systems.


All the languages you mention have parameterized types, so I don't see why anyone would be tempted to use subtyping rather than generics. The only reason I could see is wanting to parameterize at runtime, but it's not immediately obvious to me that graphs with runtime parameterized edge and nodes are something you'd want on a regular basis. Am I missing some subtlety?


Parameterized types can help here a lot. I didn't want to speak to them too quickly so I blurred a few lines, but it's a good point.

Parametric types help with part (a) by allowing you to specify only part of the structure of your type. That can help enormously, though they also force some amount of concretion in your type which isn't always good. Ultimately, OCaml's module system is pointed in the direction of ad hoc polymorphism where you pass in behavior with your abstracted types.

Subtyping supports this passing of behavior as it lets you specify a whole space of types abstractly. In that way, it's a little more supportive of the pathway to ad hoc polymorphism.


I love this example. I have a feeling that I'll be borrowing it often in the future.


Zig's compile time execution lets you do similar things I believe.

In Zig, structs and modules are equivalent, and type declarations can be manipulated at compile time just like any other value. That, among other things[1], lets you write:

  fn LinkedList(comptime T: type) type {
      return struct {
          pub const Node = struct {
              prev: ?*Node,
              next: ?*Node,
              data: T,
          };
  
          first: ?*Node,
          last:  ?*Node,
          len:   usize,
      };
  }

I wonder if there's anything that OCaml functors can do but this can't.

[1] for example, you can implement a very efficient printf that gives an error at compile time when the format string is invalid. See https://ziglang.org/documentation/master/#comptime for more details.


One thing is that OCaml modules are first class objects, so can be constructed at runtime and passed around as variables. But this Zig example, I think, will cover the most common case of functors for creating generics.

(I'm also not sure if Zig does compile-time check if the type T is compatible with LinkedList. OCaml does this type of check, making sure the signature of T fits the requirement of the functor LinkedList. The module T may have certain functions, for example a compare function associated with T to ensure sorting of the elements in your data structure.)


(Not even close to a zig expert, but I've been playing with it a little bit, find it very interesting, and think what I'm about to say is correct:)

The zig equivalent of a module, a struct, is also a first-class object in zig, so should be the same as OCaml there.

However, Zig does not have anything like a signature -- in that sense, it's "dynamically typed" at compile time (i.e., like type args to C++'s templates, however not nearly as bad because the errors you get are immediate and understandable -- think of it as more like "I fucked up the types and got a python-style error" than I "I fucked up the types and got a C++ templates error").


One limitation is privacy and abstraction. You can hide implementation details with most ML module systems - eg. hiding the underlying type of `Node`. You can also make local definitions in the module private. It's pretty challenging to get this stuff right - there's lots of research about it.

Also, as noted in other comments, Zig's type parameters are also dynamically typed. This leads to implementation details leaking out. This is not an issue in OCaml and other ML-style languages.



Amusingly, I was precisely working today on extra-nice error messages for module type errors in OCaml. :)

My reaction to the title was "But they are not exotic, I use them every day!"

It's definitely the feature I miss the most every time I work in other languages, even presumable "advanced" ones, like Haskell. One notable attempt to add them elsewhere is "modular C"[1].

[1]: http://cmod.gforge.inria.fr/


Have you heard of Backpack for Haskell? https://gitlab.haskell.org/ghc/ghc/-/wikis/backpack


Great read. It's too bad modules are pretty much an afterthought in most langs.

I really like research language 1ML's approach to modules[0]. This allows monomorphic types to be treated like values, avoiding all of OCaml's module syntax (which can be a bit complex and verbose).

https://people.mpi-sws.org/~rossberg/1ml/1ml-extended.pdf


Is there a 'readable' code sample of how that would work?


Not much I don't think. This has a few examples:

https://github.com/rossberg/1ml

There isn't special module syntax. Modules are more-or-less just structs, and the structs can contain types.



Other than the syntax, I’m not sure I understand how this is different from any other object-oriented class definition. I suppose the ability to project the module into the local or top-level scope, but that seems more like syntactic sugar than anything meaningful. What am I missing here?


For me, the value of a module is that you can describe a set of multiple types and the functions that operate on these types, all in one place. In OO, there is essentially one type that is "special", the type of the class you define.

The power of the Ocaml module system is really in the functor, which the article only touches upon. You define a module and refer in its definition to types/functions of other modules that must be provided at the time you construct the module. Even better, you can define relations between the types and do type substitutions, e.g.: to construct this module, you need to give me 2 other modules, each with a specific set of functions, and for which type t1 of the first module, matches type t2 of the second module, while the actual type of t1/t2 does not matter.

See https://dev.realworldocaml.org/functors.html for more examples.


I'm trying to wrap my head around this

> Even better, you can define relations between the types and do type substitutions, e.g.: to construct this module, you need to give me 2 other modules, each with a specific set of functions, and for which type t1 of the first module, matches type t2 of the second module, while the actual type of t1/t2 does not matter.

Isn't this essentially the same as generic type arguments in other languages? Like in this pseudo TypeScript:

    class CustomModule<T1 extends Module1Interface, T2 extends Module2Interface> {
      constructor(t1: T1, t2: T2) { ... }
    }


It's not exactly the same thing because, for example, there is no subtyping or inheritance. In Ocaml you don't say that "type T1 is an Orderable".

However, it does serve a similar purpose. For example, if you want to create a datatype for an OrderedList then you'd create a higher-order-module (functor) that receives as an argument another module containing all the necessary comparison functions for the list elements. For example, if you apply the OrderedList functor to the IntOrdering module the result would be an OrderedListOfInt module that provides an abstract data type implementing an ordered list of integers.

In the Ocaml standard library the names would be different but that's the basic idea.


This makes it seem like modules are strictly inferior, if you can't assign names to common requirements.


I don't see how you come to that conclusion. Assigning a name to a requirement (~ an interface) is the most simple usage of the module. E.g. to define something like IComparable, you could write:

  module type Comparable = sig
    type t
    val compare : t -> t -> int
  end


You can still give a name to the interface. However, interfaces apply to modules, not to types. In Ocaml you'd say "this module implements the Comparable interface" while in an OO languague you'd say "this type is a subtype of Comparable". Sorry for the confusion.


Ah, thanks for the clarification.


In your example, there is no relation between anything in Module1Interface and Module2Interface.

Probably closer would be (not sure if this is possible in Typescript):

    class CustomModule<S, T1 extends Module1Interface<S>, T2 extends Module2Interface<S> > {
      constructor(t1: T1, t2: T2) { ... }
    }
Meaning that, for example, within Module1Inteface, there is some function f1 that returns an S, and within Module2Interface, there is some function f2 that takes an S as argument.

This does become a bit tedious notation-wise, if possible at all. In Ocaml, this would look like:

  module CustomModule(M1: Module1)(M2: Module2 with type s = M1.s)


Yes and no, because a module can be a much more complicated thing than a class. A module allows you to define not just one type but several types and how they interact. In regular OOP you can have a "FooInterface<A, B, C>" where you are defining a type of object, and defining it's behaviors in the context of types A, B, and C. A module defining only one type is pretty much an interface but when you define a module in terms of multiple types it takes on a different shape. While you can probably always replace a module with a series of interfaces (ignoring the part about constructors), those interfaces will be more unwieldy and awkward.

Here's an example. This example is a bit contrived, because I couldn't think of a better example that was simple and yet demonstrated the power of modules. So this example can be re-phrased and re-structured into a more natural OO fit, but try to look past that. Suppose you want to make a module or set of interfaces that describe a classic board game (i.e. Chess or Checkers). So you have a Board. A Board has a series of Pieces, and each Piece has a set of valid moves on the board. And a move can be applied to a Board to modify the state of the game. Again, very much glossing over the details here to get to the meat of it. So you could write a series of interfaces

    interface Board {
        List<Piece> getPieces();
    }
    interface Piece {
        List<Move> getValidMoves(b: Board);
    }
    interface Move {
        void apply(b: Board);
    }
But on it's own that isn't enough, because you don't want to be able to mix-and-match different interfaces for different games, like trying to find the set of valid moves for a chess piece on a checker board. So you need to apply generics.

    interface Board<P> {
        List<P> getPieces();
    }
    interface Piece<B, M> {
        List<M> getValidMoves(b: B);
    }
    interface Move<B> {
        void apply(b: B);
    }
Only that's not enough either, because on it's own these parametric definitions don't enforce that the set of pieces a board returns are actually valid pieces for that game. With constraints you end up with this (in a psuedo-language where you can use a special Self type, I don't know typescript and don't think this can actually be implemented in plain Java)

    interface Board<P extends Piece<Self,Move<Self>>> {
        List<P> getPieces();
    }
    interface Piece<B extends Board<Self,M>, M extends Move<B>> {
        List<M> getValidMoves(b: B);
    }
    interface Move<B extends Board<?>> {
        void apply(b: B);
    }
And so you have a bunch of these weird circular definitions to get these components to play together nicely. Meanwhile, you can define a module for this without using Functors or type substitutions or anything terribly complicated (Note the below is sort of mixing OCaml with more Java-like syntax just because I'm not super familiar with OCaml):

     module type Game = sig
         type Board
         type Piece
         type Move
         val getPieces: Board -> List<Piece>
         val getValidMoves: Piece -> Board -> List<Move>
         val apply: Move -> Board -> unit
    end
And I think that's the really interesting thing you can do with modules that is more awkward with the traditional OOP interfaces. It makes it more natural to talk about multiple different data types all working together.

The only way to implement that as nicely with an OOP interface would be to wrap everything in a top-level object

    interface Game<B, P, M> {
        List<P> getPieces(b: B);
        List<M> getMoves(p: P, b: B);
        void apply(m: M, b: B);
    }
Though now you have to make a singleton Game object and pass that around everywhere you need it, which may or may not be idiomatic or obvious depending on the language and your preferences.


Module signatures may contain types.

So you can write a signature like:

  module type VectorSpace = sig
    module Field : Field
    type t
    val zero : t
    val (+) : t * t -> t
    val (*) : Field.t * t -> t
  end
In OOP it becomes hard to write an interface like this simple example, and more complex examples become harder still.


Can you give an example of the value this would provide that isn't provided by generics?


One thing the example shows is the classic problem with OO's mechanism, namely binary functions. For a given class, something like `+` is impossible to implement elegantly, it has to bias one operand over the other -- after any syntactic sugar, it's always `arg1.op(arg2)`. Module functors resemble true ADTs more, in that they do not dispatch from a live instance of the datatype.


Module parameterization by type is a kind of a big deal. This enables reuse and composability opportunities that can become pretty impressive.

C++ templates can be used in a similar way, but it's not perfect. The closest match I can think of is a template class with static members.

D has one more mechanism that is conceptually very different, but could be used to express somewhat similar patterns: template mixins. These are best described as parameterized template sections of code worh one or more declarations that have to be reasonably well formed on their own and are virtually pasted into the code at the location where they are instantiated. They exist somewhere in the space between templates and C preprocessor macros.


As another answer pointed out, a class in OOP is supposed to implement only one type and operations on it. It works in many cases, but sometimes it does not fit naturally the problem at hand.

Modules relax this single-type requirement, and let you define multiple types in it, which makes certain things more natural to express, without a need to create multiple shallow classes or manager-like classes.

Also, functors (i.e. "module functions") in the OCaml module system allows generics, when a module is essentially parameterized by one or more other modules.


I think Lisp's object system (CLOS) removes this limitation? (Actually a question - I believe this is what multiple dispatch does, but not certain)


I'm not really talking about multiple dispatch, but more like a module that store two or more types, like:

    module Data = struct
      type pair = int * float
      type numbers = int list
      
      let make_pair () = (0, 0.0)
      let make_numbers () = []
    end
This example is not very illustrative, but just explain the idea what I mean. We have two constructors: make_pair and make_numbers. So in a way, we can have multiple types in the module, if they are meant to be tightly related. We are not forced to make three classes here (Pair, Numbers, Data), everything is in one module.

EDIT: OOP classes have a primary type in your class, and sometimes this creates artificial chicken or egg type of problem, where you cannot decide what is more fundamental (message vs receiver vs sender). Modules don't force this on you.


Except where first-class modules are used, the application of a functor to a module happens at at (or before) compile-time. You can have a fully resolved type before runtime, but which can still be abstract at design time. They can be seen as a zero-cost abstraction.

Modules are structurally typed, so the more typical sub-typing behavior expected from mainstream OOP languages is not applicable to them.


Module functors and first-class modules are probably the big feature. Those are more akin to allowing functions between classes, rather than just objects.


A module, by definition, is a singleton (for parametrized modules, a “once per type” object)

C has (unparametrized) modules, calling them “compilation unit”.


On the topic of modules, I recommend reading about modules in Newspeak: https://bracha.org/newspeak-modules.pdf

Modules (which are just top level classes and contain nested classes) have no import statement and no hard linked external dependencies. When you instantiate the module (~class) you pass in dependencies it needs.


> Modules as a language feature were first developed in Modula-2 and Pascal, which were developed as a way to demarcate units of compilation.

Actually Mesa.


Right. Wirth developed Modula after his first PARC sabbatical, where he worked with the Alto and Mesa. Modula supported even nested modules. And Pascal didn't have modules at all.


You can do something similar in Rust to ML modules using traits and generic impls as module functors!

https://play.rust-lang.org/?version=stable&mode=debug&editio...

This can be really useful especially as traits with differing concrete types diverge, you can create a unified interface trait object to allow trait objects for things like container classes.


Looking forward to the series! I hope, in one of his "presents", he talks about Lisp conditions and signals, which have also inspired PL/I conditions, and go really back to the idea of an error handler in an operating system.

Unfortunately, Unix (and C) really botched signals by limiting their number in the user space (in particular, they cannot be stacked), and so the idea largely fell out of favor as an error-handling paradigm.


I worked for Prime (minicomputers) as a youngster in the early 80's. They supported PL/I and many parts of the OS (Primos) were written in a systems dialect of PL/I they called PLP.

The PL/I condition mechanism was an integral part of the OS and fault handling. For example, a floating point exception or integer overflow was detected by hardware, caused a fault, the information from the fault was packaged into a condition frame (an extended stack frame), and a condition was raised. The OS looked backward through the stack looking for an "onunit" (aka condition handler) that handled this condition - basically like try & except.

Condition handlers were very flexible, with the option to partially handle the condition then let it continue with older frames, ignore the condition in this handler and let it continue, or completely handle it.

If no running program had a handler for a condition, the default OS handler would run, which often raised a new condition and printed an error message. If the problem causing the error was corrected (for example, some files were deleted from a full disk), you could use the REN (re-enter) command to return from the disk full condition and the system call causing the condition would be automatically re-tried, sort of like when EINTR occurs on a system call and it has to be (manually) retried.

Instead of using numbers, conditions had arbitrary names and arbitrary data could be associated with a condition.

In hindsight it was very similar to Python's try/except/finally error handling mechanism. There was a condition called "cleanup$", which was executed whenever a procedure was aborted because of a stack unwind to allow it to close files, delete temp files, etc.

I wrote an emulator for the Prime and you can see all this with:

telnet em.prirun.com 8001

There are all kinds of manuals online too, including a full PL/I implementation, all from the 80's and early 90's.


One thing I like about Perl is the module system.

A big helper is when the module interface uses named parameters and has sensible defaults for unspecified parameters. This allows the module designer to add features without breaking existing code and makes it easier for someone to integrate the module in the first place. Having the documentation built into the module itself is also a huge win.


I feel like I keep saying this every single PLT-related HN thread. Sorry for preaching, but I just can't resist. I think Agda is an excellent programming language, and it's a joy to write anything in. Its learning curve is steep at the beginning but once you gain experience how to write coinfinite programs, it's such a joy.

I think its module system is a huge bonus. It makes code a lot easier to organize and reason. It forces toplevel module X to be defined in X.agda so there is no argument what the module should be named or where it should be found.


I'm not your goto java guy, but the module system of java and how modules can provide implementations for interfaces is really great.


Exotic Modules System looks like new C++


Nice (altered) Anna Karenina quote.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: