Memories: Edinburgh ML to Standard ML

azdavis · on Oct 5, 2022

I’ve been working on a language server for Standard ML called Millet: https://azdavis.net/posts/millet/

There was some past discussion about it on HN: https://news.ycombinator.com/item?id=32512715

For my part, I think it’s unfortunate that virtually no mainstream language has a formal semantics in the style of SML’s Definition. I wrote a bit about formal programming language semantics in a series of posts: https://azdavis.net/posts/define-pl-01/

zasdffaa · on Oct 5, 2022

I've had a quick look over your posts, and I will look over the more carefully as a refresher on these kinds of things (I have looked at Hoare logic which this resembles, and a couple of other related things). The problem is to me, the formal semantics aren't actually very different from what you would write in English, and the real problem, and this is never addressed, how do I use this (which includes: how do I manipulate these things symbolically to some useful end)? The reason I don't get more deeply stuck into these formal semantics is because they seem to have no way for me to actually do anything useful with them. This lack of a 'bridge' is a huge problem to me because I can't justify learning that which appears of no use.

Any comments appreciated, and don't be put off by the negative tone of this, it's just frustration.

azdavis · on Oct 5, 2022

The biggest reason why you'd want a formal semantics for your PL, as opposed to just a "human" language specification, is so that you can do proofs about the formal semantics. You can prove your PL, as defined, satisfies certain "safety" properties. And the point of doing these proofs is is to check that the definition of the PL "makes sense".

PLs that don't have a formal semantics (aka, most mainstream PLs) are more likely to be in situations where implementors realize, after the standard has been written, that e.g. some of the PL's features interact in a way that doesn't make sense, such as this example with C++ coroutines and lambdas: https://news.ycombinator.com/item?id=33084431

Two of the most common safety properties of interest are progress and preservation. (I touched on this in the last above linked post near the bottom.) At a high level:

1. Progress states that if you have a program that type-checks, then either that program is "done" or it can continue evaluating.

2. Preservation states that if you have a program that type-checks and that can continue evaluating, then as it continues to evaluate, it continues to type-check.

Note that the conclusion of preservation "feeds back" into progress: the program type-checks. And vice versa: progress may state as its conclusion that the program can continue evaluating, which then lets you apply preservation. This means you can keep applying the progress and preservation theorems in a "loop" until the program is done evaluating.

For each of the 4 posts in my series about formal semantics, I duly translated the rules presented in the blog post into Lean code, and then proved that the rules do satisfy the safety properties. For example, for the first post linked above:

- The syntax of the language: https://github.com/azdavis/hatsugen/blob/part-01/src/syntax....

- The static semantics (the "typechecker"): https://github.com/azdavis/hatsugen/blob/part-01/src/statics...

- The dynamic semantics (the "runtime"): https://github.com/azdavis/hatsugen/blob/part-01/src/dynamic...

- The proofs of safety: https://github.com/azdavis/hatsugen/blob/part-01/src/safety....

zasdffaa · on Oct 6, 2022

I have a background in formal methods and PLs, and never used it. I understand the supposed benefits but IME they never got used.

Clearly you have used it, so I'll take time looking over your stuff (though I don't know lean).

I appreciate you taking the time to write a comprehensive answer, as I said, I'll try to do it justice!

eatonphil · on Oct 5, 2022

The section on team size and funding w.r.t. Poly/ML, SML/NJ and OCaml is something.

Regarding Poly/ML (I'm quoting):

  * good performance and fantastic debugging tools
  * not merely “saving an image”, but sharable executable units, either including the Read-Eval-Print loop or standalone;
  * support for one hardware architecture after another (including Apple Silicon)
  * support for multi-threading (also here); they say that OCaml is finally catching up, 15 years later, thanks to having 100 times as much funding.

And:

> A mystery I shall never understand is the difference in performance between SML/NJ and Poly/ML. The former enjoyed vastly greater funding and had a strong team of developers compared with DCJM’s one-man show. Benchmarks I ran for my ML book consistently gave SML/NJ a big performance advantage. But with Isabelle, the strong performer was Poly/ML. Once David got parallelism working, Poly/ML’s advantage was so overwhelming that we dropped our long-standing policy of supporting both compilers.

frou_dh · on Oct 5, 2022

As much of a gem as Poly/ML might be, I just use MLton* exclusively, because its .mlb aka MLBasis build-system seems by far the most sane way to actually make use of multiple source files when writing SML code.

*another gem

cannam · on Oct 5, 2022

I also use .mlb files exclusively and agree with your general point - but I do make use of Poly/ML and occasionally SML/NJ against the same sources, using scripts to drive them from .mlb files: https://github.com/cannam/sml-buildscripts

These scripts are really limited. They only support the simplest .mlb files that are really just lists of other files, and the semantics are technically incorrect because they have no way to reset the environment when beginning each new .mlb file in, as the spec says they should. I use them scrappily during development and then generally use MLton for production builds. But you can get quite a lot done within those limitations, and it's very nice to have the faster compilation of Poly/ML, as well as the option of seeing error messages from more than one compiler.

ratmice · on Oct 5, 2022

I have always done the opposite using SML/NJ's CompilationManager, then deriving mlb with cm2mlb -D MLton (in the mlton repo), with mlton specific code then wrapped in #if (defined(MLton)).

That you can use CM as a library has always been really handy, but otherwise the same, using MLton for production builds.

frou_dh · on Oct 5, 2022

And you offer a nice selection of SML libraries on your GitHub https://github.com/cannam?tab=repositories&type=source&langu... - Thanks for that!

shwestrick · on Oct 5, 2022

MLton also has perhaps the most impressive performance of any existing functional language implementation. It generates code that easily competes with hand-optimized low-level C/C++/whatever.

cannam · on Oct 5, 2022

MLton can't do SIMD auto-vectorisation, which unfortunately gives it a ceiling significantly below C++ compilers for the sort of thing I do. I generally use FFI calls for chunky, frequently-used vector stuff. It's a good compiler generally.

hajile · on Oct 6, 2022

http://mlton.org/LLVMCodegen

MLton can do it's whole program optimization then output to LLVM which I believe should then be able to do auto-vectorization.

cannam · on Oct 6, 2022

It can also output to C, similarly. But vectorisable code isn't recognisable as such in the output. I'm pretty sure the compiler that handles MLton's output never sees anything to vectorise.

hawk_ · on Oct 6, 2022

What's the FFI overhead like? And can you pass SML function pointers into C code as callbacks?

cannam · on Oct 6, 2022

> What's the FFI overhead like?

I don't have measurements to hand, but vaguely speaking, low. Low enough to be worth using for even quite modest calls into vector functions or optimised libraries.

> And can you pass SML function pointers into C code as callbacks?

Yes. At least, you can declare a function statically for export as a callback. You can't pass arbitrary function values.

You can call C/C++ from SML, and call SML from C/C++ from SML - though some neat potential uses are ruled out as MLton (unlike Poly/ML) doesn't support native threads and you can't call back from a different thread from the caller.

You can also compile to a library using MLton and use that from a C/C++ program - again only from a single thread, but it doesn't have to be the program's main thread.

yawaramin · on Oct 6, 2022

> support for multi-threading (also here); they say that OCaml is finally catching up, 15 years later, thanks to having 100 times as much funding.

Yeah, because OCaml has millions of lines of production code in many deployments (including Xen i.e. Amazon EC2) so it can't just break backward-compatibility to introduce multicore, nor can it afford to slow down existing code. These are constraints that the various MLs don't have :-)

isomorph · on Oct 5, 2022

The author is Larry Paulson - my (and I'm sure many others' on HN) Foundations of Computer Science lecturer at Cambridge back in 2009. I wonder if I still have a little credit at the front of the lecture notes (for spotting some typo or other). It's foreboding to see him writing up his "memories" like this. Being woken up into the world of functional programming (via Moscow ML) - after learning to code as a kid in JavaScript and Visual Basic 6 - was priceless. And in a way it led to my fascination with Scala, which has made the transition to Swift much easier.

msoloviev · on Oct 6, 2022

Hah, +1 for HN users who attended that course in 2009 here. (I usually post on a pseudonymous account though.) I surely must know you, though I couldn't tell who you are from very briefly stalking your comments!

While we're reminiscing, I imagine you also took Concepts in Programming Languages in second year (https://www.cl.cam.ac.uk/teaching/1011/ConceptsPL/ ?), which (after a tour de force through all sorts of paradigms that were consigned to the fossil record for the time being) did advertise Scala pretty heavily. I found that course to be a true gem and haven't seen anything like it at any of the other universities I've passed through since then.

hanoz · on Oct 5, 2022

Still have my copy of ML for the Working Programmer, my one surviving book from back in the day (early 90s in my case).

Being hit with all that stuff from day one, fresh from a childhood of Spectrum Basic and Turbo Pascal, was a real shock to the system.

Still don't understand most of it to be honest, but it does still jar when I see the initialism used to mean Machine Learning.

MexicanJoe · on Oct 6, 2022

Ha ha.. I don't have it anymore, but I also went through ML for the Working Programmer. At my university, some web apps were written in ML, but then our principal was Mads Tofte - He is a good guy, learned a lot from him.

deltasepsilon · on Oct 5, 2022

> It’s sad that well into the 21st Century, Computer Science has so regressed that people no longer see the point of distinguishing between a programming language and its implementation.

Amen. I guarantee that Rust will regret not addressing this sooner. It's mature enough now that defining a semantics is more than appropriate.

devmunchies · on Oct 5, 2022

I REALLY want to use a native ML language[1] but they all seem to be catered towards academics.

If you want success within companies and not universities, then you need to treat the language as a product, serve the customer by investing time into documentation, guides, and branding just like with any product. I say this as a fan of ML languages, posts that complain about another language (OCaml) being more popular even though it is an inferior doesn't get new users.

[1]: I currently use F# daily but would be happy to switch to a variant of SML if it made sense.

yawaramin · on Oct 6, 2022

Effectively, the native ML language that is the actual modern-day industrial-strength language fit for today, is OCaml.

bmitc · on Oct 6, 2022

More so than F#? I’m not sure.

yawaramin · on Oct 6, 2022

Yes, absolutely more so, F# discards one of the most critical parts of ML–functors.

bmitc · on Oct 6, 2022

That’s just one component. Yes, F# does not have SML’s functor/module system, but that doesn’t negate F# being an industrial strength ML.

You can do a lot with F#’s generics and interfaces.

yawaramin · on Oct 8, 2022

Can you do a lot with it, sure. Is it an ML language without functors? No.

mhd · on Oct 5, 2022

I understand the need for good documentation and tooling, but "languages as a product" are one of the more horrible aspects of the current dev environment. First of all, this mostly means that compiler and language are the same product. Second, it leads to the release early/often dev cycle which seems ill-suited to a language and its core library (remember core libraries?).

And language "branding"? What are we talking about here? TIOBE-SEO? Cute logos? Cute in-group names ("pythonist", "rustacean", "gopher" ...)

By the way, wasn't there a "ML for the Working Programmer" aeons ago?

cannam · on Oct 5, 2022

> By the way, wasn't there a "ML for the Working Programmer" aeons ago?

There was. It was written by the author of this very submission and you can download it free from his site (follow the link trail from author's name at the bottom of the article). Great book, a real favourite, though I'm not sure it totally captures the anxiety-driven copy-paste frenzy of the true working programmer.

There is one SML compiler that has an extremely cute logo - SML# (https://smlsharp.github.io/en/)

devmunchies · on Oct 6, 2022

> And language "branding"? What are we talking about here?

“Branding Power” would be where given the choice between two products, the customer will choose the more expensive (or equally priced) product based solely on perception of brand quality or other brand characteristics. Think buying store brand vs buying name brand, an expensive name brand electronic item vs a cheap knock-off from a no-name Chinese manufacturer (the Chinese one might actually be better but that’s the point).

So not just a logo or cute drawings. It’s part of a more holistic experience, but especially the perception which is where logos and the like come into play. We engineers can stomp our feet but those are the rules of the game.

c-cube · on Oct 6, 2022

I think calling OCaml "inferior" is misleading in 2022. OCaml kept evolving (easier when not bound by a standard) and grew many features way beyond what SML offers. Things like more expressive pattern matching, GADTs, first class modules, etc. The tooling (lsp, dune, ocamlformat) is also stronger these days. That's why I think SML and OCaml are not strictly better than one another.

3836293648 · on Oct 5, 2022

I thought F# was native these days?

yawaramin · on Oct 6, 2022

Not really, at last check it just bundles the entire .Net runtime with the compiled image. It's basically the equivalent of https://github.com/vercel/pkg or similar tools.

3836293648 · on Oct 8, 2022

I was referring to this

https://learn.microsoft.com/en-us/dotnet/core/deploying/nati...

Have I missed something? Because that page says pretty clearly that it's fully precompiled. It's not default, but it's an option

yawaramin · on Oct 8, 2022

From that page:

> The app will be available in the publish directory and will contain all the code needed to run in it, including a stripped-down version of the coreclr runtime.

Also check out the limitations section on the page. Some pretty big ones, including 'Limited diagnostic support (debugging and profiling).'

agumonkey · on Oct 5, 2022

Timely topic, I was asking on a lisp/plt discord if there was a lineage between FP/algebraic types and lisp structural thinking (patterns of nested lists of symbols representing and/or) with destructuring bind.

I see a big parallel between function to turn params into structs and accessors, both being mirrors of each others (cons/car/cdr as the root example) and the fact that a type definition encodes enough information to generate both at the semantic layer, with pattern matching as sugar.

So if any early LCF team member can shed some light if I'm imagining things or if LCF/ML were a rewrite of older recursive lisp practices into a cleaner formalism, I'd be glad.

ps: the fact that some LCF impl. were done in lisp also reinforces my belief.

c-cube · on Oct 6, 2022

Caml was also initially implemented in LeLisp if I recall correctly. :-).

An advantage of ML for theorem proving à la LCF is that you can use the type system to enforce some invariants ; namely you can prevent the user from ever building a value of type `theorem` that is not actually a valid theorem. I don't know how that would work in a dynamic language where you have access to everything.

gorgoiler · on Oct 5, 2022

In the late 90s, for me and a bunch of people here I’m guessing, Larry Paulson’s course on ML was the first time any of us had written a computer program. Certainly our first introduction to the theories of computation.

I feel very lucky to have been in the right place at the right time, and ML’s structure was a pleasure then as it is now. The history of the language is indeed rather interesting.

jp0d · on Oct 6, 2022

I was just browsing this man's webpage (http://lucacardelli.name/) and it's just fascinating what he has achieved in his life!

lower · on Oct 6, 2022

It's interesting that OCaml's syntactic quirks go back to Edinburgh ML. I had never known about this bit of history.