A deep dive into Multicore OCaml garbage collector

rixed · on July 16, 2017

I never managed to understand why so many people were apparently longering for multicore support. I just can't believe many are impatiently waiting for this feature to start using the ocaml language for some projects requiring multicore support, and I became impatient myself just to see what those projects are.

In recent years where servers are to scale to many machines I found myself using less and less kennel threads. Also it seams GUIs are mostly web theses days. Most remaining fields I can imagine where you would want kennel threads, ocaml would not fit anyway because of GC and lack of control over memory layout.

So I really wonder...

rbehrends · on July 16, 2017

The most common reasons for requiring shared memory concurrency (instead of message passing) are:

1. It's just plain easier to write a lot of stuff with a shared memory approach.

2. There are algorithms where you get the best speedup using shared memory.

But yes, there are also downsides.

1. Shared memory concurrency can be a lot harder on the runtime. In particular, writing an efficient single-threaded GC is an order of magnitude easier than an efficient concurrent GC.

2. You eventually run into scaling issues with shared memory only (number of cores, limited memory bandwidth). That said, a hybrid shared memory/message passing solution can still be superior to a pure message passing one, and for "desktop" parallelism that's not an issue.

3. Shared memory concurrency requires programming language support (at least if you want to keep a modicum of sanity), whereas message passing concurrency can be done as a library. And many languages have really, really screwed up their handling of shared memory concurrency where it's extremely difficult to reason about correctness.

My perceptions may be colored here because I did cut my teeth learning about parallel programming with distributed computing in the 1990s, but I also find that lack of shared memory concurrency less of an issue in actual practice (though, obviously, I'd hardly pass up on having the option).

_0w8t · on July 17, 2017

Erlang essentially using that hybrid model. The runtime uses a shared heap for strings and some other immutable data that can be reference counted. On top of that each thread has own GC-heap and messages copies everything private heap.

I am puzzled why other languages do not even try to explore this. This will be a nice fit for Ocaml or nodejs. And even with Go I wish for be an option to run several Go processes with a shared heap with explicit API to write/read things there and channels working across processes.

rixed · on July 16, 2017

Indeed I welcome the option as well. If it doesn't make the runtime noticably slower for non concurrent programs, of course, but I'm confident the authors will make all this a no-op in that case.

kcsrk · on July 16, 2017

Performance backwards compatibility has been a key goal in the Multicore OCaml project. Currently, the overhead for running legacy sequential OCaml programs on the multicore GC is a few percentage points on average. The overhead is low enough that we may not need to provide a "sequential-only" compilation flag.

kcsrk · on July 16, 2017

Multicore support isn't for every one, but there are some important use cases where multicore support is very useful. One example is Facebook's Hack language type checker [0] which offers good opportunity for parallelism, but the lack of multicore support in OCaml means that they had to implement a custom off-heap lock free hash table shared by multiple processes. There are many industrial uses of OCaml including static type checkers (FB's Hack, Flow, Infer, etc), program verification tools (Coq, why3), math libraries (Owl) [1] etc., all of which benefit from multicore support.

[0]: https://www.youtube.com/watch?v=uXuYVUdFY48

[1]: https://news.ycombinator.com/item?id=14751236

atombender · on July 16, 2017

In Go you can just do "go doWork()" and the scheduler runs the function as a coroutine on some kernel thread, thus allowing your program to make use of all cores.

The ability to just throw goroutines at the scheduler changes how you approach writing programs. For example, say you want to process an input that consists of lots of individual records. You can run the input stream in one goroutine, then spawn a bunch of worker goroutines that each receives records on a channel for processing. Since Go uses real threads and doesn't have a global lock, you can get nearly linear improvement in throughout here.

You wouldn't do this in, say, Ruby or Python, where the concurrency situation is about the same as in OCaml. Ruby has threads, like OCaml, but their slowness means they're not really usable in the same way as goroutines.

Parallelizing apps by forking child processes and communicating input and results via pipes is something I've done a lot in Ruby, and it's a really awkward, heavy-handed concurrency model.

rbehrends · on July 16, 2017

Go doesn't really have a good shared memory story, either. Shared memory support exists, but it's not really any safer than in C or Java. If that were good enough, then Ocamlnet's netmulticore library would already fit the bill. But most people who want shared memory support want something that's both safer and higher level.

Go really is designed around message-passing as its primary means of coordination; the key part here is having nice abstractions, which is where Go's strengths lie and which revolve primarily around `select`. But writing higher-level abstractions over processes and FIFO channels isn't terribly hard, as long as you have serialization built in (which Python, Ruby, and OCaml all do).

JacobiX · on July 16, 2017

In OCaml there is Lwt. It provides very light-weight cooperative threads. The context switches are very fast and composing cooperative threads allows the writing of highly asynchronous applications. For instance, in the versions of Go prior to 1.5 goroutines used to run on a single core. And as of 1.5 the variable GOMAXPROCS can be set to the number of cores. EDIT:clarify the difference between concurrency and parallelism.

jzelinskie · on July 16, 2017

>For instance, in the versions of Go prior to 1.5 goroutines used to run on a single core. And as of 1.5 the variable GOMAXPROCS can be set to the number of cores.

This is not true. Prior to Go 1.5 GOMAXPROCS defaulted to 1, but could always be set to use any number of cores. In Go 1.5, the default value for GOMAXPROCS was changed to the number of "cores" available on the machine. Prior to then many applications were setting GOMAXPROCS based on CLI flags or just hard-coding the future behavior with `runtime.GOMAXPROCS(runtime.NumCPU())`.

JacobiX · on July 16, 2017

My bad, you are correct. I just wanted to point out that one can have concurrency without parallelism.

atombender · on July 16, 2017

Lwt looks nice, but also seems a lot more intrusive to me than, say, Go. It's based on promises (so it's very explicit), and if I remember correctly, in order to do any I/O you have to use the Lwt mechanisms instead of the standard ones (which block).

How does Lwt interact with third-party libs that aren't written to use Lwt? I know from Ruby that libevent-type stuff such as Eventmachine doesn't combine all that well with non-async code.

Go made a good call in making all I/O blocking. That means code is easy to read and write. It comes with downsides, such as that you have to use goroutines, but for most developers it's a more productive concurrency model. (Unfortunately there's no alternative; you can build a blocking system out of a non-blocking one with no overhead, but the opposite is not true. When I first started with Go I was surprised that you can't "select{}" on a file descriptor or socket, or indeed any custom implementation, in the same way that "range" is magical and only works on built-in types.)

kcsrk · on July 16, 2017

With algebraic effect handlers (the concurrency story of multicore OCaml), you can write your I/O code in direct-style, but retain the advantage of non-blocking I/O. As such, you have the advantage of goroutines, but this is also an opt-in. The asynchronous I/O is implemented as a library and is not baked into the language, and hence you can write your own I/O library. For more details, see our recent draft paper: http://kcsrk.info/papers/system_effects_may_17.pdf.

JacobiX · on July 16, 2017

Another great concurrency library in OCaml is the Async library. It's very convenient compared to Lwt. You can find here how it behaves with I/O operations https://realworldocaml.org/v1/en/html/concurrent-programming...

_0w8t · on July 16, 2017

Go story becomes rather bad the moment one wants to support IO cancellation without memory leaks in form of hanging forever go routines. As one cannot cancel them, a proper shutdown requires a lot of rather non-trivial code with thinks like sending channels over channels etc. Recently introduced Context was supposed to address it, but since one cannot use that to cancel a blocking system call, this is only helpful with high-level libraries that tries hard to work around that.

Lwt does have proper cancellation support. There are some pain points, but it is straightforward to work around them. The real problem with Ocaml is that there are 2 incompatible libraries for promise-based IO, Lwt and Async. That made it really hard for third-party libraries as now they need to support not one, but two frameworks...

ice109 · on July 16, 2017

I whole heartedly agree with you on the convenience of go routines. I haven't been following the ocaml concurrency sorry; are they implementing a similar concurrency model?

kcsrk · on July 16, 2017

Multicore OCaml comes with native support for concurrency through algebraic effect handlers which generalize common control flow abstractions such as exceptions, async/await, generators, non-determinism, backtracking. All of these mechanisms can be implemented directly in OCaml. [0] introduces the model, [1,2] has examples, and for further reading see [3].

[0]: http://kcsrk.info/ocaml/multicore/2015/05/20/effects-multico...

[1]: http://kcsrk.info/ocaml/multicore/effects/2015/05/27/more-ef...

[2]: https://github.com/kayceesrk/effects-examples

[3]: See recent pubs here http://kcsrk.info/.

_ph_ · on July 16, 2017

The short answer is: single-core cpu performance is not going up much in the recent years. The core count is still increasing every year. That is why graphics cards still see a strong rise in total performance, their usual work load is inherently parallel.

For typical server tasks, like web serving, we can just run several processes. Unix is doing this for decades. But even web serving is often cpu bound, if your SQL query runs only on one cpu.

The obvious thing to make programs faster is, to utilize as many cpus as possible to perform the computations. There is no silver bullet, but the recent years brought quite some progress. Go makes it easy and cheap to spawn off goroutines, the channels and GC help to make this a reasonable concept. Functional programming languages are even a more obvious candidate for parallel execution, as they have no side effects.

pencilhappen · on July 16, 2017

> but the recent years brought quite some progress. Go makes it easy ...

It actually feels like the recent years have been pretty disappointing. Go is one of the few places where the language developers are making a real effort around multicore. Erlang came with a bunch of opinions about how to do this in the 90s.

But every other mainstream language is taking a head-in-the-sand attitude about this. Python squandered their backwards compatibility break without addressing this at all. The JVM developers are making no effort to get away from the global heap. Ruby and Node are completely ignoring it entirely. PHP from it's request/response web history actually does well, except its non-web story is as dismal as ever. Lua has coroutines, which are a reasonable abstraction, but no multicore VM story...

_0w8t · on July 16, 2017

Given that Go is memory-unsafe on multicore, I do not think that language developers are made that strong efforts. The answer in Go is to use runtime checkers, but even those do not catch all memory safety issues.

girvo · on July 16, 2017

So about the only reason I have for it is some certain parallel algorithms I'd love to use, that are mostly best done at the thread level.

But, these days using unix sockets with some cute flags letting multiple proceeds binding to the same listening sockets means it's sort of less needed!

Though, doing a lot of work with Nim lately and exposing thread pools via Futures makes for some pretty lovely easy parallel code, so I dunno. Not a deal breaker but a nice to have, and having threads in OCaml will make things just that bit nicer

logicchains · on July 16, 2017

Even if a particular application doesn't require any multicore support, usually at work I won't be working on just one application as part of a project, there'll be multiple applications involved, at least some of which will require shared memory parallelism. To me it doesn't make sense to pick a language that can only be used for a few applications not all, as this leads to unnecessary duplication in library code and the like compared to just using one language (in my case, C++) for all of them (similar to how people prefer to use the same language for frontend and backend to reduce duplication). Especially when the lack of multicore doesn't bring any compelling advantages: it saves a few percentage points of single-threaded performance at the expense of completely ruling out most use-cases that require shared memory parallelism (or at least rules out any ways of doing them that aren't incredibly un-ergonomic), which seems like an absolutely terrible trade-off to me.

Here's a concrete example: comparing the Isabelle proof assistant, written in PolyML (a multi-core supporting SML implementation developed mostly by one guy) with Coq (which is written in OCaml). Interactive theorem proving in the former is a lot nicer as it takes advantage of multithreading not only for faster concurrent processing of proofs, but also to do things like running Quickcheck and Nitpick in a separate thread automatically to identify trivially falsifiable lemmas, and automatically finding stdlib lemmas that exactly solve a particular proof.

I think PolyML demonstrates that a lack of manpower isn't what stopped OCaml implementing multithreading support. Over the years there have actually been a few proposals/branches implementing some form of support, but all were rejected/abandoned. There was even one that just made the runtime reentrant (passing the runtime around directly instead of having it as a global variable), meaning the OCaml runtime could now be stopped and restarted when embedded in another application (e.g. C calling OCaml), a feature already present in Haskell, but this relatively simple improvement was also rejected (which personally bothers me a lot on a subjective level as I hate globals so it seems like a worthy improvement for its own sake). I remember waiting excitedly four or five years ago for it to be merged, a small but significant step on the path to multicore support, only for that hope to fade away as contributions to the branch slowed to a trickle then dried up completely. Without such extreme focus on avoiding decreases in single-core performance, it would have been much easier for a change like this to have been merged.

I think in recent times there is even less justification for rejecting multicore for affecting single-threaded performance. For two reasons: firstly, the recent addition of FLambda has brought performance improvements in many cases of over 10%, easily enough to compensate for any loss from multi-threading. Secondly, HFT now basically requires FPGA to compete, so single-threaded performance would presumably be of less concern to Jane Street (OCaml's biggest industrial user) now as they really shouldn't be using a software execution engine anyway (disclaimer: I say that as someone working at a competing HFT firm). And if Jane Street isn't doing HFT, then a drop of a few percentage points in single-threaded performance shouldn't affect them much anyway.

Finally, OCaml could always do what Haskell did and add a flag to toggle whether or not multi-threading is enabled, allowing single-threaded users to avoid any performance regression.

tom_mellior · on July 16, 2017

> things like running Quickcheck and Nitpick in a separate thread

In my experience with Isabelle, these tools provide feedback on the order of seconds (because they do expensive things). Maybe several hundreds of milliseconds, for simple cases. At these scales, it makes no noticeable difference what kind of parallelism solution you use. Spawning separate processes and sending messages would work just as well.

So the reason there is no parallel Quickcheck for Coq is more likely due to the fact that there is no Quickcheck for Coq, period.

JacobiX · on July 16, 2017

One can have the kind of unsafe shared memory like C++ in OCaml by using a library like netmulticore. But the OCaml formalised memory model is offering the same kind of OCaml safety in multicore programs. For your first point I'm not an advocate of using a single programming language for all kinds of things. But it's just my opinion.

silisili · on July 16, 2017

OCaml has always intrigued me as something to learn, but it seems I always read equally as many reasons that it's not good, not ready, or why I should try Haskell instead. Does anyone have experience to say yay/nay on worthwhile learning, know of any companies using it heavily, or know of any large cleany written codebases from which to study?

rbehrends · on July 16, 2017

I think as a computer scientist you should really have at least some familiarity with both Haskell and one of the ML languages, just because it'll expand your thinking. Whether to invest deeply in one of them is a matter of whether you have the time (especially if it's just as a hobby) and how well one or the other matches your preferences.

I don't generally evangelize OCaml – because as a language it can be a bit of a mixed bag – but end up using it a lot because in the end it's a pragmatic language and a really solid workhorse. The niche that it fills for me is that of a statically typed language with native compilation and a GC, a niche that has historically not been served all that well.

Things I like:

* The language itself is not too opinionated. It's a multi-paradigm, functional-first language, that allows me to use imperative or OO styles when that's the best fit.

* The compiler is fast.

* While the ecosystem has some gaps, it's very solid in those areas that I need (such as systems programming or language tooling).

* The language does not prioritize speed at the expense of correctness; this includes the compiler. There are languages where, let's say, I have trust issues. Code quality is still pretty darn good.

* I really like ML functors for parametric polymorphism. While more verbose than other approaches, they are simple and powerful.

* Having a time-traveling debugger can occasionally be very helpful.

* For all their warts, the language and the compiler are mature.

flunhat · on July 16, 2017

IMO the best part about OCaml is that it's just so fun to write code in. The type system is a million times more expressive and elegant than anything the world of Java/C# offers and more rigorous to boot. And unlike Haskell, you can cheat a little and write imperative code if you really need to. The library ecosystem could be better with OCaml though, but this is a chicken-egg problem and not an indictment of the language itself.

vog · on July 16, 2017

> you can cheat a little and write imperative code if you really need to

I'm not sure if this is cheating.

Yes, this can be misused to introduce side-effects where there should be none.

However, the strict evaluation by default (and lazy evaluation only when explicitly marked) has a huge benefit that you can not only reason easily about correctness (as in Haskell), but also reason easily about performance. The latter you can't do in Haskell without huge assumptions about the optimizer. In that regard, Haskell is similar to SQL: For small SQL queries it is pretty clear how it will be executed, but for large SQL queries it isn't clear at all which route the query planner will choose.

Having said that, reasoning about performance is even better in Rust, which is heavily inspired by OCaml but allows to to reason about memory access, closing the gap that you can reason about OCaml's performance only while ignoring the GC.

mercurial · on July 16, 2017

> Having said that, reasoning about performance is even better in Rust, which is heavily inspired by OCaml but allows to to reason about memory access, closing the gap that you can reason about OCaml's performance only while ignoring the GC.

Rust is a really different topic. It's an imperative low-level language with functional trappings, not the reverse. First off, "reasoning about memory access" really means "think about what part of code owns which data", which is non-trivial, especially considering the various ways for sharing data in Rust.

Secondly, trying to do idiomatic OCaml in Rust is a terrible idea. Idiomatic OCaml is all about immutability and recursion. Immutability means allocation, and idiomatic Rust is pretty much all about avoiding allocations. And recursion is also a bad idea, especially considering Rust does not have TCO. It's not a dig at Rust, which is an exceedingly interesting language of its own, but Rust and OCaml have completely different philosophies.

darksaints · on July 16, 2017

Idiomatic rust avoids heap allocations, but it very much encourages immutable stack allocation...something that OCaml also tries to encourage.

Rust started out a lot like OCaml, and gradually moved away from it whenever they needed to make a compromise to improve its position as a systems language. You start to see it with big things like how the pattern matcher works, but you also notice small things like idiomatic capitalization (lower_snake_case for functions, UpperCamelCase for types). The original compiler was even written in OCaml. The language borrowed a lot from OCaml and the ML subculture is still really strong on the core development team. That shows through enough to persuade ML-inclined engineers to use it even when they don't need a systems language. I'd say there's a really strong reason the comparison still exists even if they've moved apart over time.

mercurial · on July 16, 2017

> Idiomatic rust avoids heap allocations, but it very much encourages immutable stack allocation...something that OCaml also tries to encourage.

No doubt, but most of your OCaml data structures will, in a typical program, still be on the heap, while idiomatic Rust will avoid Box whenever possible.

Apart from that, I know about the links between the two (and I think the blend between the OCaml/Java (for the generics) and Ruby syntax (lambdas) works really well in practice. But I think the underlying philosophy, and the way of structuring code are really different. This is a much more fundamental concern than surface-level considerations.

rbehrends · on July 16, 2017

> Idiomatic rust avoids heap allocations, but it very much encourages immutable stack allocation...something that OCaml also tries to encourage.

I'm not sure I'm following you here; one of the more common criticisms of OCaml is generally that it (unavoidably) puts too many values on the heap. I mean, even floats get boxed by default, unless the compiler can avoid that.

CyberDildonics · on July 16, 2017

> even floats get boxed by default, unless the compiler can avoid that.

Yikes, that seems like it would destroy performance.

rbehrends · on July 16, 2017

It's one of OCaml's biggest weaknesses, but it's also not as bad as it sounds.

One, records and arrays of floats don't suffer an additional level of indirection.

Two, the compiler is actually pretty good at unboxing floats these days. Arguments of non-inlined functions are generally the major reason for boxing to happen (because of ABI requirements). Inner loops should generally be free of unnecessary boxing.

Three, even when boxing happens, keep in mind that OCaml uses a very fast bump allocator (which is also inlined). This means that heap allocation of short-lived values is more like alloca() than malloc().

OCaml is still not the language that you'd want to write performance-sensitive numeric code in, but it's generally fine when floating point math is only one aspect of your code or when you're using it as a way to interface with C/Fortran libraries (such as with Owl [1]), sort of like how Python uses NumPy.

[1] https://github.com/ryanrhymes/owl

pjmlp · on July 16, 2017

Personally I think it is a mistake to use Rust instead of OCaml if the use case doesn't require the tight memory management that Rust supports.

If the target domain can be done in a language with GC, enjoy productivity with sails taking full wind, instead of having to deal with lifetime annotations.

Now if the GC induced delays are too much for the target domain, then for all means use Rust.

After all, the tool for reasoning about performance is a profiler, not the compiler itself.

_0w8t · on July 16, 2017

In addition to the lack of GC Rust also requires type annotation for top-level functions. As types are often rather non-trivial due to borrowing etc, it really takes more efforts to write middle-level glue code in Rust.

grumpyprole · on July 16, 2017

You are offering a one-sided argument with regard to strict versus lazy!

Lazy evaluation can make space usage difficult to reason about, but time complexity should be unchanged.

Lazy evaluation, by default, is also a huge benefit to Haskell. Strict evaluation is often a huge impediment to function reuse.

thedufer · on July 16, 2017

Time _complexity_ is unchanged, but laziness makes it difficult to reason about when the time will be taken. This, more so than the space usage, is the reason laziness is avoided when run time is important.

> Strict evaluation is often a huge impediment to function reuse.

This reads like an argument for having laziness as a language construct, rather than for having it as a default.

logicchains · on July 16, 2017

OCaml has many of the good qualities people attribute to Go (simplicity, speed, relatively low-latency GC, well-engineered compiler&runtime) and none of the downsides. Its weakness is terrible support for multithreading (like CPython it has a global lock, such that only one thread can execute OCaml code at a time), but there's work in progress to fix that. Personally I'm of the opinion that if the developers had taken multicore more seriously a decade ago, and improved Windows support, it would/could be used now for most of the things for which Go is used.

mercurial · on July 16, 2017

People often don't like the syntax too much (which is why Facebook invented Reason). There are other weaknesses as well:

- the stdlib is good for data structures and not much else, which in turn gave rise to at least three different stdlib extensions

- it does not have typeclasses, so implementing a generic "to_string" for your data type is not a thing

- it has some really unfortunate ideas like "universal compare"

- in general, there is little focus on creature comforts (this is 2017 and despite often relying on exceptions for error handling, there is no "finally" clause in the language)

- in the same spirit, strings are simply bytes, it's up to the developer to ensure they have the right encoding

- the language separates the code itself from the type declaration, a feature which, combined with its type inference, can make reading OCaml code similar to reading Python (in the "what type is this 'e' variable" sense)

- its most widely used build system is pretty terrible

That said, it has some solid features as well, despite the small community size. The packaging system is top-notch, it has good text editor integration via Merlin, and while not always numerous, the libraries are often high-quality, and it is an exceedingly pleasant language to develop with. It strikes a really good balance between solidity and ease-of-use.

kcsrk · on July 16, 2017

There has been developments along some of these issues:

- jbuilder (https://github.com/janestreet/jbuilder) has markedly improved the build system story. Importantly, the community understands that this is an issue and is actively working towards making the situation better.

- Type declaration at the code level is optional. You could always annotate your code with types, and the type checker will ensure that the annotated types are compatible with the inferred ones. If you would like to know "what type is this 'e' variable", there is merlin (https://github.com/ocaml/merlin), which integrates with popular editors.

- Reg. typeclasses, there has been enormous amount of work put into modular implicits (http://ocamllabs.io/doc/implicits.html). It is likely to be the next major features in OCaml along with multicore.

mercurial · on July 16, 2017

> jbuilder (https://github.com/janestreet/jbuilder) has markedly improved the build system story. Importantly, the community understands that this is an issue and is actively working towards making the situation better.

Well, that's good to know.

> Type declaration at the code level is optional. You could always annotate your code with types, and the type checker will ensure that the annotated types are compatible with the inferred ones. If you would like to know "what type is this 'e' variable", there is merlin (https://github.com/ocaml/merlin), which integrates with popular editors.

I am aware of that. I shouldn't have to use Merlin to read third-party code from github, and god forbid that you have to do code review on a "let's all rely on type inference" codebase in a web browser. I think Rust strikes the right balance here, by enforcing the use of type signatures at function boundaries (which also lets it offer world-class error messages).

> Reg. typeclasses, there has been enormous amount of work put into modular implicits (http://ocamllabs.io/doc/implicits.html). It is likely to be the next major features in OCaml along with multicore.

I know about implicits. I also know that multicore has been "just around the corner" for a few years now, this does not bode well for implicits (which have also been "in progress" for a long time). This also touches another issue with the OCaml maintainers: little in the way of information seems to filter down to the unwashed masses, and let us not speak of the word "roadmap"...

arthurbrown · on July 16, 2017

The community tends to shy away from classes, but using a method for to_string accomplishes your second point in a minimal way.

  let to_string stringer : string = stringer#to_string
  val to_string : < to_string : string; .. > -> string = <fun>

I think there's also plenty of people who feel just as strongly about Reason syntax in the other direction.

mercurial · on July 16, 2017

I'm not particularly attached to Reason syntax myself, but there are some bits of OCaml syntax that do annoy me, such as the right-to-left direction for parametrized types ('a list instead of a more usual list 'a).

As for classes, well... They're not really used outside of specific situations (eg, UI widgets), so it's not really a terribly convenient solution (especially since, for a really useful to_string implementation, you would want the member fields of your class to also implement to_string, so that would mean objects everywhere).

ernst_klim · on July 16, 2017

>or know of any large cleany written codebases from which to study?

* unison (a great file sync tool for win/posix from Benjamin Pierce, the author of Software foundations and TAPL)

https://github.com/bcpierce00/unison

* mirage (which also has a bunch of interesting subprojects, like irmin)

https://github.com/mirage

* coq

https://github.com/coq/coq

* flow

https://github.com/facebook/flow

* CompCert

https://github.com/AbsInt/CompCert

* frama-c

https://github.com/Frama-C

* 0install

https://github.com/0install/0install

tom_mellior · on July 16, 2017

> or why I should try Haskell instead

Haskell has some nice things going for it. Its syntax is cleaner than OCaml's, for example. Still, it makes some things (like having a mutable variable, or a hash table) much harder than they should be. It's worthwhile looking at both over some rainy weekends. Whichever you end up preferring, you'll have learned useful things about both languages.

(OCaml is better, though ;-))

> any large cleany written codebases from which to study?

There is a big list of projects using OCaml at https://github.com/rizo/awesome-ocaml

Obviously, I can't vouch for cleanly-writtenness of any of them. Whichever you study would presumably depend on your area of expertise. There are packages there for anything from theorem proving to bioinformatics to web development. If you're into compilers, you could look at the OCaml compiler itself...

mookid11 · on July 16, 2017

> Haskell has some nice things going for it. Its syntax is cleaner than OCaml's, for example.

While I find the ocaml syntax very far from perfect, the whitespace sensitivity of Haskell makes it immediately worse. What a silly design.

mercurial · on July 16, 2017

IMHO, it's not so much that as the ASCII operator galore. $, <*>, the horrible lambda syntax... Sometimes it looks an opinionated Perl.

ice109 · on July 16, 2017

I don't know if you've worked with a production Haskell code base but it cannot be over stated what a poor decision operator overloading is. if they got rid of that it would immediately make Haskell an order of magnitude more practical

the_why_of_y · on July 16, 2017

Haskell doesn't really have the problems that are commonly associated with operator overloading.

Haskell does have type classes, which do allow overloading, even of operators, but the type class uniquely determines the type of the operator - and there can be only one definition of any operator in scope.

Also, you can just define as many new operators as you like.

For these reasons, you can't have the madness of operator<< in C++, or whatever it is that Perl does with the same operator doing 5 completely different things depending on context.

tom_mellior · on July 16, 2017

> Haskell doesn't really have the problems that are commonly associated with operator overloading.

Nobody said it did. People said that it has its own brand of problems with operators looking something like >>+@*<>> peppering your code.

ice109 · on July 16, 2017

lol i wonder if you work on my team

ice109 · on July 16, 2017

>Also, you can just define as many new operators as you like.

this is exactly the problem

mercurial · on July 16, 2017

Haskell for hobby programming coupled with professional Perl experience gave me enough idea of what creative programmers could come up with in terms of ASCII art.

tome · on July 18, 2017

Haskell doesn't have overloading at all, either of operators nor of non-operator identifiers, so I guess you mean something else.

Do you mean user-defined operators, or do you mean operators whose type has a typeclass context?

tome · on July 18, 2017

If find these kind of comments really interesting, because for me Haskell's witespace-sensitivity immediately makes it a much better language! (I feel the same about Python).

It's interesting because (presumably) reasonable people really disagree on these issues.

LukaD · on July 16, 2017

How does the whitespace sensitivity make it immediately worse?

mookid11 · on July 16, 2017

Unless experienced, you have no clue how a piece of code will be parsed. With ocaml, where is very little ambiguity; and indentation tools can take care of proper formatting, you only need to break lines.

And it can be pretty bad with haskell, which has complex semantics compared to, say, python.

With whitespace sensitive syntax, I feel like in the dark age of manual indentation.

tome · on July 18, 2017

By no means. Haskell's semantics are extremely simple. Python's are complicated.

beeforpork · on July 16, 2017

Youmeanwhitespaceisbad?C64basicrules!

seanwilson · on July 16, 2017

> OCaml has always intrigued me as something to learn, but it seems I always read equally as many reasons that it's not good, not ready, or why I should try Haskell instead.

You should just spend a few hours and try it. Once you get used to inductive types, pattern matching, immutability and recursion over for loops there really isn't a lot of syntax to learn before you can be productive. Either way, it'll teach you to be a better coder in other languages.

0xcde4c3db · on July 16, 2017

As far as I know, Jane Street is the go-to example for commercial users. It's also unsurprisingly used within INRIA itself (for e.g. Coq).

fulafel · on July 16, 2017

Docker too. https://github.com/moby/datakit https://github.com/moby/vpnkit

nv-vn · on July 16, 2017

Facebook and Bloomberg are both industrial users as well.

danieldk · on July 16, 2017

And Citrix for XenServer and xapi:

https://github.com/xapi-project

BoiledCabbage · on July 16, 2017

From what I understand F# is a pretty close version of Ocaml, but has the benefit of all the libraries of the full .net ecosystem.

JacobiX · on July 16, 2017

F# is basically an OCaml clone. I wonder why it isn't called NCaml or something like that. F# lacks some functional programming features the powerful macro system used for creating domain specific languages and extending the ocaml language.

pjmlp · on July 17, 2017

It is related to previous OCaml.NET efforts.

F# computation expressions, coupled with .NET expression trees are quite useful though, even though they aren't macros as such.

35bge57dtjku · on July 16, 2017

If you compare the two it's obvious the F# author blatantly copied as much as he possibly could.

sriram_malhar · on July 16, 2017

> it's obvious the F# author blatantly copied

That's a really unfortunate choice of words. It almost makes him look like he stole something in the middle of the night and never gave anyone credit.

Don Syme worked on Project 7 which brought Generics to .NET, and then spoke with Leroy and others to bring OCaml to this environment.

  The next logical step was to implement an OCaml- style language directly,
  and the OCaml designers were very encouraging when we talked to them
  about bringing this class of languages into the .NET space. That led to the
  early versions of F#.

See this interview: http://through-the-interface.typepad.com/through_the_interfa...

35bge57dtjku · on July 18, 2017

That quote is an unfortunate choice of words! "an OCaml- style language" could be something as different as Haskell or SML, which are far more easily identified as different than OCaml. Instead he made something that looks 95% the same.

And come on, "bringing this class of languages into the .NET space" sounds far different than "porting a copy of a language to .NET that's 95% the same."

No, he didn't steal something in the middle of the night without giving credit, but that quote is still pretty disingenuous sounding. It's more like he copied OCaml and then only gave credit for using a similar style.

sriram_malhar · on July 18, 2017

Well, you are just piling on more unfortunate choices of words. You are accusing him of dishonesty ("then only gave credit"), and laziness; your "95% the same" comment implies that you think of a language only at the syntax level.

That "5%" of F# that you imply is different from OCaml involved redoing the entire .NET framework, getting generics into it.

For example, look at the early papers (from 2001) on "ILX: Extending the .NET Common IL for Functional Language Interoperability", they considered Haskell, OCaml and ML. The compiler was written in Haskell. Or "Transposing F to C#: Expressivity of parametric polymorphism in an object-oriented language".

See Don Syme's talk on F# (https://www.infoq.com/presentations/F-Sharp-History-Today-To...).

35bge57dtjku · on July 23, 2017

> Well, you are just piling on more unfortunate choices of words.

And you are continuing to bend over backwards to misconstrue what I've said. I don't know why you're being this obtuse.

> You are accusing him of dishonesty ("then only gave credit"), and laziness

Not quite...I'm sure he's been open about what he did. Your own quote said, "to implement an OCaml- style language" when what he did seemed like a lot more than just that. What I am saying is that people who say it's just in the style of OCaml aren't accurately describing it. And I never said it wasn't a lot of work, or that it indicated laziness.

> your "95% the same" comment implies that you think of a language only at the syntax level.

Fine, if you want to believe that it doesn't bother me. But you sure are continuing to put a lot of words in my mouth here.

> That "5%" of F# that you imply is different from OCaml involved redoing the entire .NET framework, getting generics into it.

I wasn't talking about that and I never said there wasn't a lot of work there. That really has no bearing on how similar F# looks to OCaml. This has to be some blatant logical fallacy, like you moving the goalposts to now include this part, too.

It's been a week and you're clearly not interested in any discussion. Just moveon.org.

mercurial · on July 16, 2017

F# is pretty much Microsoft's version of OCaml, but it diverges in some key features (for instance, it can do multicore but it does not have OCaml's functors).

davidgl · on July 16, 2017

F# has added some new features on top of OCaml like: - Lightweight syntax - Active Patterns to extend pattern matching - Type Providers to make it easy to interact with json/sql in a type safe way

It's a joy to work in

https://stackoverflow.com/questions/179492/f-changes-to-ocam...

nv-vn · on July 16, 2017

F#'s features don't feel that useful after working in OCaml though. I've tried it for various projects, but OCaml is honestly just as good in all those situations. I wonder if there's some specific type of project where F#'s features will really shine.

mercurial · on July 16, 2017

I can imagine the .NET interop is a big deal, especially if you need to do anything on Windows. Also, parallelism can be occasionally useful.

nv-vn · on July 16, 2017

Yeah, I get that those are useful, but things like type providers or units of measure feel pretty useless to me and I'm not really sure if they're supposed to make me want to switch to F# or not. They certainly don't fill the place of OCaml's module system or object system or GADTs, PPXes, polymorphic variants, etc. To me OCaml's abstractions feel much more polished/as if they were all invented to fill some existing hole in the language and work together (rather than some random "cherry on top" feature that doesn't fix anything for me and is rarely useful).

pjmlp · on July 16, 2017

Well, a big decision point is what one uses as main development platform.

Sadly OCaml support on Windows is not great, to put it nicely.

mercurial · on July 16, 2017

On the other hand, it may be possible to use F# on Linux with .NET Core now.

mixedCase · on July 16, 2017

Tooling is definitely far from mature. It's treated as a second class language in .NET, and on top Linux is treated a second class OS.

I just tried to start a project with F# and ended up abandoning, and am looking into Rust right now. I considered OCaml but its lack of multicore support and the ecosystem situation made me drop it.

pjmlp · on July 16, 2017

You are kind of right.

My point was that even as second class citizen on .NET, F# has more tooling and available libraries than OCaml ever will on Windows.

For a long time OPAM did not support Windows, and right now cygwin or Linux subsystem still seem to be better ways than straight Win32 application support.

OCaml is quite nice on *NIX systems, on other kind of OSes not so much.

afrisch · on July 17, 2017

Of course it feels less native than with F#, but we have created and used CSML (https://github.com/LexiFi/csml) specifically to allow our OCaml code base to interact with .NET.

rmathew · on July 16, 2017

You might find these Reddit threads interesting:

* OCamlers on Haskell - https://www.reddit.com/r/ocaml/comments/3ifwe9/what_are_ocam...

* Haskellers on OCaml - https://www.reddit.com/r/haskell/comments/3huexy/what_are_ha...

trishume · on July 16, 2017

You might want to check out https://reasonml.github.io/. It's a new syntax and set of tools for OCaml that are better in many ways but compatible with the existing ecosystem and tools.

openplatypus · on July 16, 2017

DON'T! Check the Facebook licensing issues (HN is full of them today). Stay away from all-Facebook things.

amelius · on July 16, 2017

I'm guessing a lot of other languages could benefit from this. So what I'd like to see is a runtime environment and garbage collector decoupled from the actual language. Something like the LLVM project, except focused on the runtime system (and different variants) rather than code generation.

kcsrk · on July 16, 2017

There is already one! See Malfunction [0,1]. There is an Idris and Agda backend for malfunction.

[0]: https://github.com/stedolan/malfunction

[1]: https://www.cl.cam.ac.uk/~sd601/papers/malfunction.pdf

atombender · on July 16, 2017

So what's the status of Multicore OCaml? Will it be merged into the mainline any time soon?