its for spam detection. Everything that gets posted on Facebook is processed by this code. There is a talk on it from a year or two ago at a functional programming conference. I'll see if I can find it and edit this post.
Really great to read more about real world large scale Haskell use.
I'd be curious to find out if the success FB has had with Haskell for spam filtering means they might consider Haskell for different parts of their stack too? Does anyone have any insight on this?
> GHC’s runtime is based around an M:N threading model which is designed to map a large number (M) of lightweight Haskell threads onto a small number (N) of heavyweight OS threads. [...] To cut to the chase, we ended up increasing N to be the same as M (or close to it), and this bought us an extra 10-20% throughput per machine.
Ah, yes. As a Go developer, really wish Go moved to an 1:1 threading model.
1:1 in Go would mean moving servers from blocking-style code to callbacks/futures.
(I believe the story is slightly different in Haskell.)
Nice sequential blocking-style code is my favorite thing about Go. I realize it's not C, and it makes runtime work much harder, but the payoff up in complex servers is completely worth it.
A typical server under load has more outstanding requests to answer than it can OS threads. It is not uncommon in a Go server at Google to find a million or so goroutines.
If you want to have an OS thread that can also run C code, it needs a large enough stack to run C code. A million such stacks is not practical.
On an OS that lets you avoid creating stacks with threads like Linux, and which has quite light-weight kernel-side accounting for threads, it would be possible to give each goroutine a thread. You would still need an N:M pool for cgo.
As an added problem, 1:1 is slower for certain kinds of common channel operations. Anywhere you have an unbuffered channel and block waiting for a value, a 1:1 model requires a context switch to pass the value, whereas M:N means the userland scheduler can switch goroutines on the OS thread.
It is precisely these benchmarks that led Ian to implement M:N in gccgo. If there are combinations where it is 1:1, that is either a new decision he has made in the last year, or (more likely) OS/ARCH combinations that he hasn't moved to M:N yet.
I have seen a similar attempt at 1:1 lightweight tasks in C++ that ran into this. Without the ability to preemptively move tasks between OS threads, it ran into performance problems. Programs that needed the speed in that model had to switch to a futures-style of programming.
> A typical server under load has more outstanding requests to answer than it can OS threads.
First of all, I object to the use of "can" here. There was a time where this was true, but, as I said, that time has passed. There is no problem running millions of kernel threads today.
That being said, there are advantages of not doing that. Right now Go uses a network polling mechanism like epoll to implement network I/O without consuming threads. This is totally orthogonal to the 1:1 vs. N:M discussion. Go can use 1:1 scheduling while still performing asynchronous I/O behind the scenes, just as it does today.
The nature of the mapping between g's and m's does not influence the other {g, g, ...} -> {g0{m}} mapping used to implement network I/O.
> It is not uncommon in a Go server at Google to find a million or so goroutines.
No problem with that, they can continue to do so.
> If you want to have an OS thread that can also run C code, it needs a large enough stack to run C code. A million such stacks is not practical.
C code needs its own stack, but this is again orthogonal to scheduling. Right now there is a N:1 mapping {g, g, ...} -> {g0} which makes it easy to switch to g0 stack. This will have to change to a N:M mapping {g, g, ...} -> {g0, g0, ...} and Go code would have to acquire C stacks, just as it acquires any other resource that it needs to use.
This is not expensive, in fact, C calls are now expensive precisely because there is a deep interaction between calling C code and the scheduler (that needs thousands of cycles). All this cycles will Go away. In a 1:1 model all you'd have to do is acquire a C stack, which is very fast and happens in userspace for the uncontented case (the common case), probably less than a hundred cycles.
> You would still need an N:M pool for cgo.
As explained above, you need a N:M pool, but this does not need to be integrated with the scheduler any more, making it much simpler to implement (and more performant).
> Anywhere you have an unbuffered channel and block waiting for a value, a 1:1 model requires a context switch to pass the value, whereas M:N means the userland scheduler can switch goroutines on the OS thread.
Only if you naively implement channels as mutex-based queues.
There is still a runtime underneath that can switch stacks on different threads. I want to move general scheduling decisions out of the Go runtime and into the kernel, but pointwise, stateless, swap-of-control type of things can still happen synchronously in the Go runtime.
I do not believe the networking issue is orthogonal. I believe you will find that any communication between two OS threads is significantly slower than switching one thread between an active and a parked goroutine. Futexes are slow and spinlocks burn CPU. (This is exactly the case the C++ model I mentioned ran into, and why gccgo got an N:M model.)
But as I said I'm happy to be convinced. The easiest way to demonstrate it would be to move gccgo linux/amd64 back to 1:1 without hurting the channel benchmarks. You could use that as an argument that fanning out an epoll loop among threads can be made fast.
> First of all, I object to the use of "can" here. There was a time where this was true, but, as I said, that time has passed. There is no problem running millions of kernel threads today.
I object to the use of "can", just because we can doesn't mean we should.
Last I checked, creating a thread (on a Windows) took 16 ms. Creating 1 million threads would take more than 4 hours. That's the hell of a startup time! =D
(And don't bother saying Linux is faster. It's not relevant unless it's by 4 orders of magnitude ;) ).
I should add: if you believe these are surmountable problems, please take it out of the forum and write a proposal. Everyone who works on the Go runtime would love to see it radically simplified. We put up with the complexity only because we believe it necessary.
> I should add: if you believe these are surmountable problems, please take it out of the forum and write a proposal.
Yes, I've been wanting to write a proposal for over 5 years now, if only there was more time in the world...
Coincidentally, I also have a method for implementing Go stacks (growing and shrinking) that does not involve copying stacks and avoids the hot-split problem.
Late edit:
In fact, this method would work particularly well for "growing" the stack to call C code in the 1:1 model proposed above.
It would make calling a C function, just another function call with some marker on it (like we have NOFRAME and NOSPLIT now). And when we switch to a more ABI-compatible calling convention (which we want to do anyway, irrespective of all this), then we would not even be a need for a C translator written in assembly any more.
No, Haskell can abstract across the callbacks/blocking choice and let you write straight-line code that works with either. I mean, that makes sense as the two are fundamentally the same thing at super-low levels anyway. Haskell just supports enough abstraction to allow you to do either with the same interface to higher-level code.
Interesting, could you say more about your desire for 1:1 threading in Go? Do you feel that there are few users who need a huge number of goroutines, or that 1:1 is sufficiently lightweight for many of those users?
I routinely write programs that require a huge number of goroutines. Many programs in Go require a huge number of goroutines. Good thing modern kernels deal with large numbers of threads just fine. People misjudge how cheap kernel threads really are, because there was a time where they were expensive. That time was long ago. Today, threads are very light weight. For example, the solaris kernel schedules interrupts as threads, they are really that light weight.
There is a reason POSIX threading is implemented in the kernel on every operating system, even if they all started as N:M or 1:M implementations.
Related to this discussion, this talk on user-scheduled OS threads is really interesting. It's a shame no proposed implementation or experimental patches have appeared https://www.youtube.com/watch?v=KXuZi9aeGTw
I think the bigger question is why they're using a C++ Thrift server. Haskell is good at transforming data - they should be able to use an OS socket and do the message decoding/encoding in pure Haskell.
the bigger question is why they're using a C++ Thrift server
Because it's already there and it already works just fine.
It's so frustrating to see this sort of comment on nearly every post about a language outside the mainstream. It's a religious argument and not a productive one.
> Because it's already there and it already works just fine.
Well clearly it didn't work just fine if its threading model was interacting poorly with that of their Haskell code. If there's a mismatch between those two components then that will continue to bite them in the future, and it's legitimate to ask whether moving the boundary would be a better way to solve the problem.
That was the last tried solution, didn't work out well. And what is holding is nothing haskell specific. Is just wasting effort re implementing good broad used code at Facebook.
I think the "why not write it all in Haskell" argument is a bit spurious. Only the engineers on the team knew best the advantages and disadvantages of the various approaches. Presumably there were very good reasons to integrate it the way that they did, whether driven by architecture, performance, knowledge, library, smallest-effective-change, or other reasons. It's clearly worked out very well in the most basic sense (it does the job). And on top of that, we now have bug fixes and other improvements submitted to ghc. Seems like a win-win.
Surely a bigger question is how a site with a basis in PHP transformed to making PHP compilers, .js frameworks, Haskell, C++ etc.
They seemed to have succeeded in creating a culture of coders/hackers that were opinionated/empowered sufficiently to use their tool of choice, and to persuade others of the advantages.
This doesn't really happen in large-cap companies (though does happen to empowered teams, but even then, only somewhat - thinking finance). And speaks to advantages.
This was covered in a previous post:
"Haskell is sandwiched between two layers of C++ in Sigma. At the top, we use the C++ thrift server. In principle, Haskell can act as a thrift server, but the C++ thrift server is more mature and performant. It also supports more features. Furthermore, it can work seamlessly with the Haskell layers below because we can call into Haskell from C++. For these reasons, it made sense to use C++ for the server layer."
https://code.facebook.com/posts/745068642270222/fighting-spa...
Wow, I didn't know this. Anyone knows for which programs they use Haskell? The article doesn't say anything.