RethinkDB internals: making coroutines fast

pmjordan · on Dec 23, 2010

Great stuff, good catch on the syscall! I've been tinkering with C coroutines as well recently but since our critical code runs in kernel space they ended up being more hassle than they're worth. I'm still using them in our (user space) testing sandbox.

As you mention malloc()ing stacks: you might want to allocate them using mmap() and MAP_ANONYMOUS instead. You can map the adjacent memory pages with appropriate protection to prevent stack overflows and as a result, memory corruption. I'm not aware of any drawbacks (malloc itself typically uses mmap above a certain size) but it certainly beats hoping your stacks are big enough. Less of an issue on 64-bit environments of course.

littledanehren · on Dec 24, 2010

Thanks for the suggestion! I'm actually looking into doing something like that right now.

jhrobert · on Dec 23, 2010

So, coroutine are as fast as callbacks and easier to program with... so, why aren't available is all language?

I understand that it is difficult to change JavaScript, yet, when you create a whole new framework, say, nodejs, why not providing coroutines?

Does it need a language construct to be efficient? Maybe, then have a look the Icon programming language, where coroutines rule,

strlen · on Dec 23, 2010

Personally, I view node.JS as a step backwards: it's great to provide the idea of a single thread holding multiple connections, with high-performance non-blocking I/O within each thread; problem is, in their case, there's also a single thread per UNIX process with no primitives for communications between processes (unlike e.g., Erlang/OTP): if you'd want good performance, you want an event loop per logical core.; completely independent processes may be acceptable for simple, stateless web apps, but with most anything else you risk losing a great deal of efficiency without correct primitives. For example, if you have on-disk state, unless the threads are able to share memory (either by running within the same process or using UNIX shm) you're risking a situation where threads are competing for resources like OS page cache (provided you're not doing your own in-process caching and direct I/O, but even in this case, you've now lost the ability share a resource between any two connections without copying).

Some e.g., Asana have added fibers to Javascript (in their case V8): http://asana.com/blog/?p=49

The previous post from RethinkDB is quite interesting in terms of motivation for coroutines: to me, this superficially seems like SEDA. Here, instead of stages in the pipeline (thread pool per stage, first stage being processing events from epoll/kqueue, thread per core in each pool, each thread holding state machines, communication between threads via a queue) you are using coroutines for clearer code.

sjs · on Dec 24, 2010

I don't think you want an event loop per core, that's back into the realm of concurrent madness, and I don't think that's how Erlang works (Erlang folk: correct me if I'm wrong on that).

It's not a bad thing that data has to be copied to be shared between Erlang-style processes. That is part of the reason why things "just work" when you move that process to another machine, or data centre. If the implementation can be smart and "cheat" by sharing data in the same process that's fine, but it is an implementation detail and not a property of the system.

The way Erlang works is that you have a single event loop and threads are spawned to handle i/o and such (the n:m threading model, N green threads are mapped onto M OS threads). When a process blocks its execution is suspended.

Node employs an n:m threading model as well except "processes" in node are just functions. One big difference is that if you make a blocking call in node the whole process is blocked. There's only an event loop, no scheduler or anything that OS-like involved. The Erlang model is clearly superior imo, but Node is far more accessible (for better and worse).

strlen · on Dec 24, 2010

> I don't think you want an event loop per core, that's back into the realm of concurrent madness, and I don't think that's how Erlang works (Erlang folk: correct me if I'm wrong on that).

If you don't care about performance, sure (there's nothing wrong with not caring about performance e.g.,: low volume web applications, where server side Javascript can shine, IMO). However, I can give you a command line you can run which will plot a very nice graph of throughput vs. number of selector threads in a specific system that I've been working on.

Yes, it will bring you back to "concurrent madness". What you _want_ is having primitives that make it possible to deal with this madness, not handwave it away. In Erlang they're actors themselves, optimized for efficient delivery of message within a single node and remotely, supervision of processes that fail, ETS, mnesia; in Java they're Doug Lea's beautiful concurrent collections (java.util.concurrent). You're assuming multi threading pthreads or synchronized/notify/wait (Java before java.util.concurrent).

Note that there are two models for this: one is Erlang's as well as traditional UNIX IPC -- message passing; the other is shared memory multi-threading with concurrent and lock free collections (Java), which also goes nicely with the idea of minimizing mutable state (Haskell, Clojure, Scala). One is good for one kinds of applications which optimize for worst case latency (Erlang shines there), the other is good for another, which optimize for average throughput (JVM shines there).

ericflo · on Dec 23, 2010

I'm getting an "Error establishing a database connection" on that post. Kind of ironic.

I agree about your point regarding IPC, but that's something they can always add later. It's unfortunate that they've gone the callback route instead of innovating like Asana has done with coroutines.

strlen · on Dec 24, 2010

> I agree about your point regarding IPC, but that's something they can always add later.

Add some form of IPC as an afterthought? Sure. Create powerful primitives for IPC as done in Erlang/OTP? No.

ericflo · on Dec 24, 2010

Agreed again.

andrewvc · on Dec 24, 2010

Thought I'd chime in and mention em-synchrony, EventMachine(Reactor Loop) + Fibers(coroutines).

silentbicycle · on Dec 23, 2010

Languages that implement efficient coroutines (e.g. Scheme, Lua) usually manage the C stack themselves (like "Stackless" Python). First-class coroutines are very similar to first-class continuations, and many of the same fundamental issues manifest.

Lua has one of the best coroutine implementations I've seen. Scheme has continuations which are convertible to coroutines, while Lua has coroutines which are convertible to (one-shot) continuations, with community insight on their use.

seiji · on Dec 23, 2010

Silent explained it all very succinctly. If you get confused about coroutines versus continuations versus generators, check out everybody's favorite coroutine manual: http://www.inf.puc-rio.br/~roberto/docs/MCC15-04.pdf

sharkbot · on Dec 23, 2010

Lua's primary multitasking primitive is coroutines, and they work quite well. Preemptive threading support can be bolted onto the language, but the coroutine support is well thought-out and elegant, and doesn't suffer from the limitations of Python's generators (i.e., yielding from within nested function calls).

epall · on Dec 23, 2010

Monocle (http://saucelabs.github.com/monocle/) solves a lot of the issues with using Python generators as coroutines.

bkudria · on Dec 23, 2010

"Adding fibers to v8: efficiency + clarity in SSJS": http://asana.com/blog/?p=49

modeless · on Dec 24, 2010

I prefer Grand Central Dispatch, which fixes the callback model by providing closures and anonymous functions as an extension to the C language.

http://en.wikipedia.org/wiki/Grand_Central_Dispatch#Examples

finiteloop · on Dec 24, 2010

This looks really cool. I have been messing around with coroutines a lot lately as well, testing libtask (http://swtch.com/libtask/) as well as libcoroutine. Is the source for the changes to libcoroutine available? I didn't see a link from the article.

coffeemug · on Dec 24, 2010

We'd like to. The difficulty is that in order to get the performance improvements we had to integrate three different components: a few patches to libcoroutine, a patch to glibc, and pooling code that's tightly integrated with our asynchronous IO layer. A compelling patch would span three different licenses, involve patches to two libraries and re-engineering of the integrated piece to make it usable by the outside world. We'd like to do it, but for now this would be too much of a distraction.

warrenwilkinson · on Dec 24, 2010

I've blogged about co-routines before, and this post inspired me to write a little bit that contrasts RethinkDB with my own special-purpose DB that my startup uses.

http://formlis.wordpress.com/2010/12/24/co-routines/

mscarborough · on Dec 24, 2010

Sounds great, and I love the idea that RethinkDB is following.

I don't know a lot about them though, and having an extra benchmark against a similar SSD installation of MySQL or Postgres with the same select query that achieved 1.5 million QPS would be pretty interesting.