Preforking echo server in Python, inspired by Unicorn-is-Unix

sant0sk1 · on Oct 7, 2009

RyanMcGreal must not have taken too much time reading the linked article, because this is NOT Unicorn ported to Python. It is a port of the very simple example server written in Ruby in Ryan Tomayko's article about Unicorn.

See: http://news.ycombinator.com/item?id=865306

At the end of Jacob's post he states that he'll be spending his evening reading Unicorn's source. Not porting it.

RyanMcGreal · on Oct 7, 2009

You're quite right. I've corrected the title to reflect this.

jrockway · on Oct 7, 2009

I'm mostly commenting on this article's linked article, but...

Prefork is great if you have about two or three concurrent connections, and if you aren't using a dynamic language like Perl, Python, or Ruby. Otherwise, lightweight threads will serve you better.

One problem that people have (had?) with persistent preforked mod_perl apps is that copy-on-write also copies on reads for certain Perl structures. Say you have "my $foo = 42" and then you fork. The Sv associated with $foo is shared. But when the child says "if($foo eq ...)", the Sv is no longer the same; it's changed from an SvIV to a SvPVIV. No more sharing. (Maybe Python and Ruby are more clever here, but they pay a speed price on non-forked applications for this.)

Eventually you end up with two completely different memory images for the same app instead of the 1 copy of the parent shared with the child.

As for the "about two or three concurrent connections" part, I think you will find that "select" exists for a reason. ("select" is just as much a part of UNIX as "fork", BTW.) If you have a bunch of mostly idle filehandles to read from, select (or rather, its modern replacements) will perform much better than a bunch of forked processes. Imagine you are writing a push messaging server; do you really want one process for each of your 1 million users? No; you want a lightweight thread for each.

(Try this sometime; create 1,000,000 lightweight threads. Then create 1,000,000 processes that do nothing. Then note which one requires you to reboot your computer by yanking the power cable.)

blasdel · on Oct 7, 2009

  > copy-on-write also copies on reads for certain Perl structures

This is perfectly okay! Your scalar variables are a pittance compared to your language's runtime and all the bloaty libraries you've imported. Besides, in most of these instances, your memory footprint would grow (until GC) in single-process usage just the same.

  > Maybe Python and Ruby are more clever here

Python is better about keeping things immutable. Matz's Ruby pisses all over itself, it's fundamentally incapable of sharing any objects copy-on-write: http://news.ycombinator.com/item?id=865364

  > do you really want one process for each of your 1 million users?
  > No; you want a lightweight thread for each.

The hell you don't! Both of those options are the same shit in different costumes. Just because lightweight threads make keeping a stateful execution context for each client tractable on real-world OSes doesn't make it a good idea.

jrockway · on Oct 7, 2009

This is perfectly okay! Your scalar variables are a pittance compared to your language's runtime and all the bloaty libraries you've imported. Besides, in most of these instances, your memory footprint would grow (until GC) in single-process usage just the same.

Theoretically, this might be true, but it doesn't seem to work this way in the real world.

Both of those options are the same shit in different costumes. Just because lightweight threads make keeping a stateful execution context for each client tractable on real-world OSes doesn't make it a good idea.

I'm not sure what improvement you are offering. Lightweight threads have the space overhead of one structure, and almost no time overhead. (epoll is O(n) over active handles; so in most cases, constant time!)

blasdel · on Oct 7, 2009

I think you massively underestimate the benefit of sharing libraries between processes -- It seems to be perfectly normal in the Rails world to have app processes start at several hundred MB, before a single request!

I'm not sure how you think threads work. Of course Lightweight Threads have time overhead -- you still have to schedule them!

jrockway · on Oct 7, 2009

I think you massively underestimate the benefit of sharing libraries

Nope, I understand that this is incredibly efficient. The difference between C libraries and Perl libraries, though, is that Perl library data structures modify themselves as they are used. (The optree is immutable, that's about all.) UNIX/C shared objects are immutable, so it's easy to share them efficiently.

It is common for people to complain of preforked mod_perl processes using all their memory -- not immediately after the fork, but after the app has been steady-state after a while. Things like memoized functions are obvious causes of this behavior, but even Perl's internal data structures change as they are used, and cause the sharing level to decrease over time.

preforking works OK, but it is not perfect. (If you want one preforked process per core, that is fine. If you want one per client, and the process is reused for many clients, that's where I have seen problems.)

Another thing I like about writing lightweight-thread-based apps (or simply async event-based; same thing basically) is convenience -- when I start up my development webserver, the same process can also handle mDNS requests. So I can just click a button in my browser to visit my app; no messing around with making sure that something else isn't binding port 3000, or whatever. Things like job queue runners, scheduled tasks (for cleaning up expired sessions), and so on can also run this way.

btilly · on Oct 7, 2009

While you've made a lot of good points, the time overhead is insignificant. By default modern versions of Linux use a O(log(N)) scheduler, so going from a few thousand to a few million threads of execution only doubles the scheduling time. Furthermore if you go to an older kernel you can get the O(1) scheduler, meaning that scheduling is constant time no matter how many threads you have. (But the scheduling had worse characteristics, which is why you're probably better off with the less scalable scheduler.)

blasdel · on Oct 7, 2009

Here's a nickel kid, go try out your ridiculous proposition. I'll wait while you pull the power plug and restart...

btilly · on Oct 7, 2009

You have apparently misunderstood something here. Just because you are launching something that can scale to a million threads doesn't mean that it needs to spawn a million threads every time it spins up.

But if it does spin up a million threads over time, the performance of your scheduler won't be your bottleneck. There will be lots of other big ones (memory, stack space, etc), but not scheduling overhead.

jrockway · on Oct 7, 2009

Try Coro; the overhead per coroutine is very low. I have some numbers in a HN comment from a few weeks ago... let me dig that up.

Edit: here we are: http://news.ycombinator.com/item?id=794535

(Thanks, SearchYC. I found this post with a signle query!)

btilly · on Oct 7, 2009

I'm replying to me because I don't have a reply button for jrockway's response.

When you use the word "threads", most people are going to assume you mean OS threads because that is what most of the world means. Hence this subdiscussion with bladsel about having a lot of OS threads.

jrockway · on Oct 7, 2009

I think he thinks you think the alternative to lightweight threads is a process-per-connection. (I think you think that too.)

BTW, you can reply to posts without the reply button by clicking "link" and typing in the resulting box.

Yeah.

btilly · on Oct 7, 2009

If you think that, then you think wrong. I am fully aware that there are plenty of alternatives to that model. I wouldn't choose to use them in Perl, but they exist.

Which is why what I said earlier in http://news.ycombinator.com/item?id=866558 (back when I thought that by "lightweight thread" you meant something like a thread in the JVM) boils down to, "Here is the standard way to do it in Perl. If you want to take this other type of approach using a similar class of scripting language, I would recommend one of these alternate implementations of Ruby or Python."

Thanks for the tip on getting around the lack of a reply button.

btilly · on Oct 7, 2009

Um, no. Your advice is completely backwards.

Perl does NOT have a lightweight threading model. Period. Literally every time you spawn a thread, Perl copies all of its data structures to avoid sharing stuff that is not threadsafe. If your Perl developers try to tell you otherwise, they are incompetent. The standard way to build a high performance website in Perl is to use prefork and then put a reverse proxy in front of them in httpd accelerator mode. See http://perl.apache.org/docs/1.0/guide/strategy.html for an old, but still accurate, description of httpd accelerator mode if you don't know what it is.

Please note that Windows does not have good support for forking. Therefore I have never heard of a competent team of Perl developers choosing to try to deploy a high volume website on Windows.

To the best of my knowledge both Ruby and Python do support lightweight threading, unlike Perl, but both have a global interpreter lock that synchronizes very frequently, and severely limits scalability. I don't know what strategies are common in those languages, but personally I suspect that the right strategy on Unix with those languages is, like Perl, to use pre-fork with a reverse proxy in httpd accelerator mode.

However both Python and Ruby have implementations on other platforms, such as the JVM. I believe (verify this before relying on it) that those implementations support lightweight threads without the drawbacks of the C implementation. Therefore if you want to use lightweight threading with that class of scripting language, use one of those implementations.

jrockway · on Oct 7, 2009

Perl's lightweight threading is provided by Coro.

btilly · on Oct 7, 2009

Coro is just a cooperative asynchronous framework on par with, say, POE. You do not get to use multiple CPUs. You do not get to use a database. You do not get to use a normal multi-threaded webserver.

I would be suspicious of anyone seriously trying to create a high volume Perl website using Coro.

jrockway · on Oct 7, 2009

You are right about multiple CPUs. For that, fork and load-balance. Instead of each forked process handling one connection, it handles a few thousand.

If your app is CPU intensive, I have to wonder why you'd use Perl for that. You get a 2x speedup by using multiple cores, but a 50x speedup by switching to Haskell or Common Lisp for the critical section. (You could also use C or C++ or Java, but that's just being crazy.)

As for databases, most real databases have non-blocking interfaces; this means database queries won't stall your threads. (Postgres and BDB are known to work well. MySQL requires hacks.)

And yes, you don't get to use a normal multi-threaded webserver. I am not sure how that works.

This all scales quite well.

btilly · on Oct 7, 2009

Yes, you can use some databases. But I don't think you can use the classic DBI interface.

More problematic, though, is that a single poorly coded function call on a seldom hit page can seriously impact responsiveness for a large fraction of your website. With real threads or processes you can protect against a function with a memory leak with resource limits. (You can set that up BSD::Resource with Perl on Unix systems.) If you try that with Coro you risk taking down a large fraction of your website each time a bad function runs.

If you have a small team and absolutely trust their work, this works. But as you scale up in complexity, mistakes will happen. You will have problems. And your choice of cooperative multitasking will look worse and worse.

Cooperative multitasking is not a new idea. In fact it is usually the first thing people try. Windows through 3.1 (continued under the hood in the 9x line). Apple did it through OS 9. Ruby did it through the 2.8 line. Yet in every case people run into the same problems over and over again and conclude that they are better off with preemptive multitasking. Even badly done preemptive multitasking such as Windows 95 or Ruby 2.9.

Again. I would be suspicious of anyone trying to create a high volume website these days in Perl relying heavily on cooperative multitasking. I'm not going to say that they won't succeed. But they are setting themselves up for problems down the road.

Furthermore it isn't as if there is a real problem that needs solving here. A single decent server properly set up can serve enough dynamic content from a single machine to put you in the top few thousand sites. Buy more machines and you can scale as far as you want.

jrockway · on Oct 7, 2009

We have preemptive scheduling in Coro. See:

http://github.com/nothingmuch/coro

Also available from CPAN is EV::Loop::Async, which allows events to be handled even when Perl is busy. (It uses a POSIX thread for this.)

(The key to success with Coro is using the right libraries. You write what looks like blocking code, but the libraries make it non-blocking.)

Anyway, the end result is that you use a lot less memory to handle a lot more clients. This may not be an issue if every request is CPU-bound, but you'd be suprised how often your process is blocking on IO, and how many resources a process-per-connection model consumes.

Don't let this become another lecture on functional programming: http://www.perlmonks.org/index.pl?node_id=34786

btilly · on Oct 7, 2009

That looks cool, but it looks like that patch is not in the CPAN version. I'm not sure how much I'd trust it. Particularly if you'd loaded some badly behaved XS code. Or run a disaster RE. For instance I ran into one last week which took down Perl 5.8. Losing one mod_perl process occasionally was only an annoyance. Losing a good fraction of my site capacity would be much worse.

EV::Loop::Async lets you handle events, but won't solve the problem of, "I loaded an external library, and it didn't return control for 10 seconds."

Neither addresses the problem of protecting yourself against badly behaved functions that have a fast memory leak.

BTW you're assuming wrong when you assume that I'd be surprised at how often my processes are blocked on IO or how much resources they take. I am painfully aware of both factors. However it is easy to plan for that. I've personally seen 2 servers pump out a million dynamic pages/hour with real traffic on a website with only obvious optimizations. I know for a fact that the application code had memory leaks, bugs, and the occasional segfault. I'm happy to buy 4x the RAM to go with an architecture that makes those non-issues to the overall function of the website.

jrockway · on Oct 7, 2009

Maintaining a hacked-up-piece-of-shit is a different problem from starting from scratch and Doing Things Right. In the situation that you're in, you probably made the right decision -- throw RAM at the problem so you never have to think about it again.

When writing an app from scratch, though, you have some control over the quality of the code, and can aim to serve more users with less hardware. System administration is hard, and the less systems to administer, the better.

btilly · on Oct 8, 2009

Assuming that your codebase will continue to be a work of elegance is challenging. Particularly if you're loading CPAN modules that are written and maintained by other people to a different standard. Of course if you reject those CPAN modules, then what's the point of writing Perl?

But, you say, we'll just limit ourselves to high quality CPAN modules? The real standard ones that everyone uses? Surely nothing will go wrong?

Fine. Last week I ran into a segfault coming from the Template Toolkit triggering a regular expression bug in Perl. (I am waiting on the bug report until I get official permission to submit the patch with the bug. I'm careful about copyright these days...) That's about as standard as you can get. Assume that an extremely popular pure Perl text manipulation module on top of Perl works as documented and enjoy the core dump.

The moral is that unless you are personally writing the whole software stack you're using, you never know what will trigger a bug somewhere. And no sane web company is going to rewrite their whole software stack. (For the record the most painful bugs in the application I described previously were at the C level, and none of that code was touched by anyone in that organization.) However there are architectures that let you mitigate classes of problems before they come up. What is that protection worth to you? Given how much traffic you can get per server, what lengths do you need to go to to optimize?

ktharavaad · on Oct 7, 2009

If you want to know more about forking processes vs threading. You should look at TCPServer.py in the standard libary. You can look at the ForkingMixin class and the ThreadingMixin class.

A good Python server source code to read is the CherryPY WSGI server. You can read it in the web.py git directory here.

http://github.com/webpy/webpy/blob/master/web/wsgiserver/__i...

pwmanagerdied · on Oct 7, 2009

Anyone who knows the standard libraries could do this in a few minutes, this post is clutter. :(