A popular approach for Python web serving is to launch a number of "workers" (eg via gunicorn, etc), that hang around waiting to serve requests.
Each one of these workers in recently running code (here) idled using ~250MB of non-shared memory. With about 40 workers needed to handle some fairly basic load. :(
Rewrote the code in Go. No need for workers (just using goroutines), and the whole thing idles using about 20MB of memory, completely replacing all those Python workers. o_O
This doesn't seem to be all that unusual for Python either.
In a forking model that shouldn’t be the case, I guess all the workers are loading and initializing things post-fork that likely could have been accomplished pre-fork?
That said, Python devs are some of the worst engineers I encounter, so it’s not surprising things are being implemented incorrectly.
Last I heard, forking wasn’t a very effective memory-sharing technique on CPython because of the way it does reference counting: if you load things in before you fork, when the children start doing work they update the refcounts on all those pre-loaded objects and scribble all over that memory, forcing most of the pages to be copied anyway.
A popular approach for Python web serving is to launch a number of "workers" (eg via gunicorn, etc), that hang around waiting to serve requests.
Each one of these workers in recently running code (here) idled using ~250MB of non-shared memory. With about 40 workers needed to handle some fairly basic load. :(
Rewrote the code in Go. No need for workers (just using goroutines), and the whole thing idles using about 20MB of memory, completely replacing all those Python workers. o_O
This doesn't seem to be all that unusual for Python either.