I've spent a lot of time writing and debugging multiprocessing code, so a few thoughts, besides the general idea that this looks good and I'm excited to try it:
- automatic restarting of workers after N task is very nice, I have had to hack that into places before because of (unresolveable) memory leaks in application code
- is there a way to attach a debugger to one of the workers? That would be really useful, though I appreciate the automatic reporting of the failing args (also hack that in all the time)
- often, the reason a whole set of jobs is not making any progress is because of thundering herd on reading files (god forbid over NFS). It would be lovely to detect that using lsof or something similar
- it would also be extremely convenient to have an option that handles a Python MemoryError and scales down the parallelism in that case; this is quite difficult but would help a lot since I often have to run a "test job" to see how much parallelism I can actually use
- I didn't see the library use threadpoolctl anywhere; would it be possible to make that part of the interface so we can limit thread parallelism from OpenMP/BLAS/MKL when multiprocessing? This also often causes core thrashing
Sorry for all the asks, and feel free to push back to keep the interface clean. I will give the library a try regardless.
Why does everyone compare against `multiprocessing` when `concurrent.futures` (https://docs.python.org/3/library/concurrent.futures.html) has been a part of the standard library for 11 years. It's a much improved API and the are _almost_ no reasons to use `multiprocessing` any more.
Someone downvoted you, I upvoted because I think you have a good point but it would be nice to back it up. I think I agree with you, but I have only used concurrent.futures with threads.
I'll give some more detail. concurrent.futures is designed to be a new consistent API wrapper around the functionality in the multiprocessing and threading libraries. One example of an improvement is the API for the map function. In multiprocessing, it only accepts a single argument for the function you're calling so you have to either do partial application or use starmap. In concurrent.futures, the map function will pass through any number of arguments.
The API was designed to be a standard that could be used by other libraries. Before if you started with thread and then realised you were GIL-limited then switching from the threading module to the multiprocessing module was a complete change. With concurrent.futures, the only thing that needs change is:
with ThreadPoolExecutor() as executor:
executor.map(...)
to
with ProcessPoolExecutor() as executor:
executor.map(...)
The API has been adopted by other third-party modules too, so you can do Dask distributed computing with:
with distributed.Client().get_executor() as executor:
executor.map(...)
or MPI with
with MPIPoolExecutor() as executor:
executor.map(...)
> Before if you started with thread and then realised you were GIL-limited then switching from the threading module to the multiprocessing module was a complete change
Is this true?
I've been switching back and forth between multiprocessing.Pool and multiprocessing.dummy.Pool for a very long time. Super easy, barely an inconvenience.
I can think of a lot of reasons to use multiprocessing. I do it quite often. You can't always architect things to fit inside of a `with` context manager. Sometimes you need fine grain control over when the process starts, stops, how you handle various signals etc.
I think there is a time and a place for everything. I use concurrent.futures in certain situations when I need to utilize threads/procs to do work in a very rudimentary and naive way. But in more sophisticated systems you want to control startup/shutdown of a process.
TBH, assuming your stack allows it, gevent is my preferred mechanism for concurrency in Python. Followed by asyncio.
For places where I really need to get my hands dirty I will lean on manually controlling processes/threads.
i was initially using concurrent.futures for a lot of things and just assumed some of my code wasn't very multiprocessable when i saw it wasn't utilizing all my cores, but it was a speedup nonetheless, then i when i experimented with multiprocessing it gave me considerable speedups with full core usage. i was more happy than frustrated and switched everything i could and got benefit everywhere.
i usually test both when i write code nowadays and concurrent.futures is useful in maybe 10% of my cases.
The particular pain point of multiprocessing in python for me has been the limitations of the serializer. To that end, multiprocess, the replacement by the dill team, has been useful as a drop in replacement, but I'm still looking for better alternatives. This seems to support dill as an optional serializer so I'll take a look!
I agree. The main problems aren't syntax, they are architectural: Catching and retrying individual failures in a pool.map, anticipating OOM with heavy tasks, understanding process lifecycle and the underlying pickle/ipc.
All these are much more reliably solved with horizontal scaling.
[edit] by the way, a very useful minimal sugar on top of multiprocessing for one-off tasks is tqdm's process_map, which automatically shows a progress bar https://tqdm.github.io/docs/contrib.concurrent/
Suppose we want to know the status of the current task: how many tasks are
completed, how long before the work is ready? It's as simple as setting the
progress_bar parameter to True:
with WorkerPool(n_jobs=5) as pool:
results = pool.map(time_consuming_function,
range(10), progress_bar=True)
And it will output a nicely formatted tqdm progress bar.
To be honest, I don't think both are similar at all.
Parallelizing across machines involves networks, and well, that's why we have jepsen, and byzantine failures, and eventual consistency, and net splits, and leadership election, and discovery - so in short a stack of hard problems that in and of itself is usually much larger than what you're trying to solve with multiprocessing.
True, the networking causes trouble. I usually rely on a communication layer that addresses those troubles. A good message queue makes the two paradigms quite similar. Or something like Dask (https://www.dask.org/). Having your single-machine development environment able to reproduce nearly all the bugs that arise in production is a wonderful thing.
Depends on the workflow. For one off jobs or client tooling, parallelism makes sense to have rapid user feedback.
For batch pipelines on that work many requests, having a serial workflow has a lot of the advantages you mention. Serial execution makes the load more predictable and makes scaling easier to rationalize.
Except I'm a bit concerned that it might have too many features. E.g. rendering of progress bars and such. This should really be in a separate package and not referenced from this package.
The multiprocessing module might not be great, but at least the maintainers have always been careful about feature creep.
There has been lots of movement towards running multiple copies of the interpreter in the same process space, over the last several releases. I’m sure it’ll come at some point.
i see that all the benchmarks have processpoolexecutor either equal to or outperforming multiprocessing and i do not find this to be the case for about 90% of my cases.
also a niche question, is this able to overcome the inability to pickle a function within another function to multiprocess it?
i'm still excited to try this as i haven't heard of it and good multiprocessing is hard to come by.
> In short, the main reasons why MPIRE is faster are:
When fork is available we can make use of copy-on-write shared objects, which reduces the need to copy objects that need to be shared over child processes
Workers can hold state over multiple tasks. Therefore you can choose to load a big file or send resources over only once per worker
Automatic task chunking
Yea, I am a struggling to figure out what the secret sauce of this library and if that sauce is introducing foot guns down the line.
Multiprocessing std uses fork in linux distros already. I once ran a multiprocess code on Linux and Windows and there was a significant improvement in performance when running Linux.
- automatic restarting of workers after N task is very nice, I have had to hack that into places before because of (unresolveable) memory leaks in application code
- is there a way to attach a debugger to one of the workers? That would be really useful, though I appreciate the automatic reporting of the failing args (also hack that in all the time)
- often, the reason a whole set of jobs is not making any progress is because of thundering herd on reading files (god forbid over NFS). It would be lovely to detect that using lsof or something similar
- it would also be extremely convenient to have an option that handles a Python MemoryError and scales down the parallelism in that case; this is quite difficult but would help a lot since I often have to run a "test job" to see how much parallelism I can actually use
- I didn't see the library use threadpoolctl anywhere; would it be possible to make that part of the interface so we can limit thread parallelism from OpenMP/BLAS/MKL when multiprocessing? This also often causes core thrashing
Sorry for all the asks, and feel free to push back to keep the interface clean. I will give the library a try regardless.