Hacker News new | past | comments | ask | show | jobs | submit login
Mpire: A Python package for easier and faster multiprocessing (github.com/sybrenjansen)
162 points by lnyan on Aug 11, 2023 | hide | past | favorite | 46 comments



I've spent a lot of time writing and debugging multiprocessing code, so a few thoughts, besides the general idea that this looks good and I'm excited to try it:

- automatic restarting of workers after N task is very nice, I have had to hack that into places before because of (unresolveable) memory leaks in application code

- is there a way to attach a debugger to one of the workers? That would be really useful, though I appreciate the automatic reporting of the failing args (also hack that in all the time)

- often, the reason a whole set of jobs is not making any progress is because of thundering herd on reading files (god forbid over NFS). It would be lovely to detect that using lsof or something similar

- it would also be extremely convenient to have an option that handles a Python MemoryError and scales down the parallelism in that case; this is quite difficult but would help a lot since I often have to run a "test job" to see how much parallelism I can actually use

- I didn't see the library use threadpoolctl anywhere; would it be possible to make that part of the interface so we can limit thread parallelism from OpenMP/BLAS/MKL when multiprocessing? This also often causes core thrashing

Sorry for all the asks, and feel free to push back to keep the interface clean. I will give the library a try regardless.


Why does everyone compare against `multiprocessing` when `concurrent.futures` (https://docs.python.org/3/library/concurrent.futures.html) has been a part of the standard library for 11 years. It's a much improved API and the are _almost_ no reasons to use `multiprocessing` any more.


Someone downvoted you, I upvoted because I think you have a good point but it would be nice to back it up. I think I agree with you, but I have only used concurrent.futures with threads.


I'll give some more detail. concurrent.futures is designed to be a new consistent API wrapper around the functionality in the multiprocessing and threading libraries. One example of an improvement is the API for the map function. In multiprocessing, it only accepts a single argument for the function you're calling so you have to either do partial application or use starmap. In concurrent.futures, the map function will pass through any number of arguments.

The API was designed to be a standard that could be used by other libraries. Before if you started with thread and then realised you were GIL-limited then switching from the threading module to the multiprocessing module was a complete change. With concurrent.futures, the only thing that needs change is:

  with ThreadPoolExecutor() as executor:
     executor.map(...)
to

  with ProcessPoolExecutor() as executor:
     executor.map(...)
The API has been adopted by other third-party modules too, so you can do Dask distributed computing with:

  with distributed.Client().get_executor() as executor:
     executor.map(...)
or MPI with

  with MPIPoolExecutor() as executor:
     executor.map(...)
and nothing else need change.

This is why I chose to use it to teach my Parallel Python course (https://milliams.com/courses/parallel_python/).


> Before if you started with thread and then realised you were GIL-limited then switching from the threading module to the multiprocessing module was a complete change

Is this true?

I've been switching back and forth between multiprocessing.Pool and multiprocessing.dummy.Pool for a very long time. Super easy, barely an inconvenience.


I can think of a lot of reasons to use multiprocessing. I do it quite often. You can't always architect things to fit inside of a `with` context manager. Sometimes you need fine grain control over when the process starts, stops, how you handle various signals etc.


The with context manager doesn't seem mandatory. It seems mostly like a convenience to call executor.shutdown() implicitly when the block is done.


I think there is a time and a place for everything. I use concurrent.futures in certain situations when I need to utilize threads/procs to do work in a very rudimentary and naive way. But in more sophisticated systems you want to control startup/shutdown of a process.

TBH, assuming your stack allows it, gevent is my preferred mechanism for concurrency in Python. Followed by asyncio.

For places where I really need to get my hands dirty I will lean on manually controlling processes/threads.


Agreed, concurrent.futures is a great stdlib module.

Too bad it has a somewhat odd name, which doesn't help newbies guess what it really does.

But in almost all cases, it can replace multiprocessing.


i was initially using concurrent.futures for a lot of things and just assumed some of my code wasn't very multiprocessable when i saw it wasn't utilizing all my cores, but it was a speedup nonetheless, then i when i experimented with multiprocessing it gave me considerable speedups with full core usage. i was more happy than frustrated and switched everything i could and got benefit everywhere.

i usually test both when i write code nowadays and concurrent.futures is useful in maybe 10% of my cases.


It's somewhat like metaclasses. As the joke goes: If you have to ask, then you don't need metaclasses.


The particular pain point of multiprocessing in python for me has been the limitations of the serializer. To that end, multiprocess, the replacement by the dill team, has been useful as a drop in replacement, but I'm still looking for better alternatives. This seems to support dill as an optional serializer so I'll take a look!


Multiprocessing is great as a first pass parallelization but I've found that debugging it to be very hard, especially for junior employees.

It seems much easier to follow when you can push everything to horizontally scaled single processes for languages like Python.


I agree. The main problems aren't syntax, they are architectural: Catching and retrying individual failures in a pool.map, anticipating OOM with heavy tasks, understanding process lifecycle and the underlying pickle/ipc.

All these are much more reliably solved with horizontal scaling.

[edit] by the way, a very useful minimal sugar on top of multiprocessing for one-off tasks is tqdm's process_map, which automatically shows a progress bar https://tqdm.github.io/docs/contrib.concurrent/


From the linked Mpire readme:

    Suppose we want to know the status of the current task: how many tasks are
    completed, how long before the work is ready? It's as simple as setting the 
    progress_bar parameter to True:

        with WorkerPool(n_jobs=5) as pool:
            results = pool.map(time_consuming_function, 
                      range(10), progress_bar=True)

    And it will output a nicely formatted tqdm progress bar.


How is coordinating between different machines any different than coordinating between different processes?

A multiprocessing implementation is a good prototype for a distributed implementation.


To be honest, I don't think both are similar at all.

Parallelizing across machines involves networks, and well, that's why we have jepsen, and byzantine failures, and eventual consistency, and net splits, and leadership election, and discovery - so in short a stack of hard problems that in and of itself is usually much larger than what you're trying to solve with multiprocessing.


True, the networking causes trouble. I usually rely on a communication layer that addresses those troubles. A good message queue makes the two paradigms quite similar. Or something like Dask (https://www.dask.org/). Having your single-machine development environment able to reproduce nearly all the bugs that arise in production is a wonderful thing.


Parsl has quite good debugging facilities built in, which include automatic logging and visualizations.

https://parsl.readthedocs.io/en/stable/faq.html

https://parsl.readthedocs.io/en/stable/userguide/monitoring....


Depends on the workflow. For one off jobs or client tooling, parallelism makes sense to have rapid user feedback.

For batch pipelines on that work many requests, having a serial workflow has a lot of the advantages you mention. Serial execution makes the load more predictable and makes scaling easier to rationalize.


Or just use numpy's arrays, which have their integrated multiprocessing.


Another good library for concurrency and parallel tasks is futureproof:

https://github.com/yeraydiazdiaz/futureproof

> concurrent.futures is amazing, but it's got some sharp edges that have bit me many times in the past.

> Futureproof is a thin wrapper around it addressing some of these problems and adding some usability features.


I often use lox for this sort of thing. It can use threads or processes, and has a very ergonomic api.

https://github.com/BrianPugh/lox


Thanks for sharing, this really looks promising for what I am looking for.


Some potential issues about Python multiprocessing https://blog.mapotofu.org/blogs/python-multiprocessing/. COW is quite tricky. BTW, most of the related official Python docs doesn’t mention the usage under ‘spawn’.


I've written a very tiny multiprocessing pipeline in Python. It's documented.

I've actually never made use of it but at the time I got a bit obsessed and wanted to write it. It does seem to work as expected.

Is highly hackable as it is only a single file and a couple of classes.

Maybe is useful to someone, here's the link: https://github.com/lliendo/SimplePipeline


Very cool.

Except I'm a bit concerned that it might have too many features. E.g. rendering of progress bars and such. This should really be in a separate package and not referenced from this package.

The multiprocessing module might not be great, but at least the maintainers have always been careful about feature creep.


How is this different from ray.io?


Ray is parallelism across machines, this is only across cores.


Ray is cross core and cross machine.


Is ray faster across cores than stdlib multiprocessing is?


Probably not always, but in general, yes.


Why has Python never added something like We workers/isolates? That seems like the obvious thing to do but they only have multiprocess hacks.


It sort of has, but it's a work in progress.

https://lwn.net/Articles/820424/


There has been lots of movement towards running multiple copies of the interpreter in the same process space, over the last several releases. I’m sure it’ll come at some point.


i see that all the benchmarks have processpoolexecutor either equal to or outperforming multiprocessing and i do not find this to be the case for about 90% of my cases.

also a niche question, is this able to overcome the inability to pickle a function within another function to multiprocess it?

i'm still excited to try this as i haven't heard of it and good multiprocessing is hard to come by.


Why is this faster than the stdlib? What does it do to achieve better performance?


It's in the readme of the github project.

> In short, the main reasons why MPIRE is faster are:

    When fork is available we can make use of copy-on-write shared objects, which reduces the need to copy objects that need to be shared over child processes

    Workers can hold state over multiple tasks. Therefore you can choose to load a big file or send resources over only once per worker

    Automatic task chunking


COW can come back and bite you by causing not easily predictable runtime.

Your code goes down a rarely used branch and suddenly a large object gets copied.


Isn’t this given “for free” by the fact that it’s fork, even in standard multiprocessing? What does the library do extra?


It doesn't do much extra I guess.

In standard multiprocessing, all arguments are pickled and pushed to a queue for processes in the pool to use.

To pass heavy arguments, the trick to using CoW was to place them as global variables before the map.

My understanding from Mpire is that they do the same thing, but expose a `shared_objects` parameter to make it less hacky than global variables.

I guess their benchmarks compare against pickling arguments, not against using global variables/CoW, which is why they boast performance increase.


Yea, I am a struggling to figure out what the secret sauce of this library and if that sauce is introducing foot guns down the line.

Multiprocessing std uses fork in linux distros already. I once ran a multiprocess code on Linux and Windows and there was a significant improvement in performance when running Linux.


They're deprecating fork in 1 or 2 versions, one of the main issues with it is copies locks across processes which can cause deadlocks.


Would anyone be in a position to comment on how this compares to Dask?


Always dreamed of multiprocessing with tqdm, this is great


Ah darn, was hoping for some MPI Python interface.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: