Hacker News new | past | comments | ask | show | jobs | submit login

I agree. The main problems aren't syntax, they are architectural: Catching and retrying individual failures in a pool.map, anticipating OOM with heavy tasks, understanding process lifecycle and the underlying pickle/ipc.

All these are much more reliably solved with horizontal scaling.

[edit] by the way, a very useful minimal sugar on top of multiprocessing for one-off tasks is tqdm's process_map, which automatically shows a progress bar https://tqdm.github.io/docs/contrib.concurrent/




From the linked Mpire readme:

    Suppose we want to know the status of the current task: how many tasks are
    completed, how long before the work is ready? It's as simple as setting the 
    progress_bar parameter to True:

        with WorkerPool(n_jobs=5) as pool:
            results = pool.map(time_consuming_function, 
                      range(10), progress_bar=True)

    And it will output a nicely formatted tqdm progress bar.


How is coordinating between different machines any different than coordinating between different processes?

A multiprocessing implementation is a good prototype for a distributed implementation.


To be honest, I don't think both are similar at all.

Parallelizing across machines involves networks, and well, that's why we have jepsen, and byzantine failures, and eventual consistency, and net splits, and leadership election, and discovery - so in short a stack of hard problems that in and of itself is usually much larger than what you're trying to solve with multiprocessing.


True, the networking causes trouble. I usually rely on a communication layer that addresses those troubles. A good message queue makes the two paradigms quite similar. Or something like Dask (https://www.dask.org/). Having your single-machine development environment able to reproduce nearly all the bugs that arise in production is a wonderful thing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: