Hacker News new | past | comments | ask | show | jobs | submit login
Developing a computational pipeline using the asyncio module in Python 3 (pythonsandbarracudas.com)
57 points by jaxondu on Dec 1, 2015 | hide | past | favorite | 12 comments



Using a threadpool with asyncio doesn't buy you any parallelism, you have to use a processpool, which is fine for a data processing pipeline. Asyncio does not let you bypass the gil but it does vastly reduce the cost of spinning up absurd amounts of sockets and the like.


I saw this article, enjoyed it, and then said to myself "Now let me go read the inevitable well-meaning but misguided comments about the GIL." And sure enough, the very top comment is a well-meaning but misguided comment about the GIL.

The author of this article is talking about CPU-intensive tasks that involve C-extensions. Such extensions typically release the GIL during their execution, which means that you can effectively parallelize such workloads in a ThreadPool. (He explicitly mentions this about Numpy/Scipy, but most other CPU-intensive extension libraries behave the same way.) Heavy users of Numpy use Thread-level parallelism very frequently.


Yep. Thought I was missing something since he was taking about cpu bound tasks and threads, if you are using c extensions then it is wasteful to spin up additional interpreters and you can also get fun bugs like numpy reusing the same random seed in child processes!


You can use a ProcessPool executor and have nice async/await interface to it.


It would be nice if someone who groks the async "with" statement could add proper opcode support for it in pycdc. Better to do it when you have spare time than when you've clobbered an important .py file and want to recover it from the .pyc file ASAP. (pycdc can be a life saver.)

https://github.com/zrax/pycdc/issues/70


Isn't this example just like map/reduce in python?

I guess it is more like parallel running than async, since I associate async with small sleeping costs and good thread switching. How does the GIL play into this?


In asyncio and in most other asynchronous frameworks for Python, threads aren't usually involved.

Your Python interpreter is doing one thing at a time. However, a function can get to a statement that needs to wait for something outside of the interpreter -- usually I/O, but it could also be a timer.

What happens at that point isn't thread-switching. The function gets suspended and control goes elsewhere in the program, just like if you yielded from a generator. (It's the same mechanism.) The function can be woken up again by feeding it the data it was waiting for -- the asyncio main loop is responsible for this.

If you actually want your processor to do more things at the same time, though, asyncio's model of asynchronous computing isn't going to do it. Python programmers are afraid of threads (I mean, they have a lot of disadvantages and not much benefit in Python, due to the GIL), so they'll tend to use multiple processes for that, instead of threads.

EDIT: But at this point I realize you're asking about a much more specific thing in the article. This article is asking you to not be afraid of combining threads and asyncio. It's suggesting that a useful asynchronous thing you can do is to spawn a thread, perform a computation in it, and wait for the result.

At this point you do need to worry about the GIL. C extensions can release the GIL, so the article suggests you use one of those (NumPy).


Or you can use Numba to write gil free imperative code as well.


Looks like everybody is in love with async/await in Python :)


I'm not sure that's true. Some people weren't happy about the PEP. Few people are using it in any case, so most are either still unaware of it or haven't used it yet themselves.


I'm very excited to start using it. It will be the thing that finally pushes me to Python 3.

>Few people are using it in any case

Probably true, but it's only been out for 2 months now, and most people who needed this feature are either using an ugly workaround (guilty) or have moved on to other languages that already had this feature. This time, I'm glad Python joined the crowd.


Yeah. I've played around with asyncio and aiohttp in python 3.5 for a crawler and loved it. But then I ran into dependency hell down the line manipulating the data, so that was that (for now).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: