Hacker News new | past | comments | ask | show | jobs | submit login
BrainFuck inspired scheduler successfully replaced the Python GIL (python.org)
103 points by janitha on March 26, 2010 | hide | past | favorite | 34 comments



To clarify a little, this patch does not eliminate the GIL it just schedules the next thread to acquire the GIL using the BFS scheduling algorithm.

Also and perhaps more importantly this has not been incorporated into any version of python. It is just a patch on the bug tracker and realistically I doubt it has much chance of being accepted.


If he's fixed the bugs and gotten it to be portable (at least to POSIX), then why shouldn't it get adopted? Just look at those benchmark results! 259ms per loop instead of the next best of 1.25s. If a real app gets improvements from this, it should be a no-brainer.


I'd wait to see how this works with a real workload as opposed to his contrived example. It looks like it might be good, but it definitely needs to be better tested.


Glancing at the patch, it looks like it should work on both POSIX and Win32.


At least the patch as first proposed suffered from all-the-world-is-Linux (see the discussion of CLOCK_THREAD_CPUTIME_ID)...


You have to start with one platform. You then make it work on the others.

If I write something that initially only runs on IRIX and then make it work on other OSs, it would be unfair to attribute it an all-the-world-is-IRIX. It just happens the guy had a Linux box to refine his idea to the point of working, unfortunately using stuff other OSs don't offer in the same way.


It was based off a piece of the Linux kernel... I'd say it isn't so much an all-the-world-is-Linux so much as Linux is what is available.


Assuming it fixes your app, instead of using "python" to run it, you just start using his version. The joys of open source.


The problem then is that you now have to maintain your own fork of Python.

Again, the joys of open-source ;-)


This is easier than it sounds.


Well, it would be easier than it sounds but a bit of Googling reveals that while Python plans on moving to Mecurial, it's still in SVN.

Still, git-svn can solve that problem pretty well.


If it's easy, chances are many people will participate and the fork will take on a life of its own. If not, you will have to merge back the patch from time to time.

It's not a terrible thing, if the parts the patch changes don't get changed often. If they do, it could quickly turn into a nightmare


I program in Python (Django) full-time. It's my job. I'm the only developer at my company who knows C.

I'm sorry but mere mortals who just want to get stuff done are not going to maintain their own fork of an entire programming language. I don't know what alternative reality you've been spending your time in (academia? FSF employee?) but it's simply not going to occur unless someone else takes up the burden and makes regular public releases.

Notice that usage of CK patches by Linux users evaporated as soon as he went on software sabbatical.

Open source is not driven by crowd sourcing or collective effort, it's driven by a small number of heroes willing to devote time in order to benefit everyone else.


Open source is not driven by crowd sourcing or collective effort, it's driven by a small number of heroes willing to devote time in order to benefit everyone else.

I agree completely.

It is worth noting that this is true of mast crowd sourced projects though. A small cadre of devotees put in heroic effort and then a larger group contributes now and then, and everyone else either contributes nothing or some money.


Wow I was excited and confused for a second, I thought the impossible had been done! That's fine that it hasn't I am happy with Twisted Spread for multi-process RPC needs.


The title is wrong. Brain Fuck scheduler is not related to brainfuck language.


The scheduler is a simplified implementation of the recent kernel Brain Fk Scheduler by the Linux hacker Con Kolivas

Not as fun now is it? Kolivas is a leader on scheduling, he can attribute his hacks to whatever joke language out there and that wont make them any less stellar.


I had totally missed Con's return from self-imposed exile. Very exciting. I loved his previous scheduler work. Glad to see him back at it.


As far as I understand, it just changes the way thread scheduling works, but doesn't make Python "properly multithreaded". That means it's still only one active non-native-extension thread running at any time. Could someone confirm it?

Edit: I guess janzer confirmed this at posting at the same time.


No, Python doesn't work like that even with the traditional GIL. The problem is that even when you have multiple OS threads, they all end up competing for the same lock which kills throughput (but not as badly as the fully-serialized scenario you described). By using a scheduler the locking order can at least be controlled a bit more to improve throughput. [As far as I know; been a few years since I dug through CPython.]


What do you mean by "not as badly as the fully-serialized scenario"? I thought that Python threads are fully serialized, apart from extensions code which can spawn their own threads and release the GIL during operations that don't affect the python memory (IO mainly). Interpreter still switches Python threads using GIL, but the Python code itself never runs in parallel.

Are we talking about the same thing, or is there some other non-serialized scenario?


I'm pretty sure that there's a good chunk of stuff you can do in Python without acquiring the GIL -- the problem is that in practice you end up doing a lot of I/O and stuff that requires at least momentarily acquiring the GIL, leading to contention. So if you stuck to the operations that didn't require locking any GIL-protected data, you could run at full throughput. It's at least not the case that the GIL is held all the time while running a Python thread -- the problem is instead that your threads end up having to acquire it often.


Actually, the GIL is needed to execute Python code (well, access Python objects). It is released by I/O- or computation-heavy C code, so e.g. SciPy or reading files allows some level of parallelism, but pure-Python code will be serial.


Pure Python code can do a lot. Large parts of the stdlib are extension modules, and can release the GIL (e.g. one of the tests on the bug tracker used time.sleep()). In practice, if your code is IO-bound, it's doing work in an extension module, and if it's CPU-bound, it should be doing its work in an extension module. So there's actually not a huge problem.


I stand corrected. And frightened. fork(), here I come!


fork() isn't that great for a lot of situations. If you are thinking of taking advantage of your operating system's copy-on-write paging by loading a large chunk of data to be used read-only, forking a bunch of processes, processing the data each processes, and finally, 'reducing' the results of all of the forks into some sort of output, don't bother.

What happens is when you read an object in one process, python increments the reference count, thus touching the memory page, thus copying it, thus screwing you.

(however, compacting garbage collection turns out to have more or less the same problem)


This is related to Python 3.2 only. In other words, this is not noteworthy.


Python is python. I wouldn't be surprised to see it ported to python 2.7 if it does work. At the moment I doubt it's production ready - lots of testing and validation before it goes live. I thing Guido explicitly said that removing the GIL is the sort of thing he would like to see in Python 2.X.


Actually, though I'm using 2.X in everything I'm doing, I'd rather see it only appear in 3.X. There has to be something that drives people to port stuff to 3.X or it's not going to happen. Dramatic speed improvements such as what this potentially provides would be extremely helpful in that regard. The other hope right now, is of course unladen swallow, which hasn't proved to be very significant yet, as far as I'm concerned.


3.X can be pretty awesome, but as long as projects want to maintain compatibility with 2.5 or earlier, it's going to be difficult to get some serious porting momentum going. Once 2.6+ becomes a practical development target, 3.x will be a much easier sell.

At least, that's my perspective after watching PHP 5 slowly catch on amongst PHPers, even though it had many more improvements (e.g. objects are no longer value types), and far fewer compatibility breaks.


People are lazy. I still hear people wanting Perl 5.8 compatibility for my modules, even though 5.10 is 2 years old and has 100% backwards and forwards compatibility with Perl 5.8. In other words, all your existing code will run unmodified, and any 5.10-specific features you use will cause 5.8 to die at compile time.

People confuse me.


I use a lot of Perl 5.8 at the LoC because it's the only dynamic language that comes installed by default on Solaris 10. That, and because it is the primary language of a proprietary product that we have to use. I'd love to use Perl 5.10, but then I'd have to install it on all of the machines on which my code is expected to run. If I had that kind of control, I'd skip Perl and go straight to Python or Ruby. (Actually, that was a lie, I'd use Common Lisp if I could.) As it is, I've standardized on Perl 5.8.


They can install your product, but not if you bundle Perl/Python/Ruby in that directory?


The product I am talking about is Signiant:http://www.signiant.com/. It's a file based workflow application that basically that is written in Perl in the same sense that emacs is written in emacs lisp. The idea is for people to write workflows in the embedded Perl environment which is the same across of the machines on which Signiant is installed. I could use another interpretor, but that would require extra work and I wouldn't be able to use a lot of the Signiant specific code.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: