To clarify a little, this patch does not eliminate the GIL it just schedules the next thread to acquire the GIL using the BFS scheduling algorithm.
Also and perhaps more importantly this has not been incorporated into any version of python. It is just a patch on the bug tracker and realistically I doubt it has much chance of being accepted.
If he's fixed the bugs and gotten it to be portable (at least to POSIX), then why shouldn't it get adopted? Just look at those benchmark results! 259ms per loop instead of the next best of 1.25s. If a real app gets improvements from this, it should be a no-brainer.
I'd wait to see how this works with a real workload as opposed to his contrived example. It looks like it might be good, but it definitely needs to be better tested.
You have to start with one platform. You then make it work on the others.
If I write something that initially only runs on IRIX and then make it work on other OSs, it would be unfair to attribute it an all-the-world-is-IRIX. It just happens the guy had a Linux box to refine his idea to the point of working, unfortunately using stuff other OSs don't offer in the same way.
If it's easy, chances are many people will participate and the fork will take on a life of its own. If not, you will have to merge back the patch from time to time.
It's not a terrible thing, if the parts the patch changes don't get changed often. If they do, it could quickly turn into a nightmare
I program in Python (Django) full-time. It's my job. I'm the only developer at my company who knows C.
I'm sorry but mere mortals who just want to get stuff done are not going to maintain their own fork of an entire programming language. I don't know what alternative reality you've been spending your time in (academia? FSF employee?) but it's simply not going to occur unless someone else takes up the burden and makes regular public releases.
Notice that usage of CK patches by Linux users evaporated as soon as he went on software sabbatical.
Open source is not driven by crowd sourcing or collective effort, it's driven by a small number of heroes willing to devote time in order to benefit everyone else.
Open source is not driven by crowd sourcing or collective effort, it's driven by a small number of heroes willing to devote time in order to benefit everyone else.
I agree completely.
It is worth noting that this is true of mast crowd sourced projects though. A small cadre of devotees put in heroic effort and then a larger group contributes now and then, and everyone else either contributes nothing or some money.
Wow I was excited and confused for a second, I thought the impossible had been done! That's fine that it hasn't I am happy with Twisted Spread for multi-process RPC needs.
The scheduler is a simplified implementation of the recent kernel Brain Fk Scheduler by the Linux hacker Con Kolivas
Not as fun now is it? Kolivas is a leader on scheduling, he can attribute his hacks to whatever joke language out there and that wont make them any less stellar.
As far as I understand, it just changes the way thread scheduling works, but doesn't make Python "properly multithreaded". That means it's still only one active non-native-extension thread running at any time. Could someone confirm it?
Edit: I guess janzer confirmed this at posting at the same time.
No, Python doesn't work like that even with the traditional GIL. The problem is that even when you have multiple OS threads, they all end up competing for the same lock which kills throughput (but not as badly as the fully-serialized scenario you described). By using a scheduler the locking order can at least be controlled a bit more to improve throughput. [As far as I know; been a few years since I dug through CPython.]
What do you mean by "not as badly as the fully-serialized scenario"? I thought that Python threads are fully serialized, apart from extensions code which can spawn their own threads and release the GIL during operations that don't affect the python memory (IO mainly). Interpreter still switches Python threads using GIL, but the Python code itself never runs in parallel.
Are we talking about the same thing, or is there some other non-serialized scenario?
I'm pretty sure that there's a good chunk of stuff you can do in Python without acquiring the GIL -- the problem is that in practice you end up doing a lot of I/O and stuff that requires at least momentarily acquiring the GIL, leading to contention. So if you stuck to the operations that didn't require locking any GIL-protected data, you could run at full throughput. It's at least not the case that the GIL is held all the time while running a Python thread -- the problem is instead that your threads end up having to acquire it often.
Actually, the GIL is needed to execute Python code (well, access Python objects). It is released by I/O- or computation-heavy C code, so e.g. SciPy or reading files allows some level of parallelism, but pure-Python code will be serial.
Pure Python code can do a lot. Large parts of the stdlib are extension modules, and can release the GIL (e.g. one of the tests on the bug tracker used time.sleep()). In practice, if your code is IO-bound, it's doing work in an extension module, and if it's CPU-bound, it should be doing its work in an extension module. So there's actually not a huge problem.
fork() isn't that great for a lot of situations. If you are thinking of taking advantage of your operating system's copy-on-write paging by loading a large chunk of data to be used read-only, forking a bunch of processes, processing the data each processes, and finally, 'reducing' the results of all of the forks into some sort of output, don't bother.
What happens is when you read an object in one process, python increments the reference count, thus touching the memory page, thus copying it, thus screwing you.
(however, compacting garbage collection turns out to have more or less the same problem)
Python is python. I wouldn't be surprised to see it ported to python 2.7 if it does work. At the moment I doubt it's production ready - lots of testing and validation before it goes live. I thing Guido explicitly said that removing the GIL is the sort of thing he would like to see in Python 2.X.
Actually, though I'm using 2.X in everything I'm doing, I'd rather see it only appear in 3.X. There has to be something that drives people to port stuff to 3.X or it's not going to happen. Dramatic speed improvements such as what this potentially provides would be extremely helpful in that regard. The other hope right now, is of course unladen swallow, which hasn't proved to be very significant yet, as far as I'm concerned.
3.X can be pretty awesome, but as long as projects want to maintain compatibility with 2.5 or earlier, it's going to be difficult to get some serious porting momentum going. Once 2.6+ becomes a practical development target, 3.x will be a much easier sell.
At least, that's my perspective after watching PHP 5 slowly catch on amongst PHPers, even though it had many more improvements (e.g. objects are no longer value types), and far fewer compatibility breaks.
People are lazy. I still hear people wanting Perl 5.8 compatibility for my modules, even though 5.10 is 2 years old and has 100% backwards and forwards compatibility with Perl 5.8. In other words, all your existing code will run unmodified, and any 5.10-specific features you use will cause 5.8 to die at compile time.
I use a lot of Perl 5.8 at the LoC because it's the only dynamic language that comes installed by default on Solaris 10. That, and because it is the primary language of a proprietary product that we have to use. I'd love to use Perl 5.10, but then I'd have to install it on all of the machines on which my code is expected to run. If I had that kind of control, I'd skip Perl and go straight to Python or Ruby. (Actually, that was a lie, I'd use Common Lisp if I could.) As it is, I've standardized on Perl 5.8.
The product I am talking about is Signiant:http://www.signiant.com/.
It's a file based workflow application that basically that is written in Perl in the same sense that emacs is written in emacs lisp. The idea is for people to write workflows in the embedded Perl environment which is the same across of the machines on which Signiant is installed. I could use another interpretor, but that would require extra work and I wouldn't be able to use a lot of the Signiant specific code.
Also and perhaps more importantly this has not been incorporated into any version of python. It is just a patch on the bug tracker and realistically I doubt it has much chance of being accepted.