Guido van Rossum: People want CPAN (discussion on Python packages)

amix · on Nov 10, 2009

A few weeks ago I argued that language distribution platforms should be based on a distributed version control system (like Mercurial or Git) and have an user-friendly web-interface (like BitBucket or GitHub). Reputation (like the one found in StacOverflow) should also be a part of the platform. Anyhow, read more here if you are interested: http://amix.dk/blog/viewEntry/19475

steveklabnik · on Nov 10, 2009

Nice post. I think this is a really cool idea in general. A few of us on one of my projects have been thinking myself recently about a package manager based on git, where you could literally merge in feature branches or patch branches as you wanted...

n8agrin · on Nov 10, 2009

While a nice idea, it would be worth finding out why github abandoned this very concept in favor of gemcutter.org before diving in head first. I trust the github guys as being far more competent than most in all matters of distributed source control especially when it comes to package management.

pjhyett · on Nov 10, 2009

The idea is sound, but it was a distraction for us, so we were more than happy to offload rubygems to a dedicated service that could give it the attention it deserves.

inklesspen · on Nov 10, 2009

Presumably because gemcutter was doing a better job?

staunch · on Nov 10, 2009

Both the CLI CPAN module/tool itself and http://search.cpan.org/ are awesome to work with. I think every language should shamelessly copy them first, and try to improve upon them second.

It almost seems like Ruby/Python have been consciously reluctant to do so.

jrockway · on Nov 10, 2009

CPAN's success, as with Debian's, is not due to the technical infrastructure but rather the social infrastructure. There are plenty of ways to make packages not work at all, it's just that the Perl people don't do that.

(If you have ever read the CPAN source code, you would be surprised it works at all. Don't get me started on the various incompatible build systems, and what happens when your module's build system depends on a newer version of the build system.)

Haskell's Cabal is the technical model to steal. Don't let modules execute their own code unless they actually need to. 99.9999% of modules do fine with some sort of declarative interface, rather than actual code to do those things.

cdavid · on Nov 10, 2009

Yes, cabal is the thing to steal. Improving distutils is worthwhile, but a temporary stop gap, most fundamental issues cannot be solved from improving distutils. Distutils itself is only 10 000 LOC of bad code, without good API, and you can reuse most setup.py from a conversion step.

I have started playing with a cabal-like project.

http://github.com/cournape/toydist

It does not do much ATM, except for the conversion step: it can generates a static description of the package from existing setup.py. It reuses distutils to build package, but it does so from the declarative file, meaning that the whole thing is not tied to distutils anymore: distutils becomes an implementation detail.

plinkplonk · on Nov 10, 2009

"Haskell's Cabal is the technical model to steal."

I can't upvote this enough. CABAL is extremely well done.

chromatic · on Nov 10, 2009

The CPAN works despite clunky code and competing implementations because the functional decomposition is sufficiently effective. (EUMM is some of the worst code in intent, design, and implementation I've ever read and used anyway.)

jrockway · on Nov 10, 2009

The CPAN software infrastructure only works for things that it was intended to do, it is very hard to make it do arbitrary things. If you want to find out what file to download from the BackPAN to get Foo::Bar 0.2, that's too bad, you have to download the BackPAN and index it yourself. Of course, there is no way to index things without evalling code regexed out of every file. (And there is no way to predict if "make install" will actually install Foo::Bar.)

Something CPAN couldn't easily do a few years ago was install to arbitrary directories. If you set the right environment variables, EUMM would sort of do the right thing. If you set different environment variables, sometimes MB would do the right thing. Eventually EUMM and MB were patched so that it almost always worked, and then local::lib was written to paper over the differences.

And of course, nothing requires the use of EUMM or MB, so if a package doesn't use it, you can't install it to your home directory.

Anyway, the EUMM is what you get when you write code to fix problems that people complain about. MB is what you get when you write specs to "fix" problems people complain about. Maybe someday we will have a build system that has a sane design and actually works.

marltod · on Nov 10, 2009

having run into several vague errors from CPAN I agree

jrockway · on Nov 10, 2009

CPAN.pm has some great errors. If it thinks your clock is set wrong (!), it will print a warning message.

jcapote · on Nov 10, 2009

Maybe python, but rubygems kicks ass.

patio11 · on Nov 10, 2009

Google is the de-facto rubygem search engine. If you don't know the real name of the gem, you Google for functionality or common name ([ruby prawn gem]), find a blog post, copy/paste the line that starts with "gem install" (gem install prawn), and then go back to whatever you were doing.

r11t · on Nov 10, 2009

http://gemcutter.org and the search feature of the website might reduce the need of googling for rubygems.

relme · on Nov 10, 2009

Read the source: http://rubygems.rubyforge.org/svn/trunk/lib/

adw · on Nov 10, 2009

rubygems and easy_install/.egg packages are more or less the same thing.

tvon · on Nov 10, 2009

Eh, easy_install has been a sort of sticking point with me for a while. One of those things that makes me shake my fist at the sky and leaves me a bit baffled that I seem to be the only one pulling his hair out.

So far as I can tell there is no way for easy_install to do anything but install packages. I see an "upgrade" command but it seems to only work if I specify the package name, and I don't see any way to list what version of what is installed. I also don't see any way to remove packages.

In short, this means easy_install turns my system into a mystery grab bag of packages that I can't remove without fishing around in site-packages.

Unless of course I'm missing something, but if I'm missing something it certainly isn't obvious.

marcusbooster · on Nov 10, 2009

I find package systems fantastic for things I don't want to care about it, but get frustrated when it handles the things I do care about.

Or maybe that's just my own experience using Common Lisp on Ubuntu.

adw · on Nov 10, 2009

The scientists he's talking about are smart people, but aren't really into computers and (what's more) have no patience at all for the amount of pain it takes to compile the dependencies Python packages need.

It's not the Python side of things that's the problem. Numpy and Scipy (which nearly all Python scientific software depend on) is based on both bindings to C and to Fortran, and getting those library ducks in a row - especially on Windows or MacOS X, on Linux your package manager does it for you - is a pain for someone who knows what they're doing and almost impossible for someone who doesn't.

Plus, well, most scientists outside physics use Windows. If you need a command-line tool you've already lost in that respect. What you're competing against is often Excel.

The battle over Numpy being in core Python has been fought and lost, but short of that level of integration, I don't see that there's going to be much effective to do about this.

cdavid · on Nov 10, 2009

The problem is more complicated than just C code being more difficult to build.

The whole distutils infrastructure is messy and badly designed. It takes care of everything from build up to installation and packaging, and all those parts are tighly coupled. It is also incredibly inflexible, and the way to extend it through subclassing leads to incompatible code (if package A subclass distutils, and package B subclass the same thing, how can you use A and B ?). Almost every design decision of distutils is wrong, and badly implemented.

Numpy and scipy binaries are built for every release: actually, that's the platform we support the best in some sense since we can reliably build binaries, and that saddens me quite a bit.

adw · on Nov 10, 2009

You guys do an amazing and thankless job, and I'm sorry if I oversimplified that aspect of it. It's just difficult to see how to solve the problems you describe, let alone to explain those problems to a researcher you're trying to wean off Fortran. :)

illumen · on Nov 10, 2009

Numpy is one of the best packaged python modules there is. Saying it is hard to install is completely wrong.

It's even already packaged for many OSes. For example, it comes with OSX(and ubuntu, red hat, etc). There's a 3 click installer for windows.

There are also distributions of python that have numpy(and many other sci modules) by default.

adw · on Nov 10, 2009

... and even so, I've seen people struggle to install them - Scipy in particular - countless times. I'm not sure how easy is easy enough, but it's going to have to be basically impossible to screw up.

And I'm not sure that's possible short of bundling everything with Python, and of course that's a bad idea.

ableal · on Nov 10, 2009

Debian's 'synaptic'. Help and leverage it to other platforms, if needed. Less pain all around.

blasdel · on Nov 10, 2009

Please god no.

Apt itself is barely good enough to handle libraries written in C, much less a dynamic language with multiple potentially-incompatible runtimes, and Debian's policies are dead set against making anything remotely wholesome:

  * As an author, affected middlemen have a stranglehold on easy distribution
  * License wankery (fuck debian-legal)
  * Teenagers randomly patching upstream software without review
  * Shipping non-standard configurations, often with features randomly disabled
  * Rearranging everything to fit their naive 'filesystem hierarchy'
    (this completely fucks up a decent packager like Ruby Gems)
  * Breaking off features into separate packages whenever possible
  * Shipping ancient versions of software with a selection of patches picked
    specifically to introduce no features, just cherry-pick 'bug-fixes'
  * Shipping multiple versions of a runtime with mutually-exclusive depgraphs
  * FUCKING RELEASE FREEZES
    There's no goddamn reason for any non-system software to be frozen ever

Ubuntu is making a decent stab at unfucking all this (at least on their turf) with PPAs: https://help.launchpad.net/Packaging/PPA

DarkShikari · on Nov 10, 2009

These are all problems with Debian's use of synaptic; the program itself is a very good package manager.

blasdel · on Nov 10, 2009

Synaptic is a major improvement over the original apt tools -- while it's slower, it actually handles dependencies correctly most of the time, and doesn't throw anal-retentive errors.

Unfortunately, the underlying apt system has plenty of shitty behaviors that aren't even related to Debian's shitty packaging policies or the hostile original implementation:

  * Only one process can even read the db at a time!
  * It's extraordinarily fragile
    * Loves to crap out at the slightest network failure
    * Will corrupt its database on SIGINT
  * Hamhandedly muddles up installation and configuration
  * Does not handle optional dependencies well
  * Poorly handles only part of the 'alternatives' problem

A lot of its failings are rooted in the assumption that installation will be fast enough to be interactive, so why bother?

nailer · on Nov 10, 2009

Actually I believe a few ar problems with the parent poster: "* Rearranging everything to fit their naive 'filesystem hierarchy'(this completely fucks up a decent packager like Ruby Gems)"

The FHS is a known standard which works with every language. What specifically about Ruby makes it unique amongst all other software?

blasdel · on Nov 11, 2009

Gems (like NeXT / OS X .app bundles) are self-contained, with the documentation and data resources alongside the code in a standard way. This makes it very easy to support having multiple versions of the same software installed simultaneously, with an optionally qualified import statement to disambiguate.

The FHS inspires maintainers to large amounts of useless and regressive tedium in re-separating the peas and the carrots into global piles. It's not so bad with traditional C libraries, but the brokenness is immediately obvious when dealing with the libraries of a language that has anything resembling a module system.

What's specific to Ruby is that their community somehow managed to not fuck up their packaging medium.

nailer · on Nov 11, 2009

Yes, but native package managers already allow multiple versions to be installed simultaneously.

'What's specific to Ruby is that their community somehow managed to not fuck up their packaging medium.'

Overwriting global binaries in /usr/bin is pretty fucked to me, and I don't think I'm alone in that. Say I'm using puppet or OVirt or other Ruby based system apps - I wouldn't want Gems breaking them. If Python did this (being the basis for most Linux distros) or Perl did this on older Unix there would be hell to pay.

swolchok · on Nov 10, 2009

    There's no goddamn reason for any non-system software to be frozen ever

What about, I don't know, developers who release early and often without good test coverage? If you're putting your seal of approval on a bunch of software, you probably want to make sure that it works. This cannot be done instantaneously.

blasdel · on Nov 10, 2009

That doesn't mean you should be shipping the version of Firefox released when you last did a freeze, whether that's 6 months or 3 years ago. Even worse, with the way the freeze cycle is done in practice, they need to stop all updates even for bleeding edge users for several months (Ubuntu) or 6-18 months (Debian).

In a rolling release system you don't have to have a single imprimatur of package approval. At minimum everyone implements it with at least 'stable', 'new', and 'fucked' markers for packages, and you can go way further with multiple repos and overlay semantics.

swolchok · on Nov 10, 2009

Have you heard of Debian stable, testing, unstable, and experimental?

blasdel · on Nov 10, 2009

testing is the only one of those that's remotely usable, and it gets frozen as soon as they start thinking about releasing -- etch was frozen for five months, lenny for seven. Sid / unstable is generally frozen whenever testing is, is still really laggardly normally, and is constantly broken anyway. I've never gotten anything useful done with Experimental.

At least volatile has been around for a few years, so people aren't fucked on tzdata and such because of bullshit policy, though I think it's still off by default.

graywh · on Nov 10, 2009

  * Rearranging everything to fit their naive 'filesystem hierarchy'
    (this completely fucks up a decent packager like Ruby Gems)

Well who made ruby gems install to /usr/bin by default and potentially interfere/overwrite system-installed software in the first place? Sorry, but I think "Linux Standard Base" is a good thing.

blasdel · on Nov 11, 2009

To be fair, nearly all package management systems install to /usr/bin by default, and run roughshod over anything underneath them. At least gem can be configured to install in your home directory, and utilize dependencies that are already present. Apt is completely incapable of supporting anything like that.

The LSB is trivial spec-wank. A whole bunch of effort that doesn't address any of the actual real-world portability issues.

ableal · on Nov 10, 2009

There's no goddamn reason for any non-system software to be frozen ever

You may want to re-think this, from a perspective other than the single-user system.

Consider an organization that uses its own software, which may not be very good but does the job productively. Imagine that one piece produces output that a browser must present. The tested and supposedly stable system is updated. With a major new version of the browser. Which now renders the output as a blank page.

Hilarity ensues. Or perhaps not.

blasdel · on Nov 11, 2009

No such organization would have their desktops update themselves straight from the internet -- you either have them connect to your own internal server for updates, or you do regular full system imaging -- both methodologies are straightforward and widely used for all major platforms, with explicit support from upstream.

jeremymcanally · on Nov 10, 2009

I sincerely doubt porting that to other platforms would be very easy, but perhaps you and I have different definitions of "pain." :)

Plus, you'd have to worry about minor-yet-annoying namespace issues (e.g., a Python package and a non-Python package sharing a name).

Am I the only one not offended by the idea of package managers for each programming language? They always work better when they're tailored to the language.

lucumo · on Nov 10, 2009

> Am I the only one not offended by the idea of package managers for each programming language?

Learning multiple tools to do a similar job is a bit of a nuisance. If you know one tool, you can learn its details over time. That's much harder with multiple tools. Does tool X remove configuration files when uninstalling? Can it even uninstall? Do I need to update some configuration files manually?

There's also the part where you sometimes need integration with packages from other package manager systems. System libraries (libcurl, etc.), header files if it gets compiled at install time, make, a compiler, etc.

But since the problem is getting solved by multiple tools already (cpan, pear, apt, yum, just plain ./configure+make, etc.), maybe we should work more on integrating those package managers and less on replacing the others. It seems unlikely to happen with so many incompatible personal preferences around...

viraptor · on Nov 10, 2009

There are cases when they don't work better. It happens when your non-python program requires something from python for scripting, or when a python module requires a 'classic' library. A global system is quite good in those cases.

I'm not sure what you mean by the namespace issues. Everything that's installed as a python package in debian is prefixed with "python-"...

ableal · on Nov 10, 2009

About the porting and the pain: I believe the hard parts are defining the metadata, and refining the actual program logic. In my experience over the last decade, 'synaptic' has been the most trouble-free system to use.

I think it would be less work to clone/port the needed logic bits to Windows/whatever, and share most of the metadata defined for Debian/Ubuntu/etc, instead of redoing (and debugging) everything from scratch.

vegai · on Nov 10, 2009

Or perhaps start with something simple, like Arch's 'pacman'...

blue1 · on Nov 10, 2009

Common Lisp needs something well structured like CPAN too. The situation with clbuild, mudballs (deceased?), asdf-install etc. is rather confusing IMHO.

wavesplash · on Nov 10, 2009

Perhaps I'm late to the game but why not take a good look at the Gem/Gemcutter Ruby packaging and distribution system?

The Ruby folks have done a few iterations of packaging systems and have pretty much nailed it.

Might be worth studying the whole 'gem' system (discovery, distributed publishing, versioning, dependencies, uninstall, etc).

http://www.gemcutter.com

aceofspades19 · on Nov 10, 2009

s/com/org