Python Packaging, One Year Later: A Look Back at 2023 in Python Packaging

infecto · on Jan 16, 2024

I always struggle with these discussions. I have been using Python a long time and while there ahve definitely been a handful of gotchas over the years, it was never an experience that held me up long enought to think twice about. I am sure there is a better way but what holds me up is how often it impacts other people but not myself. Perhaps my usage is not advanced enough but it always leaves me wondering.

Edit: Just as an additional thought, is one of the main issues when distributing to client machines to what is essentially python executables and the clients could be using different OS? My mind jumps to probably not using Python at that point unless there was a specific dependency/library to using it.

jonawesomegreen · on Jan 16, 2024

I've encountered frustration with Python packaging in two main areas:

1) Installing applications within Docker containers. While wheels have improved this situation, I was surprised that building a package for easy copying into a container and running without the need for installing build tools and extensions in the final container image was not straightforward, especially coming from other languages.

2) Distributing Python utilities to end users across various platforms in an easily installable manner without requiring them to follow lengthy instructions to set up all the dependencies has been another challenge.

We use Poetry and have largely "solved" #1 with a somewhat complicated Docker build. It works well now, so no one has to think about it much. That made deploying Python server-side fairly easy. However, #2 has been much more of a challenge, and I wonder if that is where other folks in this thread are feeling the most pain.

kingkongjaffa · on Jan 16, 2024

Pyinstaller has been robust enough for us to deploy Python written windows executables without any issues.

guappa · on Jan 16, 2024

All of those problems can be easily solved by distribution packages.

ogrisel · on Jan 16, 2024

Distribution packages are nice but:

- they require different instructions for each platform: one for each distribution: it's not possible to give a one liner that will work for all supported OS/architecture combinations, - distributions are not all updated at the speed of upstream. Sometimes users do not want to wait several years to use a recently released feature.

bbojan · on Jan 16, 2024

Same here. I always see people complaining about Python packaging, but rarely run into issues myself. Maybe it's an OS issue? I use Linux.

Some of the projects at work use gigabytes of computer vision dependencies and again I don't remember the last time I had any issues.

Could some of the people who do have issues give an example or two of the problems they run into?

epage · on Jan 16, 2024

Part of the problem is the frustration is the curse of choice, the need to research what tool to use.

Before the new crowd of tools, like poetry, the problem was also having to figure out how to chain all the scattered tools together. I likely still have some bias from this experience.

I stopped doing python before getting to adopt any of these tools but when I researched Poetry for a company to use, a fatal flaw is that they copied too much from Rust/Cargo, Cargo was born with its community and they help shaped each other. The rust community has a strong adherance to semver. In Python, you have at mess of versioning schemes (e.g. CalVer) and low quality version requirements. You need a way to override version requirements but they refuse.

Doing cross platform development? The tools that do locking today via `requirements.txt` generate platform-specific lockfiles.

oblio · on Jan 16, 2024

You don't try to distribute Python applications to multiple OSes, I assume?

throwup238 · on Jan 16, 2024

> CUDA version on your system does not match the CUDA version torch was compiled with

Bane of my existence.

Shish2k · on Jan 16, 2024

Yes this - I’ve been using python for decades, and been mostly fine (dropping down to ~zero packaging & deployment issues since I started putting all my services into docker images). But AI stuff? Somehow that manages to be a massive pain in the ass every single time

gipp · on Jan 16, 2024

I mostly agree with the GP comment, but CUDA support has definitely been the one exception that was just an absolutely infuriating experience.

marbs · on Jan 16, 2024

Here are the two main packaging issues I run into, specifically when using Poetry:

1) Lack of support for building extension modules (as mentioned by the article). There is a workaround using an undocumented feature [0], which I've tried, but ultimately decided it was not the right approach. I still use Poetry, but build the extension as a separate step in CI, rather than kludging it into Poetry.

2) Lack of support for offline installs [1], e.g. being able to download the dependencies, copy them to another machine, and perform the install from the downloaded dependencies (similar to using "pip --no-index --find-links=."). Again, you can work around this (by using "poetry export --with-credentials" and "pip download" for fetching the dependencies, then firing up pypiserver [2] to run a local PyPI server on the offline machine), but ideally this would all be a first class feature of Poetry, similar to how it is in pip.

I don't have the capacity to create Pull Requests for addressing these issues with Poetry, and I'm very grateful for the maintainers and those who do contribute. Instead, on the linked issues I share my notes on the matter, in the hope that it may at least help others and potentially get us closer to a solution.

Regardless, I'm sticking with Poetry for now. Though to be fair, the only other Python packaging tools I've used extensively are Pipenv and pip/setuptools. It's time consuming to thoroughly try out these other packaging tools, and is generally lower priority than developing features/fixing bugs, so it's helpful to read about the author's experience with these other tools, such as PDM and Hatch.

[0] https://github.com/python-poetry/poetry/issues/2740

[1] https://github.com/python-poetry/poetry/issues/2184

[2] https://pypi.org/project/pypiserver/

coldtea · on Jan 16, 2024

>I always see people complaining about Python packaging, but rarely run into issues myself. Maybe it's an OS issue? I use Linux.

No, it's a "what you do with it issue". Not necessarily about the mere "number of dependencies" used, when e.g. someone just makes some conda env and is fine with it.

Things like relocating, provisioning, reproducibility, version updating, cross platform, etc, all have their issues, and it gets worse when you need to build your own packages.

volker48 · on Jan 16, 2024

The only time I've ever experienced a memorable amount of pain was when I was working on a project that used one of the newer options like Poetry. Using a virtualenv, a requirements.txt file, and pip I have not had serious issues in 15 years of using Python.

coldtea · on Jan 16, 2024

>it was never an experience that held me up long enought to think twice about

The pain is when you're doing more complex stuff than creating a local virtualenv with some basic deps though.

infecto · on Jan 16, 2024

That is why I was wondering. I have produced a lot of software using python, deployed it on different platforms, using containers, not using containers, bare metal, cloud, using complex dependecies, etc. I have not run into a lot of issues and most of them for me were mostly depedency verion related where different versions were required for subdependencies but even then it was not super complicate. I realize more complex scenarios exist but it surprises me that its such a hot topic always with Python.

nerdponx · on Jan 16, 2024

I'm less pessimistic about packaging than most. No language that I know of has ever attempted a standardization effort like this. I think it will pay off in the long term. But it's taken years to get here and it will take more years to wrap up the project.

> An attempt at a specification was rejected due to “lukewarm reception”, even though there exist at least four implementations which are achieving roughly the same goals, and other ecosystems also went through this before.

Python has been weird lately.

We got structural pattern matching, which is entirely new syntax that is essentially just syntactic sugar and only kind-of solves a problem that relatively few users ever had.

But then we reject __pypackages__ and null-coalescing/safe-navigation operators, which solve problems that everyone has and are unambiguous improvements and modernizations of the language. Even if the PEPs had problems and needed to be rewritten from scratch, there is now approximately zero chance that will ever happen.

I love Python but sometimes I feel like the decisions are arbitrary and do not reflect what is best for the language.

It drives me nuts because I'm a captive user. There's too much momentum for data science and machine learning to switch and be taken seriously in a non-solo workplace. Also I just like Python.

PurpleRamen · on Jan 16, 2024

Which problem or usecase would null-coalescing/safe-navigation operators solve, which are not already solved with conditional expression or walrus-operator?

Kwpolska · on Jan 16, 2024

You can certainly replace `a?.b?.c` with some `if`s. But it gets tedious and unreadable fast.

ngrilly · on Jan 16, 2024

> No language that I know of has ever attempted a standardization effort like this.

What about Ruby, Rust, or Go, for example?

nerdponx · on Jan 16, 2024

All of those use a single package management tool, which is considered the only correct and acceptable tool to use. That is what everybody wished Python had done 20 years ago, but they didn't, and now we are engaged in this unique experiment, in order to avoid forcing everyone to migrate to one particular tool.

Instead, Python is developing a set of interoperable standards and APIs that any number of build and packaging tools can use. So projects can choose whatever build tool makes sense for them, and and users can build any project via a uniform interface.

As an example, let's say that I am writing a web app which serves a machine learning model. I have three dependencies: a web framework, a database driver, and a machine learning framework. The web framework might be packaged using Flit, the database driver might be packaged using Setuptools, and the ML framework might be packaged using CMake or Meson. And for my own project I might choose to use Hatch or Poetry. When I use Pip to install my dependencies, the details of what tool they were packaged with are completely abstracted away. Even if I need to compile something because there is no binary package published for my system, Pip will automatically install the required build dependencies in an isolated environment, and it will magically know how to build the package, using nothing but the package's own declaration of its "build backend". And then when I want to publish my own work, and users will have the same experience when they need to install my dependencies.

The system doesn't work 100% perfectly in all cases (mostly due to things that are out of scope, like managing shared libraries at the system level), but it actually works most of the time for most people in most situations. And they were no doubt issues transitioning from older systems. But as far as I know, no system like this has ever even been attempted, let alone rolled out to millions of users and working so seamlessly that most people didn't even realize it was happening.

So it's not perfect. And maybe the whole idea is backwards and would've been avoided if there had been a single coherent package management story in the first place. But given the scale of the challenge, I think we should all be a little less eager to wave the "Python steering council is stupid & bad" banner.

oblio · on Jan 16, 2024

Or, you know, Java, the language that solved packaging around 2004.

nerdponx · on Jan 16, 2024

I don't know how Java packaging works. Got a short summary?

oblio · on Jan 17, 2024

Well, I'll start from the original sin for JS/Python/Ruby/PHP/Perl:

0. Java was designed to be fast enough to be comparable with C/C++. Think 2x slower or less, not 10-100x slower. This leads to:

1. Most Java libraries are in Java, they're not native. And they're portable as-is across platforms. If they do have platform specific code, it's their job to make sure they bundle and load everything needed for the specific problem.

2. In 2004 Maven 1 was launched, then in 2005 Maven 2 came with a repository format update. Packages are zip files called jars (Java ARchive), there are also wars (Web ARchive) which are zip-of-zips meant to be unpacked by the application server before the first launch, or ear (Enterprise ARchive), which are also zip-of-zips but I forget the exact details for using these.

3. The Maven repo format is the same, locally and remotely. When you work with Maven (or Gradle, or any Java build tool, since they're ALL compatible with Maven, otherwise nobody would use them), you get a local cache/mirror/proxy of the remote repos.

4. Because Java has CLASSPATH (sort of like PYTHONPATH but probably better, and it probably inspired PYTHONPATH because I'm sure the Java CLASSPATH predates it), packages are not copied over to the local folder when developing. Maven & co just assemble the correct CLASSPATH and everything is referred to directly from the local repo. You don't have venvs or node_modules because those are just silly hacks that aren't needed here.

5. If you need to package Java stuff, the standard approaches are:

- cross platform jars for libraries (these are usually published to the Maven repo and can be used natively by any Java package manager)

- zips for desktop apps; shell scripts or executable launchers inside the zips to launch the apps

- wars for web applications

- ears for huge, enterprise applications

It's obviously very deep and complex when you want to look at everything, but that's it.

Because Java is really close to "Write Once, Run Anywhere" in practice, for most platforms yeah, you just copy over the jar/zip.

The hardest part that people complain is installing the JRE, which is SUPER silly since for a technical person it should be trivial to do.

For non technical person, since at least 10 years ago, you can just bundle the JRE with your app. These days (at least 5+ years), I'm fairly sure you can AOT compile the app, even.

Python, by comparison, is a horror show.

And it all comes from 0. -> Python is slow, it was meant to be used with C libraries, so it carries all the baggage from that ancient and creaky ecosystem. Packaging a Python (or Ruby, or...) app to deploy on Windows, Linux (multiple distros), MacOS is such a horror show that they invented an entire layer with Docker, to just put the whole thing into an almost literal shipping container and not bother with the craziness inside.

ephimetheus · on Jan 19, 2024

> 4. Because Java has CLASSPATH (sort of like PYTHONPATH but probably better, and it probably inspired PYTHONPATH because I'm sure the Java CLASSPATH predates it)

Just pointing out that Python predates Java by a few years. Not sure when the PATH concept was introduced, though.

oblio · on Jan 19, 2024

On paper it predates it but Java was "industrial development ready" from day 1. Python 1 was barely used and it was more of an academic toy of sorts, it took Python 2 to come on the scene for Python to even be mentioned in the same sentence as Java.

I don't know the exact chronology of CLASSPATH versus PYTHONPATH, but I can tell you that CLASSPATH usage is pervasive and has been so from the start, for Java (I was using Maven 2 in 2007 and even then CLASSPATH was established, being also used by the Ant and Eclipse and other, older, tools), while PYTHONPATH is definitely not used to the same effect in Python, it's an afterthought.

cassianoleal · on Jan 16, 2024

Python packaging is really bad.

These days I just use virtualenv since at least it's simple to use and reason about. It's bad, just not nearly as bad as the others I've tried.

Hackbraten · on Jan 16, 2024

You might like Relieving your Python packaging pain [1]. The article seems to come to a similar conclusion as you do.

[1]: https://www.bitecode.dev/p/relieving-your-python-packaging-p...

cassianoleal · on Jan 17, 2024

Indeed. I'm not too fussed about points 1 and 2. I will install whatever version works for me, usually via Homebrew.

Otherwise, sound advice.

ehutch79 · on Jan 16, 2024

Why is venv bad though?

coldtea · on Jan 16, 2024

Compare to something like node_modules for the equivalent functionality.

Trivial to rellocate: just move the folder.

Trivial to delete: just delete the folder.

Trivial to do something more complex, with layers of dependencies, custom not local path, etc.

ehutch79 · on Jan 16, 2024

You can certainly just delete a venv. `rf -rf venv`

I'm not sure what kind of custom path you need, but you can put a venv anywhere you want that's practical.

Yes, when you create it, you need leave it there. but they're trivial to create. `python3 -m venv [path to venv]` Installing packages is a LOT faster than npm.

There's definitely issues with python's setup as a global interpreter, but these arn't them.

oblio · on Jan 16, 2024

You should be able to copy paste the folder across identical machines. Now it's not recommended for... reasons.

ehutch79 · on Jan 16, 2024

Why would you do that? I wouldn't do that with node_modules either.

If they're 100% cloned VMs or something, I guess? I'd still probably just build wheels and use those.

oblio · on Jan 17, 2024

> Why would you do that?

I hate this question. It's basically "why are you stupid"? I'm a professional with 20 years of experience with ecosystems ranging from Java, .NET, Ruby, Python, etc., so if I write something, I probably have a valid use case.

The opposite question should be asked: why can't I do that?

And the answer is, frankly, either lazy/bad design on the Python ecosystem part or backwards compatibility with an existing but bad system. In a well designed system venvs would be by default copy-able to another equivalent system since there's little to lose and a lot to gain, for example, access to the simplest and most reliable deployment system invented in history, the <<copy-paste deployment system>>.

The fact that we have entire generations of developers that can't fathom why someone would want that basically says all it needs to say about the degree of over engineering that's now standard.

pgwhalen · on Jan 16, 2024

> Why would you do that?

Deployment, sharing for development, etc.

hasty_pudding · on Jan 16, 2024

poetry is awesome. i recommend it

jzombie · on Jan 16, 2024

It won't install tensorfow on a Mac Mx series without hacking a custom config. I gave up on Poetry after that and try to stick with pip now.

drcongo · on Jan 16, 2024

I've not had this problem.

jzombie · on Jan 16, 2024

Poetry install tensorflow does not work for me. There's lots of other folks with this issue as well.

dannyz · on Jan 16, 2024

From a user point of view Python packaging has probably never been better, but it is still a huge mess from anyone having to maintain these projects which might explain some of the dichotomy in the comments.

In my experience most of these packaging tools work fine for pure Python packages, it is when you try and bundle extensions in a cross-platform way that things get really messy. For better or worse I think third party package managers somewhat outside the Python ecosystem, i.e. conda + conda-forge, are the only tools that get this right.

bspammer · on Jan 16, 2024

I really enjoyed this too (linked in the article) https://gregoryszorc.com/blog/2023/10/30/my-user-experience-...

bvrmn · on Jan 16, 2024

Yes, it's a great read. It validated my frustration with python packaging.

smitty1e · on Jan 16, 2024

Seeing the Poetry support on the thread, maybe just

1. pick the best of breed,

2. have the PSF sponsor a Packaging kickstarter for those of us who'd toss in some money but lack time/skill to do more than whine, and

3. get this mess fixed up for 3.13 this Fall or 3.14 (pi-thon?) in 2025?

I get it that the feature Venn across the scene has many disjoint parts. The management seems a greater challenge than the code itself.

jjgreen · on Jan 16, 2024

get this mess fixed up for 3.13 this Fall or 3.14 (pi-thon?) in 2025?

I admire your optimism but don't share it. Crap packaging will be present in the Python experience for the lifetimes of multiple dogs. 2075 at the earliest.

smitty1e · on Jan 16, 2024

I submit that the existing packaging mosaic covers the requirements (almost) totally.

Just need to incentivi$e the proper ninja(s) to merge the functionality into One True package.

Whether or not that is a native python, or C-port of all this Rust goodness, or the Rust binary will be a sticky wicket.

t43562 · on Jan 16, 2024

I'm more worried about commercial entities gaining power - by simply employing enough people to work on something that the alternatives cannot compete.

People who want "one" anything - I can sympathise with the desire for simplicity (as long as it's a kind that suits you) but I hope it never happens.

oblio · on Jan 16, 2024

This is a niche concern. There are no commercial package managers that I know of, in any mainstream ecosystem. Even Gradle, which probably comes closest, is perfectly usable without the commercial add-ons.

t43562 · on Jan 16, 2024

I'm not thinking of commercial anything - just changes financed by companies that want them.

Example: PyPy getting almost no funding but Microsoft getting it's JIT into cpython.

Big money will be deciding what happens. We're already all dependent on Github - and they've managed to use our code to train their AI. Like a vampire offering poor travellers a bed in it's castle and then sucking their blood :-D

nerdponx · on Jan 16, 2024

Arguably, Conda is a commercial product that exists to serve the needs of Anaconda first and others second. Fortunately Anaconda has been a very benevolent overlord.

bvrmn · on Jan 16, 2024

I have a dozens of packages created since 2009 and never had issues with distutils, setuptools also worked fine for me. Even C extensions weren't a problem.

Recently I've tried setup.py-less approach. Oh boy. What a mess "modern" python packaging is. Multiple backends. PEP-517. pip<23 could not install your package. `build` tool includes explicitly excluded files.

At least you can include markdown as a README for the package.

nerdponx · on Jan 16, 2024

Setuptools is supposed to work with PEP 517 just fine with no changes, other than adding the build-backend declaration in pyproject.toml. Old versions of Pip will find your setup.py and use it directly, and new versions will find it via pyproject.toml and invoke it via `build`. Maybe you hit a bug? Or were doing something unusual and complicated. I never had a problem with the transition in a Setuptools pronect.

lysecret · on Jan 16, 2024

Nowadays I am usually using micromamba inside Docker. I honestly never had any issues. I need docker anyway for deployment. And mamba/conda is almost needed for scientific stuff (working with some super niche weather stuff which is only on conda).

IMO the most important thing is to stay flexible, all of them have their tradeoffs pick one that works and move to a different one if it becomes painful.

Docker is very very useful though. Can you imagine the joy I felt when I spun up a 6 month old very complex project and it immediately worked.

boxed · on Jan 16, 2024

The loss of Guido as the BDFL was not good for this. And honestly, even before that he was too much hands off to resolve this situation sadly.

hasty_pudding · on Jan 16, 2024

Use Poetry. Python packaging problems solved.

drcongo · on Jan 16, 2024

I don't know why the PyPA seems to be stubbornly ignoring Poetry, most Python devs I know have switched to it and are very unlikely to switch again to something else. It does what you need it to do, it's stable and reliable these days, and the ergonomics are good.

Hackbraten · on Jan 16, 2024

I wish Poetry were PEP-621 compliant though. [1]

Currently, it uses a proprietary configuration group (or "tool section", as they seem to call it in `pyproject.toml` speech).

[1]: https://github.com/python-poetry/roadmap/issues/3

drcongo · on Jan 16, 2024

I kinda wish PEP-621 had gone more down the pyproject route that poetry took, their version looks so much cleaner and more readable at a glance than the PEP-621 version.

nerdponx · on Jan 16, 2024

Hatch seems to be the preferred tool by PyPA, probably because it's more standards-compliant and it claims standards-compliance as a design goal.

bbojan · on Jan 16, 2024

Also Poetry integrates well with pip/venv and vice-versa, so you can transition to it step-by-step if your project consists of multiple packages across a multirepo or a monorepo. You don't have to switch everything all at once.

BerislavLopac · on Jan 16, 2024

No, use PDM. ;)

The main problem with Poetry is that it still does not support PEP 621. [0]

[0] https://github.com/python-poetry/poetry/issues/3332

shankr · on Jan 16, 2024

I tried using PDM at my new work place and gave up. I had weird issues and since I had used poetry before, I switched to it. Poetry just works out of the box these days.

apple4ever · on Jan 21, 2024

PEP 621 doesn't seem supported any more either

> This PEP is a historical document. The up-to-date, canonical spec, pyproject.toml specification, is maintained on the PyPA specs page.

I'd be fine if it supported it, but then PyPA should push to standardize on Poetry.

BerislavLopac · on Jan 21, 2024

Yes, but PEP 631 is still relevant here.

apple4ever · on Jan 21, 2024

That's what I found. It just works. I never have to worry about anything.

Python should just adopt it.

PurpleRamen · on Jan 16, 2024

Are there relevant use cases, which poetry is not supporting, because of which people prefer other solutions too?

bvrmn · on Jan 16, 2024

Yes a great tool which breaks existing project tomls on update. Before poetry you had one issue and two with.

guappa · on Jan 16, 2024

Poetry is just a wrapper on pip… might as well just use pip.

nerdponx · on Jan 16, 2024

For packaging you need more than that. Poetry and Hatch both wrap or replace a bunch of small modular components: Pip, Venv, Build, Twine, Setuptools, Pip Compile (fo lockfiles), and a task runner like Tox, Nox, or Invoke. You can (and people do) use that "stack" instead of Poetry or Hatch, if you prefer a more Unix-style approach of modularity and composability among several single-purpose tools.

hasty_pudding · on Jan 16, 2024

Python is just a wrapper on assembly language.

Might as well just use assembly.

;P

guappa · on Jan 17, 2024

I guess you are one of those people who thinks installing 300MiB of dependencies to save 3 lines of code is completely ok?

ggregoire · on Jan 16, 2024

I use pip inside docker and it just works.

coldtea · on Jan 16, 2024

If by "just works" you mean "now I also need to have and maintain a container image for isolation", yes.

ggregoire · on Jan 16, 2024

There is nothing to maintain. I bump python in the dockerfile if I need to bump it for some reasons, otherwise there is no reason to change anything and it indeed just works. Takes like 2 min of work once a year.

I can rebuild an image from 5 years ago and deploy it on multiple OSes and it will probably still work.

coldtea · on Jan 16, 2024

>There is nothing to maintain.

Yeah, just install a root-owning Docker service, run everything in a virtualized container, write the dockerfile -- "nothing to maintain".

nerdponx · on Jan 16, 2024

It's actually easier on Mac and Windows, because it runs Docker inside QEMU and it installs the whole toolchain in a single app bundle. I'd pay good money for a Linux equivalent, setting it all up manually is a pain.

Izkata · on Jan 17, 2024

Can't you also run the QEMU virtual machine on linux?