I use uv pip to install dependencies for any LLM software I run. I am not sure i...

simonw · 2025-02-18T23:38:28 1739921908

uv implements its own resolution logic independently of pip.

Maybe your various LLM libraries are pinning different versions of Torch?

Different Python versions each need their own separate Torch binaries as well.

At least with uv you don't end up with separate duplicate copies of PyTorch in each of the virtual environments for each of your different projects!

sieve · 2025-02-18T23:50:02 1739922602

> Different Python versions each need their own separate Torch binaries as well

Found this the hard way. Something to do with breakage in ABI perhaps. Was looking at the way python implements extensions the other day. Very weird.

zahlman · 2025-02-19T10:50:11 1739962211

>Something to do with breakage in ABI perhaps.

There is a "stable ABI" which is a subset of the full ABI, but no requirement to stick to it. The ABI effectively changes with every minor Python version - because they're constantly trying to improve the Python VM, which often involves re-working the internal representations of built-in types, etc. (Consider for example the improvements made to dictionaries in Python 3.6 - https://docs.python.org/3/whatsnew/3.6.html#whatsnew36-compa... .) Of course they try to make proper abstracted interfaces for those C structs, but this is a 34 year old project and design decisions get re-thought all the time and there are a huge variety of tiny details which could change and countless people with legacy code using deprecated interfaces.

The bytecode also changes with every minor Python version (and several times during the development of each). The bytecode file format is versioned for this reason, and .pyc caches need to be regenerated. (And every now and then you'll hit a speed bump, like old code using `async` as an identifier which subsequently becomes a keyword. That hit TensorFlow once: https://stackoverflow.com/questions/51337939 .)

sieve · 2025-02-19T11:10:17 1739963417

Very different way of doing things compared to the JVM which is what I have most experience with.

Was some kind of FFI using dlopen and sharing memory across the vm boundary ever considered in the past, instead of having to compile extensions alongside a particular version of python?

I remember seeing some ffi library, probably on pypi. But I don't think it is part of standard python.

zahlman · 2025-02-19T12:27:04 1739968024

You can in fact use `dlopen`, via the support provided in the `ctypes` standard library. `freetype-py` (https://github.com/rougier/freetype-py) is an example of a project that works this way.

To my understanding, though, it's less performant. And you still need a stable ABI layer to call into. FFI can't save you if the C code decides in version N+1 that it expects the "memory shared across the vm boundary" to have a different layout.

woodruffw · 2025-02-18T23:58:53 1739923133

> Something to do with breakage in ABI perhaps. Was looking at the way python implements extensions the other day. Very weird.

Yes, it's essentially that: CPython doesn't guarantee exact ABI stability between versions unless the extension (and its enclosing package) intentionally build against the stable ABI[1].

The courteous thing to do in the Python packaging ecosystem is to build "abi3" wheels that are stable and therefore don't need to be duplicated as many times (either on the index or on the installing client). Torch doesn't build these wheels for whatever reason, so you end up with multiple slightly different but functionally identical builds for each version of Python you're using.

TL;DR: This happens because of an interaction between two patterns that Python makes very easy: using multiple Python versions, and building/installing binary extensions. In a sense, it's a symptom of Python's success: other ecosystems don't have these problems because they have far fewer people running multiple configurations simultaneously.

[1]: https://docs.python.org/3/c-api/stable.html

sieve · 2025-02-19T00:11:45 1739923905

My use of python is somewhat recent. But the two languages that I have used a lot of - Java and JS - have interpreters that were heavily optimized over time. I wonder why that never happened with python and, instead, everyone continues to write their critical code in C/Rust.

I am planning to shift some of my stuff to pypy (so a "fast" python exists, kind of). But some dependencies can be problematic, I have heard.

woodruffw · 2025-02-19T00:29:43 1739924983

Neither Java nor JS encourages the use of native extensions to the same degree that Python does. So some of it is a fundamental difference in approach: Python has gotten very far by offloading hot paths into native code instead of optimizing the interpreter itself.

(Recent positive developments in Python’s interpreted performance have subverted this informal tendency.)

throwup238 · 2025-02-19T00:40:09 1739925609

Node also introduced a stable extension API that people could build native code against relatively early in its history compared to Python. That and the general velocity of the V8 interpreter and its complex API kept developers from reaching in like they did with Python, or leaving tons of libraries in the ecosystem that are too critical to drop.

woodruffw · 2025-02-19T00:53:18 1739926398

Yeah, I think it's mostly about complexity: CPython's APIs also change quite a bit, but they're pretty simple (in the "simple enough to hang yourself with" sense).

sieve · 2025-02-19T00:56:12 1739926572

> Neither Java nor JS encourages the use of native extensions to the same degree that Python does.

You already had billions of lines of Java and JS code that HAD to be sped up. So they had no alternative. If python had gone down the same route, speeding it up without caveats would have been that much easier.

woodruffw · 2025-02-19T01:21:05 1739928065

I don't think that's the reason. All three ecosystems had the same inflection point, and chose different solutions to it. Python's was especially "easy" since the C API was already widely used and there were no other particular constraints (WORA for Java, pervasive async for JS) that impeded it.

physicsguy · 2025-02-19T09:02:18 1739955738

For scientific stuff and ML, it's because people already had libraries written in C/Fortran/C++ and so calling it directly just made sense.

In other languages that didn't happen and you don't have anywhere near as good scientific/ML packages as a result.

zerkten · 2025-02-19T03:50:29 1739937029

>> My use of python is somewhat recent. But the two languages that I have used a lot of - Java and JS - have interpreters that were heavily optimized over time. I wonder why that never happened with python and, instead, everyone continues to write their critical code in C/Rust.

Improving Python performance has been a topic as far back as 2008 when I attended my first PyCon. A quick detour on Python 3 because there is some historical revisionism because many online people weren't around in the earlier days.

Back then the big migration to Python 3 was in front of the community. The timeline concerns that popped up when Python really got steam in the industry between 2012 and 2015 weren't as huge a concern. You can refer to Guido's talks from PyCon 2008 and 2009 if they are available somewhere to get the vibe on the urgency. Python is impactful because it changes the language and platform while requiring a massive amount of effort.

Back to perf. Around 2008, there was a feeling that an alternative to CPython might be the future. Candidates included IronPython, Jython, and PyPy. Others like Unladen Swallow wanted to make major changes to CPython (https://peps.python.org/pep-3146/).

Removing the GIL was another direction people wanted to take because it seemed simpler in a way. This is a well researched area with David Beazley having many talks like this oldie (https://www.youtube.com/watch?v=ph374fJqFPE). The idea is much older (https://dabeaz.blogspot.com/2011/08/inside-look-at-gil-remov...).

All of these alternative implementations of Python from this time period have basically failed at the goal of replacing CPython. IronPython was a Python 2 implmentation and updating to Python 3 while trying grow to challenge CPython was impossible. Eventually, Microsoft lost interest and that was that. Similar things happened for the others.

GIL removal was a constant topic from 2008 until recently. Compatibility of extensions was a major concern causing inertia and the popularity meant even more C/C++/Rust code relying on a GIL. The option to disable (https://peps.python.org/pep-0703/) only happened because the groundwork was eventually done properly to help the community move.

The JVM has very clearly defined interfaces and specs similar to the CLR which make optimization viable. JS doesn't have the compatibility concerns.

That was just a rough overview but many of the stories of Python woes miss a lot of this context. Many discussions about perf over the years have descended into a GIL discussion without any data to show the GIL would change performance. People love to talk about it but turn out to be IO-bound when you profile code.

sieve · 2025-02-19T08:03:38 1739952218

A bit baffling, IMO, the focus on GIL over actual python performance, particularly when you had so many examples of language virtual machines improving performance in that era. So many lost opportunities.

zerkten · 2025-02-19T14:44:27 1739976267

They don't want to throw away the extensions and ecosystem. Let's say Jython, or some other modern implementation became the successor. All of the extensions need to be updated (frequently rewritten) to be compatible with and exploit the characteristics of that platform.

It was expected that extension maintainers would respond negatively to this. In many cases it presents a decision: do I port this to the new platform, or move away from Python completely? You have to remember, the impactful decisions leading us down this path were closer to 2008 than today when dropping Python or making it the second option to help people migrate, would have been viable for a lot of these extensions. There was also a lot of potential for people to follow a fork of the traditional CPython interpreter.

There were no great options because there are many variables to consider. Perf is only one of them. Pushing ahead only on perf is hard when it's unclear if it'll actually impact people in the way they think it will when they can't characterize their actual perf problem beyond "GIL bad".

Numerlor · 2025-02-19T00:19:14 1739924354

python just didn't have much momentum until relatively recently, despite it's age. There are efforts to speed it up going on now backed by Microsoft.

For pypy it's in a weird spot as the things it does fast are the ones you'd usually just offload to a module implemented in C

simonw · 2025-02-19T01:03:14 1739926994

As a long time Pythonista I was going to push back against your suggestion that Python didn't have much momentum until recently, but then I looked at the historic graph on https://www.tiobe.com/tiobe-index/ and yeah, Python's current huge rise in popularity didn't really get started until around 2018.

(TIOBE's methodology is a bit questionable though, as far as I can tell it's almost entirely based on how many search engine hits they get for "X programming". https://www.tiobe.com/tiobe-index/programminglanguages_defin...)

tialaramex · 2025-02-19T02:27:12 1739932032

Yes, TIOBE is garbage. The biggest problem is that because they're coy about methodology we don't even know what we're talking about. Rust's "Most Loved" Stack Overflow numbers were at least a specific thing where you can say OK that doesn't mean there's more Rust software or that Rust programmers get paid more, apparently the people programming in Rust really like Rust, more so than say, Python programmers loved Python - so that's good to know, but it's that and not anything else.

wiseowise · 2025-02-19T07:51:31 1739951491

Tiobe is garbage. I remember Python making waves since 2005 with Google using it and such.

Numerlor · 2025-02-19T13:47:34 1739972854

From what I can tell it wasn't as prominent as it has been recently, with being a popular pick for random projects that weren't just gluing things together. The big companies that used it were perfectly happy specializing the interpreter to their use case instead of upstreaming general improvements

collinmanderson · 2025-02-23T05:46:03 1740289563

python had momentum until the 2->3 transition put a huge damper on it around 2012-2016.

Python got lucky with the machine learning community using Python. (Thank you TensorFlow and PyTorch, and the SciPy community for saving Python.)

sieve · 2025-02-19T00:51:44 1739926304

> There are efforts to speed it up

Well, the extensions are going to complicate this a lot.

A fast JIT interpreter cannot reach into a library and do its magic there the way HotSpot/V8 can with native Java/JS code.

int_19h · 2025-02-19T01:46:17 1739929577

The reason why people don't always use abi3 is because not everything that can be done with the full API is even possible with the limited one, and some things that are possible carry a significant perf hit.

woodruffw · 2025-02-19T01:59:14 1739930354

I think that's a reason, but I don't think it's the main one: the main one is that native builds don't generally default to abi3, so people (1) publish larger matrices than they actually need to, and (2) end up depending on non-abi3 constructs when abi3 ones are available.

(I don't know if this is the reason in Torch's case or not, but I know from experience that it's the reason for many other popular Python packages.)

int_19h · 2025-02-19T02:37:05 1739932625

Yes, you're right; I should have clarified my comment with, "people who know the difference to begin with", which is something one needs to learn first (and very few tutorials etc on Python native modules even mention the limited API).