Hacker News new | past | comments | ask | show | jobs | submit login

I use uv pip to install dependencies for any LLM software I run. I am not sure if uv re-implements the pip logic or hands over resolution to pip. But it does not change the fact that I have multiple versions of torch + multiple installations of the same version of torch in the cache.

Compare this to the way something like maven/gradle handles this and you have to wonder WTF is going on here.




uv implements its own resolution logic independently of pip.

Maybe your various LLM libraries are pinning different versions of Torch?

Different Python versions each need their own separate Torch binaries as well.

At least with uv you don't end up with separate duplicate copies of PyTorch in each of the virtual environments for each of your different projects!


> Different Python versions each need their own separate Torch binaries as well

Found this the hard way. Something to do with breakage in ABI perhaps. Was looking at the way python implements extensions the other day. Very weird.


>Something to do with breakage in ABI perhaps.

There is a "stable ABI" which is a subset of the full ABI, but no requirement to stick to it. The ABI effectively changes with every minor Python version - because they're constantly trying to improve the Python VM, which often involves re-working the internal representations of built-in types, etc. (Consider for example the improvements made to dictionaries in Python 3.6 - https://docs.python.org/3/whatsnew/3.6.html#whatsnew36-compa... .) Of course they try to make proper abstracted interfaces for those C structs, but this is a 34 year old project and design decisions get re-thought all the time and there are a huge variety of tiny details which could change and countless people with legacy code using deprecated interfaces.

The bytecode also changes with every minor Python version (and several times during the development of each). The bytecode file format is versioned for this reason, and .pyc caches need to be regenerated. (And every now and then you'll hit a speed bump, like old code using `async` as an identifier which subsequently becomes a keyword. That hit TensorFlow once: https://stackoverflow.com/questions/51337939 .)


Very different way of doing things compared to the JVM which is what I have most experience with.

Was some kind of FFI using dlopen and sharing memory across the vm boundary ever considered in the past, instead of having to compile extensions alongside a particular version of python?

I remember seeing some ffi library, probably on pypi. But I don't think it is part of standard python.


You can in fact use `dlopen`, via the support provided in the `ctypes` standard library. `freetype-py` (https://github.com/rougier/freetype-py) is an example of a project that works this way.

To my understanding, though, it's less performant. And you still need a stable ABI layer to call into. FFI can't save you if the C code decides in version N+1 that it expects the "memory shared across the vm boundary" to have a different layout.


> Something to do with breakage in ABI perhaps. Was looking at the way python implements extensions the other day. Very weird.

Yes, it's essentially that: CPython doesn't guarantee exact ABI stability between versions unless the extension (and its enclosing package) intentionally build against the stable ABI[1].

The courteous thing to do in the Python packaging ecosystem is to build "abi3" wheels that are stable and therefore don't need to be duplicated as many times (either on the index or on the installing client). Torch doesn't build these wheels for whatever reason, so you end up with multiple slightly different but functionally identical builds for each version of Python you're using.

TL;DR: This happens because of an interaction between two patterns that Python makes very easy: using multiple Python versions, and building/installing binary extensions. In a sense, it's a symptom of Python's success: other ecosystems don't have these problems because they have far fewer people running multiple configurations simultaneously.

[1]: https://docs.python.org/3/c-api/stable.html


My use of python is somewhat recent. But the two languages that I have used a lot of - Java and JS - have interpreters that were heavily optimized over time. I wonder why that never happened with python and, instead, everyone continues to write their critical code in C/Rust.

I am planning to shift some of my stuff to pypy (so a "fast" python exists, kind of). But some dependencies can be problematic, I have heard.


Neither Java nor JS encourages the use of native extensions to the same degree that Python does. So some of it is a fundamental difference in approach: Python has gotten very far by offloading hot paths into native code instead of optimizing the interpreter itself.

(Recent positive developments in Python’s interpreted performance have subverted this informal tendency.)


Node also introduced a stable extension API that people could build native code against relatively early in its history compared to Python. That and the general velocity of the V8 interpreter and its complex API kept developers from reaching in like they did with Python, or leaving tons of libraries in the ecosystem that are too critical to drop.


Yeah, I think it's mostly about complexity: CPython's APIs also change quite a bit, but they're pretty simple (in the "simple enough to hang yourself with" sense).


> Neither Java nor JS encourages the use of native extensions to the same degree that Python does.

You already had billions of lines of Java and JS code that HAD to be sped up. So they had no alternative. If python had gone down the same route, speeding it up without caveats would have been that much easier.


I don't think that's the reason. All three ecosystems had the same inflection point, and chose different solutions to it. Python's was especially "easy" since the C API was already widely used and there were no other particular constraints (WORA for Java, pervasive async for JS) that impeded it.


For scientific stuff and ML, it's because people already had libraries written in C/Fortran/C++ and so calling it directly just made sense.

In other languages that didn't happen and you don't have anywhere near as good scientific/ML packages as a result.


>> My use of python is somewhat recent. But the two languages that I have used a lot of - Java and JS - have interpreters that were heavily optimized over time. I wonder why that never happened with python and, instead, everyone continues to write their critical code in C/Rust.

Improving Python performance has been a topic as far back as 2008 when I attended my first PyCon. A quick detour on Python 3 because there is some historical revisionism because many online people weren't around in the earlier days.

Back then the big migration to Python 3 was in front of the community. The timeline concerns that popped up when Python really got steam in the industry between 2012 and 2015 weren't as huge a concern. You can refer to Guido's talks from PyCon 2008 and 2009 if they are available somewhere to get the vibe on the urgency. Python is impactful because it changes the language and platform while requiring a massive amount of effort.

Back to perf. Around 2008, there was a feeling that an alternative to CPython might be the future. Candidates included IronPython, Jython, and PyPy. Others like Unladen Swallow wanted to make major changes to CPython (https://peps.python.org/pep-3146/).

Removing the GIL was another direction people wanted to take because it seemed simpler in a way. This is a well researched area with David Beazley having many talks like this oldie (https://www.youtube.com/watch?v=ph374fJqFPE). The idea is much older (https://dabeaz.blogspot.com/2011/08/inside-look-at-gil-remov...).

All of these alternative implementations of Python from this time period have basically failed at the goal of replacing CPython. IronPython was a Python 2 implmentation and updating to Python 3 while trying grow to challenge CPython was impossible. Eventually, Microsoft lost interest and that was that. Similar things happened for the others.

GIL removal was a constant topic from 2008 until recently. Compatibility of extensions was a major concern causing inertia and the popularity meant even more C/C++/Rust code relying on a GIL. The option to disable (https://peps.python.org/pep-0703/) only happened because the groundwork was eventually done properly to help the community move.

The JVM has very clearly defined interfaces and specs similar to the CLR which make optimization viable. JS doesn't have the compatibility concerns.

That was just a rough overview but many of the stories of Python woes miss a lot of this context. Many discussions about perf over the years have descended into a GIL discussion without any data to show the GIL would change performance. People love to talk about it but turn out to be IO-bound when you profile code.


A bit baffling, IMO, the focus on GIL over actual python performance, particularly when you had so many examples of language virtual machines improving performance in that era. So many lost opportunities.


They don't want to throw away the extensions and ecosystem. Let's say Jython, or some other modern implementation became the successor. All of the extensions need to be updated (frequently rewritten) to be compatible with and exploit the characteristics of that platform.

It was expected that extension maintainers would respond negatively to this. In many cases it presents a decision: do I port this to the new platform, or move away from Python completely? You have to remember, the impactful decisions leading us down this path were closer to 2008 than today when dropping Python or making it the second option to help people migrate, would have been viable for a lot of these extensions. There was also a lot of potential for people to follow a fork of the traditional CPython interpreter.

There were no great options because there are many variables to consider. Perf is only one of them. Pushing ahead only on perf is hard when it's unclear if it'll actually impact people in the way they think it will when they can't characterize their actual perf problem beyond "GIL bad".


python just didn't have much momentum until relatively recently, despite it's age. There are efforts to speed it up going on now backed by Microsoft.

For pypy it's in a weird spot as the things it does fast are the ones you'd usually just offload to a module implemented in C


As a long time Pythonista I was going to push back against your suggestion that Python didn't have much momentum until recently, but then I looked at the historic graph on https://www.tiobe.com/tiobe-index/ and yeah, Python's current huge rise in popularity didn't really get started until around 2018.

(TIOBE's methodology is a bit questionable though, as far as I can tell it's almost entirely based on how many search engine hits they get for "X programming". https://www.tiobe.com/tiobe-index/programminglanguages_defin...)


Yes, TIOBE is garbage. The biggest problem is that because they're coy about methodology we don't even know what we're talking about. Rust's "Most Loved" Stack Overflow numbers were at least a specific thing where you can say OK that doesn't mean there's more Rust software or that Rust programmers get paid more, apparently the people programming in Rust really like Rust, more so than say, Python programmers loved Python - so that's good to know, but it's that and not anything else.


Tiobe is garbage. I remember Python making waves since 2005 with Google using it and such.


From what I can tell it wasn't as prominent as it has been recently, with being a popular pick for random projects that weren't just gluing things together. The big companies that used it were perfectly happy specializing the interpreter to their use case instead of upstreaming general improvements


python had momentum until the 2->3 transition put a huge damper on it around 2012-2016.

Python got lucky with the machine learning community using Python. (Thank you TensorFlow and PyTorch, and the SciPy community for saving Python.)


> There are efforts to speed it up

Well, the extensions are going to complicate this a lot.

A fast JIT interpreter cannot reach into a library and do its magic there the way HotSpot/V8 can with native Java/JS code.


The reason why people don't always use abi3 is because not everything that can be done with the full API is even possible with the limited one, and some things that are possible carry a significant perf hit.


I think that's a reason, but I don't think it's the main one: the main one is that native builds don't generally default to abi3, so people (1) publish larger matrices than they actually need to, and (2) end up depending on non-abi3 constructs when abi3 ones are available.

(I don't know if this is the reason in Torch's case or not, but I know from experience that it's the reason for many other popular Python packages.)


Yes, you're right; I should have clarified my comment with, "people who know the difference to begin with", which is something one needs to learn first (and very few tutorials etc on Python native modules even mention the limited API).




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: