Guido van Rossum joins Microsoft

MR4D · on Nov 12, 2020

So instead of BDFL for Python, he's going "make using Python better".

Congrats to him for finding something fun to do in retirement - dictators usually end up with a different outcome. ;)

I'm looking forward to seeing the future of Python - I think this move will be great for the whole community, and lets him push boundaries without being bogged down on the management side.

systemvoltage · on Nov 12, 2020

An official package manager with great dependency resolution would be fantastic. Or over take pipenv or poetry and sponsor it through Microsoft $$$.

The biggest hurdle to python right now is the stupid package managers. We need cargo for Python.

dcolkitt · on Nov 12, 2020

I think in general Python's biggest challenge is that it doesn't scale well. This is an agglomeration of issues around the same theme: bad packaging when there's a lot of cross-cutting dependencies, slow performance, no concurrency, typing as second-class citizens, etc. All of that is barely noticeable when you're just getting started on a small experimental project, but incredibly painful in large production systems.

I strongly suspect that devs' satisfaction with Python is strongly correlated with the size of the codebase they're working on. Generally people using Python for one-off projects or self-contained tools tend to be pretty happy. People stuck in sprawling enterprise codebases, with O(million) lines of code to wrangle, seem almost universally miserable with the language.

What I've observed a lot is that many startups or greenfield projects start with Python to get an MVP out the door as fast as possible. Then as the scope of the software expands they feel increasingly bogged down and trapped in the language.

reissbaker · on Nov 13, 2020

I work at Instagram, which is a O(millions) LOC Python monorepo with massive throughput and a large engineering team. It's actually quite nice — but our code is heavily, heavily typed. It would be miserable without the type system. Some of the older parts of the codebase are more loosely typed (although they're shrinking reasonably quickly), and those sections are indeed a huge pain to ramp up on.

Part of the success of IG's large Python codebase is FB's investment into developer tooling; for example, FB wrote (and open-sourced, FWIW) our own PEP-484 compliant type checker, Pyre [1], because mypy was too slow for a codebase of our size.

1: https://github.com/facebook/pyre-check

JAlexoid · on Nov 13, 2020

That's my major complaint about Python...

For it's age and popularity - the tooling is abysmal.

I have to rewrite too many things, that I expected to just be there for the age of the project.

And some things were only fixed now! dict + dict only started to work with 3.9!

nomdep · on Nov 13, 2020

dict1 + dict2 is syntactic sugar, you had been able to do `dict1.update(dict2)` since forever.

HelloNurse · on Nov 13, 2020

No, it's completely different: dict1+dict2 creates a new object and leaves inputs unchanged, dict1.update(dict2) modifies dict1.

slaman · on Nov 16, 2020

you've been able to do splat for forever! https://dpaste.org/yU7s

anonymouse008 · on Nov 13, 2020

Yeah, you could use my hairline to tell when I’m working with Python... I still haven’t figured out a good typed framework for it yet.

kevinarpe · on Nov 13, 2020

I also struggle with weak(er) typing in Python, compared to strong(er) typing in C++, Java, or C# -- or even VBA(!).

Frustrated like you, I wrote my own open source type checking library for Python. You might be interested to read about it here: https://github.com/kevinarpe/kevinarpe-rambutan3/blob/master...

I have used that library on multiple projects for my job. It makes the code run about 50% slower, on average, because all the type checking is done at run-time. I am OK with the slow down because I don't use Python when I need speed. My "developer speed" was greatly improved with stricter types.

Finally, this isn't the first time I wrote a type checking library/framework. I did the same for Perl more than 10yrs ago. Unfortunately, that code is proprietary, so not open source. :( The abstract ideas were very similar. Our team was so frustrated with legacy Perl code, so I wrote a type checking library, and we slowly applied it to the code base. About two years later, it was much less painful!

reissbaker · on Nov 15, 2020

Try using Pyre! It's open-source. I use it daily at IG.

david-gpu · on Nov 13, 2020

Have you guys published any whitepaper on this subject? These last few years working on moderately large python codebases with dynamic typing have been less than idilic.

wilsonthewhale · on Nov 13, 2020

Here's a small collection https://pyre-check.org/docs/pysa-additional-resources

KaiserPro · on Nov 12, 2020

"devop" here

> doesn't scale well.

Nothing scales well. scaling requires lots of effort. It doesn't matter what language you use, you'll rapidly find all its pain points.

> bad packaging when there's a lot of cross-cutting dependencies

Much as I hate it, docker solves this. Failing that poetry or if you must venv. (if you're being "clever" statically compile everything and ship the whole environment, including the interpreter) its packaging is a joy compared to node. Even better, enforce standard environments, which stops all of this. One version of everything. you want to change it? best upgrade it for everyone else.

> slow performance

Meh, again depends on your use case. If you're really into performance then dump out to C/C++ and pybind it. fronting performance critical code in python is a fairly decent way to allow non specialists handle and interface performance critical code. Its far cheaper to staff it that way too. standard python programmers are cheaper than performance experts.

If we are being realistic, most of the time 80% of python programs are spend waiting on network.

Granted, python is not overly fast, but then most of the time your bottleneck is the developer not the language.

> no concurrency

Yes, this is a pain. I would really like some non GIL based threading. However its not really been that much of a problem. multiprocessing Queues are useful here, if limited. Failing that, make more processes and use an rpc system.

> typing as second-class citizens

The annotation is under developed. being reliant on dataclass libraries to enforce typing is a bit poop.

> People stuck in sprawling enterprise codebases, with O(million) lines of code to wrangle, seem almost universally miserable with the language.

I work with a _massive_ monorepo. Python isn't the problem, its programmer being "clever" or making needless abstractions of abstractions. None of that is python's issues, its egotistical programmer not wanting to read other people's (un documented) code. And not wanting to spend time make other people's code better.

rovr138 · on Nov 12, 2020

>Nothing scales well. scaling requires lots of effort. It doesn't matter what language you use, you'll rapidly find all its pain points.

This is very important. A lot of people think that just using go or rust or whatever other language is new fixes all of this. But with a big enough project, you'll find all the issues. It's just a matter of time.

loup-vaillant · on Nov 12, 2020

Do not miss that one will find all of the language's pain points. I'd wager that a dynamically typed languages such as Python has quit a few more pain points at scale than a more principled language such as OCaml.

I love Python's bignum arithmetic when I write small prototypes for public key cryptography. I love Python's extensive standard library when I'm scrapping a couple web pages for easier local reading. But I would never willingly chose it for anything bigger than a few hundred lines. I'm simply not capable of dealing with large dynamically typed programs.

Now if people try Rust or OCaml with the mentality of an early startup's Lisp developer, they're going to get hurt right away ("fighting the language" and "pleasing the compiler" is neither pleasing nor productive), and they're going to get hurt in the long run (once you've worked around the language's annoying checks, you won't reap as much benefit).

If you'll allow the caricature, don't force Coq down Alan Kay's throat, and don't torture Edsger Dijkstra with TCL.

girvo · on Nov 14, 2020

Though OCaml’s tooling pain points hurt at least as much as Pythons, even though I adore the language.

sanderjd · on Nov 12, 2020

This is somewhat true - scaling is hard no matter what - but some things scale much better than others. I have been miserable working with ruby on rails codebases that are much smaller than java codebases I have been content working on. This is despite personally enjoying the ruby language far more than the java language.

nope_42 · on Nov 12, 2020

> Much as I hate it, docker solves this. Failing that poetry or if you must venv. (if you're being "clever" statically compile everything and ship the whole environment, including the interpreter) its packaging is a joy compared to node. Even better, enforce standard environments, which stops all of this. One version of everything. you want to change it? best upgrade it for everyone else.

No, docker doesn't solve the fact that some packages just won't play nicely together. NPM actually does this better than the python ecosystem too since it will still work with different versions of the same dependency. You get larger bundle sizes but that's better than the alternative of it just flat not working.

JAlexoid · on Nov 13, 2020

Scalability is not just runtime, it's also developer time scalability. The larger the project, the more you have to split it up and write interface documentation between libraries - which adds complexity.

As for processing scalability - Python is OK, but it's considerably hampered by BDFL's own opinions. The result is a few third party libraries that implement parallelism in their own way. That functionality should be integral to the standard library already. The worst part is lack of standard API for data sharing between processes.

> packaging

Python's packaging issues only start with package management. Setuptools is a wholly mess of a system, that literally gave me headaches for the lack of "expected features". I hate it with every single cell in my body.

And then there are systems and libraries, where you literally cannot use docker (Hello PySpark!).

>read other people's (un documented) code

I lolled! Seriously... We get Python fanboys moan about how indentation makes everything more readable and it's a pleasure to write code in python. Give me a break!

ineedasername · on Nov 12, 2020

its programmer being "clever"

When I have to revisit old code I've written, I occasionally encounter my "cleverness" at the time. I always hate that past version of me. I think I've mostly learned my lesson. I guess I'll know in a few years.

kaishiro · on Nov 13, 2020

"When I have to revisit old code I've written, I occasionally encounter my "cleverness" at the time. I always hate that past version of me. I think I've mostly learned my lesson. I guess I'll know in a few years."

...I feel attacked.

theon144 · on Nov 12, 2020

I'm sorry, but as a fellow Python "devop" too, this really reads like empty apologism.

>Nothing scales well. scaling requires lots of effort.

Sure, just like all PLs have their flaws, and most software has security vulnerabilities. But it's a question of degree and the tendency of the language. Different languages work better in different domains, and fail in others, and what Python is specifically bad at is scaling.

If only for the lack of (strong/static) typing and the relatively underpowered control flow mechanisms (e.g. Python often using exceptions in their stead)... While surely all languages have pain points that show up at scale, Python still has a notable lot of significant ones precisely in this area.

>docker, poetry, venv...

Yes, and this is exactly the point. There's at least three different complex solutions, none of which can really be considered a "go-to" choice. What is Rust doing differently? Hell, what are Linux distros doing differently?

>If you're really into performance then dump out to C/C++ and pybind it.

If you want performance, don't use Python - was the parent's point.

>If we are being realistic, most of the time 80% of python programs are spend waiting on network.

This really, really doesn't apply to all of programming (or even those domains Python is used in). Besides, what argument is that? If it were true for your workload, then it would be so for all other languages too, meaning discussion or caring about performance is practically meaningless.

>Granted, python is not overly fast, but then most of the time your bottleneck is the developer not the language.

Once again, this applies to all languages equally, yet, for example, Python web frameworks regularly score near the bottom of all benchmarks. I doubt it is because of the lack of bright programmers working in Python, or the lack of efforts to make the frameworks faster.

>Python isn't the problem, its programmer being "clever" or making needless abstractions of abstractions.

Just as C isn't the problem, it's the programmer forgetting to check for the size of the buffer, and PHP isn't the problem, it's the programmer not using the correct function for random number generation.

You can always trace any given error to a single individual making an honest mistake, that's really not a useful way to think about this. It's about a programming language (or an environment) leading the programmer into wrong directions, and the lack of safety measures for misguided "egotistical programmers" to do damage. You can blame the programmers all you want, but at the end of the day, the one commonality is the language.

Now Python is still one of my favorite languages, and I think that for a lot of domains, it really is the right choice, and I can't imagine doing my work without it. But performance and large, complex systems, is not one of those domains, and I honestly feel like all you've said in Python's favor is that other languages are like that too, and that it's the fault of the programmers anyway.

KaiserPro · on Nov 15, 2020

I have thought about what you've written. I broadly agree. I didn't mean for my post to be a "python is great really", It was more to illustrate that all programming languages have drawbacks.

The is a points that I think I've failed to get over:

> Just as C isn't the problem, it's the programmer forgetting to check for the size of the buffer, and PHP isn't the problem, it's the programmer not using the correct function for random number generation

I don't think I was arguing that point. of course all languages have their USP. My point I wanted to get across is that large python projects are not inherently hard to manage. That kind of scaling is really not that much of an issue. I've worked on large repos for C, C++, python, perl, node and as a punishment, php. The only language that had an issue with a large codebase was node, because it was impossible to build and manage security. The "solution" to that was to have thousands of repos hiding in github.

The biggest impediment to growth was people refusing to read code, followed swiftly by pointless abstractions. This lead to silly situations where there were 7-12(!) wrappers for s3 functions. none of them had documentation and only one had test coverage.

zmmmmm · on Nov 12, 2020

Very much agree. I oversee a relatively small python codebase, but getting good quality, safe code out of the developers in a controlled way is really hard - there are so many ways in which the language just doesn't have enough power to serve the needs of more complex apps. We have massive amounts of linting, type hinting, code reviews spotting obvious errors that would be just invalid code in other languages.

It's like getting on a roller coaster without a seat belt or a guard rail. It's fun at first, and you will make it around the first few bends OK ... then get ready ...

Of course, with enormous discipline, skill and effort you can overcome all this. But it just leaves the question - really, is this the best tool for the job in the end? Especially when you are paying for it with horrifically bad performance and other limitations.

d0mine · on Nov 12, 2020

Have you ever seen O(million) lines enterprise codebase that didn't suck?

gordaco · on Nov 12, 2020

This is surely anecdotic and very subjective, but I have (in Java and in C++; IIRC the exact versions were Java 7 and C++03), and the level of pain was lower than with a Python code base that was about one order of magnitude smaller. In the case of C++, the pain was mostly asociated with an ancient build system we used; the code itself was relatively manageable. There was almost zero template code, and maybe that helped (although in other occasions I've worked with smaller C++03 codebases that relied heavily on templates and I didn't find them that bad).

Not all codebases are equal and maybe I was lucky, but in my experience, using dynamic languages (or, to be exact, any language where the compiler doesn't nag you when there is a potential problem) doesn't scale well.

commonturtle · on Nov 12, 2020

I've worked with an O(100k) line code base in Python that was pure torture. Honestly, I was so desperate for static-typing by the end that I would have preferred if it was all written in C++.

Large codebases are really hard reason about without types. I'm glad we now have projects like Pyre that are trying to bring typing to Python.

PyDude · on Nov 13, 2020

I've worked with Python for more than 15 years, usually with code bases 50-100k lines per app. The only time I have had real issues with types was a codebase I inherited where previous developers were using None, (), "", [], and {} all to mean roughly the same thing and sometimes checking for "", sometimes () etc. I couldn't handle it, so I put asserts everywhere and slowly found out where those things were coming from and sanitized it to be consistent.

chii · on Nov 12, 2020

There's some confounding issues that are often confused together tho.

large python code bases _could_ be written with well modularized, clean separation of concerns and composability. Or it could be written in spaghetti.

Using types _could_ help a code base from becoming spaghetti, but it's not the only way. I think the understandability and maintainability of a code base has more to do with the person writing it than the availability of a type system tbh.

valenterry · on Nov 13, 2020

No they can't, at least not with the same amount of effort. Of course, you can make everything good by throwing enough time and money on it, but that's not the point.

The issue is that to have a nice and well architected code base, you have to constantly refactor and improve - sometimes you need to re-arrange and refactor huge parts of the code. Without types _and_ tests, this is just not gonna happen. It will be unproductive and scary, so that people will start to stop touching existing code and work their way around it.

> I think the understandability and maintainability of a code base has more to do with the person writing it than the availability of a type system tbh.

That is the same thing. Because someone who wants great maintainability will also want a great type system (amongst other things).

ThaJay · on Nov 13, 2020

A good carpenter never complains about his tools. He works around their limitations or uses something else.

The quality of the product is down to the skill of the worker either way.

d0mine · on Nov 13, 2020

We can assume buffer overflows are less common in Java than in C and I doubt that Java programmers are better craftsmen.

The same with types: they make some kind of errors much less likely though there is no silver bullet in general e.g., I much prefer a general purpose language such as Python for expressing complex requirements in tests over any type system (even if your type system is turing complete and you can express any requirement in it; it doesn't mean it is a good idea)

Dylan16807 · on Nov 14, 2020

How often is a carpenter told to use this particular rusty saw or their work won't be compatible with everyone else's?

Everything interlocks in such intricate ways that you can't meaningfully choose your own tools, and working around problems only goes so far. And you can't repair your own tools.

JAlexoid · on Nov 13, 2020

There's also failures of the community to provide good guidance.

ineedasername · on Nov 12, 2020

> desperate for static-typing

Can you explain why? I honestly don't know, because my experience with C++ was during school ~20 years ago, and since then professionally I've used mostly python in relatively small codebases where it's all my own code (mostly for data processing/analysis). Thanks!

(Although I did have to write some C code to glue together data in a very old legacy system that didn't support C++, much less python. It took a lot more effort to do something simple, but it was also strangely a really rewarding experience. Kind of similar to feeling when I work with assembly on hobby projects)

johntb86 · on Nov 12, 2020

The main problem with duck-typing like python has is the lack of consistency between different objects that code has to work on. Different callers may pass objects with different sets of methods into a function and expect it to work. You run into the case where the object that was passed in is one with subtly-mismatched behavior from what your method expects, but you don't know who created it - it was probably stored as a member variable by something 10 callstack levels and 5 classes distant from what you're currently working on.

Static typing prevents that by telling you early where the mismatch is happening - some method calls into another with a variable of the wrong type, and that's where the bug is. It also allows tooling to look up the types of variables and quickly get information about their properties.

ineedasername · on Nov 12, 2020

Got it, that makes sense. It also makes sense why I've not much been bothered by it in python since my relative small code bases don't have that many layers of abstraction laid on top of each other. I'm generally not working with more than 2,000-3,000 lines, and I can just about keep the basic structure in my head. (Unless it's been a while since I've had to revisit it... then I often hate my past self for getting "clever" in some way)

valenterry · on Nov 13, 2020

For these small code bases, static typing is still great (if you are used to it already) but the adverse effects of not having it usually show much stronger with a team (and not a single person). And yeah, if you keep the structure in your head, then you are good anyways.

d0mine · on Nov 14, 2020

> it was probably stored as a member variable by something 10 callstack levels and 5 classes distant from what you're currently working on

If you can define methods on an object dynamically in Python, it doesn't mean that you should. Monkeypatching is not encouraged in culturally in Python. Most often it is seen in tests otherwise, it is rare.

Nobody forbids using ABCs to define your custom interfaces or using type hints for readability/IDE support/linting (my order of preference).

xapata · on Nov 13, 2020

Funny, I'm working on a project about the same size, and the overly aggressive type and value restrictions are the main problem that I struggle with daily.

JAlexoid · on Nov 13, 2020

I am literally working on two projects that are roughly 100kLOC each.

The Scala Spark project I can navigate, understand, test and consider to be average complexity... with some failures, unique to Scala.

The Python Spark project is barely readable.

People who built the Python Spark codebase are "experienced Python devs". While Scala codebase was built by people who used Scala for the first time.

(take this anecdote, as evidence for the poor tooling and guidance present in python community.... and BDFL's own failures)

lrem · on Nov 12, 2020

I've worked on several separate projects of that size in C++ and Go. None of them seemed to achieve a similar mess as Python codebases with one or two dozen thousand lines seem to. OTOH, all the typing developments in Python should have helped? I don't have that much experience with them in enterprise setting.

zo1 · on Nov 12, 2020

I have - and it's not that bad. The key is you have to have someone coordinating and driving a shared vision for the codebase and patterns. But it's hard to find people with that sort of passion and drive to follow-through as it's a multi-year endeavor with politics all over.

Otherwise its a thousand implementations of the same 100-line piece of code interspersed everywhere.

ineedasername · on Nov 12, 2020

It seems like quality code management gets passed over by (bad) management because it looks like it doesn't directly move the project forward.

Which is strange because those same managers may be full adherents to micro tasking projects in a project management system whose purpose is basically to do for the project what code management does for the code itself.

In my workplace, we've recently had leadership that appreciates these things, and the difference is night & day. Simple requests from "stakeholders" (I hate that term) are often filled in days, or same day, instead of weeks. I think it helps tremendously that the primary manager is also a coder herself, and still codes ~25% of her job.

JAlexoid · on Nov 13, 2020

That's the problem with some languages - they lack a visionary, that drives the overall understanding of how things should be structured.

I believe it's Guido that basically said - if you don't like how Python does it, then implement it in C. And that's how you end up with great C based libraries bound to python... and python is often used as a messy orchestrating language.

MR4D · on Nov 12, 2020

LOL!!

Or even worse - could you imagine how many lines that would be in C++ ?

Yowza!

oarabbus_ · on Nov 12, 2020

Guthur · on Nov 12, 2020

And how much of those problems are an artefact of moving fast and getting things down.

I've seen the exact same scenario with other languages. The problem is that in a start up environment you are likely adding amd retiring more "features" at a speed that layers so much complexity that you can no longer reason about what business rules are actually valid any more.

EsotericAlgo · on Nov 12, 2020

I think that's part of it. There is a convention over configuration issue as well. A language like Go forces some patterns like package management and formatting unless you actively try to subvert it.

It wouldn't surprise me if many of these issues are self-selecting in the language communities as well.

jnwatson · on Nov 12, 2020

I work on Python every day on a reasonably large code base and have none of the issues you’re talking about. I’m 10x more productive than similar C or Java projects.

Dependency management is about as easy as it is going to get. We have problems with our dependencies breaking stuff, but who doesn’t?

People talk as if packaging is a solved problem. It isn’t in any language. And then they complain that Python packaging changes too much. That’s because folks are iterating on a hard problem.

disgruntledphd2 · on Nov 12, 2020

Do you handle deployment of this Python application? For me, that's where the pain points arise. I love writing Python, but deploying it does not spark joy at all, at all.

d0mine · on Nov 12, 2020

Here's some of the ways to deploy Python code:

- `curl -L https://app.example.com/install | sh` that downloads installer and runs for instance: apt/yum install <your-package>

- in CI environment on a VM: `git checkout` & `pipenv install --deploy`

- `pipx install glances` on a home computer

- just `pip install` e.g., in a docker container [possibly in a virtualenv]. For pure Python packages, it can work even in Pythonista for iOS (iphone/ipad)

- just copy a python module/archive (PyInstaller and the like)

- give a link to a web app (deployed somewhere via e.g., git push)

- for education: there are python in the browser options e.g., brython, repl.it, trinket.io, pythontutor.com

- just write a snippet in my favourite editor for literate devops tasks/research (jupyter-emacs + tramp + Org Babel) or give a link to a Jupyter notebook

- a useful work can be done even in a REPL (e.g., Python as a powerful calculator)

lmm · on Nov 13, 2020

The fact that you have 9 different ways all with their own different problems is exactly the problem here.

d0mine · on Nov 13, 2020

Do you use a single program on all of your devices for all possible computer-related tasks? Do you see a fault in such logic? Why do you think the deployment space is any different: do you use kubernetes for everything?

I dare you. Do mention any tool/any language that handles all the above use cases without sacrificing the requirements for each use-case.

lmm · on Nov 16, 2020

> Do you use a single program on all of your devices for all possible computer-related tasks? Do you see a fault in such logic?

No. But if I talked about how I used 9 different word-processing programs, you'd see that as a problem, or at least an indictment of those programs. Deployment isn't that complicated.

> I dare you. Do mention any tool/any language that handles all the above use cases without sacrificing the requirements for each use-case.

I use Maven/Scala and as far as I can see it covers all of them other than "give a link to a web app" which isn't actually deploying at all (and I'd still have used maven to deploy the webapp wherever I was deploying it).

I don't think there's any legitimate case for curl|sh, and I don't think there's any real reason for separate pip/pipenv/pipx (did you make that one up? Have I fallen for an elaborate troll?) - rather pipenv exists to work around only being able to install one version of a library at a time. Nothing's gained by having "just copy a module/archive" be different from what the tool does. Running in browser, notebook, or REPL can and should still use the same dependency management tooling as anything else.

If I want to deploy my code, I use maven. You can use curl (since maven repositories use standard HTTP(S)) or copy files around by hand, if you have a use case where you need to, but I can't think what that would be. If you want to bundle up your app as a single file, you can configure things to do that when publishing, but the dependency resolution, repository infrastructure, and deployment still look the same. Even if you want to build a platform-level executable, it's the same story, all the tooling just works the same. If I want a REPL or worksheet, I can start one from maven (and use the same dependency management etc. as always), or my IDE (where it's still hooked up to my maven configuration). If I want to use a Zeppelin notebook then there's maven integration there too.

Ever wonder why you don't hear endlessly about different ways of doing dependency management in non-Python ecosystems? Because we have tools that actually work, and get on with actually writing programs. It baffles me that Python keeps making new tools and keeps repeating the same mistakes over and over: non-reproducible dependency resolution, excessively tight integration between the language and the build tools, and tools and infrastructure that can't be reused locally.

disgruntledphd2 · on Nov 15, 2020

My core problem is with C/C++ depedencies. Can you describe to me how you handle these when you deploy Python?

d0mine · on Nov 15, 2020

  - system packages (deb/rpm/etc)
  - binary wheels (manylinux)
  - building from source

plus some caching if appropriate

disgruntledphd2 · on Nov 16, 2020

God, I wish that would work for me.

To take your examples in order:

1) system packages: almost always out of date for my needs

2) Binary wheels: I actually haven't investigated this much, maybe it will work (and if it does, I'll buy you a drink if we ever meet in person).

3) Building from source: this kinda proves my point about Python having poor dependency management tools if this is a serious response. In general, this would be much further down the rabbit hole than I want to go.

wenc · on Nov 12, 2020

I use Anaconda exclusively and deployments (with virtual environments) have been fairly ok.

That said, I do run into trouble when I have a dependency that requires compilation on Windows (i.e. like the popular turbodbc) because say, a wheel isn't available for a particular Python version. Any time a compilation is needed, it's a headache. Windows machines don't come with compilers, so one has to download and install a multigigabyte Visual Studio Build Essentials package just to compile. Sometimes the compilation fails for various reasons.

Require gcc compilation is headache for installing dependencies inside Docker containers too -- you have to install gcc in order to install Python dependencies and then remove gcc after.

I think requiring local compilation (instead of just delivering the binary) is a UNIX-mindset that is holding back many packaging solutions. I think a lot of pain would be alleviated if we could somehow mandate centralized wheel creation for all Python versions, otherwise the package manager marks a package as broken or unavailable and defaults to the last available wheel.

Also if only we applied some standards like R's CRAN repo does -- ie. if it doesn't pass error checks or doesn't build on certain architectures (institute a centralized CI/CD build pipeline in the package repo), it doesn't get published -- the Python packaging experience would be much improved.

disgruntledphd2 · on Nov 13, 2020

Yeah, if PyPi was as annoying as Cran with respect to new versions, then a lot of this pain would go away.

For those who don't realise, when there's a new version of R, anything that doesn't build without errors/warnings is removed from the archive.

This is really annoying if you want something to keep running, but it prevents the kind of dependency rot common to Python (recently I found a dependency that was four years out of date).

erjiang · on Nov 12, 2020

Curious to know what issues you have with deploying Python codebases. Out of all of the minor and major gripes I have with Python, deployment is not one of them.

tetha · on Nov 12, 2020

To me, python deployments are painless, as long as you can stick to pure dependencies and possibly wheels.

Once a pip install needs to start compiling C, things do go way south very quickly. At that point you can install the union of all common C development tools, kernel headers and prepare for hours of header hunting.

I've done that too much to like python anymore.

disgruntledphd2 · on Nov 13, 2020

Yup, yup. I deploy statistical models with Python, and these always have C dependencies.

Additionally, they are part of a larger application, which is mostly managed by pip, which means that I need both pip and conda which is where things get really, really hairy.

I actually blame Google and FB here, as neither of them use standard python dependency management tools, and many of their frameworks bring in the world, thus increasing the risk of breakage.

JAlexoid · on Nov 13, 2020

Adding data files via setuptools....

And putting them into a common shared directory.

Try doing that without writing convoluted code in your setup.py.

xapata · on Nov 13, 2020

It is funky, but importlib package resources helps.

jnwatson · on Nov 13, 2020

Production deployment is Docker all the time.

Deployment for development is just pyenv and virtualenv.

aserafini · on Nov 12, 2020

No concurrency? asyncio is great for I/O bound network stuff!

colonwqbang · on Nov 12, 2020

"No parallelism" is probably what was meant.

dragonwriter · on Nov 12, 2020

> "No parallelism" is probably what was meant.

Which is still wrong, of course, but "no in-process (or in-single-runtime-instance) parallelism" would be correct, as would "forking inconvenient parallelism".

colonwqbang · on Nov 12, 2020

Posix fork() doesn't really count, if that's what you mean...

dragonwriter · on Nov 12, 2020

Why doesn't Python's multiprocessing module (which uses fork by default on Unix) count? It literally exists for parallelism.

colonwqbang · on Nov 14, 2020

It's understood that you can have "parallelism" by running two copies of your program using basic system facilities like fork(), or even by buying several computers and running one instance of your program on each of them. That's not what is meant by a language "supporting parallelism". If it was, then every language ever designed supports parallelism and so the term is meaningless.

To claim that a language "supports parallelism", it has to do something more to facilitate parallel programming. I would say that parallel threads of computation with shared memory and system resources is the bare minimum. You can go the extra mile and support transactional memory or other "nice" abstractions which make parallel programming easier.

Saying that Python support parallelism because it has a fork() wrapper is like saying that Posix shell is a strongly typed language because it has strings and string is a type.

int_19h · on Nov 13, 2020

It doesn't use fork() on macOS anymore, because some of Apple's own APIs get broken by its use.

Pretty much any app that uses both fork and threads, has to jump through many hoops to make the two work together well. And this applies to all the libraries that it uses, directly or indirectly - if any library spawns a thread and does some locking in it, you get all kinds of hard-to-debug deadlocks if you try to fork.

So unless you have very good perf reasons to need fork, I would strongly recommend multiprocessing.set_start_method("spawn") on all platforms. No obscure bugs, and it'll also behave the same everywhere, so things will be more portable. Code using multiprocessing that's written to rely on fork semantics can be very difficult to port later.

guenthert · on Nov 13, 2020

You wouldn't fork() for performance, but for security reasons.

FartyMcFarter · on Nov 13, 2020

It's not wrong. If running two processes counts as parallelism, then everything does parallelism, and it becomes pointless to talk about it.

xapata · on Nov 13, 2020

Then one should talk about how convenient the related abstractions are. I like the concurrent.futures library.

colonwqbang · on Nov 14, 2020

Concurrency is not the same as parallelism. Python has good concurrency support, I agree. Python (C Python) does not support parallelism however due to its Big Interpreter Lock which actively prevents any parallelism in Python code.

This was probably a conscious design decision on the part of C Python implementers and perhaps a good one. But we should not claim that Python is something which (actively and by design) it's not.

xapata · on Nov 16, 2020

I use `concurrent.futures.ProcessPoolExecutor` fairly often. I handle inter-process communication usually through a database or message queue, expecting that someday I'll want to go clustered instead of just single-machine multiprocessing. I've been burned by implementing multithreading and then needing to overhaul to clustered enough times to stop doing it.

adamc · on Nov 12, 2020

At O(million), the problem wrangling it has more to do with how well its architected and written than it being Python. Python is at least easy to read. Its major deficiency is the lack of annotation of parameters, and that's something that could now be fixed... but it isn't going to be fixed in that much historical code.

It you are trying to get performance out of it (which doesn't really hinge on whether it's a million lines of code), then Python might be the wrong choice. But you can always write it in Rust or C and give Python an API to the functionality.

I agree that packaging is a mess. Fixing that mess with modularization in Java took a long time, and most other languages have that problem, too.

JAlexoid · on Nov 13, 2020

I disagree. Python is not inherently easy to read.

Explicitness and naming standards screw up the clarity of any code... Not to mention the complexity when you get into OOP.

andi999 · on Nov 14, 2020

Also lack of switch-case statements dont help. (Workaround is either if statements or dict of functions to be called)

ineedasername · on Nov 12, 2020

>People stuck in sprawling enterprise codebases, with O(million) lines of code to wrangle, seem almost universally miserable with the language.

This seems to be the case with most languages, especially if good code control isn't practiced, and unfortunately that's not uncommon.

andi999 · on Nov 14, 2020

Is concurrency really an issue? Yes, you do not have threads, but you can launch multiple processes. Do you really need to habe shared memory for your concurrent needs (i think it is muuuch easier to introduce subtle bug into a shared memory concurrency (threads))

kbumsik · on Nov 13, 2020

Could I ask you what language you use instead then?

talolard · on Nov 12, 2020

This described is perfectly

kitanata · on Nov 12, 2020

We use poetry for apps in production. At this point I think that's the winning solution and as it continues to grow and improve I think it will overtake all the others in this respect.

derefr · on Nov 12, 2020

People keep saying that about every new solution. But then another one comes along that's even better-er, and the previous one peters out.

The biggest need for a package manager and its ecosystem is continuity: the stance that new features and paradigms will be gradually shifted toward — without package-ecosystem incompatibilities, without CLI commands just disappearing (but instead, with long deprecation timelines), etc.

In other words, an officially-blessed package manager is one where, when something better-er comes along, it gets absorbed by the existing thing, instead of replacing it.

That is what the Python ecosystem is missing.

throwaway894345 · on Nov 12, 2020

I don’t think it’s that another one comes along that’s better as much as it is the new “better” ends up missing some important corner case. Pipenv advertised itself as solving all problems, but once someone tries it in practice, they realize that it introduces a new problem: every little interaction takes literally 30 minutes for any non-toy project. I’ve heard mixed things about poetry, but I wouldn’t be surprised in the least of it failed to behave as advertised just because this has been my experience with every package manager I’ve tried to use. And it’s embarrassing when every other language has a package manager that just works.

EDIT: It was probably misleading to characterize Pipenv as advertising itself as solving all problems; it’s probably more correct to say that it’s significant weaknesses weren’t advertised and thus one has to invest considerably before discovering them for oneself.

memco · on Nov 12, 2020

Just a heads up to anyone who hasn’t looked recently: pipenv has been very actively worked on since earlier this year and has had four updates that fix a lot of issues. Earlier this year I would have said Poetry is better hands down, but after the updates and after using poetry and seeing some of its quirks, it’s a much closer matchup.

ifore · on Nov 12, 2020

If it wasn't so opinionated it might have been more successful.

Just one example: you want your virtualenvs to be created in ~/.virtualenvs so that pipenv is a drop-in replacement for virtualenvwrapper+pip? Tough luck for you, Kenneth Reitz doesn't think that's how it should be done.

At least 3 or 4 times some issue I've wanted resolved I found in the issue tracker with the last message "we'll have to check with kennethreitz42 whether we're allowed to change that" and then silence for a year.

It could still catch up with poetry, but from what I've seen there's a fundamental mindset difference in how change requests are approached between pipenv and poetry.

dagw · on Nov 13, 2020

Last I checked (3-4 month ago) Pipenv only cares about the situation when you are deploying code on machines (or containers) you have complete control over. If you're writing code for deploying on machines you don't have control over, via for example pip install, then pipenv isn't helpful while poetry supports this out of the box.

michaelmior · on Nov 12, 2020

Interesting. I haven't used pipenv on any very large projects, but I'm surprised to hear about the slowness. With the (admittedly small) projects I've tried it, I found that it does more or less just work.

throwaway894345 · on Nov 12, 2020

As I understand it, the problem is that Pipenv needs to resolve the dependency tree to do just about anything; however, the dependency tree is dynamic—to determine a package’s dependencies, you have to download the package and run it’s setup.py. To get the whole tree, you have to recursively download and run each package. So the cost is proportional to the size of the dependency tree, so it’s very plausible that it works fine for the smallest projects.

lmm · on Nov 13, 2020

> The biggest need for a package manager and its ecosystem is continuity: the stance that new features and paradigms will be gradually shifted toward — without package-ecosystem incompatibilities, without CLI commands just disappearing (but instead, with long deprecation timelines), etc.

I disagree. I used to think that that's the problem, but having seen a few more cycles of it, the problem isn't that kind of commitment - after all, the whole python ecosystem enthusiastically jumps into the new thing, and Python people are used to relatively short deprecation cycles. The problems are the actual problems; every Python package manager is just embarrassingly awfully bad as soon as you try to use it for 5 minutes, presumably because they're developed by Python people who've never used a decent package manager and so think that no-one could ever need deterministic dependency resolution, once you've pinned a transitive dependency there surely wouldn't be any reason to ever want to unpin it, having the package manager coupled to the language version is absolutely fine, no-one could ever want a standard way to run tests ...

solidasparagus · on Nov 12, 2020

What the Python ecosystem actually needs is a single opinionated perspective on versioning that is followed by everyone, such as NPM's semantic versioning. In the absence of that I don't see how dependency resolution and thus packaging is ever going to improve in Python.

JAlexoid · on Nov 13, 2020

Guido is very opinionated...

He just doesn't care about package management.

DangitBobby · on Nov 12, 2020

Yes, Poetry should be the blessed package manager.

mumblemumble · on Nov 12, 2020

I only want Poetry to become the be-all-end-all of package managers if it turns out that Python really is never going to fix the core problems that have engendered so many of the hacks upon which Poetry (and its competitors) is precariously balanced. Pyenv and venv, for example.

bshanks · on Nov 12, 2020

If you were doing a green field redesign, how would you want Python to fix the core problems?

dragonwriter · on Nov 12, 2020

> Yes, Poetry should be the blessed package manager.

Last time I tried to use poetry (and this is why it was the last time I tried to use poetry), it ignored global pip settings and had no documented mechanism for its own settings (I believe poetry uses its own implementation of or captive install of pip) which made it completely unusable in a corporate environment with annoying SSL interception issues to work around where pip + venv worked.

Poetry is a much smoother experience when it works, though.

DangitBobby · on Nov 15, 2020

It will install a virtual env if you don't have one active and it will use the active one if you do. What global pip settings, for example?

kavalg · on Nov 13, 2020

Although I generally like it, the two major issues I have with poetry are the abysmal dependency resolution times and the handling of binary wheels.

croh · on Nov 12, 2020

> People keep saying that about every new solution. But then another one comes along that's even better-er, and the previous one peters out.

I think this is happening these days frequently. People try to cover all use cases and then end up in biting more than they can chew. It won't work that way. Good set of MINIMALS, is easy to maintain, sustain and extend.

roseway4 · on Nov 12, 2020

Much of Python's growth has been driven by data science. Here, the conda package manager is pretty ubiquitous. Conda packages system and other non-Python dependencies (such as the CUDA SDK), removing the need for data scientists to resolve these non-trivial dependencies themselves. This is likely unneeded/ unwanted for production web app deployments.

Given the varied use cases for Python, the goal of a single package manager may be misguided.

mumblemumble · on Nov 12, 2020

My understanding is that the people who developed Conda would love to have stuck with pip, and originally wanted to see about upgrading pip to support their use cases. And it was GvR himself who told them that that wasn't going to happen.

That was a long time ago, though, when scientific computing was a small niche for Python. It might have been reasonable to say it's not worthwhile to take on all that extra work just to support the needs of a small minority of users. Fast forward the better part of a decade, and it turns out that scientific computing did not stay a small niche. I think that one could make a strong argument that, in retrospect, that brush-off did not end up ultimately serving the best interests of the Python community. It made the community more fragmentary, in a way that divided, and therefore hindered, efforts at addressing what has proven to be one of Python's biggest pain points.

mixmastamyk · on Nov 12, 2020

Conda predates pip by perhaps a decade.

mkl · on Nov 12, 2020

Really? The first release of pip was in 2011 [1] and the earliest release of Conda I can find is 1.1.0 in Nov. 2012 [2], and the first public commit (into an empty repo) was a month earlier [3].

[1] https://en.wikipedia.org/wiki/Pip_(package_manager)

[2] https://github.com/conda/conda/tags?after=1.3.0

[3] https://github.com/conda/conda/commit/c9aea053d8619e1754b24b...

mixmastamyk · on Nov 12, 2020

May be anaconda that I'm thinking of.

dragonwriter · on Nov 12, 2020

Anaconda was released in 2012, as well. Conda is a tool that is part of anaconda.

mixmastamyk · on Nov 12, 2020

This one: https://en.wikipedia.org/wiki/Anaconda_(installer)

I forgive myself, it's pretty confusing :D.

dragonwriter · on Nov 12, 2020

Ah. Name collisions suck.

mixmastamyk · on Nov 12, 2020

Especially when they conceptually do the same thing.

JAlexoid · on Nov 13, 2020

And that's another issue with Python ecosystem

mixmastamyk · on Nov 13, 2020

Why do you think Python is more susceptible than other platforms?

It is true that PyPi was designed before the author/project naming scheme popularized by github. Other than that I don't see a greater problem with name collisions in Python.

JAlexoid · on Nov 14, 2020

Susceptible - yes, all platforms could fall to this.

That's why a strong leadership in a community, or subcommunities, works well. Python lacked this leadership, that leads to millions of half-arsed projects that compete... without moving the whole platform forward. It feels like NIH syndrome has permeated Python. Hopefully that's going to change

nxpnsv · on Nov 12, 2020

I also use poetry for everything. I have 0 problems, things work on my mac, my interns pc, aws instances, I don't even see what problem people are having. Before that I was using pipenv, and before that just good old requirements.txt - there were a few occasional issues, but really not much even then. At this point, I suspect it is more about regurgitating a complaint than a real issue. But, I could be lucky and completely wrong...

ifore · on Nov 12, 2020

- until a few months ago no way to sync an environment with a lockfile (remove packages that shouldn't be there)

- no way to check if the lock file is up to date with the toml file

- no way to install packages from source if the version number is calculated (this will likely never be fixed as it's a design decision to use static package metadata insetad of setup.py, but is an incompatibility with pip)

- no way to have handle multiple environments: you get dependencies and dev-dependencies and that's it. You can fake it with extras, but it's a hack

- if you upgrade to a new python minor version you also have to upgrade to the latest poetry version or things just fail (Something to do with the correct selection of vendored dependencies. May have since been fixed -- new python versions don't come out all that often for me to run into it. And in fairness the latest pip is typically bundled with each python so it avoids that issue)

I still use poetry because it's more standard than hand-rolled pip freeze wrapper scripts, and there's definitely progress (the inability to sync packages was a hard requirement for me but is not fixed) but it's not quite there yet

nxpnsv · on Nov 13, 2020

Interesting, i usually rebuild my env from pacakges so don't notice 1,2, or 3. I guess 2 should be fixable by poetry by including more from the toml in the lock file. Point 4 also didn't bother me as I in general just have the main and dev deps, this seems an easier thing to fix for poetry though. I actually have encountered 5 when fiddling around with pyenv.

disgruntledphd2 · on Nov 12, 2020

If you don't need c or c++ dependencies it's ok. If you do, it's very very painful. To be fair, most of the DS libraries can be handled by conda, but if you need both conda and pip, then you're going to have a bad time. (Source: this is my life right now).

codethief · on Nov 12, 2020

Oh man, this is my life right now, too. In my case, we're using tensorflow or tensorflow-gpu, depending on the host system and, unfortunately, only Conda offers tensorflow-gpu with built-in CUDA. Add to this that the tensorflow packages themselves are notoriously bad at specifying dependencies and that different versions of tensorflow(-gpu) are available on conda-forge, depending on your OS.

disgruntledphd2 · on Nov 13, 2020

Tensorflow is the worst (along with ReAgent from FB).

I think it's because they have their own internal build systems, but they never play well with pip/conda et al.

One of my recent breakages was installing the recsim package, which pulled in tensorflow and broke my entire app. There's actually a recsim-no-tf package on PyPi, presumably because this happens to loads of people.

nxpnsv · on Nov 13, 2020

I see, I miss a lot of issues as don't use any GPU stuff, mainly flask + scipy and friends.... probably this is what saves me.

disgruntledphd2 · on Nov 15, 2020

It's not even the GPU versions, even the CPU stuff causes issues.

The core problem is that pip will happily overwrite your existing dependencies when you attempt to install a new package.

lmm · on Nov 13, 2020

If you don't know what the problems with pipenv or requirements.txt were, you're really not qualified to judge whether poetry has solved them or not.

nxpnsv · on Nov 13, 2020

You are reading wrong, it did solve my issues with reqs and pipenv. Also you certainly aren’t qualified to judge my qualifications.

hobofan · on Nov 12, 2020

Yes, Poetry is great! I avoided Python for a long time due to it's bad package management/environment handling situation, but Poetry solved all my problems there.

ccanassa · on Nov 13, 2020

I wouldn't advise using anything other than the tools blessed by the PSF for mission-critical stuff. Using Poetry for local development is fine but don't build a huge infrastructure around it and don't use in production.

I migrated the CI/CD of my company to Poetry some time ago, it worked fine for some time until we needed a feature that Poetry didn't support. I submitted a PR adding the feature to Poetry but their sole developer was apparently taking some time off and the project remained without any development for several months.

I migrated the CI/CD to use my own Poetry fork but it was very cumbersome, Poetry has a very weird build system so forking it is not simple.

At this point, I realized that I was just wasting time. There is nothing that Poetry does that the other (old and stable) tools don't do. Poetry was the result of me falling for the shiny toy syndrome.

nickbauman · on Nov 12, 2020

So I hear Poetry is the way to go these days for python.

But a plurality of the people I encounter in the Clojure community came there because leiningen (Clojure's package manager that uses Maven under the covers) "just works" and they got tired of having a tough time reproducing builds consistently on other platforms / OSs with Python; not to mention the performance gains of the JVM.

JAlexoid · on Nov 13, 2020

Python's package management is light years behind, the much hated Maven.

hodgesrm · on Nov 12, 2020

But if you fix Python package managers that will remove 50% of the audience for Docker. Think of the children! ;)

meowface · on Nov 12, 2020

poetry feels like the closest equivalent to cargo that I've used. pipenv is better than the previous status quo but is still oddly unstable, with random new issues I encounter with every release. poetry "just works" for me, has better dependency resolution, and IMO has a nicer interface and terminal output to boot.

codethief · on Nov 12, 2020

Could you elaborate on what issues you've had with pipenv? I've only had very good experiences with it, so I'm surprised how many people here seem to prefer poetry.

BerislavLopac · on Nov 13, 2020

https://chriswarrick.com/blog/2018/07/17/pipenv-promises-a-l...

codethief · on Nov 19, 2020

Regarding not being able to work outside the project root[0]: This is actually one of the things that I love about pipenv! Anaconda, for instance, has environments that are not tied to a directory and are referred to by name (rather than by a directory path) and I've found this to be an absolute nightmare and extremely cumbersome! Not everyone has 10 projects that can share the same environment. (Besides, I would argue they never should.) I, for instance, have 10 projects that all require a slightly different environment and it's much easier to type a generic `pipenv shell` on the command line no matter what project I'm in, rather than trying to remember the Conda environment's name time and again. (Besides, it can be easily automated using .bashrc.)

[0]: https://chriswarrick.com/blog/2018/07/17/pipenv-promises-a-l...

0xbadcafebee · on Nov 13, 2020

> sponsor it through Microsoft $$$

You don't want that. When companies "sponsor" things they try to take them over, unless it's a pure donation, which is rare. The community then drops out because a company is in control. Later the project is abandoned by then company. It's a slow death spiral.

I would love if Guido could create a new PEP for extending modules with generic namespaces, ala Perl/CPAN modules.

There aren't 15 different libraries for doing the same thing in Perl, there's 1. You never replace it, you extend it by making a new module in a hierarchical namespace. The same core library's code might not change in years while new extensions can keep popping up. So even if you think Requests sucks, you can make Requests::UserAgent which inherits Requests code and extends it / gives a better interface. And these can be written & packaged by completely different authors.

Then maybe Pypi wouldn't have 5,000 nearly identical yet mostly unusable modules, or modules with nonsense names.

roel_v · on Nov 12, 2020

I only know a bit about Python - in what sense is pip not a package manager?

vlovich123 · on Nov 12, 2020

It's a package manager in the vein of "old-school" package managers that came from Linux distros and whatnot. It maintains a global dependency chain across your entire machine. This can be good for security fixes in that you only have 1 copy of a package & everyone references it. This is not good for development because it doesn't provide a sandbox'ed environment for you to do development in (ala cargo as others have mentioned). It also causes issue if you try to install 2 packages but they rely on incompatible versions of a popular package meaning you have to choose which package you want installed.

Some of this has been mitigated with virtualenv but having a project express it's packages & have that automatically reflected in the environment.

Finally, Cargo to my knowledge actually lets multiple dependencies exist (even within the same project!!!) so that you can have a dependency like:

                        dep1 -- dep3 <= v1.6
                      /

< my awesome project> -------- dep3 >= 3.0 \ dep2 -- dep3 >= 2.0

That's not possible if you don't have the right language hooks because module resolution needs to be aware of the version of the library (i.e. when you go `import numpy`, it actually needs to be aware of the package it's being imported from to resolve that correctly).

Now whether or not it's a good idea to support this kind of dependency stuff can be controversial. In practice though clearly it does cause problems the larger your codebase gets as you're more likely to have some nested dependency chain that's time-consuming to upgrade so you'd rather move faster than make sure you're only running 1 version of the dependency.

cactus2093 · on Nov 12, 2020

For some reason I've run into this type of dependency chain issue many times in JS, but have never run into it in Python despite using both languages pretty heavily. Maybe because the JS ethos is to change things so quickly, if you're not making a major breaking change to your package's API every year or two then you're getting left behind (only kind of joking). Also probably because the standard library is so small in JS (or at least it used to be, and may projects want to be compatible with at least some older browsers) so the average number of dependencies that a typical library has is probably much higher than in Python.

I really don't get the fuss about the global dependency management though, maybe I would change my mind if Python shipedp a great implementation of it. But I feel like the problem is already solvable in multiple ways with containers, VMs, or virtualenvs and I don't think yet another abstraction to separate environments would add much value to my day to day workflow building python apps.

divbzero · on Nov 12, 2020

This is my experience too. I’ve never actually encountered a global dependency conflict with Python pip though in theory it’s possible. But I have encountered the same version conflict problem in dependency chains that exists with Node npm.

And yet, I hear so many more complaints about Python pip and I really don’t understand the disconnect. Perhaps dislike of pip is actually triggered by usability issues? And then people look for other reasons to explain their dislike?

disgruntledphd2 · on Nov 12, 2020

It happens with the data science/scientific programming stack a lot, at least twice in the last month I've pulled in a small dependency that changed my numpy version which broke everything.

divbzero · on Nov 12, 2020

Thanks. I suppose it could be related to the rate of change in the ecosystem. Python’s data science / scientific programming stack definitely changes faster than Python’s web stack which is currently my primary use case.

As mentioned elsewhere in this thread, resolving this dependency issue would require a change in the Python language itself.

JAlexoid · on Nov 13, 2020

Maybe you just don't rely on packages that have different update cycles.

If you're only working with the standard library, boto3, twisted and redis - you're unlikely to have issues. You get into big issues, when you get to more obscure libraries... or libraries that are C bindings.

d0mine · on Nov 12, 2020

> It maintains a global dependency chain across your entire machine.

There are per python binary, per user, per virtualenv installation (per project or per whatever you like) that make the conflict less likely.

Sometime packages "vendor" their dependencies e.g., there is `pip._vendor.requests` (thus you may have different `requests` versions in the same environment).

There were setuptools' multi-version installs https://packaging.python.org/guides/multi-version-installs/ (I don't remember using it explicitly ever -- no need)

dralley · on Nov 16, 2020

> Finally, Cargo to my knowledge actually lets multiple dependencies exist (even within the same project!!!)

That's not pip's fault, that's Python's fault. Python's module system has no concept of versioning, so there can only ever be one copy of a module that has a given name.

And this is an interpreter detail that is exposed through the language itself, so it can't be fixed without causing severe pain.

eeZah7Ux · on Nov 12, 2020

> Finally, Cargo to my knowledge actually lets multiple dependencies exist

That's a bug, not a feature. It enables sloppy development and the disasters like on NPM

vlovich123 · on Nov 13, 2020

As I said above:

> Now whether or not it's a good idea to support this kind of dependency stuff can be controversial. In practice though clearly it does cause problems the larger your codebase gets as you're more likely to have some nested dependency chain that's time-consuming to upgrade so you'd rather move faster than make sure you're only running 1 version of the dependency.

Consider the view that some times it can be sloppy & other times it's not & it's impossible to distinguish between the two in an automated fashion.

wtetzner · on Nov 13, 2020

How so? The version ends up being part of the type, so a type from two different versions of a given package are not compatible, which solves most if not all of the issues.

If a package is just using a particular library internally, I don’t see why the package manager should prevent using it with another library that depends on a different version.

throwaway129013 · on Nov 13, 2020

> I don’t see why the package manager should prevent using it with another library that depends on a different version.

I do. The main reason for Linux distributions to exists is to provide a development and running environment where:

- API/ABIs do not change for the whole lifetime of the distribution. No new features, no new bugs, no new vulnerabilities, so that your production code can run reliably for 5+ years.

- Vulnerabilities are fixed with minimally invasive patches.

- Vulnerabilities are fixed in reasonable times even if the upstream development stopped. Patches are well tested against the set of packages in the distribution.

You simply cannot have these 3 features together if a distribution ships 10 different version of each library.

It's already a ton of work to maintain packages in stable distributions.

wtetzner · on Nov 14, 2020

I’m confused by your comment. This is about a programming language package manager, not an OS package manager. Or was that just an example?

snazz · on Nov 12, 2020

It does a bad job of dealing with versioning conflicts and multiple projects, so Python developers resort to hacks like virtual environments to get work done. Compared to Cargo or even Go modules, it's not a great solution. It's also missing lots of features that are standard in other package managers.

d0mine · on Nov 12, 2020

It may depend on you experience with each language. I have much more experience with Python than Go (almost none with Rust) and therefore I have much better time with python packaging tools (I don't remember a single issue, that I didn't find a satisfactory solution -- as much as possible in the packaging world with myriad conflicting use-cases)

For example, my experience with `cargo` (that I mostly use to install command-line utilities such as rg, fd, dust): it is great when it works as written in the instructions but sometimes it doesn't (running `cargo` may involve a lot of compiling -- in contrast to `pip` which can use wheels transparently and avoid compiling even for modules with C extensions -- I guess there might be a way to do something similar with `cargo` though not by default).

orf · on Nov 12, 2020

The need for a virtualenv has nothing to do with pip. Python only has “global” dependencies due to the way its import system works.

jrochkind1 · on Nov 12, 2020

Ruby works the same way, but bundler dependency manager solves it anyway to give you per-project dependencies not just system-wide dependencies. (I believe other well-liked dependency managers like cargo are largely based on bundler's semantics).

Perhaps ruby was more "hackable" by bundler. (Bundler has now become part of ruby stdlib, but didn't start out that way, it definitely started hacking around the way the more fundamental stdlib 'rubygems' worked).

dragonwriter · on Nov 12, 2020

> Ruby works the same way

Kind of, if you ignore Rubygems, which is also part of stdlib at a lower level then bundler (and also, originally wasn't.)

> but bundler dependency manager solves it anyway to give you per-project dependencies not just system-wide dependencies.

It can do that because rubygems manages multiple installed versions of packages and allows per-project ("per call to require", potentially, IIRC) specification of which one to pull from the globally-installed versions (this was originally done by monkey patching require when rubygems was an add-on.) This lets bundler easily live on top of it providing per-project dependencies somewhat more smoothly than Rubygems does without requiring anything like a venv.

> Perhaps ruby was more "hackable" by bundler.

Ruby is ludicrously hackable, yes.

> (Bundler has now become part of ruby stdlib, but didn't start out that way, it definitely started hacking around the way the more fundamental stdlib 'rubygems' worked).

Rubygems also wasn't part of stdlib originally, and started out relying on hacking around the way Kernel#require works.

jrochkind1 · on Nov 12, 2020

> It can do that because rubygems manages multiple installed versions of packages

Oh wow, the default python dependency manager only lets you have one version of each package installed system-wide?

Yeah, that is a limitation. As opposed to rubygems (the first dependency manager although as you say not originally built-in to ruby) which has system-wide install, but always let you have more than one version installed.

Without fixing that one way or another, there's no sensible way, true. virtualenv is certainly one way to fix it. I wonder if there would have been a more rubygems way to fix it.

orf · on Nov 12, 2020

Honestly I found the multiple versions approach more complex, confusing and more hacky. A virtualenv is just “node_modules” that also contains a Python executable.

It’s a directory - you delete and create them at will, fast, and don’t worry or care about the system Python. Having some crazy setup that patches “require” to handle concurrently installed package versions seems insane, especially if you cannot actually use them concurrently in the same Ruby process. So, segmenting them by project (aka virtualenv) seems like the best solution.

orf · on Nov 12, 2020

so they’ve just automated the virtual environment creation then. There’s nothing about pip that is global or not global. It unzips files from pypi into a directory. Python, not pip, doesn’t really support a “node_modules” style setup. We use virtual environments (venv/) which is somewhat similar.

jrochkind1 · on Nov 12, 2020

Sort of. The equivalent of "virtualenv" would be more "rvm gemsets", which is what people did before bundler. Bundler is doing something different.

Bundler is not a "node_modules" style setup. It does not require dependencies to be in a local path (although they can be, the default is they live in a system-wide location, and this does not limit functionality). It also does not support more than one version of a dependency in the same execution environment (as node_modules does) -- that really would be impossible in ruby too.

It's possible something about python's design would make the bundler approach impossible, I don't know. But it's not "dependencies are installed globally" alone, as that's true of ruby too.

We would probably all benefit in understanding better how these things are handled in other environments. And I include myself here. I think ruby's bundler really set a new standard for best practices here, and many subsequent managers (like cargo) were heavily influenced by it, although many don't realize it. But meanwhile many don't even realize what they are missing or what's possible.

Like the basic idea of having a specification of top-level dependencies (including allowable ranges) separate from a "lockfile" of exact versions in use of ALL dependencies... is just so hugely useful I never want to do without it, and I think is compatible with just about any architecture, and yet somehow JS is still only slowly catching on to it.

zo1 · on Nov 12, 2020

Not quite true. Python import mechanics are quite hackable and controllable programmatically and externally via controlling the relevant PATH env variables. It's just that no one bothers with it and instead seems to rely on the set of global folder-path lookup mechanics that are standard.

But nothing stops you from masking global-libraries with local library versions (similar to node_modules). Why hasn't anyone done this you may ask? I don't know the answer to that.

orf · on Nov 12, 2020

There are a few reasons you can’t have multiple versions of the same module at the same time. Consider a simple enum class defined in a package: two versions now have two different enum objects which may not compare equally. Maybe a function from module A would return Enum.A and pass it to module B, which would then compare A.Enum.A to B.Enum.A, which fails. Super confusing.

So yes, Python’s import system is dynamic enough to do crazy things, but I don’t see how we can ever retrofit that into Python.

Regarding dependencies installed into a project-local directory (node_modules): that’s a virtual environment. Just more flexible.

ptx · on Nov 12, 2020

What's wrong with virtual environments? The required tools come bundled with Python nowadays and are super easy to use.

jtdev · on Nov 13, 2020

Agreed 100%,in fact of the languages I work with regularly (Python, Java, C#, JavaScript, Go) Python has the simplest dependency management solution via virtualenv + pypi + pip. Not sure why every Python thread turns into a conversation about the pain of Python dep management... seems overblown.

_v7gu · on Nov 13, 2020

The biggest piece that is missing is you have to go out of your way to get sane dependency management. I have only used JS and F# (same dependency management as C#) out of your mentions, and it's the official tools that enable the local dependency management with a single command ("npm install X" or "dotnet add package X").

If you're using python, you don't know to check out pyenv/etc until you have a huge mess on your computer due to pip's behavior.

JAlexoid · on Nov 13, 2020

Hm.... Let me see. The number of operations I have to perform in Maven vs Python for independent packages:

Maven Scala project - create skeleton, add libraries to POM, write app, run app

PIP Venv Python project - create venv, enable venv, create requirements file, write app, run pip to install dependencies(possibly install GCC and extra libraries), run app

(Oh... and god forbid that you forget to deactivate venv)

You're lying when you say that library management is easier in Python. It's just factually untrue.

ptx · on Nov 13, 2020

You don't have to activate the environment. I never do; it's a strictly optional convenience (it you think it's convenient).

Instead, simply run the interpreter installed in the environment when you run your app, e.g. "./my_env/bin/python my_app.py", and things will just work. No activation required, no special mode, nothing to forget.

The part about requirements.txt and installing packages could also be simplified if you did it the other way around: install first and create the requirements file from that:

  $ python3 -m venv my_env
  $ my_env/bin/pip install some-dependency
  $ my_env/bin/pip freeze >requirements.txt
  $ my_env/bin/python3 my_app.py

There you go. Setup, install and run in four steps and zero modes.

JAlexoid · on Nov 13, 2020

That's literally one step more, than Maven.

That's before you get to package your app...

Dylan16807 · on Nov 13, 2020

Most of the steps you're listing take only a few seconds. They're talking about the actual management, not how long it takes you to type . venv/bin/activate

JAlexoid · on Nov 13, 2020

A few seconds here and a few seconds there - it's death by a thousand papercuts.

There's no community consensus - that keeps Python from advancing to where it needs to be.

I said it once and I'll say it again - Python lacks mature tooling.

Dylan16807 · on Nov 13, 2020

You can't have death by a thousand papercuts when there are exactly three delay-papercuts adjacent to a step that takes a significant amount of time.

There are contexts where little delays matter, and you didn't pick one of those.

jtdev · on Nov 13, 2020

Maven is far more complicated and burdensome when compared to virtualenv and pip... pom.xml, what a righteous mess of XML and overly specified nonsense.

JAlexoid · on Nov 13, 2020

This is just to run your app, to package it - it's a whole different headache in Python.

It's literally a `mvn jar` and that's it!

vmchale · on Nov 12, 2020

> so Python developers resort to hacks like virtual environments to get work done

It's really hard with many deps, it's why cabal (for instance) moved away from a global model.

nickjj · on Nov 12, 2020

> in what sense is pip not a package manager?

It is a package manager but it lacks features that many other package managers have in Ruby, Node, Elixir, and other languages.

For example there's no concept of a separate lock file with pip.

Sure you can pip freeze your dependencies out to a file but this includes dependencies of dependencies, not just your app's top level dependencies.

The frozen file is good to replicate versions across builds but it's really bad for human readability.

Ideally we should have a file made for humans to define their top level dependencies (with version pinning support) and a lock file that has every dependency with exact pinned versions.

vlovich123 · on Nov 12, 2020

FWIW I had a lot of success using https://github.com/jazzband/pip-tools to have dependencies automatically managed in a virtualenv.

* Basically I would have a single bash script that every `.py` entrypoint links to.

* Beside that symlink is a `requirements.in` file that just lists the top-level dependencies I know about.

* There's a `requirements.txt` file generated via pip-tools that lists all the dependencies with explicit version numbers.

* The bash script then makes sure there's a virtual environment in that folder & the installed package list matches exactly the `requirements.txt` file (i.e. any extra packages are uninstalled, any missing/mismatched version packages are installed correctly).

This was great because during development if you want to add a new dependency or change the installed version (i.e. pip-compile -U to update the dependency set), it didn't matter what the build server had & could test any diff independently & inexpensively. When developers pulled a new revision, they didn't have to muck about with the virtualenv - they could just launch the script without thinking about python dependencies. Finally, unrelated pieces of code would have their own dependency chains so there wasn't even a global project-wide set of dependencies (e.g. if 1 tool depends on component A, the other tools don't need to).

I viewed the lack of `setup.py` as a good thing - deploying new versions of tools was a git push away rather than relying on chef or having users install new versions manually.

This was the smoothest setup I've ever used for running python from source without adopting something like Bazel/BUCK (which add a lot of complexity for ingesting new dependencies as you can't leverage pip & they don't support running the python scripts in-place).

xapata · on Nov 12, 2020

> Sure you can pip freeze your dependencies out to a file but this includes dependencies of dependencies, not just your app's top level dependencies.

Isn't that a good thing?

> no concept of a separate lock file with pip.

setup.py/.cfg vs requirements.txt, no?

nickjj · on Nov 12, 2020

> Isn't that a good thing?

Yes, a very good thing.

> setup.py/.cfg vs requirements.txt, no?

A lot of web applications aren't proper packages in the sense that you pip install them.

They end up being applications you run inside of a Python interpreter that happen to have dependencies and you kick things off by running a web app server like gunicorn or uwsgi.

For a Python analogy vs what other languages do, you would end up having a requirements.txt file with your top level dependencies and when you run a pip install, it would auto-generate a separate requirements.lock file with all deps pinned to their exact versions. Then you'd commit both files to version control, but you would only ever modify your requirements.txt by hand. If a lock file is present that gets used during a pip install, otherwise it would use your requirements.txt file.

The above work flow is how Ruby, Elixir and Node's package managers operate out of the box. It seems to work pretty well in practice for ensuring your top level deps are readable and your builds are deterministic.

Currently there's no sane way to replicate that behavior using pip. That's partly why other Python package managers have come into existence over the years.

xapata · on Nov 12, 2020

I don't understand the distinction you're making. Are you pip-installing or not? If not, why not?

My method for deploying a web application is to have a Dockerfile which pip-installs the Python package, but I could see someone using a Makefile to pip-install from requirements.txt instead. In fact, I use `make` to run the commands in my Dockerfile.

nickjj · on Nov 12, 2020

> Are you pip-installing or not? If not, why not?

I am running a pip install -r requirements.txt when I do install new dependencies. I happen to be using Docker too, but I don't think that matters much in the end.

xapata · on Nov 13, 2020

Docker does matter, because the Docker image should take the place of requirements.txt (your "locked" dependencies) in your deployment process. I suggest you pip-install the package, rather than the package's requirements.txt file.

nickjj · on Nov 13, 2020

> Docker does matter, because the Docker image should take the place of requirements.txt (your "locked" dependencies) in your deployment process.

In practice it doesn't tho.

Let's say I'm working on a project without a lock file and commit a change that updates my dependencies. I get distracted by anything and don't push the code for a few hours.

I come back and push the code. CI picks it up and runs a docker-compose build and pushes the image to a container registry, then my production server pulls that image.

With this work flow there's no guarantee that I'm going to get the same dependencies of dependencies in dev vs prod, even with using Docker. During those few hours before I pushed, a dep of a dep could have been updated so now CI is different than dev. Tests will hopefully ensure the app doesn't break because of that, but ultimately it boils down to not being able to depend on version guarantees with Docker alone.

There's also the issue of having multiple developers. Without a lock file, dev A and B could end up with having different local dependency versions when they build their own copy of the image.

I've seen these types of issues happen all the time with Flask development. For example Flask doesn't restrict Werkzeug versions, so you wake up one day and rebuild your image locally because you changed an unrelated dependency and suddenly your app breaks because you had Werkzeug 0.9.x but 1.x was released and you forgot to define and pin Werkzeug in your requirements.txt because you assumed Flask would have. The same can be said with SQLAlchemy because it's easy to forget to define and pin that because you brought in and pinned Flask-SQLAlchemy but Flask-SQLAlchemy doesn't restrict SQLAlchemy versions.

Long story short, a lock file is super important with or without Docker.

xapata · on Nov 13, 2020

Use the same method to verify in dev as in staging (Docker image). If you don't know it works in staging, then you didn't know in dev either.

baq · on Nov 12, 2020

yes, but don't underestimate the power of convention.

if you make pip run 'pip freeze > requirements.txt.lock' after every 'pip install whatever', you almost solve that particular problem if setup.py is configured to parse that (it isn't by default and there's no easy way to do that!)

xapata · on Nov 12, 2020

That's the whole point of distinguishing between logical dependencies and reproducibility dependencies. I use setup.cfg to describe the logical dependencies, I supply a requirements.txt (or environment.yml, or a Dockerfile) to provide the tools necessary to create a deployable build.

musingsole · on Nov 12, 2020

Isn't that effectively the result of a typical `setup.py -> pip compile -> requirements.txt` flow?

The setup.py file contains a human readable designation of requirements and then `pip compile` generates a requirements.txt with all deps' (and deps of deps') versions specified.

zo1 · on Nov 12, 2020

Honestly - My non-polite, personal impression of all the complaints is that they're borne from a very specific development environment. One that includes lots of dependencies similar to "npm", wants "docker style" local development, and the devs seem to think dependency-management is hard and you need complicated "semantic" versioning and version operators... but really it's just because they're working in a complex ecosystem of microservices.

But, for the other 99% projects though: Most of their dependencies won't break compatibility, you'll never uncover a hard version dependency that the package manager can't solve, you'll never need to "freeze" your dependency versions and you can pretty much just rely on a semi-persistent environment with all your necessary packages installed and semi-regularly updated. Essentially smooth-sailing.

pmiller2 · on Nov 12, 2020

GP comment was objecting to "stupid" package managers. Maybe they think pip is stupid, because it's unquestionably a package manager.

andi999 · on Nov 12, 2020

in the functioning sense.

estomagordo · on Nov 12, 2020

care to elaborate?

staticassertion · on Nov 12, 2020

I'd love to see Poetry take off. I'm watching it pretty closely.

kamyarg · on Nov 12, 2020

Same, we switched from pipenv ~6 months ago, had not had to worry about package/env related stuff since then, "just works".

"evangelized" a sibling team also to consider switching, they were sceptical but just recently they mentioned they also like it more.

transcranial · on Nov 12, 2020

Same here. Poetry has been a joy, after many bouts of frustration with pipenv.

apichat · on Nov 13, 2020

Gnu Guix would be a great one ! (the package manager, not the distro GNU Guix System)

https://guix.gnu.org/

https://guix.gnu.org/en/blog/2019/gnu-guix-1.0.0-released/

https://guix.gnu.org/en/blog/2020/gnu-guix-1.1.0-released/

https://guix.gnu.org/#guix-in-other-distros

https://guix.gnu.org/en/blog/2020/guile-3-and-guix/

birdyrooster · on Nov 12, 2020

Python’s biggest hurdle is and always will be speed. Pip and wheels aren’t great but I would rather stop using golang. That’s a bigger win for me to stop maintaining proficiency and tooling for a language that I honestly consider inferior for the way I think about problems.

systemvoltage · on Nov 12, 2020

Strongly disagree. Python’s adoption is primarily driven by how easy it is to get started (barring some nasties in datetime, typing, and ofcourse the package manager).

Python is plenty fast for most automation tasks.

birdyrooster · on Nov 12, 2020

You are talking past the point I am making. No doubt Python has excellent adoption because the language is incredibly idiomatic and easy to understand -- thus why it is my favorite language. Python is great in all of the ways you just stated but it could be better and I am saying that speed for CPU bound tasks is the way it could be better. I am so tired of binning Python in favor of golang the moment latency becomes important, I would like to use it all the time.

fermigier · on Nov 12, 2020

Python is fast enough for many tasks, including number crunching for which it excels (as long as you are using the right libraries).

Projects like Cython allow to tweak without too much effort the parts of a program that need an extra-boost.

Last but not least, there have been discussions in the Python community in the last weeks of ways to speed up considerably the default (CPython) implementation.

Hopefully, all of this will bear some fruits in the next 2-5 years. Guido will probably help from his new position at Microsoft.

birdyrooster · on Nov 12, 2020

Only because libraries have to use C bindings to make it fast enough. I don't think Python performance is good enough when I have to stop writing in the language that we are benchmarking to get good results.

Radim · on Nov 12, 2020

> discussions in the Python community in the last weeks of ways to speed up CPython

This piqued my interest. I found this, by Mark Shannon: https://mail.python.org/archives/list/python-dev@python.org/...

Is that what you mean?

JAlexoid · on Nov 12, 2020

Python's biggest hurdle is it's lack of good tooling.

Try building a Unix app, with data in standard Unix locations(bin, shared, lib, etc) and you'll find that you have to write custom code.

And Google searches don't help :(