Hacker News new | past | comments | ask | show | jobs | submit login
Freezing Python’s Dependency Hell (instacart.com)
200 points by ammaristotle on July 25, 2018 | hide | past | favorite | 150 comments



What's wrong with pipenv? I am genuinely curious.

On local :

    mkdir my_project_directory
    cd my_project_directory
    export PIPENV_VENV_IN_PROJECT=1 (To make the virtual environment folder determininstic(.venv/) otherwise you will get a hash based directory(my_project_directory-some-hash-value) which might not be suitable for automatic deployments in applications like docker. I don't know why this is not default.)
    pipenv --python 3.6 (or any particular version number)
    pipenv install numpy scipy pandas matplotlib requests
    pipenv graph (Gives me a dependency graph)
    git add .
    git commit -a -S -m "init" 
    git push
On remote :

    git clone url/my_project_directory
    cd my_project_directory
    export PIPENV_VENV_IN_PROJECT=1
    pipenv install
    pipenv shell
    pipenv graph

Is this workflow not enough? I have recently started using pipenv after a lot of struggle. The only issue I have is, Pycharm doesn't allow native pipenv initialisation. I always end up creating an environment manually and then importing the project. Pycharm does detect the environment though.


A lot of people seem to run into bugs and some hit a brick wall when they report them:

https://www.reddit.com/r/Python/comments/8elkqe/pipenv_a_gui...

Personally I think poetry doesn't get enough visibility. It's not as hyped as pipenv but it feels a bit nicer:

https://poetry.eustace.io/


It really serves a different audience/purpose.

Poetry replaces setup.py which is mostly used for building libraries. You still need to create your own virtualenv.

Pipenv replaces requirements.txt and handles the virtualenv for you. It can’t be used for packaging libs, but it’s primary purpose is to make developing on apps easier.

Lore seems to be much closer to pipenv than poetry.


pyproject.toml and pyproject.lock are analogous to Pipfile and Pipfile.lock

Both attempt to replace requirements.txt, both have dependency resolvers and both are workflow tools.


Yet poetry does not manage the virtualenv for you which is what these other tools do. I think poetry would find a lot more love outside of packaging where people are now using pipenv if it managed the virtualenv too.

The integration pipenv has with pyenv is also very nice.

Many people want less tools, which is the primary reason pipenv took off IMO. Creating and activating virtualenvs? mkvirtualenv? Minor python version changes and the venv is toast? Different ways of structuring requirements files? It’s a mess for juniors especially.

People that are packaging libs probably aren’t having the same difficulty with virtualenvs that junior devs are when starting at a company deving on new code bases, learning new processes and tools. But packaging and releasing python libs is challenging, so tooling to help with that is awesome.

A single tool that can do packaging, dependency management, and venv management would be embraced. If poetry doesn’t add venv management, then pipenv should add packaging management.


>Yet poetry does not manage the virtualenv for you which is what these other tools do.

That's not correct:

https://poetry.eustace.io/docs/basic-usage/#poetry-and-virtu...

>A single tool that can do packaging, dependency management, and venv management would be embraced.

It does all of those.


My apologies! I checked the wrong place in the docs before commenting, you are right.

Can you direct it to use a python other than the one linked to poetry? I don’t have CLI access at the moment.


Poetry looks nice - that dev has authored some other really nice looking libs.

After skimming the docs and tinkering with poetry a bit, I'm not sure what my workflow with it would be for containerized python apps, though - where you generally don't want virtual environments at all. Pipenv handles that case pretty well.

I might reach for it though if I were developing open-source libraries that would be distributed on pypi


You don't need to but there's also no harm in creating a virtualenv in a container.


After having used both, I'm not yet sure which is better of `pipenv` or `pip install --require-hashes` + `python -m venv`. For example, `pipenv sync` doesn't uninstall packages which were previously in the same Pipfile{,.lock}, making the sharing of Pipfile{,.lock} via version control kinda pointless. `PIPENV_VENV_IN_PROJECT` not being the default is also annoying for development.


`pipenv clean` uninstalls all packages not specified in the lock file


Pycharm just released 2018.2 with support for pipenv :) https://www.jetbrains.com/pycharm/whatsnew/#v2018-2-python


So I went ahead and tried it. To not have `export PIPENV_VENV_IN_PROJECT=1` while creating an environment is giving me an environment with the hash value. This is not good. Looks like I am gonna have to keep creating pipenv environments manually and then import projects into pycharm.


There doesn't seem to be really fast way to check if everything is up to date or not:

  $ pipenv install numpy scipy pandas matplotlib requests
  ....
  ....installs everything
  ....
  $ time pipenv sync
  Installing dependencies from Pipfile.lock (3f6ae1)…
        15/15 — 00:00:05
  All dependencies are now up-to-date!

  real	0m7.219s
  user	0m15.645s
  sys	0m1.406s
Why does it take so long just to check a bunch of hashes? Is there a better command?


Last time I tried, it also required that the target Python version be installed somewhere on the path. If pipenv used venv instead of virtualenv, something like pyenv to retrieve/install Python versions, and was distributed as a full executable (rather than requiring a bootstrapped Python) I would actually it.


Install pyenv and add a .python-version file to your project. pipenv will use it the pyenv installed python, and prompt to install it via pyenv if it's missing


pipenv will do automatic python installation if you have pyenv installed [1]. Pyenv isn't bundled in the base install, but I've been using them both and have been happy switching between environments with different Python versions.

[1]: https://docs.pipenv.org/advanced/#automatic-python-installat...


`pipenv install --python ~/.pyenv/versions/3.6.5/bin/python` works for me, and that directory is not on my path. Did you try the more explicit `--python` flag?


It uses virtualenv, rather than venv.

After discovering PYTHONUSERBASE, I no longer need any of the plethora of wrappers around venv/virtualenv.


>After discovering PYTHONUSERBASE, I no longer need any of the plethora of wrappers around venv/virtualenv.

Is there any walkthrough available? Pipenv has one deficiency is that it can be slow at times, particularly when you want to quickly install a dependency and run. Would love to know the alternative.


https://docs.python.org/3/using/cmdline.html#envvar-PYTHONUS...

Set the environment variable to a different directory for each project, and `python3 -m pip install whatever` will go into that directory.


How do you use PYTHONUSERBASE?


    $ export PYTHONUSERBASE=/path/to/project-specific-python-user-base
    $ python3 -m pip install --user whatever
Everything now goes into that directory. Different projects on the same machine have different directories and so can't affect one another.


Up till not long ago pipenv was not ready - for example it could not install packages like gevent. That bug is fixed now I believe.


I think you are supposed to use `pipenv sync` on the remote to get the pinned versions from the lock file


That's honestly my biggest gripe with pipenv: which command is best for a CI run?

My current magic is:

    pipenv sync $(pipenv --venv > /dev/null || echo '--python 3.6') --dev
The reason is that (magically) adding --python 3.6 will always create a new virtual environment, and I'd rather not do that if the cache is up to date, but running sync by itself won't.

And I think I also want to run `install --deploy`, to check if my Pipfile / lock are in sync or broken.

None of them are huge gripes, more the frustration that it almost works out of the box, but it always seems no one writing these tools ever uses them in prod...


I use `pipenv install --deploy --system`. Doesn't create a virtualenv and verifies the lock file.

Of course, I definitely understand where you're coming from. It took quite a while for me to figure that out, because --deploy is not well documented.


I'm not sure I want to use --system in CI, just to avoid differences with development machines if possible. That does simplify baking a Docker image, though.


pipenv has a huge issue that they refuse to fix: no init command. That means you can only run pipenv commands from the root directory of your project. If you accidentally run pipenv install X in a subdirectory, guess what? You just created a new Pipfile and virtualenv!

npm actually got this right, init helps, and it makes sense to traverse up directories to find a package.json.


I agree searching up would be helpful, though, honestly, build tools and other engineers assume you're doing build actions in the project root, and as some of them fail mysteriously if you're not, I often write scripts to fail fast if they're run elsewhere.

Regarding "init", are you complaining that many commands will create a new virtualenv when really only one ought to? Automagically creating the virtualenv definitely seemed cool and modern to me... for about 3 minutes.


They may have changed that behavior recently. I was trying out pipenv last week and running `pipenv run script.py` in a subdirectory printed a message along the lines of "Courtesy notice: no pipfile found in this directory. Using pipfile in [project_root]. Behavior can be customized by specifying a pipfile with [some_flag]". I'm fairly sure, but not positive, that I also was able to install modules from subdirectories into the project venv like you want.

On mobile; I may be misremembering some details. Would encourage you to check new version behavior if you're interested.


Wow thank you so much! It's crazy how much this project is changing[1], especially and more and more users/companies/projects are starting to depend on it.

[1] https://github.com/pypa/pipenv/blob/77110ed5da89823fa5954e47...


As far as I can tell, the main difference is that this also uses pyenv to manage python versions separate from system python packages. There was an article a couple of weeks ago about combining pyenv + pipenv, and this doesn't really seem to add anything over that combination except an opinionated wrapper script.


Pipenv is certainly better than npm. Although, that may be a result of the js ecosystem but still...


Pipenv is probably the best tool to keep clean the dependency workflow


Also, for now, pipenv handles badly custom pipy with authentication


I feel that all of these language specific solutions still only solve halve the problem. Your code depends on a lot more than _just_ the python libraries. And often this is exactly what makes projects break on different systems.

Let me make another suggestion: nixpkgs [0] it helps to define exactly that fixed set of dependencies. Not just on published version number, but on the actual source code _and_ all it's dependencies.

[0] - https://nixos.org/nixpkgs/


This. A lot of GPU deep learning libs depend on CUDA, cuDNN, which is not solved by pipenv / virtualenv, BUT is actually handled by conda.


Here we go again. The source of the problems in in toy package managers (and I include all language package managers here) is not just the package managers themselves, it's the "version soup" philosophy they present to the user. Not daring to risk displeasing the user, they will take orders akin to "I'd like version 1.2.3 of package a, version 31.4.1q of package b, version 0.271 of package c, version 141 of package d...", barely giving a thought to inter-version dependencies of the result.

Unfortunately, software does not work this way. You cannot just ask for an arbitrary combination of versions and rely on it to work. Conflicts and diamond dependencies lurk everywhere.

Sensible package systems (see specifically Nix & nixpkgs) have realized this and follow a "distribution" model where they periodically settle upon a collection of versions of packages which generally are known to work pretty well together (nixpkgs in particular tries to ensure packages' test suites pass in any environment they're going to be installed in). A responsible package distribution will also take it upon themselves to maintain these versions with (often backported) security fixes so that it's no worry sticking with a selection of versions for ~6 months.

However, I can't say I'm particularly surprised that these systems tend to lose out in popularity to the seductively "easy" systems that try to promise the user the moon.


Some background: A few months back I was curious about the nix style of packaging so I setup a python project using nix via nixpkgs' pythonPackages. This worked pretty well, but I kept wondering to myself if it was superior to explicitly declaring each version of a package via npm, cargo, bundler, etc.

The way to "freeze" dependencies seemed to involve using a specific git sha of nixpkgs.

From the point of view of a nix newbie, it seems that by relying on nixpkgs to remain relatively stable, you are at the mercy of your package maintainers who might introduce a backwards incompatible change resulting in a build breaking.

One of the alternatives to this was to essentially copy the nix package descriptions from nixpkgs to a projects repo to ensure that packages are explicitly declared. At this point, it felt as though I was maintaining a .lock file by hand.

Do you think nixpkgs without declaring its specific version i.e., just use pythonPackages.numpy is the best way to use nix for dependency management?


There's been quite a bit of discussion about Anaconda and conda in this thread already. Anaconda also takes this distribution approach, and it's targeted specifically at python.


Yet it will never be able to solve the system-library dependency problem in the way that Nix does.


It solves this already and has done for many years (but this depends on what you mean exactly by "the way that Nix does").


Can you elaborate? Conda packages have full metadata descriptions for system-level dependencies. I agree with you that conflicts and diamond dependencies lurk everywhere, which is precisely the reason Conda employs a SAT solver.


Using a local virtual environment and then building a Docker image removes most of the headaches. I also bundle a Makefile with simple targets. See this as an example: https://github.com/zedr/cffi_test/blob/master/Makefile New projects are created from a template using Cookiecutter.

It isn't really so bad in 2018, but I do have a lot of scars from the old days, most of them caused by zc.buildout.

The secret is using, as the article mentions, a custom virtual env for each instance of the project. I never found the need for stateful tooling like Virtualenvwrapper.


You can also set a PYTHONUSERBASE environment variable (and `pip install --user`) to scope the installed packages to the project's directory. This is effectively the same as a virtualenv, but doesn't have the requirement on bash or "activation", and it's less magical than virtualenv because these choices are explicit on each command. The tradeoff is that it can be tedious to be explicit, remembering to use `--user` and specify the PYTHONUSERBASE. If you're scripting everything via make, though, then that's not such a burden.


There is no need to activate a virtualenv to use it. Just call $VIRTUALENV/bin/python directly. Activating is just a convenience for doing interactive work.


Thanks! I didn't know about that. I'll try it out.


"Pipfile looks promising for managing package dependencies, but is under active development. We may adopt this as an alternative if/when it reaches maturity, but for the time being we use requirements.txt."

If I where given the choice between community supported/in development Pipfile/pipenv or the 3rd party supported yet-another-package-manager lore to get those best practices my money would be on Pipfile/pipenv. I've been using it for many project now and besides some minor annoyances (eg: the maintainer's love for color output that is not form follow function) it has been a great tool.


Never had a problem with dependencies in Python. Just keep it simple.

When starting a new project:

  virtualenv venv -p *path-to-python-version-you-want*
  ./venv/bin/pip install *name-of-package*
When running that project:

  ./venv/bin/python *name-of-python-file*
Many people don't realize that the venv/bin/ contains all the relevant binaries with the right library path's out of the box.


Pipenv is a good replacement for the above workflow. It manages your dependencies and virtualenvs.


Thanks for the tip, but honestly, I don't need another tool.


I was of the same mentality, until yesterday when I watched this PyCon video, uploaded 13 May 2018.

https://www.youtube.com/watch?v=GBQAKldqgZs

To cut a long story short, if you're happy with virtualenv and pip then that's great, but the idea of pipenv is to replace virtualenv and pip, which means you'll actually have one tool fewer. :)


Genuine question: does pipenv do anything that [mini]conda doesn't?


Everything you need to know about pipenv is in the linked talk. Sorry, I don't know anything about [mini]conda.


Does conda create a lockfile?


Not yet. But that's planned. See See https://github.com/conda/conda/issues/7248.


Is there a reason you don't "activate" your virtualenv?

That (with the addition of using mkvirtualenv and friends) is the workflow I use to both dev and prod and am really happy with!


I hate the whole idea of activating virtualenvs. It's a tool that makes it really easy to end up running a command in the wrong environment and see weird behavior instead of a clear error message.

I've seen variations on this scenario happen at least 3 times, for instance:

1) Somebody creates script that activates and runs django and commits it.

2) Junior runs script but the virtualenv doesn't get created for some reason.

3) The "warning virtualenv doesn't exist" message appears briefly and gets missed.

4) The junior gets "import error: cannot import django" or something.

5) They then start installing django in their system environment and... it sort of works. Except then they get another import error. And a whole bunch of python packages installed in their system environment. Yech.

Moreover, I'm really not sure what was so wrong with just running ./venv/bin/python in the first place and never having to worry about what environment you're in.


> 1) Somebody creates script that activates and runs django and commits it.

You can call the python bin inside the virtualenv and it will run as if the virtualenv was active:

  venv/bin/python -m foo.bar
Obviously it doesn't work if devs used different names for their virtualenvs. Work has a convention to always use the same name so this works pretty well.


That's 7 characters more you'll need to write all the time! Also you'll need to remember to prepend them to all scrips, ie pip, fab etc. Well that seems to me to be more error prone for juniors than telling them to always use a virtual env (ie have (envnane) in their prompt)!!


It's less error prone. Never had ^^ that scenario since and I've not run into additional problems either.

Having an extra 7 characters in a ./run.sh script doesn't really bother me. I'm not a perl developer.


I don't like "magic". I don't need anything to hijack PS1 and muck around with my shell.


I heartily agree with this. I really dislike the tools that provide their functionality by mucking around with the shell environment in (essentially random after accounting for platform variations) ways ...

Tools like nvm, rvm, ros ... If I can use a solution for managing a development context that doesn't involve mucking around with the shell environment I much prefer it. Configuration via the sourcing of shell scripts is a very fragile interface, doesn't work when with good (ie non-bash) shell, and almost always eventually leads to bugs when some workflow triggers processes in a manner that fails to inherit the shell environment...


Fair enough.


I'm not sure why the scientists don't use VMs and simply save the virtual disk files? That would at the very least allow them to verify the settings at a later date. Fresh install reproducibility doesn't seem necessary to verify experimental findings as long as the original vm is available to boot up.


My guesses are that:

1. Integrating the development environment on their host PC (for example connecting RStudio in R's case, or connecting their web browser back to a server running in the VM in the case of Jupyter) is another set of skills to master.

2. Many data analyses are memory hungry unless you want to resort to coding practices that optimize for memory consumption. The overhead of running a VM is a bummer for some scientists.

3. Many scientists are not using Linux top-to-bottom, and therefore don't have a great way of virtualizing a platform that they are familiar with (e.g. Windows, macOS)

Can people think of others? I'm sure I'm missing some.

(EDIT: To be clear, I think VMs are a great path, but I do think there are some practical reasons why some scientists don't use them)


Often scientists are using hardware to acquire new data. The acquisition hardware might be on a PC that came installed from the manufacturer where you are told not to change anything.

Touching that PC, in anyway would be considered harmful to everybody using that specific piece of equipment.

Therefore, from the beginning of your acquisition, you are basically using a machine you don't control.


I think these and other issues can be solved with technical training.


Sure, they’re all mitigatable, but that technical training is competing with a lot of other considerations within the limited brainwidth of a scientist.

From the scientist’s perspective, a lot of this can start to feel like yak shaving. The opportunity costs are real.


Eh, maybe. Virtualbox is point and click at this point, and taken on in conjunction with their institutional IT departments as hopefully they do with all desktop point and click software, totally doable with 5-10 hours of training and some typed desk procedures. Learning new tools and workflow seems to be part of the job. As I type that I also thought of a different response from the perspective of a leader and software engineer, I did not type that response.


VMs are an easy copout to a problem that shouldn't be a problem in the first place.


That's not true. Python has C libraries, some might need to be built from source, and there's good reason to not allow root access on a lot of systems (and ability to install headers/dev packages, gcc, etc). System package management is hard and coordinating with (ubiqitous, not specific to Python) language package managers magnifies it. Unless you had some other solution in mind that I've missed...


We had the ability to run packages from a custom "root" prefix for ages in UNIX. If only package management tools all worked together, with the same central store, and respecting this...

There's never a need to mess with root access -- not even to install headers/dev packages. Dev tools and compilers can also be made to look to custom locations. I mean, there's never a need aside from self-imposed limitations our OSes and tooling places upon us.

Nothing inherently complex: just tons of accidental complexity.

Those things are only hard because we never did any coordinated effort to fix them.


conda basically proves that you can install almost everything one needs in the user's home directory. they have been working more and more on being completely independent from things like system compilers as well.


Conda is a great way to get gcc 7.2 on CentOS 6. Anaconda builds all of its packages targeting CentOS 6 for broad compatibility, but with the latest compilers to ensure we have the latest security features compiled in.


Goodness, you say that like it's a good thing. Yes, it is easy to download compiled binaries from a 3rd party.


I don't understand the nature of your discourse. You agreed that maintaining software distros is not easy, some recommended conda and you seem dismissive again?


This is not true, at least not without some sort of context around the work being performed and the requirements of the workflow.


1. Build Docker image out of requirements.txt

2. Develop application

3. Repeat 1-2 until ready to deploy

4. Run Docker image in production with same dependencies as development

5. ??

6. Profit!

As long as you don't rebuild in between steps 3-4, you'll have the same set of dependencies down to the exact patch level.


This has the added benefit of letting you encode the system dependencies (OS packages) for library build time and for run time.

Docker images are also a great way to distribute Python CLI tools, certainly far better than installing via pip which either pollutes global state or is confined to a certain project's virtualenv.


Bingo. If you’re not vendoring the binaries of your dependencies as part of a release then you’re doing it wrong.

It doesn’t have to be docker, containers just makes it easy to have immutable snapshots. Anything that packages it all up (including a simple tarball) is enough.


Doesn't help developers not get different versions of packages. Lockfiles are necessary regardless of Docker.


This is important (though I'm oddly yet to run into this issue with pip; I've only had conflicts with npm and composer before). Freezing dependency sources in Docker images and using (pip install --require-hashes -r requirements.txt) for development seems to cover everything.


yeah, i was going to say the same: i never had that issue in ~7y of python work.

nowadays, requirements + docker solves 99% of everything i do.

maybe it's because i'm not using numpy and the likes?


genuine question - is nobody using anaconda/conda in production ? I have found the binary install experience in conda far more pleasant than in anything else.

Going forward, the trend is going to be pipenv+manylinux (https://github.com/pypa/manylinux), but conda is super pleasant today


I use miniconda in production, and it's awesome. It's on par with (or even better than) npm except perhaps on the number of packages in the repository, supports pip, does everything I need and then some.

I'm baffled myself at the anaconda-blindness in the general crowd, which is evident every single time this comes up for discussion.


What happens when there isn't a conda recipe for some package or inexplicably some dependency? Do I go back to pip? sudo pip ;) ? Use virtualenv?? Nothing is ever solved.......


> What happens when there isn't a conda recipe for some package or inexplicably some dependency?

You go contribute it on conda-forge? The conda team is also actively working on improving some of these problems specifically for python users. When you create a new conda environment with python in it, we put pip in it for you too. In a way, we're implicitly encouraging you to use pip along with conda, and yet it's not a seamless experience. So https://github.com/conda/conda/issues/7053 is a current effort. Eventually, we're working toward conda being able to install wheels directly (at least pure-python wheels at a minimum), so that packages available on PyPI that conda-forge or Anaconda haven't built out yet can still be installed with conda.

> Do I go back to pip? sudo pip ;) ?

If you're putting `sudo` in front of `pip`, you're probably doing it wrong ;-)


Indeed, using pip from within a conda environment is trivial.

But the preferred solution is to make a conda package for yourself, and it's really quite simple. You can host it from anaconda.org, or from a local directory for crying out loud.


Why are you moving away from conda going forward?


Not OP, but we would like to move away from it as well.

- Breaking behavior between minor versions (https://github.com/conda/conda/issues/7290)

- Environments not actually being isolated (https://github.com/conda/conda/issues/448)

- Can't create environments in long paths (https://github.com/conda/constructor/issues/156)

Those are just a few I can remember. We unfortunately have not found a strong replacement.


> Breaking behavior between minor versions

See https://github.com/conda/conda/issues/7248 for where conda intends to head in the future on the environment.yml issue.

> Environments not actually being isolated

That's actually a really sticky issue, and one that's more about the python interpreter itself rather than anything conda is doing. More recent discussion at https://github.com/conda/conda/issues/7173. Yes, we can change the default behavior of the python interpreter. Either way though, we'll be making a group of people mad.

> Can't create environments in long paths

Of course you can. `conda create` works well with long paths on unix systems (Windows is more difficult, but we're working on that too). What you're bumping into in that issue is that the constructor installer builder isn't (right now) compatible with longer paths. The solution really is to get conda bootstrapped onto your system, and then just use that one conda. You don't need full miniconda installations scattered all over the place. One easy way to do it is bootstrap conda into `/opt/conda` and then symlink `/opt/conda/bin/conda` to `/usr/local/bin/conda`. Now `conda create` whatever and wherever you want.

> We unfortunately have not found a strong replacement.

Conda definitely isn't perfect, and it's far from a "done" project. One thing we do have at this point is years of battle-hardening with something like six million active users blanketing all sorts of operating environments. With conda-forge being as strong as it is today, I'm not sure anything else like it really exists. Nix and pkgsrc are probably the closest alternatives.


Quick question - has there been an attempt to make conda and official PEP and make conda-forge part of the Python Foundation (instead of a private company).

I'm trying to figure out why is all this new manylinux PEP stuff "inspired by conda" and is not actually conda.

Is this a situation like grsecurity vs Torvalds? How does the situation change now that BDFL is gone ?


> Is this a situation like grsecurity vs Torvalds?

Ha! Conda and Anaconda, Inc. are _not_ like grsecurity. Starting with the fact the Conda is BSD-licensed.

Recounting some of the history will probably provide the context you're looking for.

It's my understanding that the birth of Anaconda (the distribution) and Conda goes back to this statement by Guido regarding package building and installing at a PyData summit in 2012:

> It really sounds like you guys' needs are so unusual compared to the larger python community that you're just better off building your own.

https://www.youtube.com/watch?v=QjXJLVINsSA&t=59m10s

This predated PEP 427 (Wheel Binary Package Format 1.0).

Of course both conda and pip/wheels have evolved immensely since 2012, and their evolution has been guided by different constraints. Elsewhere in this thread I spoke about how pip/wheels/PyPA packaging has as it's primary target the site-packages directory, and then branches out from there when necessary. Conda's primary target is the prefix root. PyPA's mandate is to build, install, and manage python packages, which makes something like 'pip install python3' out of scope. Things like 'pip install postgresql' and 'pip install r' are _for sure_ out of scope. Conda has the luxury of being able to install and manage all packages, not just python packages.

Regarding the creep of wheels toward conda packages, and especially the static embedding within wheels of compiled "system" libraries, I question whether it's actually to the benefit of python _users_. No doubt the wheel "binary" format is a huge improvement over eggs for pure-python packages. But manylinux wheels are often built going beyond just compiled python extensions. Rather than describing these "system" dependencies in metadata, and then having pip ensure these system dependencies are present before installing the package, these wheels now implicitly cross the python-only line that PyPA in other cases holds. There are real consequences for users, with the result being that it pushes failure mechanisms back rather forward.

So, because of the differences in scope, my guess is that the python community would think it inappropriate to make Conda an official PEP. Conda will probably someday install wheels (at least pure-python ones); maybe someday pip can reach out to conda or apt-get or yum to ask for non-python dependencies to be installed.

> and make conda-forge part of the Python Foundation (instead of a private company)

Conda-forge is actually a community-driven organization completely independent of Anaconda, Inc.


Binary wheels are a nightmare in terms of knowing what you have installed. We disabled them in our builds when we discovered a few packages had vendored old, terribly-insecure static versions of libxml2 (and who knows what else) which silently replaced using the system version (which received OS security updates). We only found out when we hit one of the behavioural quirks of that version and spent ages investigating it with gdb.

If you're using manylinux wheels you're probably using static libs you have no idea you're using. That way lies madness.

I havent used conda so maybe this doesn't apply there somehow.


We are a rolling Software Distribution and providing up to date software is one of our main goals.

Another is building that software with good security flags, see: https://www.anaconda.com/blog/developer-blog/improved-securi...

We also keep track of CVEs in our software and actively look for patches (e.g. pycrypto is dead now but Debian maintains patches to fix reported CVEs) or write our own (though usually to fix build-system bugs rather than security issues).

But yes, static linking and leaving software building to non-experts using whatever tools they like (without studying anything to do with low-level binary security or how to achieve that), statically linking insecure (some time to become old) libraries is far from ideal.

Anaconda Distribution strongly prefers dynamic linking and shared package dependencies so we can update to address critical security issues without needing to rebuild significant portions of our stack.


> my guess is that the python community would think it inappropriate to make Conda an official PEP.

will you guys try ? I really, really, really hope you do.


Thanks for the reply. I wrote that post earlier this morning after some other frustration, so I'm sorry for the negative tone.

Despite some of our issues with it, Conda has worked well for us both as a development team and in production for quite some time.

> six million active users

That's awesome to hear, and one of the reasons we haven't seriously invested in a replacement.


we are NOT moving away. In fact, for all its faults, conda has been the most pleasant experience. We publish our own internal packages as conda packages.

I was commenting about manylinux becoming an official PEP - https://www.python.org/dev/peps/pep-0513/ - could eventually end up supplanting conda.

I wish they had adopted conda itself. Because manylinux was clearly inspired by conda.

"Instead, we define a standard subset of the kernel+core userspace ABI that, in practice, is compatible enough that packages conforming to this standard will work on many linux systems, including essentially all of the desktop and server distributions in common use. We know this because there are companies who have been distributing such widely-portable pre-compiled Python extension modules for Linux -- e.g. Enthought with Canopy [4] and Continuum Analytics with Anaconda [5].

Building on the compatibility lessons learned from these companies, we thus define a baseline manylinux1 platform tag for use by binary Python wheels, and introduce the implementation of preliminary tools to aid in the construction of these manylinux1 wheels."


Agreed its the most practical solution for most people. Its also a shame that it shows just how unreliable python packaging is.


Interesting reading, I share some of the points in the post, however, one more dependency manager?

Mostly I've used plain `python -m venv venv` and it always worked well. A downside - you need to add a few bash scripts to automate typical workflow for your teammates.

Pipenv sounds great but there are some pitfalls as well. I've been going through this post recently and got a bit upset about Pipenv: https://chriswarrick.com/blog/2018/07/17/pipenv-promises-a-l...

Another point is that it does not work well with PyCharm and does not allow to put all dependencies into the project folder as I used to do with venv. (just like to keep everything in one folder to clean up it easily)

Are there any better practices to make life easier?


Actually, I recommend bash scripts for automating team workflows as a best practice.

You create a wrapper script around your application that calls a dev environment set-up script, that [if it wasn't done yet] sets up the environment from scratch for that project or application, and loads it before running your application. This does a couple things.

First, it removes the need to train anyone on using your best practices. The process is already enshrined in a version-controlled executable that anyone can run. You don't even need to 'install lore' or 'install pipenv' - you just run your app. If you need to add documentation, you add comments to the script.

Second, there's no need for anyone to set up an environment - the script does it for you. Either set up your scripts to go through all the hoops to set up a local environment with all dependencies, or track all your development in a Docker image or Dockerfile. The environment's state is tracked by committing both the process scripts and a file with pinned versions of dependencies (as well as the unpinned versions of the requirements so you can occasionally get just the latest dependencies).

Third, the pre-rolled dev environment and executable makes your CI-CD processes seamless. You don't need to "set up" a CI-CD environment to run your app. Just check out the code and run the application script. This also ensures your dev environment setup scripts are always working, because if they aren't, your CI-CD builds fail. Since you version controlled the process, your builds are now more reproducible.

All this can be language-agnostic and platform-agnostic. You can use a tool like Pipenv to save some steps, but you do not need to. A bash script that calls virtualenv and pip, and a file with frozen requires, does 99% of what most people need. You can also use pyenv to track and use the same python version.


Completely agree on every bullet point,

Every time I saw simple bash scripts or/and Makefile used - it did not seem to be the idiomatic way of doing things in python but after using it for a while - turned out to be one of the best development experiences.


> Another point is that it does not work well with PyCharm and does not allow to put all dependencies into the project folder as I used to do with venv.

This is annoying for AWS lambdas too, because you have to bundle the dependencies and zip it. It's pretty trivial to go Pipfile -> requirements.txt -> pip install -t if you use a Makefile, but it's definitely an omission. I asked about it on their github though and it is a known issue, hopefully it'll be there soon.


JetBrains have heard the prayers :D Here is an announce of pipenv support: https://blog.jetbrains.com/pycharm/2018/06/pycharm-2018-2-ea...

> because you have to bundle the dependencies and zip it btw, I've used serverless to deploy lambdas in python and it worked super cool. Highly recommended.


Oo nice I didn't know serverless worked with Python! Thanks for the heads up :)


I bitch a lot about npm, but then I remember that time when python's package distribution drove me to learn a new language. I can't help but notice that TFA and all the comments here are only talking about one end of this: managing your dev environment. Is there a similar work explaining how to distribute python packages in a straightforward manner? Is that article compatible with this one?


The author's justifications for using this home-grown tool over miniconda are weak at best, if not plain incorrect.

Conda really is the tool he wants; he just seems not to understand that.


How does Conda replaces a virtual environment? (honest question)


Python's virtualenvs target isolation of the site-packages directory. Conda environments are one step up in abstraction, isolating the "prefix" (in python world just the output of `sys.prefix`). The target for conda and conda environments is the management of everything within that prefix, including python itself. The target for pip, pipenv, virtualenv, and other python-only package management tools is everything within `lib/pythonX.Y/site-packages`.

The distinction is important especially for people using python's data science libraries, since those libraries are often just python wrappers around compiled code and link to shared "system" libraries. Conda manages and isolates those libraries; pip and virtualenv do not.

The distinction also has security implications, for example when openssl is statically embedded in wheels. When this happens, there isn't any real visibility into the openssl versions being used. Because conda has the flexibility of the step up in abstraction as I described before, conda can manage a single instance of openssl for the whole environment, and then the python packages needing openssl need not statically embed it.


> if not plain incorrect

The justification was that the Anaconda installer is too heavy. The kitchen sink Anaconda installer is not designed for the author's use case. Miniconda is the provided way to bootstrap conda onto a system.


Indeed, Anaconda is too heavy. But he is aware of Miniconda and even mentions it in his last bullet point. He then dismisses it in short order, with a vague complaint about mysterious "best practices".

If he really believes that his tool is somehow better, fine. But since Miniconda is the de facto standard tool among data scientists for this use-case, the burden is on him to spend more words on exactly why it doesn't work for him.


Version pinning is technical debt and a fool's errand. New versions will always come out and your new development is confined to what once worked. You need to keep testing with current versions to see what will break when you upgrade and fix it as soon as possible so as to minimize the odds of a big breaking change.

It may keep your environment stable for some time, but that stability is an illusion because the whole world moves on. You may be able to still keep your Python 2.2 applications running on Centos 3 forever, but you shouldn't want to do it.


New versions will always come out, but it's not my job to test all of them. I'd rather consciously decide when I can afford to pay off the debt.



Came here to say this. Directly freezing requirements.txt rather than using a requirements.in file is a mistake imho.


One things that comes to my mind is: when I was starting using Python, I was eager to mock Java people and their absurd approach (write everything in Java, specify a full classpath for all dependency, etc). I pointed out as it was easy and quick to program in Python rather than in Java.

I did not appreciate what the pros of a linear and well-defined (by the language) approach to the dependencies, and a clear API between the system libraries (java, javax) vs the user libraries, actually gives A LOT of value. Even though it's more cumbersome to use.


Why would you do this? Redirect chain:

  https://tech.instacart.com/freezing-pythons-dependency-hell-in-2018-f1076d625241
  https://medium.com/m/global-identity?redirectUrl=https%3A%2F%2Ftech.instacart.com%2Ffreezing-pythons-dependency-hell-in-2018-f1076d625241
  https://tech.instacart.com/freezing-pythons-dependency-hell-in-2018-f1076d625241?gi=85c0588ca374


It looks like tech.instacart.com is hosted on Medium. The redirect is part of the auth flow. If you have a Medium account, you would have logged in to medium.com, not tech.instacart.com. If you don't have a Medium account, Medium still will want to add first-party tracking information to your interaction with tech.instacart.com and all other Medium properties. So this client-side redirect flow enables them to capture that association.

This is presumably what the `gi=85c0588ca374` query parameter is in the follow-on redirect. I would guess that `gi` stands for "global identity" or something.


I ran into a migraine last week: cleaning up requirements.txt

How do you determine which requirements are no longer needed when you remove one from your code? In node, your package.json lists only packages YOU installed. So removing them cleans up their dependencies. But in Python, adding one package with pip install might add a dozen entries, none indicating they're dependencies of other packages.


At most projects we're using pip-tools which generates a fully pinned requirements.txt based on a manually kept (and clean) requirements.in which only contains the specific packages you need without their dependencies


Thanks. I'll investigate this method. It sounds like you hand write dependencies and their versions into the requirements.in file?



Use requirements.txt volatile.

We use a separate file to list the direct dependencies, 'ddeps.txt' and 'ddeps-dev.txt' for development deps.

Once we update one of these files a clean venv is created, the dependencies installed and the freeze output saved as requirements.txt. Then the dev dependencies are installed and the output of that freeze is saved to requirements-dev.txt.

This preserves the dependencies where we made the conscious choice to require them and also allows us to explicitly vet any new dependencies and versions.


I’m not sure about other people, but that is how I use requirements.txt. You don’t have to dump the entire output of pip freeze in there. You can just list the dependencies you want.


Or you can list direct dependencies in another file and regenerate requirements.txt with `pip freeze` whenever you change the other file. Especially easy with Make.


It really bothers me that they're skipping these two as separate steps. Track "what I asked for", use "what I ended up with" for deployment. Otherwise you're just saying "use pip freeze" regardless of wrapping magic around it.

If you're already down that road, pipdeptree is your friend. It will resolve your frozen packages to at least tell you which are top-level and which are dependencies-of-dependencies. There are still exceptions if you're using a dependency both directly and via another module, but having a requirements.in from the pipdeptree parents will have you covered.

Get that list, set them all to module>=version in development, pip install -r requirements.in, then pip freeze > requirements.txt to get hard version locks for deployment.

As others have stated, pip-tools handles this separation for you.


Naive question: Why does this url 302 redirect to medium.com and then medium.com forwards back to the same original url?

Is there some commercial advantage?

Why not just post the medium url

https://medium.com/p/f1076d625241

This 302 redirects to tech.instacart.com


Anybody played with the brand new XAR from Facebook?

https://code.fb.com/data-infrastructure/xars-a-more-efficien...


Yeah, very quick to get going actually.

This is an excellent post to get started http://sevag.xyz/post/xar/


Thanks for the link. That looks interesting; I'll have to give that a try. When I started reading the link my first thought was Pex from Twitter. I don't know how comparable XAR is to Pex but it's worth a look to compare the two.


Both PEXs and XARs package a python script and its dependencies in single hermetic file.

PEX is a self-extracting zip file which has to be fully extracted before being run. The extracted files could potentially be modified.

XAR is a self-mounting compressed SquashFS filesystem image. SquashFS will decompress pages lazily and cache the result in the page cache, so the startup time is much faster. Since SquashFS is read-only, the files can't be modified.


Since we're sharing XKCD cartoons, here's one that comes to mind: https://xkcd.com/927/

So not to disappoint, here's another contestant: Poetry [0]

That said, in my experience it works best if don't force any particular workflow on your developers, but maintain a solid and repeatable process for testing and deployment. People have different mental models of their development environments -- I personally use virtualfish (or virtualenvwrapper if I'm on Bash), while a colleague works with `python -m venv`; and we have played with pipenv, pyenv, anaconda and poetry in various cases.

As long as your requirements are clearly defined -- requirements.txt works perfectly well for applications, and setup.py for libraries [1] -- any method should be good enough to build a development environment. On the other hand, your integration, testing and deployment process should be universal, and fully automated if possible, and of course independent of any developer's environment.

[0] https://github.com/sdispater/poetry

[1] https://caremad.io/posts/2013/07/setup-vs-requirement/


Use a fresh virtualenv for each project

As a form of version pinning, this locks in old versions and creates technical debt. A few years downstream, you're locked into library modules no longer supported and years behind in bug fixes.


The joy of not having to deal with broken production builds when dependencies change under your feet is well worth the "technical debt" in my opinion. Reproducible builds are valuable in their own right.


We’ve recently went through this process at our company & chose to use pipenv as the dependency management tool. As mentioned in the article, pipenv is under active development but takes care of many things that we had custom scripts before such as requirements hashs, in-built graph of dependencies, automatic retries of failed dependencies, automatic re-ordering of dependency installations etc. it also has a few quirks - we had to pick a version that had most commands working & also pipenv install is painfully slow & didn’t seem to have a caching strategy for already built virtualenvs.


Doesn't using requirments.txt not account for (I forget the official name) Double Dependencies, you dependencies in requirements.txt might have a dependency whose version number may change over time.

This seems like something pip freeze could handle but doesn't.


I started using pipenv and it seems everything just works fine, except that I can't really install wxPython with pipenv, but I can live with that.


ruby practices based around bundler aren't perfect, but they did solve _this_ level of problem ~7 years ago.

It remains a mystery to me why python seems to have won the popularity battle against ruby. They are very similar languages, but in all ways they differ ruby seems superior to me.


My theory is that it's because Travis Oliphant wrote numpy for python rather than ruby.


And Python is taught in the intro to programming course in just about every college in the world.

Dumb simple languages make better teaching tools, but unlike Lisp and Smalltalk, Python was also good enough for widespread professional use.

So almost everyone is exposed to Python, many people never bothered to learn anything better. Inertia is a hell of a force.


> And Python is taught in the intro to programming course in just about every college in the world.

Why do you think that ended up python instead of ruby? Something about the language or it's uses, or just a coincidence of history?

I have no idea myself.

I think ruby and python are about equal level of both "simpleness" (neither is very simple, actually; although it depends on what you mean by 'simple') and "good enough for widespread professional use" (both are, and especially both were ~8 years ago). Or do you disagree and think they differ there?


I love Ruby, but I would still advocate for Python as a teaching language.

Ruby's grammar is objectively more complex than Python's. People generally get stuck on syntax issues when they begin learning programming. Python's significantly simpler grammar, simpler and fewer basic building blocks, and historically "only one way to do things" philosophy makes it easier to pick up.

Ruby's conventional control flow also doesn't translate well to lower level languages; Enumerable pretty much replaces all sorts of loops, functions/methods tend to be chained instead of wrapped (math style), implicit returns, and perlisms, all make Ruby more confusing for a first timer language.


When MIT stopped teaching Scheme (replaced with Python) a couple of years back, yet another essential concept in computing left the academy. This is exactly the kind of thing Kay means when he talks about computing pop-culture.

Anyone who's ever read The Little Schemer knows what I mean.


Yes, let's add another incompatible tool to the list. /s

Here's to Python 4 actually fixing this mess.


Dependency hell in Python ? The only annoying part would be missing some library to build certain packages, like lxml, etc.

That's all.

We Python developers are fortunate to have amazing tools such as pip, virtualenv, etc.


So... current tools miss some functionality. Let's invent a new one. Reminds me of another xkcd: https://xkcd.com/927/


Less blogs, more Dockerfiles. That's the solution.


This needs to be posted again: https://xkcd.com/927/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: