Hacker News new | past | comments | ask | show | jobs | submit login
How to create a Python package in 2022 (mathspp.com)
424 points by kieto on July 28, 2022 | hide | past | favorite | 145 comments



This is really nicely written; kudos to the author for compiling a great deal of information in a readable format.

If I can be forgiven one nitpick: Poetry does not use a PEP 518-style[1] build configuration by default, which means that its use of `pyproject.toml` is slightly out of pace with the rest of the Python packaging ecosystem. That isn't to say that it isn't excellent, because it is! But you the standards have come a long way, and you can now use `pyproject.toml` with any build backend as long as you use the standard metadata.

By way of example, here's a project that's completely PEP 517 and PEP 518 compatible without needing a setup.py or setup.cfg[2]. Everything goes through pyproject.toml.

[1]: https://peps.python.org/pep-0518/

[2]: https://github.com/trailofbits/pip-audit/blob/main/pyproject...


I believe that Poetry does conform to PEP 518 (i.e. it specifies `[build-system]requires/build-backend`), but not to the `dependencies` part of PEP 621 [1]. There are plans for this in the future though [2]. Though I would defer to your expertise if I'm mistaken.

[1] https://peps.python.org/pep-0621/

[2] https://github.com/python-poetry/roadmap/issues/3


Yes, this was a mistake on my part! I meant PEP 621.


> By way of example, here's a project that's completely PEP 517 and PEP 518 compatible without needing a setup.py or setup.cfg[2]. Everything goes through pyproject.toml.

Using pyproject.toml with pip / flit still has many rough edges such as pip being unable to install deps locally for development or not generating lock files. Poetry is way more mature IMO.


Maybe I’m misunderstanding what you mean, but installing dependencies locally for development (meaning development extras) and generating lock files (via pep freeze) both work for me.


Your nitpick is forgiven! Thanks a lot for this information, I was not aware of this...

However, I took a look at PEP 518 and failed to understand what was wrong with Poetry's default configuration. Can you help me out?


It's actually PEP 621 (https://peps.python.org/pep-0621/) that the OP meant to refer to.

IIRC, there's work ongoing in Poetry to allow it to support PEP 621.


Yes, I meant PEP 621. Thanks for the correction!


If I can be forgiven another nitpick regarding readability:

The font size is extremely small to the point of being unreadable on a mobile phone.


I think the font size is perfect, but I take offense to the font used for headings


I've been looking for a replacement for that font for a long time. Do you have any suggestions?


You could just use the same font as your body copy - "Atkinson Hyperligeible" with `line-height: 0.8` and `font-weight: 700` - and remove the `font-size: .8rem;` from your `body` rule.

If you want to keep a similar style: https://fonts.google.com/?category=Handwriting

Also: https://www.pagecloud.com/blog/best-google-fonts-pairings


What a great article.

We start with learning that we absolutely need this Poetry thing because… it's what everyone else uses. It's refreshing to see author who can skip usual badly argued justifications and just plain admin that he does not know shit and is just following rest of the herd. Then we continue by "solving" depependencies by usual way of ignoring them and just freezing whatever happens to be present.

Then there is inevitable firing up of virtualenv, because that's just what you have to do when dealing with messed up dependencies.

Next one is new to me. Apparently, one does not just set up git hooks nowadays but use separate tool with declarative config. Because if you ever happen upon something not covered by the Tool, that would mean you are no longer part of the herd.

Then we push our stuff straight to pypi, because of course our stuff can't possibly have any dependencies outside of python herd ecosystem. It's not like we knew our dependencies anyway.

Then comes the fun part, pulling in tox, because when you have special tool to handle dependencies, what you just need is another tool with different environment and dependency model.

Code quality section I will just skip over, seeing what pass for code quality these days makes me too sad. What follows is setup of several proprietary projects that modern opensource seemingly can't exist without. What is more interresting is "tyding up" by moving code from git root to subdir. Now, this is of course perfectly sensible thing to, but I wonder why is it called 'src'? Maybe some herd memeber saw compiled language somewhere and picked it up without understanding difference between compiled binary and source code?

Now don't take this as if I have problem with the article content in itself. No, as a primer to modern python packaging it's great. It's not authors fault that his work is so comprehensive it lays out bare all the idiosyncrasies, herd mentality, cargocultism and general laziness of python ecosystem these days. Or is it?


> Poetry thing because…

pip freeze doesn't pin transitive dependencies and so you have to pick something and Poetry is fine and actively developed.

> virtualenv, because that's just what you have to do when dealing with messed up dependencies

No that's what you do when you have multiple dependency trees for different projects on your system. Somehow people got the message that global variables were bad but still think that "random bullshit strewn on my specific system" is a great way to make software that works on other people's machines.

> Because if you ever happen upon something not covered by the Tool

You write your own hook because it's entirely plugin based.

> tox, because when you have special tool to handle dependencies, what you just need

A tool that doesn't pollute your development environment with testing packages and doesn't run your tests in your development environment, hygiene that before this tool basically nobody bothered to do because it was tedious.


> pip freeze doesn't pin transitive dependencies

AFAIK it lists all installed packages and hence pins all dependencies.


Yes, unfortunately all dependencies are frozen at the same level, so it becomes really hard to distinguish between what your actual dependencies, and sub-dependencies, and sub-sub-dependencies are.


Use pip install -c constraints.txt.


Jeez, I skimmed the article, and saw what I assume to be a comprehensive but basic primer on modern packaging, like you say. But I also inferred that the author is probably a newer programmer, with only a few years of experience. He's learning about tools and best practices in an accessible language and having fun sharing knowledge through his blog.

The sentiment behind your comments is shared, but I don't see the need to sarcastically rant about it and rail all the suggestions OP made.

If anything, I'm surprised someone with more experience didn't see the post for what it is, and attacking someone's post like this just shows immaturity when you could have easily taken those opinions and formed a constructive argument or given good advice.


I had the same reaction. Not much explaining, no justification for the tooling decisions, pushing more undocumented code to pypi because why not, "I saw this other package do this", etc.

I guess it's great if you're just looking for a shortcut to push something up to pypi, but my guess is someone new to it won't really understand what's going on other than some vague sense that they're following "best practices".

And then I imagine that same person will go on to write another article like this, and on and on we go!


Then go write your own opinionated article on your blog. For myself, I was already thinking about trying to develop a package and this gives me a nice starting point for doing so even if I don't use all the same tools to do it.


the irony here is python packaging has sucked forever, and this is just another example of it. "Do more with less" has never entered the average python developers mind.

you'd think herd mentality might help it but it only creates more packaging solutions.

Now days, I've stopped using python outside of tiny scripts and I will never touch it for a large project.


The problem with Python is it's too successful. It has too many useful libraries. I don't really like it very much but it's really hard to ignore it.

There is exactly one thing I miss from Python packaging tools: developer mode. I can factor out parts of an application into a library and develop both at the same time by installing the library in editable mode and pointing to the library's local directory. This is something I've always wanted but never had in every other language I know. Only god knows how much time I spent trying to do exactly this with git submodules.


Julia had a dev mode as well

https://pkgdocs.julialang.org/v1/managing-packages/#developi...

It’s packaging tool is top notch. Maybe because it’s developed as a core part of the language.


Good to know. Never tried Julia myself but I see a lot of Julia posts here. I'm also aware it has a really interesting foreign interface with Python, never saw anything like that before.

> Maybe because it’s developed as a core part of the language.

Always thought it was strange how libraries and packaging never seem to be considered part of the language. My favorite example is Scheme: a beautifully minimal language but with no library support and the result was endless fragmentation due to unportable implementations.


I'm not sure I understand this. Do you mean having an app instance using library code from local git instead of pip package? I do that all the time with a Makefile and some symlinks.


Yeah. Pip already has that feature built in. I've hacked up partial solutions for some other languages but they're not as seamless. For C I tried git submodules but it's not really designed for that.


I wish people would just forget about pre-commit, this thing is especially useless in a setting where a CI/CD pipeline exists. It's not that hard to write a simple Makefile or shellscript to run linters on push.

Pre-commit is one of the most annoying tools that have come into existence in recent years that everyone seems to be cargo-culting. It doesn't play well with editors since in order to find the actual binary path, you'd have to open up a sqlite database to fish out the virtualenv pre-commit created. Pre-commit also increases maintenance burden, since its configuration is completely separate from your usual requirements-dev.txt/pyproject.toml/setup.cfg etc. If you have dev dependencies in one of these files because making your editors to find the pre-commit created binaries are hard, now you have to keep both versions in sync.

I really don't see the point of any pre-commit hooks unless you are the one guy that doesn't use a modern CI/CD platform.


Pre-commit is optional, you can just not install them into .git/ ... Although I'd indeed prefer just before push, or just make display warnings

One thing that's really annoying these days are CI/CD that can't be replicated locally, generating quite annoying delays in the development. Jenkins seems particularly problematic in this regard: the steps get encoded in some cryptic pet Jenkins server, and then you have to wait minutes until an agent picks it up and reaches the step you actually care about. Other tools are a little quicker, but still...

So, I think at the very least pre-commit hooks help with this "over-reliance" on the CI/CD server. It's so much better DX when you can run parts of the pipeline instantaneously.


> One thing that's really annoying these days are CI/CD that can't be replicated locally

pre-commit just runs your linter, formatter and tests. Surely you can fully replicate this step locally. Just run make lint both locally and on CI.

Anything else that's hard to replicate has to do with the distributed systems that you are probably working on because these systems are all probably proprietary stuff that live on the cloud.

> Although I'd indeed prefer just before push, or just make display warnings

Wholehearted agree. This is a compromise I can live with.


My point is that using these tools at least makes people put these scripts within the repo instead of just in the CI server.


There's no "instead of" to speak of. Either you run the same linting step both locally and on CI, or just on CI, you can't skip linting on CI. Every properly set up CI pipeline has some combination of Makefile/shell script/Dockerfile that runs exactly the same way locally and on CI, which the script checked into the repo itself. If your CI scripts don't exist in your repo, you are doing it wrong.


Our CI pipelines invoke pre-commit. That way it's trivial to run the exact same tools locally as would be run in CI.

Running the tools locally is basically about tightening the development loop. Many of the commonly used tools (e.g. black, isort etc.) actually make the changes to the files so you'll never even commit failing versions. Do you really want to push changes to some remote CI system only to be told it's failed some boring QA check? There's nothing at all stopping you from doing that. Pre-commit is completely optional for each developer. I would just recommend it for sanity reasons.


> Do you really want to push changes to some remote CI system only to be told it's failed some boring QA check?

Yes, that's what CI/CD is for.

> Pre-commit is completely optional for each developer.

No it is not, it's installed at your git pre-commit hook, and it gets run every fucking time I commit, even if I've set up my editor correctly to auto-format and lint everything at I develop, meaning 99.9999% of the time sure the code I commit will pass all these linter checks. I can use git commit --no-verify to bypass pre-commit, but then again, what's the point of using pre-commit in the first place if you need to bypass it? There is absolutely no point of linting twice locally and thrice in total just to hit the first stage of your deployment.


  rm ./.git/hooks/pre-commit
Or just do not install the hook in the first place. Choice is still the developer's unless your team is doing something else here that limits your ability to delete/rename files locally.


That does not alleviate the problem with difficulty having your editor to use the correct version and settings for the linters and formatters. This approach also does not address the issue of having 2 separate sets of configuration when you are trying address problem number 1 above.


> Yes, that's what CI/CD is for.

Not really. CI/CD is a methodology where a team immediately integrates new changes into a trunk/release. The CI/CD pipeline is there to give the team the confidence to merge your stuff. When I see people constantly pushing breaking code into a CI pipeline I see an incredible amount of wasted time and shared computing resource. Especially if it's some trivial formatting check that you should have already done locally. You do what works for you, but tools like pre-commit were invented to save time and effort and they work well.

> I've set up my editor correctly to auto-format and lint everything at I develop, meaning 99.9999% of the time sure the code I commit will pass all these linter checks.

Then what is the problem? Have your team installed hooks that take a long time to run? Even on larger codebases pre-commit adds a negligible amount of time to each commit, unless perhaps you've touched every file in the codebase or something. Honestly your gripe with pre-commit seems mostly irrational.


> The CI/CD pipeline is there to give the team the confidence to merge your stuff.

That's why linting is a part of every CI stage. Linters check your code for bugs.

> Have your team installed hooks that take a long time to run?

Yes, they are called tests.


Wait, you're not supposed to run tests with pre-commit. I can see why that would be frustrating. If that's the case, your team is doing it wrong.


Perhaps, but it's not just one team, and if you are doing code quality checks on CI anyway, I don't see the point of pre-commit (for most people).


How is it mandatory? You can just not install the hooks


1. Because it defeats the purpose of pre-commit.

2. pre-commit install installs the hook by default.

3. If you happen to fail linting a few times a year, there's always an anal coworker telling your boss in a 1-on-1 you are not following some BS ways of working.


> 1. Because it defeats the purpose of pre-commit.

No, it doesn't. Pre-commits are just a DX bonus so you won't have to wait for the CI/CD server.

2. pre-commit install installs the hook by default.

Am I missing something? If you run "$ pre-commit install" of course it will install the hooks. Do you mean that poetry install, setup.py themselves are making the magic of installing the hooks themselves? If so, it's indeed a bad practice.

3. If you happen to fail linting a few times a year, there's always an anal coworker telling your boss in a 1-on-1 you are not following some BS ways of working.

I don't get why anyone would care, unless you're polluting the CI/CD history with tons of tiny commits all day along just to run linting.


Filling the CI/CD pipeline with tiny commits to fix linting when it should be done before commit or push is why pre-commit exists. It's not BS, clogging the job queue shits in everyone else.

Of course I don't know GPs specific context so they may have other details, but generally it's


Except this problem rarely comes up. Everyone has their favorite way to set up their editors to lint and format automatically using the project's config.

Over my career, I see that every company has some version of:

1. New guy joins a team, wants to push code ASAP

2. He sets up his favorite editor and punt configuring linters and formatters

3. Code pushed, CI fails at linting

4. Some well-meaning coworker or new guy suggests some variations of husky/pre-commit/fancy git hook scripts

5. Team agrees and put that into the repo

6. 6 months to 1 year later the entire team realizes its benefit does not out-weight the cost, and unanimously agree to rip it out unceremoniously.


Well I have different experiences where no one would ever go back to living without pre-commit, black, autoflake, etc. It moves the fixes to before they get to CI. There's literally no downside.

Maybe you have experience with some weird tool? Or running black and autoflake messed around with emacs and it needs to reload all the buffers?


I initially had some problems, then I've written an emacs package[1] to fix all of them. And it's due to the course of writing this package made me realize just how bad pre-commit is from its UX, design to its entire premise.

[1]: https://github.com/wyuenho/emacs-python-exec-find


Ok then I think the issue is not so much pre-commit but that python tooling in general should not be written in python. It's a minority opinion but the fact that there is a tooling python and an application python can mess with things as you're experiencing.


Yeah, I agree. It's just that they can just run pre-commit manually if the hooks bother them that much


Tell me you use Emacs without telling me you use Emacs.


Curious, how does using Emacs affect that much the workflow to the point of pre-commit being so annoying?


Not sure why, though? I've used emacs for 15 years and pre-commit for probably 5 years at this point.


Are you using git inside Emacs or from the command line? It seems GP is doing everything in Emacs and application python resolution is tripping up with tooling python.

If you have it all sorted out from Emacs maybe check the repo they posted which tries to do all this.


Guilty


Does it matter if it’s run twice? Usually it’s so fast you hardly even notice it.

If your pre-commit takes more than 3 seconds to run it’s set up incorrectly IMO, and should belong to a more manually (and CI of course) invoked test suite instead.


Pre-commit running tools in it's own virtual environment is a feature, not a bug, in my book -- it means that the dependencies for my linter tools aren't mixed in with the dependencies for the code I'm writing.

And, keeping things separate from setup.cfg or pyproject.toml is optional: The tools still look for configuration in their usual places, so it's still possible have your black options in pyproject.toml and just a bare-bones entry to call black in your .pre-commit file if you prefer.


Except your usual configuration don't necessarily work for your pre-commit hooks. A prime example is mypy.


Sure it can. The pre-commit tool is just a framework for running multiple hooks at commit time against just the files which are modified in that commit. You can configure those hooks however you want.

You also don't have to run the hooks at pre-commit time. Just don't hook pre-commit into your checkout. The pre-commit tool can also be configured to run its checks at a different stages than, well, the pre-commit stage:

https://pre-commit.com/#top_level-default_stages


Do you have any tutorials for setting up CI/CD? My impression was that's all stuff that runs in the cloud but if it's something I can use on my own personal projects I'd play with it. Frankly a lot of these things become unintelligible. I've used pre-commit for things like black and autopep8 and that's all pretty understandable to me. The CI/CD things I've read all seem like everyone already understands some giant Rube Goldberg contraption that they're strapping on things for some reason that probably matters to giant dev teams.


Good example of CI/CD that's not a rube goldberg machine per se:

https://github.blog/2022-02-02-build-ci-cd-pipeline-github-a...


Thank you. So does CI/CD require becoming dependent on a cloud service such as github?


Not necessarily, it just depends on how invested you are in the CI/CD pipeline for any given project, your preferences regarding self-hosting vs. cloud, and the amount of time you have to dedicate to the subject.

Strictly speaking, any tool or set of tools that allow you to trigger building & deploying/publishing artifacts in response to source control commits can be used to build a CI/CD pipeline. One could write bash scripts linked to a cron job that pulls a remote repository every n minutes and then performs some scripted actions to integrate changes between branches before building & publishing the artifact to a local SFTP server.

If you prefer a more mature solution with better documentation however, there is a (non-exhaustive) list of CI/CD tools on this awesome-devops list:

https://github.com/wmariuss/awesome-devops#continuous-integr...

---

edit: I saw gitlab on the list and realized it is probably the closest self-hosted equivalent to the github option I mentioned previously

https://about.gitlab.com/features/continuous-integration/


It's convenient to run the lint step faster/sooner than at CI/CD time. Depending on your setup, the separate linter deps handled by pre-commit can be more convenient than hassle both locally and in your CI/CD pipeline (re the makefile script you mention).

Having done it both ways several times I lean pre-commit for now


You've already run them at edit time.


Pre-commit can also be configured to use the project's dependencies, which is what I do with my repos when there is overlap. You don't have to use its built-in integrations. Indeed, I find its integrations most useful for tools that aren't specific to my project: things like the white-space checks, yaml checks, etc.

You can also set things up the other way around, having the CI/CD system install and and run pre-commit's checks as a build step. Pre-commit provides a nice framework, I find, for running these checks.

The advantage of pre-commit is that it catches mistakes before they are committed. Most CI/CD systems are setup to only validate the tip of the branch (i.e. the last commit in the PR), not all the commits along the way. Yes, you can configure a CI/CD system to test each commit, but it's usually swimming upstream to do so. And yes, you can squash all of a PR's commits into a single commit, but there are good reasons NOT to do that. So assuming you have multiple commits in a change, it's nice to know they have likely all been validated in a project using pre-commmit.

I'll make an appeal to authority here:

I've been a professional developer for decades. I've worked with a variety of VCS's and build systems. I've written plenty of Makefiles. Nonetheless, I still find pre-commit useful. I use it even in combination with a Makefile sometimes. You'd be horrified, I guess, to know some of my Makefiles have a rule which runs `pre-commit run --all`.

As an example, earlier this week I setup a new repo which build packages an AWS Lambda written in javascript, and deploys it using the AWS "sam" CLI via a CI/CD system. So the deployed code is Javascript. The repo contains shell-scripts to assist with deployment. And there are multiple yaml files. There's a yaml file to configure the CI/CD system, and there's the CloudFormation template files.

Here's what I configured pre-commit to do:

1. Check for whitespace nits.

2. Check for syntax errors in the yaml files.

3. Validate the CloudFormation templates.

https://aws.amazon.com/blogs/infrastructure-and-automation/u...

4. Run shellcheck against the shell scripts.

5. Run eslint and prettifier against the javascript.

6. Run "npm test" as a local step.

Normally I'd leave (6) out because in most projects its too time-consuming, but in this project the tests run quickly enough that I just made it a pre-commit check. The "npm test" step runs jest, which is installed as a dev dependency in the project's package.json. The other tools are all installed by pre-commit itself.

> I really don't see the point of any pre-commit hooks unless you are the one guy that doesn't use a modern CI/CD platform.

I've tried to make an argument for why to use pre-commit above. Nevertheless, you don't have to install pre-commit's hooks in your checkout. They are ideally there to save you time having to correct mistakes after the fact. It sounds like there's an impedance mismatch between your personal workflow and how you've seen pre-commit set up. Perhaps by resolving that mismatch by adjusting the pre-commit configuration, you can enjoy pre-commits benefits w/o experiencing the issues you've run into.


As another dev with decades of experience, I've got to ask, why would you "validate" every commit in a PR lol. The reason every CI pipeline is setup to only validate the tip is because your combined outgoing changes that is about to be merged is all that matters. Nobody cares if you have an experimental branch full of commits failing lint and tests as long as the PR doesn't.

However, in my experience, I've seen plenty of pipelines setup to run lint on every push on a PR branch, which is effectively only checking outgoing changes before merge, it's just in this case it's merging to your feature branch. My point still stands - as long as linting is done on CI, and you've set up your editor to lint as you edit, you don't need pre-commit.

I'm not entirely sure what point you are trying to make. It sounds like all you've done is moved all of the tools you'd call anyway from a Makefile to a YAML file.


Here's an example from one of my repos that mostly uses the repo's own installed dependencies so that I don't need to manage those in more than one place:

https://pastebin.com/raw/0SP3NvBS


PyPi is adding support for GitHub OIDC for publishing packages soon, so there will be no need to generate API keys - you can just grant your GitHub Actions permissions to publish to PyPi.

https://github.com/pypi/warehouse/issues/10619


Hey, that’s my issue :-)

Thank you for linking it! Yes, this will be a huge convenience and security win for the large number of packages that use GitHub to release new versions.


Surely you mean OAuth2? I really hope you mean OAuth2.


Oh, this would be neat! Looking forward to it!


Lot of good info and saved away!

However, it drinks the code coverage cool-aid that started like 30 years ago when code coverage tools emerged.

Management types said "high test code coverage == high quality"; lets bean count that!!

A great way to achieve high code coverage is to have less than robust code that does not check for crazy error cases that are really hard to reproduce in test cases.

Code coverage is a tool to help engineers write good tests. One takes the time to look at the results and improve the test. It is a poor investment to be obsessed with code cover on paths where the cost to test them greatly exceeds the value.

10% coverage and 100% are both alarm bells. Don't assume naive, easy to produce metrics are the same as quality code.

Otherwise, and excellent article.


Python is the language with one of the highest 100%-coverage-to-effort ratios. The included unittest.mock framework is making it quite easy to trigger obscure errors and ensure they are handled properly.

Combined with thoughtful use of `# pragma: no cover` a 98% code coverage nowadays is an immediate warning that something was rushed. With this and type checking I feel RuntimeErrors much easier to avoid these days.

And typing, not even a mention?! :) But otherwise a great article, thank you!


The problem with using mocks extensively for testing is that you then end up mostly testing the mocks. You'll know you've hit that point when you have 100% coverage, but things still break routinely due to interaction (sometimes very indirect) between components.


From the point of view of a programmer setting up a new project, type checking tools are yet another optional linting operation that can be integrated in pre-commit hooks or other automation.


100% coverage as a side effect of careful testing isnt a red flag.

Coverage is a decent (among other things) measure unless it becomes a target. Once it becomes a target you get shitty rushed tests that act mostly as cement surrounding current behavior - bugs and all.


I guess you have no much experience with python and how easy to get 100% coverage in it.


He's not saying it's difficult to get 100% coverage. He's saying that 100% is a bit suss because you probably wouldn't achieve 100% coverage unless that is your goal (even if Python does make it easy) and that's the wrong goal.


It's a real goal for dynamic languages. You have no other option to be sure your code is not broken. Another option is to use 100%-non-any type hints. It's a way more harder.

I can see it's a non goal then you have access to deployed code and Sentry. But as library author or author of customer apps there is no other way around.


Thanks for the words of caution! For such a small package, I think 100% code coverage isn't necessarily a bad thing yet :P But you raise valid points!


"How do you create a Python package? How do you set up automated testing and code coverage? How do you publish the package? That's what this article teaches you." — delivered as promised!


Even more: "created a Python package; set up Poetry for dependency management; set up a GitHub repository to host the code; defined some pre-commit hooks to make sure we only make commits that meet certain criteria; added a (fairly permissive) license to the project; configured Poetry to allow uploading to PyPI and a test version of PyPI; tested uploading the package to a test version of PyPI; added Scriv to help us with changelog management and generation; tagged and published a release of our project; wrote a bunch of tests; automated testing and linting with tox; checked code coverage and got it to 100%; set up CI/CD with GitHub actions to run linting and testing; integrated with Codecov to get coverage reports in our pull requests; created a GitHub action to publish our package to PyPI automatically; and added some nifty badges to the README file."


Original author here: don't make promises you won't keep, right? :P


We've been fine-ish with classic setup.py/setup.cfg + gha for publishing to pypi. But as we do OSS data science (gpu graph ai + viz), where conda is typical nowadays...

... Have to admit: We recently ended up contracting conda packaging out because it was nowhere near clear enough to make sense for our core team to untangle. Would love to see a similar tutorial on a github flow packaging & publishing to conda. Still no convinced we're doing it right for subtleties like optional dependencies: equivalent of `pip install graphistry` vs `pip install graphistry[ai]` vs `graphistry[umap-learn]`, etc.


We do a lot of deep learning and image processing and pip works much better for us. PyTorch makes wheels that contain all reauired DLLs on all systems. Maybe conda isn’t needed anymore.


I think of conda as jupyter notebooks. It's ok to experiment with and get something running quickly, but it's a bad idea to use for production code.

Personally I never use either of these tools.


Conda let's you install a specific cuda version directly in a virtual environment with one click though. It's really useful when you have to switch between multiple PyTorch versions and convinient in general imo.


The PyTorch wheel comes with its own CUDA DLLs, they can’t be shared with other libraries but they won’t be your problem to install that way.


I found conda-forge community quite helpful here [1]. They make feedstock repositories based on templates that cover a lot of automation. Their bots pickup updated packages in pypi and automatically file merge requests, run tests and even merge updates if tests pass successfully. Basically you only need to maintain your recipe here and there when your dependencies change.

1. https://conda-forge.org/docs/user/introduction.html


Yep, that was one of the better docs!

The contractor still took a couple weeks to figure out and get up. I assumed it'd be one evening for initial bulk as we already had setup.cfg etc, but after searching conda tutorials.. not surprised. Our ~final meta file is pretty simple, so no idea why the docs are so indirect.


Oh, I see... I personally barely use conga and I have no idea how that is done. I don't think I'll write any blog article like that any time soon :( Maybe you could do it!


Hey, original author here. Thanks a lot for sharing this!

Also, can't believe everyone let me get away with not writing about documentation! I'll see to it that it gets done and added to the article.


This is really nice. The only thing I'm missing here is a simple way to bump versions. Any ideas on how to do that?

For Node, it's quite simple and even built into npm. Also the version is only part of the package.json file. For Python you probably have your version somewhere in __init__.py, and I always end up writing ugly bash scripts that modify multiple places with sed.


I never really understood this. Sure, changing the version in one place would be better and I would do that if it was possible with the current Python tools. But I have seen insane setups to achieve this, parsing __init__.py with regex from setup.py, using third-party dependencies that do only that, sometimes scripts with hundreds of lines included in the distribution to support it.

Is changing the number 2 places really that big of a deal?

You should have a release checklist anyway, with steps like sending an announcement email or tweet etc. How much time does this really save, at the cost of so much complexity?


Your program needs to be able to output its version (e.g. with a --version CLI option), and with setup.py I could at least parse __init__.py. The pyproject.toml file doesn't do that, so suddenly I have to maintain two version numbers.

I maintain half a dozen small Python packages. I don't do emails, tweets, etc. I just want to create releases easily when there's a bug fix. It not only saves time to automate this step (I can use the same script to release each package), it also means you can't forget things. Before I had a script, I always forgot to push the tag, or run the changelog, etc.


Try commitizen.


I know commitizen and it doesn't do that. It enforces particular forms of commit messages, but it does not bump versions in Python projects.


A very nice post. Consider adding a prominent RSS/atom feed for your blog. My lack of finding one means I won't easily catch any future posts.


The title should be: How to create a "Python DISTRIBUTION package".

The term "python package" means something entirely different (or at the very least is ambiguous in a pypi/distribution context).

To add to the confusion, creating a totally normal, runnable python package in a manner that makes it completely self-contained such that it can be "distributed" in a standalone manner, while still being a totally normal boring python package, is also totally possible (if not preferred, in my view).

(shameless plug: https://github.com/tpapastylianou/self-contained-runnable-py... )


I guess it's a great exercise to set up a repository yourself in this way, but once you have experience with the technologies involved, it's much easier to just use a cookiecutter template [1] to set up your package. Another aspect to consider is that there are often different tools to achieve the same goal, thus, it makes sense to experiment until you've found your perfect package setup.

[1] https://github.com/search?q=python+package+cookiecutter


Excellent points! I have seen several cookiecutter templates, but like you said, those aren't very useful when you are at the very start and everything looks weird and new.


Sigh I don't see why I need to use a 3rd party tool for what should be a very straightforward process in Python out of the box. In fact, I think these days it actually is straightforward, of course once you work out what you need to do...

Python is a mess.


Just fyi, both npm and yarn are “third party” tools. Cargo is closer to the Rust core, and you can find tools in a similar position in Python as well (e.g. setuptools, hatch). Packaging tools being managed by a different group than people working on the “core language” is actually the norm, since those are very different topics and only very few brilliant people care about both of them at the same time. Python people tend to not hide some of these implementation details (probably self-selected by their choice to use Python in the first place), and it’s OK to not like it, but hopefully this helps clarify where your hate comes from so you don’t get burnt by wrong expectations elsewhere down the road.


Instead of being insecure about criticisms to the Python ecosystem and calling my disappointment "hatred" I'd rather we focused on solutions.

Just because this mess happens in some other languages doesn't mean it's the right thing. Having a very fragmented community is not a good thing for a beginner. Also, npm is far more of a de facto choice than poetry, which is still better than the state python finds itself in.


It doesn't even happen in the example of Node. Npm is even bundled together with NodeJS itself, and yarn is fully compatible with package.json.


This is almost exactly how I set up python projects; it’s reassuring to see it set out in one place.

I started using tox-poetry-installer[1] to make tox pick up pinned versions from the lock file and reuse the private package index credentials from poetry.

[1] https://github.com/enpaul/tox-poetry-installer


Oh, this looks like an interesting tool! Thanks for the link.


This excellent article references Textualize, which - as I have just found from their website - has a really great approach to job interviews: https://www.textualize.io/jobs

[I have no ties to this company and have never applied there.]


It's impressive that this involves 3 human-readable configuration languages and 2 markup languages


You can use pyproject.toml for the tox configuration too:

    [tool.tox]
    legacy_tox_ini = """<tox.ini content here>"""


Yeah, but that looks a bit odd and you lose the syntax highlighting... I considered that but ended up going with a tox.ini file.


You can take this a step further and completely automate the release of your package. That means the tagging, publishing and the GitHub release notes.

I don't have a blog post but you can see the process on my personal project https://github.com/DontShaveTheYak/cf2tf

Check out the merged PR's and the GitHub actions.

I even do alpha releases to test pypi.


To that end, I've had good success with `replease-please`: https://github.com/googleapis/release-please . It's available as a GitHub Action and works out of the box very easily. It does tagging, publishing, editing the CHANGELOG, creating a release and more. Whatever you want it to, really, using a bool flag in the CI pipeline that triggers after a release-please release was made aka merged into main.


Poetry uses non-standard dependency specification formats. PDM is like Poetry but faster/more standards compliant.

https://pdm.fming.dev/


Good thing about finally converging on some sort of standard means tools become more interoperable:

Another good tool (which was endorsed by the PyPA) is Hatch - https://hatch.pypa.io/latest/environment/

I currently use PDM because it supports conda virtual environments for isolation, but am keeping an eye on Hatch.


Didn't they also endorse PipEnv and that turned into a flop?


Fantastic article. A clickable TOC at the top would be a great addition.


Excellent and very informative post, thank you very much !


Thanks for the nice words. Was there anything that was unclear or that you think was missing?


Fantastic article. Thank you so much for putting this together. Any thoughts of putting this in a Git repo somewhere so that it can be a living reference contributed to by others? Otherwise, I worry that in 6 months this will start to get confusing, and in 18 months it will be dangerously out of date (though I haven’t used poetry yet, so maybe they aim for more stability).


Hm, it's very strange argument regarding src directory. Article they refers to mentioned main issue with package in root is: it's possible to miss submodules in final released py-package. But src doesn't prevent this. You also can run your tests. But without proper packages option or find_packages in setup.py one gets a broken package.


This is a fantastic resource.

Are there any similar resources for setting up internal packages that you don't intend to publish publicly?

I can think of a number of situations where I would have benifitted from it, but the process of configuring a package to publish, hosting it and then pulling it when necessary is a mystery to me.


You can mostly just put wheels and source dists on any HTTP server with directory listing enabled and have it work (with --extra-index-url). Use a subdirectory per package.

https://packaging.python.org/en/latest/guides/hosting-your-o...


My only gripe with poetry is that when I tried it out last year there was no equivalent for `pip install -e` which can be used to install other python package dependancies that you are simultaneously developing. I found that feature useful enough to stick with setuptools and setup.py instead.


Poetry has path dependencies and the develop flag to enable the same editable functionality

https://python-poetry.org/docs/dependency-specification/#pat...


Thanks, I think that the thing this still lacks compared to `pip install -e` is that the develop dependency is then only visible inside other packages configured with poetry. So it is not useful for example if you want to locally install a package you make to be accessed through cli or any other python script not using poetry.


I'm not sure what you are talking about. It will be installed in the virtual environment just the same. There is no mechanism for it to "only be visible inside other packages configured with poetry".


What I mean is `pip install -e` installs it outside of the virtual environment also.


No, it doesn't.


There is a standardized API for package tools to expose and invoke editable-install functionality [1] which pip install follows, but Poetry didn't implement this API before (I see it does now though as of this past February [2]).

In other words, this a more thorough explanation of my point from [3]:

> Yes, if you ran poetry install you could get editable mode, but that requires every single end user of your package to install poetry and explicitly invoke a poetry install just for your package. And if someone else's package uses a different package builder, now they need to install and invoke that one. And on and on, and you end up with a six-hundred-line install script because of all the one-off "must install this developer's favorite package manager to use their package" stuff.

[1] https://peps.python.org/pep-0660/

[2] https://github.com/python-poetry/poetry/issues/34#issuecomme...

[3] https://www.reddit.com/r/Python/comments/t3p3ub/comment/hyum...


I recently set up a new Python package and ran across the github instructions for automatically uploading the package to pypi. A month later I ran into a bug and needed to push out an update, man was that nice to have automated, and easy to set up.


Very well written , thanks!!


Thanks a lot! I'm the original author and I appreciate your nice words.


cool - there's a lot of overlap with the "Hypermodern Python"[1] so if anyone enjoyed this and wants to read more, they might also enjoy that.

[1]: https://cjolowicz.github.io/posts/hypermodern-python-01-setu...


I really dislike the Python convention of plonking source code effectively in the root directory. It means literally any random thing in there can get picked up if you put your package in the PYTHONPATH. Is there a reason the Python world did not standardise on putting source code in a "src" directory like every other language?


Python puts source code in a directory with the package's name. That's why the source code is really under `repo_root/package/`, not under `repo_root/` as you seem to imply. It's just that it's not called `src/`, but `package/` in Python (but that's configurable!).

So if you have a project called "splitter", your source code really lives under `splitter/splitter/`. I would agree that seems a bit redundant and `splitter/src` look better, but the source code is not in the project root.


I don't have a single src directory in any of the Elixir, JavaScript, Ruby projects I'm currently working on. Python too, obviously.

I got src directories in the Elixir dependencies written in Erlang, in about 12% of the node_modules used by a React project and in the few C extensions I'm using for Ruby.

May I conclude that src is uncommon at least in scripting languages? (Elixir is compiled.) Maybe the reason is that there is only source code and there is no need for a separate directory for a build / dist.


Yeah, I adopted the “src” dir layout after reading this post some time back:

https://blog.ionelmc.ro/2014/05/25/python-packaging/#the-str...

It describes some of the outdated motivations for the other layouts commonly seen with python, as well as the many benefits of the “src” layout.


Post doesn't make sense. It clearly states that you only be sure that package is working by actually installing it into clean virtualenv and testing it there. `src` or any other layout doesn't matter.


The difference is if you're running tests from your root project directory, the package is importable regardless of whether or not it is installed, as python picks up packages in the current working directory by name. src/ prevents this.


But it doesn't prevent package configuration errors and allows to bdist broken package. What's the point then?


Related question: how to correctly package a graphical application, preferably for Linux?


Look at debs/rpms, or flatpak etc for a cross-distro equivalent.



Excellent blog!


Thanks a lot! I'm the original author and I appreciate your nice words.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: