>Avoid using Poetry for new projects. Poetry predates many standards for Python tooling. This means that it uses non-standard implementations of key features, such as the dependency resolver and configuration formats in pyproject.toml files.
Was going to comment the same thing. Would love to hear the author expand further on why not use Poetry. I've found it to be pretty solid and continue to use Poetry + Pyenv for all my projects, but open to hearing the case for PDM or Hatch.
I've never worked on a team that uses Poetry, but in my current company another team uses it, and I haven't found it really as slick as I would have imagined, primarily because you need to create a venv and install poetry into that before you even get started, which by that point why not just pip install the rest anyway? For standalone applications it just seems like an unnecessary extra step. It doesn't even mandate a build lifecycle like Maven so what are you getting?
But that's not what soured me on Poetry. What soured me was recently I needed to create a release of one of their libraries with a Git commit in the local version identifier and... Poetry doesn't do that. There's an issue that was open on GitHub for years before they finally agreed to implement it, and since February the change is now merged to master, but despite several point releases since then, that change has not landed in any of them. When will we be able to get a local version part? Who knows!
This experience has really made me skeptical of Poetry being the One True Packaging Tool that fixes everything. As usual, it just fixes the things the devs want fixed and everything else is still janky or half implemented. From my perspective, if you're gonna deal with jank anyway, might as well just deal with the standard jank that comes as part of Python itself.
Actually poetry comes with an optional self-installer, though I prefer to manage it with pipx. And it's recommended not to install it into your project env, as there's the potential for conflicting dependency versions.
One thing I saw which was weird as hell was poetry has serious issues with changing windows requirements (specifically removing them) and has a tendency to throw all sorts of fun errors (its method of removing files is not file system friendly.)
As far as I can tell the issue is still open and has been for awhile.
After struggling with the complexity of pyenv and slowness of poetry I'm really happy with rye.
It manages both python versions (which it downloads instead of compiling) and package versions. It's written in rust so it's faster and can replace pipx as well for installing global tools. (Some people will recommend uv which rye is slowly merging with buy uv is still missing some rye features, probably some time in the future you might want to switch).
Aside from all the obvious issues of having no distinction between transitive and direct dependencies, it completely breaks cross-platform support if any of your dependencies have platform-specific sub-dependencies (which is not uncommon in python).
pip freeze was an overly simplified hyperbolic jest...
fair point, so when this packaging mess happens, can one not strike a balance, and define the dependencies that you need, then, and let the resolver handle the rest?
Agree, I'm a poetry person too. I feel like TFA might be getting ahead of itself on this point. Just because something was ratified as a "standard" doesn't immediately make everything prior irrelevant.
Agreed. Most of these are fairly uncontroversial. A few will probably be new to some, particularly because users of Python are heterogenous and people find themselves with years of experience on projects that never upgraded their language version, e.g. from 3.6, and don't know that they're missing.
A final few, even with the explanation given, feels like preference, such as the Poetry opinion. I suppose if I wrote a similar article, it would end up with a similar mix of obvious defaults and a few of my preferences.
It could also be because of different use cases? For example, writing a library for packaging, vs a deployed application, vs a personal projects' workbench. Or perhaps if you are collaborating with people who use Windows without WSL. I have heard tell that Poetry can sometimes trip over its own shoelaces on Windows. I have never experienced it myself, and don't personally care to support any projects involving Windows, but if I did I might have different preferences.
Yeah to me Poetry was a game changer in dependency tooling. Not that better tooling can come about, but it’s recently worked great on multiple production grade projects.
No complaints, and has shoot-in-the-foot safety features
I agree. The main problems aren't syntax, they are architectural: Catching and retrying individual failures in a pool.map, anticipating OOM with heavy tasks, understanding process lifecycle and the underlying pickle/ipc.
All these are much more reliably solved with horizontal scaling.
[edit] by the way, a very useful minimal sugar on top of multiprocessing for one-off tasks is tqdm's process_map, which automatically shows a progress bar https://tqdm.github.io/docs/contrib.concurrent/
Suppose we want to know the status of the current task: how many tasks are
completed, how long before the work is ready? It's as simple as setting the
progress_bar parameter to True:
with WorkerPool(n_jobs=5) as pool:
results = pool.map(time_consuming_function,
range(10), progress_bar=True)
And it will output a nicely formatted tqdm progress bar.
To be honest, I don't think both are similar at all.
Parallelizing across machines involves networks, and well, that's why we have jepsen, and byzantine failures, and eventual consistency, and net splits, and leadership election, and discovery - so in short a stack of hard problems that in and of itself is usually much larger than what you're trying to solve with multiprocessing.
True, the networking causes trouble. I usually rely on a communication layer that addresses those troubles. A good message queue makes the two paradigms quite similar. Or something like Dask (https://www.dask.org/). Having your single-machine development environment able to reproduce nearly all the bugs that arise in production is a wonderful thing.
Depends on the workflow. For one off jobs or client tooling, parallelism makes sense to have rapid user feedback.
For batch pipelines on that work many requests, having a serial workflow has a lot of the advantages you mention. Serial execution makes the load more predictable and makes scaling easier to rationalize.
>We were on AWS Managed Airflow, but to stay on it and have a solid platform, I would have been writing Github Actions for CI/CD, standing up ECR and IAM roles with Terraform, setting up EKS to run Kubernetes jobs, managing infra monitoring with Datadog, etc., etc.
This sounds like an issue not with Airflow but with integration.
This is the case for most column oriented data warehouses (including BigQuery, but Snowflake does allow for 1 PK). It's just the nature of the technology.
My favorite flaw of averages isn't even mentioned in this article. It's the aggregation of averages across covariates. The more covariates (higher dimensions) your problem has, the less likely the population will exist "in the average".
This was explored in a famous study of Air Force pilots and when measuring across 10 different dimensions, found that 0 pilots were "average" across all 10.
(PS: this is my favourite pet theory why UX is such a trainwreck these days, UIs are designed for an "average user" that doesn't exist, driven by "telemetry averages")
User interfaces should be designed for the users you will have in the long run. In industry and commerce these will be expert users.
I spent a large chunk of my life writing software to design transformers. The UIs broke all of the naive 'rules' about UI design and were crammed full of information, buttons, boxes, entry fields, pull down lists, etc.
For the users they were designed for they were very productive. For a casual or first time user they were impossible to use. But we had no casual users, only experts who were in a hurry and would not tolerate having to wade through multiple screens to perform some small what-if exercise. It was like an airliner cockpit, everything as close to hand as possible and only the rarely used items on other pages.
A frequent request was to enlarge the window so that more could be fitted in at once, it was much rarer to be asked to move something off the main window.
A $40B manager once told if I could deliver 6% annual return with 1% volatility, he would give me all his money. Yeah, they want as little volatility as possible.
GCP is behind. They're are behind in niche services to be fair but behind nonetheless. Nothing that prevents building similar services, but coming from AWS, I routinely feel frustrated by lack of finesse and comparable offerings. IMO, GCP's greatest benefits over AWS are:
* BigQuery
* Tighter k8s integration with GKE
* Single message service (PubSub)
Unless BigQuery is your first class citizen, I would avoid GCP.
If you sign a service contract with Google (which you will if you are spending any amount of serious money), then you will have a precisely defined guarantee that they will continue operating GCloud.
I doubt anyone’s service contract stipulates that Google will continue to invest heavily to keep feature parity with AWS and Azure for the next 20 years
What? This is the first I've heard of this.