More

jw887c · 2024-07-02T22:44:09 1719960249

>Avoid using Poetry for new projects. Poetry predates many standards for Python tooling. This means that it uses non-standard implementations of key features, such as the dependency resolver and configuration formats in pyproject.toml files.

What? This is the first I've heard of this.

SCUSKU · 2024-07-02T22:45:03 1719960303

Was going to comment the same thing. Would love to hear the author expand further on why not use Poetry. I've found it to be pretty solid and continue to use Poetry + Pyenv for all my projects, but open to hearing the case for PDM or Hatch.

alisonatwork · 2024-07-02T23:36:06 1719963366

I've never worked on a team that uses Poetry, but in my current company another team uses it, and I haven't found it really as slick as I would have imagined, primarily because you need to create a venv and install poetry into that before you even get started, which by that point why not just pip install the rest anyway? For standalone applications it just seems like an unnecessary extra step. It doesn't even mandate a build lifecycle like Maven so what are you getting?

But that's not what soured me on Poetry. What soured me was recently I needed to create a release of one of their libraries with a Git commit in the local version identifier and... Poetry doesn't do that. There's an issue that was open on GitHub for years before they finally agreed to implement it, and since February the change is now merged to master, but despite several point releases since then, that change has not landed in any of them. When will we be able to get a local version part? Who knows!

This experience has really made me skeptical of Poetry being the One True Packaging Tool that fixes everything. As usual, it just fixes the things the devs want fixed and everything else is still janky or half implemented. From my perspective, if you're gonna deal with jank anyway, might as well just deal with the standard jank that comes as part of Python itself.

skeledrew · 2024-07-03T00:45:40 1719967540

Actually poetry comes with an optional self-installer, though I prefer to manage it with pipx. And it's recommended not to install it into your project env, as there's the potential for conflicting dependency versions.

odie5533 · 2024-07-03T15:09:30 1720019370

Poetry should not be installed into the local venv. That is a mistake. You should install it with homebrew or using its standalone installer.

hobs · 2024-07-02T22:48:37 1719960517

One thing I saw which was weird as hell was poetry has serious issues with changing windows requirements (specifically removing them) and has a tendency to throw all sorts of fun errors (its method of removing files is not file system friendly.)

As far as I can tell the issue is still open and has been for awhile.

rat87 · 2024-07-03T00:11:58 1719965518

After struggling with the complexity of pyenv and slowness of poetry I'm really happy with rye.

It manages both python versions (which it downloads instead of compiling) and package versions. It's written in rust so it's faster and can replace pipx as well for installing global tools. (Some people will recommend uv which rye is slowly merging with buy uv is still missing some rye features, probably some time in the future you might want to switch).

odie5533 · 2024-07-03T15:10:11 1720019411

mise is also a great alternative to pyenv and works with for Node projects too.

tinix · 2024-07-02T22:55:25 1719960925

pyenv? great! then just use pip with requirements.txt...

what's wrong with pip freeze? why are there so many competing tools?

it's very anti-python IMO.

tmnvix · 2024-07-02T23:14:23 1719962063

> what's wrong with pip freeze?

I prefer my requirements.txt to include only the packages I install with pip myself (and not their dependencies).

tinix · 2024-07-03T02:44:59 1719974699

so then maintain a curated requirements.txt...

jcparkyn · 2024-07-03T01:22:48 1719969768

> what's wrong with pip freeze?

Aside from all the obvious issues of having no distinction between transitive and direct dependencies, it completely breaks cross-platform support if any of your dependencies have platform-specific sub-dependencies (which is not uncommon in python).

tinix · 2024-07-09T04:32:54 1720499574

pip freeze was an overly simplified hyperbolic jest...

fair point, so when this packaging mess happens, can one not strike a balance, and define the dependencies that you need, then, and let the resolver handle the rest?

jampekka · 2024-07-02T23:26:07 1719962767

Python packaging has always been a mess, and will probably always be. The reasons are mostly sociological.

mikepurvis · 2024-07-02T23:28:39 1719962919

Agree, I'm a poetry person too. I feel like TFA might be getting ahead of itself on this point. Just because something was ratified as a "standard" doesn't immediately make everything prior irrelevant.

chao- · 2024-07-02T23:38:45 1719963525

Agreed. Most of these are fairly uncontroversial. A few will probably be new to some, particularly because users of Python are heterogenous and people find themselves with years of experience on projects that never upgraded their language version, e.g. from 3.6, and don't know that they're missing.

A final few, even with the explanation given, feels like preference, such as the Poetry opinion. I suppose if I wrote a similar article, it would end up with a similar mix of obvious defaults and a few of my preferences.

It could also be because of different use cases? For example, writing a library for packaging, vs a deployed application, vs a personal projects' workbench. Or perhaps if you are collaborating with people who use Windows without WSL. I have heard tell that Poetry can sometimes trip over its own shoelaces on Windows. I have never experienced it myself, and don't personally care to support any projects involving Windows, but if I did I might have different preferences.

agumonkey · 2024-07-02T22:56:44 1719961004

I recall some discussion on the extended tool configuration, the section names have seen different conventions over time.

spacephysics · 2024-07-02T22:46:53 1719960413

Yeah to me Poetry was a game changer in dependency tooling. Not that better tooling can come about, but it’s recently worked great on multiple production grade projects.

No complaints, and has shoot-in-the-foot safety features

cmcconomy · 2024-07-02T23:16:30 1719962190

pyenv and poetry always my goto

odie5533 · 2024-07-03T15:11:26 1720019486

Replace pyenv with mise. It's faster and supports Node projects.

jw887c · on Aug 11, 2023

Multiprocessing is great as a first pass parallelization but I've found that debugging it to be very hard, especially for junior employees.

It seems much easier to follow when you can push everything to horizontally scaled single processes for languages like Python.

uniqueuid · on Aug 11, 2023

I agree. The main problems aren't syntax, they are architectural: Catching and retrying individual failures in a pool.map, anticipating OOM with heavy tasks, understanding process lifecycle and the underlying pickle/ipc.

All these are much more reliably solved with horizontal scaling.

[edit] by the way, a very useful minimal sugar on top of multiprocessing for one-off tasks is tqdm's process_map, which automatically shows a progress bar https://tqdm.github.io/docs/contrib.concurrent/

Terretta · on Aug 11, 2023

From the linked Mpire readme:

    Suppose we want to know the status of the current task: how many tasks are
    completed, how long before the work is ready? It's as simple as setting the 
    progress_bar parameter to True:

        with WorkerPool(n_jobs=5) as pool:
            results = pool.map(time_consuming_function, 
                      range(10), progress_bar=True)

    And it will output a nicely formatted tqdm progress bar.

xapata · on Aug 11, 2023

How is coordinating between different machines any different than coordinating between different processes?

A multiprocessing implementation is a good prototype for a distributed implementation.

uniqueuid · on Aug 11, 2023

To be honest, I don't think both are similar at all.

Parallelizing across machines involves networks, and well, that's why we have jepsen, and byzantine failures, and eventual consistency, and net splits, and leadership election, and discovery - so in short a stack of hard problems that in and of itself is usually much larger than what you're trying to solve with multiprocessing.

xapata · on Aug 11, 2023

True, the networking causes trouble. I usually rely on a communication layer that addresses those troubles. A good message queue makes the two paradigms quite similar. Or something like Dask (https://www.dask.org/). Having your single-machine development environment able to reproduce nearly all the bugs that arise in production is a wonderful thing.

dr_kiszonka · on Aug 11, 2023

Parsl has quite good debugging facilities built in, which include automatic logging and visualizations.

https://parsl.readthedocs.io/en/stable/faq.html

https://parsl.readthedocs.io/en/stable/userguide/monitoring....

flakes · on Aug 11, 2023

Depends on the workflow. For one off jobs or client tooling, parallelism makes sense to have rapid user feedback.

For batch pipelines on that work many requests, having a serial workflow has a lot of the advantages you mention. Serial execution makes the load more predictable and makes scaling easier to rationalize.

wheelerof4te · on Aug 11, 2023

Or just use numpy's arrays, which have their integrated multiprocessing.

jw887c · on July 25, 2023

jw887c · on Aug 2, 2022

>We were on AWS Managed Airflow, but to stay on it and have a solid platform, I would have been writing Github Actions for CI/CD, standing up ECR and IAM roles with Terraform, setting up EKS to run Kubernetes jobs, managing infra monitoring with Datadog, etc., etc.

This sounds like an issue not with Airflow but with integration.

stkbailey · on Aug 2, 2022

Yep, that's what I tried to point out in the article.

jw887c · on May 24, 2022

This is the case for most column oriented data warehouses (including BigQuery, but Snowflake does allow for 1 PK). It's just the nature of the technology.

jw887c · on April 9, 2022

Skydiving is cold. Skydiving while naked sounds freezing

jw887c · on Jan 4, 2022

My favorite flaw of averages isn't even mentioned in this article. It's the aggregation of averages across covariates. The more covariates (higher dimensions) your problem has, the less likely the population will exist "in the average".

This was explored in a famous study of Air Force pilots and when measuring across 10 different dimensions, found that 0 pilots were "average" across all 10.

https://www.thestar.com/news/insight/2016/01/16/when-us-air-...

>There was no such thing as an average pilot. If you’ve designed a cockpit to fit the average pilot, you’ve actually designed it to fit no one.

edit: wrong link

flohofwoe · on Jan 4, 2022

You had the wrong link in your clipboard ;)

It's probably this one?

https://www.thestar.com/news/insight/2016/01/16/when-us-air-...

(PS: this is my favourite pet theory why UX is such a trainwreck these days, UIs are designed for an "average user" that doesn't exist, driven by "telemetry averages")

kwhitefoot · on Jan 4, 2022

User interfaces should be designed for the users you will have in the long run. In industry and commerce these will be expert users.

I spent a large chunk of my life writing software to design transformers. The UIs broke all of the naive 'rules' about UI design and were crammed full of information, buttons, boxes, entry fields, pull down lists, etc.

For the users they were designed for they were very productive. For a casual or first time user they were impossible to use. But we had no casual users, only experts who were in a hurry and would not tolerate having to wade through multiple screens to perform some small what-if exercise. It was like an airliner cockpit, everything as close to hand as possible and only the rarely used items on other pages.

A frequent request was to enlarge the window so that more could be fitted in at once, it was much rarer to be asked to move something off the main window.

tenkabuto · on Jan 4, 2022

You linked to the OP by mistake. Please share the article you were referring to!

jw887c · on March 28, 2021

Great tip.

jw887c · on March 23, 2021

A $40B manager once told if I could deliver 6% annual return with 1% volatility, he would give me all his money. Yeah, they want as little volatility as possible.

jw887c · on March 20, 2021

GCP is behind. They're are behind in niche services to be fair but behind nonetheless. Nothing that prevents building similar services, but coming from AWS, I routinely feel frustrated by lack of finesse and comparable offerings. IMO, GCP's greatest benefits over AWS are:

* BigQuery

* Tighter k8s integration with GKE

* Single message service (PubSub)

Unless BigQuery is your first class citizen, I would avoid GCP.

nullserver · on March 20, 2021

Google also has a trust problem. As in they lose interest in things quickly.

I would only consider them if I need a massive scale of something they are cheaper on, that can quickly be migrated elsewhere.

granzymes · on March 20, 2021

This is such an overblown talking point.

If you sign a service contract with Google (which you will if you are spending any amount of serious money), then you will have a precisely defined guarantee that they will continue operating GCloud.

namdnay · on March 20, 2021

I doubt anyone’s service contract stipulates that Google will continue to invest heavily to keep feature parity with AWS and Azure for the next 20 years

nullserver · on March 20, 2021

Bingo. I’ve had to move a companies entire platform multiple times because of similar issues. I want boring on infrastructure.

gegtik · on March 20, 2021

AWS has a trust problem - they have no qualms about observing that your product is valuable, cloning it, and leaving their 'customer' in the dust.