> It's really, really, not scale! Talking to 1 million different people an hour ...

> It's really, really, not scale!

Talking to 1 million different people an hour is _business_ scale. Regardless of the tech required to do that, it not working would likely be a significant business impact.

> Then you're prob using the wrong lang.

There's so much more here than performance. There's developer productivity, there's tooling availability, all sorts. They happen to be using Go which is probably the best trade-offs for this particular system, but if you were doing complex machine learning you'd probably want to use Python due to all the excellent tooling available, and that means having a "slow" language for parts that aren't optimised for you.

If the difference is 1 engineer, ~1k lines of code, 3 machines, vs 3 engineers, ~10k lines of code and 1 machine, the former is likely to be the right trade off for most companies. It's cheaper to build, and since number of bugs typically correlates to lines of code, it will likely be much more reliable.

As for the ACID semantics, you're right you wouldn't need them all here, redelivery is probably fine within some bounds, and the upstream APIs might even have idempotency tokens to prevent this, but the downtime of losing a machine for a few hours, the time to regenerate all those messages that were lost on the downed machine, etc, could equate to quite a bit of downtime and poor UX for users.

You're not wrong from a performance perspective, but taking into account reliability, business impact, user experience, and developer productivity/costs, I think the solution in the blog post is a better set of trade-offs than you're suggesting.