Exactly Once = At least once + Idempotence

crazygringo · on March 1, 2023

Which the author admits three quarters of the way through:

> The way we achieve exactly-once delivery in practice is by faking it. Either the messages themselves should be idempotent, meaning they can be applied more than once without adverse effects, or we remove the need for idempotency through deduplication.

Honestly I don't get why this is "faking it" though. It seems like the author's definition of "exactly once" is so purist as to essentially be a strawman. This is "exactly once" in practice.

Like are there other people claiming that this purist version of exactly-once does exist?

nimih · on March 1, 2023

> Like are there other people claiming that this purist version of exactly-once does exist?

In my experience, the purist version of "exactly-once" exists as a vague, wishy-washy mental model in the brains of developers who have never thought hard about this stuff[0]. Like, once you sketch out why idempotency is important and how to do it, folks seem to pick up on it pretty quickly, but not everyone has trained their intuition to where they automatically notice these sorts of failure modes.

[0] I don't mean this as a slight against those developers--the issues that arise from distributed systems are both myriad and subtle, and if you've spent your time learning how to make beautiful web pages or cool video games or efficient embedded systems, it seems reasonable to not know anything about the accursed problems of hypothetical Byzantine Generals. Or maybe you're fresh out of a bootcamp or an undergraduate program and haven't yet been trained to expect computers to always and constantly fail in every possible way.

cowl · on March 1, 2023

Because both of this "solutions" are not part of the delivery mechanism but part of your problem space. So the delivery system is not guaranteeing even a fake exactly-once delivery, it's you usage that makes it a fake exactly once. What's more both of these solutions are very hard in practice. Idempontency can be applied only on special circumstances when you can design it that way. "Prepare an order" message for example can't be idempotent, it has side effects and it will prepare a new order every time you recieve the message, so you go the deduplication Route by considering the OrderID but if you have several Workers that process these messages how do you handle DeDuplication? if the first worker has never Ack-ed the processing, do you deliver it to a new Worker in the queue? How does the new Worker know if someone else is processing the same OrderID? Central Database? you are only hitting the can down the road...

majormajor · on March 1, 2023

It can be very hard to get idempotency right.

It can get way harder when your initial design made incorrect assumptions about the delivery semantics you were using, so you didn't know you'd need it.

Edit for example:

Someone could have a low-latency problem that seems like it could be a fit for a streaming application. They could look at docs and see "ooh, with Flink I can do exactly-once writes to Kafka" in one place, and choose to use that. But if they don't dig deeply into what that means, they may miss the latency impacts of having to checkpoint every time to commit a set of writes to Kafka. And by the time they figure this out, managing both "low latency" and "exactly once" in the code they wrote might be a really hairy problem.

hn_go_brrrrr · on March 1, 2023

The distinction is how you design. You don't need idempotence with a mythical "exactly once" system. Conversely, when you're debugging a system built on top of "at least once", you need to keep that property in mind in case the bug you're tracking down is lost idempotence.

kevincox · on March 1, 2023

Because idempotence can be very hard to achieve. You usually can't just write the message ID to a DB and ignore messages with a matching ID because if you crash while processing then you need to start over again. But you can't just write it at the end because then all of your processing steps need to be idempotent (so why are you bothering to write the ID?).

I've seen very few systems that have general idempotency baked in. Often it ends up being specific to the application. In some cases you can have simple solutions like upon crashing reload all of the state from an authoritative source. In some cases your messages result in simple idempotent operations such as "insert message with a unique ID" or "mark a message with a unique ID as read" but even then these are becoming quite related to business logic.

Basically idempotency is a powerful tool to create a solution but it is no silver bullet. That is why it is important to understand the underlying problem.

pksebben · on March 2, 2023

reading your comment, it dawned on me; there is a way to theoretically ensure exactly-once delivery.

1. buy plane ticket 2. bring box to recipient 3. plug in Ethernet & send message

keep an eye out for our IPO

yencabulator · on March 2, 2023

That's at-most-once.

jerf · on March 1, 2023

I think we need to keep the concepts separate because otherwise people get confused. You can not receive a message exactly once. Yes, it's not that hard, if you know this is an issue, to build a system where receiving the same message more than once won't cause a bad thing to happen. There's a few principled ways to do this, and some less principled ways that will still mostly work.

But that's not because you built a system that successfully delivers messages exactly once... you build a system that successfully processes messages exactly once, even if delivery occurs multiple times. The delivery still occurred multiple times. Even if your processing layer handled it, that may have other consequences worth understanding. Wrapping that up in a library may present a nice API for some programmer, but it doesn't solve the Byzantine General problem.

Whenever someone insists they can build Exactly Once with [mumble mumble mumble great tech here] I guarantee you there's a non-empty set of human readers coming away with the idea they can successfully create systems based on exactly-once delivery. After all, I built some code based on exactly-once delivery last night and it's working fine on my home ethernet even after I push billions of messages through it.

We're really better of pushing "There is no such thing as Exactly Once, and the way you deal with is [idempotence/id tracking/whatever]", not "Yes there is such a thing as Exactly Once delivery (see fine print about how I'm redefining this term)". The former produces more accurate models in human brains about what is going on and is more likely to be understood as a set of engineering tradeoffs. The latter seems to produce a lot of confusion and people not understanding that their "Exactly Once" solution isn't a magic total solution to the problem, but is in fact a particular point on the engineering tradeoff spectrum. In particular, the "exactly once" solutions can be the wrong choice for certain problems, like multiplayer game state updates, where it may be a lot more viable to think 1-or-0 and some timestamping and the ability to miss messages entirely and recover, rather than building an "exactly once" system.

naasking · on March 1, 2023

> But that's not because you built a system that successfully delivers messages exactly once... you build a system that successfully processes messages exactly once, even if delivery occurs multiple times.

I think the difference might be partly semantic. If processing at the messaging level is idempotent + at least once, then message delivery to the application level is exactly once. People mostly only care about the application level not the lower levels where they might just build on a library or system that handles that logic for them.

jerf · on March 1, 2023

I'd say it's entirely semantic. I'm very much arguing for where to draw the definition lines in the terms we use. It won't change the code one bit (give or take a few different names on things). I definitely think understanding carefully the issues involved in delivery, and understanding the various solutions to that problem, is the way to go, not to blur the questions of delivery and handling into one atomic element. They're not atomic.

Alternatively we could come up with names for all the other combinations of delivery mechanism and handling mechanism, but since you can easily see we hit an NxM-type problem on that, this may well help elucidate why I think it's a bad idea to try to combine the two into one term. It visibly impairs people's ability to think about this topic clearly.

naasking · on March 1, 2023

Well, my argument for erasing that line is that you generally don't care about TCP packets or SSL handshakes and such, so why is this one property relevant if it can be punted to a lower layer just like those others?

I'll grant that it matters if you're trying to debug some problem and trying to find at what layer it failed, but it's basically the same process you use to debug all of those other layers too, so I'm not sure why this layer deserves special consideration.

doctor_eval · on March 1, 2023

AFAIK the point of exactly once delivery, in the context of message passing, is to abstract delivery concerns away from the application layer and into the messaging layer, so that the application can depend on the exactly-once semantics without having to write logic for it.

The problem with this is similar to the problems with two-phase commit in distributed databases: there are unavoidable failure cases. Most of the time it works just fine, but if you write your application to depend on this impossible feature, and it fails - which, given enough time, will certainly happen - then the cleaning up the mess can be much more effort (and have much wider business implications) than simply dealing with the undesirable behaviour of reality in the first place.

Or to put it another way: exactly once semantics can never be reliably extracted away from the application, so if you need it, it needs to be part of your application.

tunesmith · on March 1, 2023

This is called "Effectively Once".

tunesmith · on March 2, 2023

(first heard it coined by Viktor Klang)

FooBarWidget · on March 1, 2023

Theoretically true, and easy to say. But the hard part is actually implementing this in the context of business problems. What if you need to call external services that you don't control, and they don't provide idempotence? Like sending emails. Or worse: you send a message to a warehouse to deliver an item, and they deliver duplicates...

lll-o-lll · on March 1, 2023

Yeah the duplicate email thing is a classic problem, but I’m not sure it’s one of “idempotence”. This can happen in any (intended to be) transactional operation that creates a side affect.

Hit an error, roll-back, side-affect can’t be rolled back. Retry - side-affect happens again.

Wouldn’t the general approach be to have unique message identifiers and queue side-affects? Maybe I’m missing lots of subtleties.

Mavvie · on March 1, 2023

Email is absolutely something that requires idempotence to avoid sending duplicates. Even if your code is perfect and you don't send emails until after you commit your transaction, the actual http request to the email provider could fail in a way where the email is sent but your system can't tell.

Idempotency (either via a token in the request, or another API to check the result of the previous request) is required to prevent duplicates. And this requires the third party service to support idempotency; there's nothing you can do on your side to enable it if their service doesn't support it.

HiJon89 · on March 3, 2023

What if your system is the one actually sending the emails (ie, you are the 3rd party in this scenario)

purpleblue · on March 1, 2023

It's not "equal".

If you guarantee "exactly once", you design your systems differently than "at least one with idempotence". A system designed for exactly once will be less complicated than a system designed for at least once + idempotence, which is why it is ideal but impossible.

stonemetal12 · on March 1, 2023

Until the bill comes anyway. Having to provision extra bandwidth for useless dups, extra processing power for useless updates, etc.

paxys · on March 1, 2023

So, the opposite of exactly once

fizwhiz · on March 1, 2023

With idempotence, you shift the problem from "deliver X exactly once" to "make it seem like X was delivered exactly once". In most systems, exactly-once is really "effectively exactly once".

paxys · on March 1, 2023

That's my point. You are simply converting the problem to a new form, not actually solving it.

Hey here's a solution to the halting problem – always assume yes, and then figure out the edge cases. How do you do that? Well that's on you, I did my job.

In a distributed system that needs exactly-once delivery, implementing perfect idempotence is equally impossible.

burnished · on March 1, 2023

Converting a problem to a new form that you know how to better solve, or at least hope is more tractable, is a time honored mathematical and CS tradition

nawgz · on March 1, 2023

Idempotency - famously complex. No one has ever successfully implemented it, great point.

paxys · on March 1, 2023

If you don't think idempotency can be complex then you haven't really worked on serious distributed computing problems.

nawgz · on March 1, 2023

If you don't think your analogy is a miss then you haven't really read any serious literature.

naasking · on March 1, 2023

It can be exactly once at the application level just not exactly once at the more fine-grained message level. The fact that it's not exactly once at that lower level doesn't really matter, the semantics at the application level is what we care about.

kybernetikos · on March 1, 2023

Exactly. In practice there are probably a bunch of other things happening over the wire we also don't care about, handshakes and keepalives and negotiation and splitting and encryption and latency measuring and backpressure... It doesn't matter, in a variety of systems, at the application layer it is fine for the user to assume they will see a delivery to their code exactly once and that's what the user cares about. A delivery didn't mean some internal bytes travelled across the wire, it means your clients received a call.

That's why if you search for exactly once delivery you'll see a bunch of products claiming to have it (e.g kafka).

sokoloff · on March 1, 2023

Not exactly. If you have a business problem where you’re thinking “But I really, really need the effect of exactly-once; what can I do?”, GP’s post has the answer.

fsckboy · on March 1, 2023

OP's idea should be

idempotence + at least once

idempotence isn't necessarily commutative.

echelon · on March 1, 2023

No, if your datastore is online (the only way you're functioning anyway), store an idempotency key, vector clock, etc. with transactional semantics.

In active / active setups, there are other strategies such as partitioning and consensus.

dilyevsky · on March 2, 2023

Good thing most things in the real world are idempotent then!

hackerdad · on March 1, 2023

This!