Hacker News new | past | comments | ask | show | jobs | submit login
You cannot have exactly-once delivery (2015) (bravenewgeek.com)
189 points by babelfish on March 1, 2023 | hide | past | favorite | 217 comments



My company (among other things) routes print-on-demand orders to various print companies. Some of their APIs have mechanisms to ensure idempotency, some don't. The last time I pressed the issue, I was asked - and I quote - "Can't you just send the order only once?"

The thing is, having a print company that gets the printing part right is more important than having one that gets the API right. I use them anyway, and accept the risk that there will very occasionally be duplicate orders. At least in my business, it's just tshirts.

A few months ago I bought a fairly expensive cordless vacuum from hoover.com. I was charged once, but two of them arrived. I suspect I know why.


I have these sorts of facepalmable discussions from time to time. The last time was with a company that provides an invoicing service through an API. They are (and I am not shitting you) unable to ensure unique invoice numbers and told me to "just don't try to open and issue more than one invoice at any time".


> The last time I pressed the issue, I was asked - and I quote - "Can't you just send the order only once?"

What did they suggest you should do if you sent the order once and it didn't arrive?


Next time you fly, ask the person processing the check in how frequently the following happens: Somebody arrives with a ticket reservation confirmed, money charged to credit card, but no actual Eticket at Sabre can be found that will allow them to fly.

It's something I have been tracking over the years, rare but happens. Again some non transactional, non idempotent integrations setup out there...


I suspect they also don't have an API to check the orders you sent?


That would make it too easy! Most just offer get-by-id (their ID, of course). Some don't even offer that; they just post a webhook on shipment.


> There are essentially three types of delivery semantics: at-most-once, at-least-once, and exactly-once.

Oh, there's a fourth kind: "none-of-the-above", i.e., neither at-most-once or at-least-once. The message gets delivered between [0, ∞] times. Maybe it gets delivered … maybe not. Your message is like UDP packet.

A surprising number of systems exhibit this behavior, sadly.


> A surprising number of systems exhibit this behavior, sadly.

I noticed [0, ∞] delivery semantics in a widely-used, internal/homegrown message delivery system at a big tech company once. The bug was easy to spot in the source code (which I was looking at for unrelated reasons), but the catch-22 is that engineers with the skills to notice these sorts of subtle but significant infra bugs are the same engineers who would've advised against building (or continuing to use) your own message delivery system in the first place when there are perfectly serviceable open source and SaaS options.


I think whether or not to build your own thing isn’t an obvious choice for a big company. You might have lots of other infrastructure to integrate with and adapting an existing solution might not work as well as making something from scratch that eg integrates with your storage layer or how you manage permissions. The choice may be between having a team dedicated to managing some third party thing (because in the real world these systems have bugs or cases that don’t perform well or need more hardware or rebalancing or whatever) and having a team dedicated to developing and managing an internal solution. The latter case can mean you get to take advantage of existing infrastructure, have less work integrating with this thing, and have in-house experts who can investigate and fix bugs.

I don’t think it’s as simple as always preferring to bring in external things.


I guess you mean "[0, ∞)". You cannot have infinitive delivery in computer systems.


Oh IDK.

So, if you look at it juuuust right, ELK can be considered as delivering messages … log messages. Normally our ELK system exhibited the behavior I described above: our logging component would make best efforts to ensure that log messages did get delivered. But there was no ID on the log message when it was submitted: double-submission would result in the message getting duplicated. The message was only removed from the queue of messages that needed to be submitted if the ACK was successfully received. The local queue was only so big: if the application continued to log but couldn't submit to ELK, well, it would just discard the messages², so messages might not get delivered period, even when the client was fine, (e.g., during network outages).¹

That was all fine and good (the consequences of a log message getting delivered twice or not delivered under normal circumstances is "whatever").

One day, the team managing ELK misconfigured it, causing all log submission to start failing. This … didn't get detected? (I mean, as a consumer of the service, what are you going to do, log the error?) Worse, some logs were first routed through the local syslog daemon. It decided to log that it couldn't send the message to a local log file, and then retry without backup, and never gave up. (And, of course, there's not log rotation on that local log file.) So we noticed the problem when the disk filled up … and then realized basically every VM in the fleet was doing this. So it felt like ∞] that day.

But yes, mathematically, that bound should be ∞).

¹there is some decent discussion in the thread below my OP about whether this technically counts as "0 deliveries".

²the logic being that crashing due to inability to log is not worth it. It noted the failure on its stderr … but you had to know to look.


if you don't explicitly re-transmit the UDP packet. It would be at-most-once delivery right? The IP routers in the path will not re-send the UDP packet on their own!


"The IP routers in the path will not re-send the UDP packet on their own!"

First rule of networking: Every bad thing that can happen, will.

If nothing else, by sheer bugginess you will certainly have something, at some point, retransmitting UDP packets for no good reason.


If you’re admitting into the discussion literally any conceivable buggy behavior from public routers you have no control over, then isn’t it pretty clear you can’t have any guarantees about anything whatsoever?


Hello and welcome to the internet. Enjoy your stay!

I think that is the point of these conversations. You shouldn't have application expectations that cannot be met in the real world. Even in your own data center you have far less control over your mirrored switches doing something dumb like sending a stream of packets twice out of their respective interfaces.


In some contexts, sure, you could say literally nothing is "guaranteed." But in this context, we're clearly talking about how distributed systems work in practice, and which behaviors you can reasonably achieve and what tradeoffs you must make. The article clearly states:

> There are essentially three types of delivery semantics: at-most-once, at-least-once, and exactly-once. Of the three, the first two are feasible and widely used.

But in the context you're talking about, even those first two are not "feasible." And in the context you're talking about, there's basically no point in talking about anything you can do to try to achieve certain behavior since there is technically no way to physically guarantee that. What you are saying is definitely not "the point of these conversations," because you're just saying that literally nothing can ever be absolutely guaranteed. Technically true, but not particularly helpful when you're designing real systems to solve real problems, and your systems will mostly operate on infrastructure that mostly does work as intended.


I work in financial data — lots of big ol’ UDP multicast streams getting replicated by big routers in private data centers on private lines.

In that context, everything bad can happen, but it rarely does. For example: packets most definitely get dropped, but not every packet does in practice. Packets can get double-sent, but again, not every packet in practice. Packets can get corrupted randomly, but again, not every packet in practice.

The hard mathematical guarantee: nothing at all! You will get a packet from 0 to N times, where N might be arbitrarily large.

The in-practice behavior: mostly kinda works, but be careful! Have a strategy for reordering, a strategy for detecting and handling dupes, and a strategy for when you need to just give up and start from scratch.

The internet is like that but if some router is screwing it up, you can’t call the owner and complain.


You're not trying hard enough. I spent an afternoon on the phone with an ISP in Boston letting them know their BGP was hosed, and consequently black hole'ing my traffic. I had the traceroutes and whois data to prove it.

Nowadays though, it is getting much harder to do so however. I know there's a secret IRC of black belt NetOps out there, but I haven't managed to figure out how to route myself there yet.

(Rumor has it there's a router out there with a reliably flaky network card that'll mangle the packets juuuuuust right. Personally, I think it's all hooey and someone has a really neat IPTables file complete with port knocking)


There's a bit of daylight between "feasible" and "guaranteed".

It turns out it really isn't that much daylight, and it's really easy to overestimate the size of that window. As you scale up, that window gets smaller and smaller, too, which makes it even more exciting.

Nevertheless, it is indeed where real systems are built.

I also disagree that realizing that nothing can be guaranteed is not important to building systems. The more you scale up, the more important it is. It's hugely important. It's one of the major things that separates people who can build real network systems at scale and those who can't. Far from the only such thing, but certainly one of the important ones.


In a broadcast scenario it only takes a temporary instability in the routing tables to cause packets to be duplicated. It's less of an issue nowadays, but you don't control the Internet. It's not a bug because it's not guaranteed that a message won't be delivered twice.


take it up with the authors of RFC 791. duplicate packets are allowed. the presence of one does not indicate a bug.


> you can’t have any guarantees about anything whatsoever?

On public networks, yes.

Of course in many cases andyou have to limit what edge cases you deal with unless you have infinite development time, but unexpectedly repeated UDP packets are definitely something that happens often enough to account for it if your protocol could be adversely affected by it.


I'd like to send the request 0 times and still end up with the expected result, thank you very much


"I'm not a mind reader" --router in rack 3


AI will solve this problem and create countless others.


Buffer bugs can and will happen.

Packets can and will be queued in multiple outgoing interfaces.

Dumb shit happens in network kit.

Also, the new SDN stuff sends packets multiple times over different paths on purpose. It's supposed to discard everything except the one that got there first, but...


In TCP in particular there can be a lot of duplication due to missed packets or packets that arrive just a little too late.

On noisy wifi you're transferring data, and the destination finally gets enough packets to send an ACK for the sliding window, only the source never gets the ACK, so it sends the packets it thinks you want but already have. Some of those get through, and the destination realizes it needs to send the ACK again because clearly you didn't get it the first time. Finally you resync and start getting new data, until the next cup of coffee goes into the microwave and it all repeats again.

Since many UDP protocols end up re-implementing half of TCP, you're going to have some of the same failure modes.


An easy-to-imagine scenario is that a router with multiple egress paths emits the UDP packet on multiple interfaces. The packets take different routes but both arrive at the destination.


With multiple egress paths different packets from the same stream can be sent using different routes and arrive to the destination at different order but each packet should be delivered at most once. Unless there is a bug in the router which duplicates packet - possible in theory but in 20+ year practice I haven't see one. This is not to say routers are bug free, just this exact problem is not what I would expect.


Some bugs are just misconfigurations that didn't explode loudly enough to get immediate attention.


Switch A and B are connected to each other with two links, we'll call those 1 & 2.

They're supposed to run STP or bonding or something to make 1 & 2 a logical single link, but that's misconfigured, misbehaving, or just plain old buggy.

By sheer bad luck, switch B has currently overflowed its MAC lookup table, and is falling back to broadcast for your destination MAC.

You send a packet to switch A. Switch A looks up the destination MAC, and forwards the packet to link 1. Switch B receives the packet, has no idea where that MAC is, and forwards it to links 2-n. Switch A receives the packet, looks up the destination MAC, and and forwards the packet to link 1. Rinse, repeat. Observe packets sent by switch B on ports 3-n.


its very much not likely to happen given the implementations, but the IP layer does not guarantee at-most-once delivery


Not with multi-casting.


Ah, the celery semantics.


I'd like to get [0, ∞] printed on a t-shirt.


So that would be 'at least once'. Since if it is 0, it is not a delivery.


That's silly. If we followed your logic, there is no such thing as "at most once" delivery. It's actually "exactly-once" delivery, since if it is 0 it is not a delivery.


Exactly Once = At least once + Idempotence


Which the author admits three quarters of the way through:

> The way we achieve exactly-once delivery in practice is by faking it. Either the messages themselves should be idempotent, meaning they can be applied more than once without adverse effects, or we remove the need for idempotency through deduplication.

Honestly I don't get why this is "faking it" though. It seems like the author's definition of "exactly once" is so purist as to essentially be a strawman. This is "exactly once" in practice.

Like are there other people claiming that this purist version of exactly-once does exist?


> Like are there other people claiming that this purist version of exactly-once does exist?

In my experience, the purist version of "exactly-once" exists as a vague, wishy-washy mental model in the brains of developers who have never thought hard about this stuff[0]. Like, once you sketch out why idempotency is important and how to do it, folks seem to pick up on it pretty quickly, but not everyone has trained their intuition to where they automatically notice these sorts of failure modes.

[0] I don't mean this as a slight against those developers--the issues that arise from distributed systems are both myriad and subtle, and if you've spent your time learning how to make beautiful web pages or cool video games or efficient embedded systems, it seems reasonable to not know anything about the accursed problems of hypothetical Byzantine Generals. Or maybe you're fresh out of a bootcamp or an undergraduate program and haven't yet been trained to expect computers to always and constantly fail in every possible way.


Because both of this "solutions" are not part of the delivery mechanism but part of your problem space. So the delivery system is not guaranteeing even a fake exactly-once delivery, it's you usage that makes it a fake exactly once. What's more both of these solutions are very hard in practice. Idempontency can be applied only on special circumstances when you can design it that way. "Prepare an order" message for example can't be idempotent, it has side effects and it will prepare a new order every time you recieve the message, so you go the deduplication Route by considering the OrderID but if you have several Workers that process these messages how do you handle DeDuplication? if the first worker has never Ack-ed the processing, do you deliver it to a new Worker in the queue? How does the new Worker know if someone else is processing the same OrderID? Central Database? you are only hitting the can down the road...


It can be very hard to get idempotency right.

It can get way harder when your initial design made incorrect assumptions about the delivery semantics you were using, so you didn't know you'd need it.

Edit for example:

Someone could have a low-latency problem that seems like it could be a fit for a streaming application. They could look at docs and see "ooh, with Flink I can do exactly-once writes to Kafka" in one place, and choose to use that. But if they don't dig deeply into what that means, they may miss the latency impacts of having to checkpoint every time to commit a set of writes to Kafka. And by the time they figure this out, managing both "low latency" and "exactly once" in the code they wrote might be a really hairy problem.


The distinction is how you design. You don't need idempotence with a mythical "exactly once" system. Conversely, when you're debugging a system built on top of "at least once", you need to keep that property in mind in case the bug you're tracking down is lost idempotence.


Because idempotence can be very hard to achieve. You usually can't just write the message ID to a DB and ignore messages with a matching ID because if you crash while processing then you need to start over again. But you can't just write it at the end because then all of your processing steps need to be idempotent (so why are you bothering to write the ID?).

I've seen very few systems that have general idempotency baked in. Often it ends up being specific to the application. In some cases you can have simple solutions like upon crashing reload all of the state from an authoritative source. In some cases your messages result in simple idempotent operations such as "insert message with a unique ID" or "mark a message with a unique ID as read" but even then these are becoming quite related to business logic.

Basically idempotency is a powerful tool to create a solution but it is no silver bullet. That is why it is important to understand the underlying problem.


reading your comment, it dawned on me; there is a way to theoretically ensure exactly-once delivery.

1. buy plane ticket 2. bring box to recipient 3. plug in Ethernet & send message

keep an eye out for our IPO


That's at-most-once.


I think we need to keep the concepts separate because otherwise people get confused. You can not receive a message exactly once. Yes, it's not that hard, if you know this is an issue, to build a system where receiving the same message more than once won't cause a bad thing to happen. There's a few principled ways to do this, and some less principled ways that will still mostly work.

But that's not because you built a system that successfully delivers messages exactly once... you build a system that successfully processes messages exactly once, even if delivery occurs multiple times. The delivery still occurred multiple times. Even if your processing layer handled it, that may have other consequences worth understanding. Wrapping that up in a library may present a nice API for some programmer, but it doesn't solve the Byzantine General problem.

Whenever someone insists they can build Exactly Once with [mumble mumble mumble great tech here] I guarantee you there's a non-empty set of human readers coming away with the idea they can successfully create systems based on exactly-once delivery. After all, I built some code based on exactly-once delivery last night and it's working fine on my home ethernet even after I push billions of messages through it.

We're really better of pushing "There is no such thing as Exactly Once, and the way you deal with is [idempotence/id tracking/whatever]", not "Yes there is such a thing as Exactly Once delivery (see fine print about how I'm redefining this term)". The former produces more accurate models in human brains about what is going on and is more likely to be understood as a set of engineering tradeoffs. The latter seems to produce a lot of confusion and people not understanding that their "Exactly Once" solution isn't a magic total solution to the problem, but is in fact a particular point on the engineering tradeoff spectrum. In particular, the "exactly once" solutions can be the wrong choice for certain problems, like multiplayer game state updates, where it may be a lot more viable to think 1-or-0 and some timestamping and the ability to miss messages entirely and recover, rather than building an "exactly once" system.


> But that's not because you built a system that successfully delivers messages exactly once... you build a system that successfully processes messages exactly once, even if delivery occurs multiple times.

I think the difference might be partly semantic. If processing at the messaging level is idempotent + at least once, then message delivery to the application level is exactly once. People mostly only care about the application level not the lower levels where they might just build on a library or system that handles that logic for them.


I'd say it's entirely semantic. I'm very much arguing for where to draw the definition lines in the terms we use. It won't change the code one bit (give or take a few different names on things). I definitely think understanding carefully the issues involved in delivery, and understanding the various solutions to that problem, is the way to go, not to blur the questions of delivery and handling into one atomic element. They're not atomic.

Alternatively we could come up with names for all the other combinations of delivery mechanism and handling mechanism, but since you can easily see we hit an NxM-type problem on that, this may well help elucidate why I think it's a bad idea to try to combine the two into one term. It visibly impairs people's ability to think about this topic clearly.


Well, my argument for erasing that line is that you generally don't care about TCP packets or SSL handshakes and such, so why is this one property relevant if it can be punted to a lower layer just like those others?

I'll grant that it matters if you're trying to debug some problem and trying to find at what layer it failed, but it's basically the same process you use to debug all of those other layers too, so I'm not sure why this layer deserves special consideration.


AFAIK the point of exactly once delivery, in the context of message passing, is to abstract delivery concerns away from the application layer and into the messaging layer, so that the application can depend on the exactly-once semantics without having to write logic for it.

The problem with this is similar to the problems with two-phase commit in distributed databases: there are unavoidable failure cases. Most of the time it works just fine, but if you write your application to depend on this impossible feature, and it fails - which, given enough time, will certainly happen - then the cleaning up the mess can be much more effort (and have much wider business implications) than simply dealing with the undesirable behaviour of reality in the first place.

Or to put it another way: exactly once semantics can never be reliably extracted away from the application, so if you need it, it needs to be part of your application.


This is called "Effectively Once".


(first heard it coined by Viktor Klang)


Theoretically true, and easy to say. But the hard part is actually implementing this in the context of business problems. What if you need to call external services that you don't control, and they don't provide idempotence? Like sending emails. Or worse: you send a message to a warehouse to deliver an item, and they deliver duplicates...


Yeah the duplicate email thing is a classic problem, but I’m not sure it’s one of “idempotence”. This can happen in any (intended to be) transactional operation that creates a side affect.

Hit an error, roll-back, side-affect can’t be rolled back. Retry - side-affect happens again.

Wouldn’t the general approach be to have unique message identifiers and queue side-affects? Maybe I’m missing lots of subtleties.


Email is absolutely something that requires idempotence to avoid sending duplicates. Even if your code is perfect and you don't send emails until after you commit your transaction, the actual http request to the email provider could fail in a way where the email is sent but your system can't tell.

Idempotency (either via a token in the request, or another API to check the result of the previous request) is required to prevent duplicates. And this requires the third party service to support idempotency; there's nothing you can do on your side to enable it if their service doesn't support it.


What if your system is the one actually sending the emails (ie, you are the 3rd party in this scenario)


It's not "equal".

If you guarantee "exactly once", you design your systems differently than "at least one with idempotence". A system designed for exactly once will be less complicated than a system designed for at least once + idempotence, which is why it is ideal but impossible.


Until the bill comes anyway. Having to provision extra bandwidth for useless dups, extra processing power for useless updates, etc.


So, the opposite of exactly once


With idempotence, you shift the problem from "deliver X exactly once" to "make it seem like X was delivered exactly once". In most systems, exactly-once is really "effectively exactly once".


That's my point. You are simply converting the problem to a new form, not actually solving it.

Hey here's a solution to the halting problem – always assume yes, and then figure out the edge cases. How do you do that? Well that's on you, I did my job.

In a distributed system that needs exactly-once delivery, implementing perfect idempotence is equally impossible.


Converting a problem to a new form that you know how to better solve, or at least hope is more tractable, is a time honored mathematical and CS tradition


Idempotency - famously complex. No one has ever successfully implemented it, great point.


If you don't think idempotency can be complex then you haven't really worked on serious distributed computing problems.


If you don't think your analogy is a miss then you haven't really read any serious literature.


It can be exactly once at the application level just not exactly once at the more fine-grained message level. The fact that it's not exactly once at that lower level doesn't really matter, the semantics at the application level is what we care about.


Exactly. In practice there are probably a bunch of other things happening over the wire we also don't care about, handshakes and keepalives and negotiation and splitting and encryption and latency measuring and backpressure... It doesn't matter, in a variety of systems, at the application layer it is fine for the user to assume they will see a delivery to their code exactly once and that's what the user cares about. A delivery didn't mean some internal bytes travelled across the wire, it means your clients received a call.

That's why if you search for exactly once delivery you'll see a bunch of products claiming to have it (e.g kafka).


Not exactly. If you have a business problem where you’re thinking “But I really, really need the effect of exactly-once; what can I do?”, GP’s post has the answer.


OP's idea should be

idempotence + at least once

idempotence isn't necessarily commutative.


No, if your datastore is online (the only way you're functioning anyway), store an idempotency key, vector clock, etc. with transactional semantics.

In active / active setups, there are other strategies such as partitioning and consensus.


Good thing most things in the real world are idempotent then!


This!


> We must choose between the lesser of two evils, which is at-least-once delivery in most cases. This can be used to simulate exactly-once semantics by ensuring idempotency or otherwise eliminating side effects from operations.

There is a third option besides idempotency and eliminating side-effects: give each message a unique ID, use that to keep a record of which messages have been processed, and don't process the same message twice.


That is quite precisely idempotency.

But I wouldn't get too worked up over it. The article is basically saying: You can't have exactly-once delivery (unless you take the necessary steps to ensure you have exactly once delivery).


> That is quite precisely idempotency.

Well, it's one way of implementing a kind of idempotency. But idempotency in general is more complicated than just deduping messages. For example, "Toggle the power state" is not idempotent because the result state depends on the initial state. You might think that "turn the power on" is idempotent, and by itself it is, but in conjunction with "turn the power off" it is not because the order in which they are processed matters. A truly idempotent message would be something like, "Insure that the number of on-off cycles at time T1 is N, at time T2 is N+1" etc.

Idempotency is in general more powerful and more complicated than deduping.


Sure, but you said that assigning unique ID's to messages and not executing the same ID twice was a "third option", besides establishing idempotency. I'm saying it's not a "third option", but instead it literally is "establishing idempotency".


Idempotency of message reception is different than idempotency of message execution.


I guess we'll just have to agree to disagree about that.

Here is a quote from the original article that supports my position:

"Therefore consumer applications will need to perform deduplication or handle incoming messages in an idempotent manner. ... The way we achieve exactly-once delivery in practice is by faking it. Either the messages themselves should be idempotent, meaning they can be applied more than once without adverse effects, or we remove the need for idempotency through deduplication."


How does this work? This means the consumers have to somehow know the IDs of every message that's been sent and if it's been processed.

Either you have a shared database and all consumers are local, in which case why are you passing messages at all, or you have a distributed system somehow. If you have a distributed system, you've got this problem, and going recursive probably won't help much.

Or I've missed something.


It depends what you mean by “processed”. If there’s some particular action you want to take in response to the message, then you put the database next to wherever that action takes place.

If it’s a notification displaying on a phone, the phone holds a DB that tracks what notifications have been shown.

You can’t reliably show a notification on exactly one of a user’s devices. But that’s no biggie; display it exactly once per device, remove it everywhere when acknowledged anywhere.


That sounds like at-least-once delivery with recipient systems tracking known received messages. Which is to say faking exactly-once with the power of idempotency, rather than exactly-once.

This is academic with enough abstraction, but when you're designing a system and implementing the work of processing a message it can be a pretty important distinction. Especially if you're past the point where the list of seen items becomes a chokepoint for synchronization between consumers.

> You can’t reliably show a notification on exactly one of a user’s devices. But that’s no biggie; display it exactly once per device, remove it everywhere when acknowledged anywhere.

Potentially messy. Now you have two distributed system messages - the initial notification and the ack.


I don't understand why "fake" exactly-once delivery is fake. If I make you generate a random UUID in your system, then store the UUID in a centralised place in my system and ignore duplicate messages with that UUID, and you keep retrying until you get an ACK, why is that not exactly-once delivery?

Is it because the UUID is "only" random with some collision chance? (But then how come a sequential ID from your system wouldn't count?) Is it because my system needs to trust your system? (But why is trust a factor here?) Is it because my system needs a centralised database? (But our two systems are still distributed when you consider them together, right?) Is this a semantic argument over the meaning of "delivery", where I'm not allowed to impose requirements or check a database because the message has already been "delivered"? (But then why are we quibbling over semantics?) Is it because the message broker becomes stateful? (But why is that a constraint?)

I think from looking at the article that this is about delivery within a finite number of retries, but that seems like the kind of problem where in the real world we just ring each other when a message has been retried 50 times over the course of two days.


It's fake because the deduplication isn't a property of the messaging system, it's a property of the system consuming the messages. "Your system" is providing a workaround (duplication) for at-least-once delivery, not the message system.

I think it's also worth noting that your suggested workaround has actually recreated the problem it's intending to solve. If you mark the UUID as "received" as soon as you get a message, how do you deal with duplicates if the processing fails? If you mark the UUID as "received" when you're done processing, how do you deal with the possibility that you'll receive the message multiple times? This can get hairy very quickly.


We come back to the original argument, which is that at-least-once delivery of idempotent operations is how you represent things that should happen exactly once in a system, and in a saner world this could reasonably be called exactly-once delivery.

Every distributed systems engineer knows what "exactly-once delivery" is asking for, and in plain English it's valid to conflate the two, but for some reason the field has decided to treat the phrase as an annoying semantic pit trap for the unwary. Want to add an ID to your transactions to make them idempotent? Well, even though your transactions are now recorded exactly once, that wasn't technically delivery! Gotcha!


The issue is precision in terminology and resulting understanding. At-least-once delivery with idempotency imposes certain rules on the processing of messages. As you say, every distributed systems engineer knows this.

When you call something exactly-once, people who are perhaps not distributed systems engineers make the reasonable assumption that this means exactly what it says. They will engineer around this reasonable assumption based on a clear technical description and get something hilariously broken in non-obvious ways. This will have happened because jargon ("exactly-once delivery") has been confused for a technical description of a delivery system's properties.

Not everyone in this series of comments is a distributed systems engineer. Never mind everyone using a messaging system.


If you zoom out enough then every system looks like a black box and the only thing that matters is inputs and outputs. Messaging systems are no different in that respect from literally everything else. If you're building a messaging system, there's a world of difference in terms of how other people integrate with your software if you say "exactly-once" vs "at-least-once."


The solution to that last problem is a transaction log.


Thinking about it, we’re probably violently agreeing with just some differences in terminology.

I’m arguing that exactly-once delivery is possible if the message receipt happens in a single place, but maybe others don’t see that as a distributed system at all.

Potentially messy. Now you have two distributed system messages - the initial notification and the ack.

Not a problem, both are idempotent! :)


> I’m arguing that exactly-once delivery is possible if the message receipt happens in a single place, but maybe others don’t see that as a distributed system at all.

I think that only holds if the sender is also in the same place. Otherwise there's the very real chance of a message getting lost, turning your system from exactly-once into at-most-once. At this point your sender, consumer, and messaging systems are one system, so it's probably reasonable to question if that's a distributed system.


> This means the consumers have to somehow know the IDs of every message that's been sent

No, they only need the ids of messages they have received.


When do you record the message as processed? If you do it before the processing is complete, you have at-most-once delivery (because processing could fail after you've recorded it as processed, and it won't be retried). If you do it after the processing is complete, you have at-least-once delivery (because marking the message as processed could fail, and it will be retried).


The article does a bad job of communicating this, but that's what it calls deduplication.


If you want to reason about a world that has random software and hardware failures, than you cannot really have any kind of pure results. A backhoe could cut your network cable at exactly the wrong point, or a malfunctioning network switch could decide to insert the right extra few bytes in exactly the wrong place, changing the meaning of your message without altering any of the checksums. As the scale of your application increases, the chance of this sort of chaos increases as well. The question then becomes how to reason in the face of chaos, what sorts of error rates are acceptable, and how to build systems that can recover from supposedly impossible states. If your bank's software makes an error, they have established processes to determine that and correct the balance of accounts.


This line of "what if an asteroid hits the primary & DR data centers in the same microsecond" thinking is why we settled on running our product on 1 VM with SQLite in-proc.

After taking our customers through this same kind of apocalyptic rabbit hole conversation, they tend to agree with this architecture decision.

The cost of anticipating the .00001% that might never come is completely drowned out by the massive, daily 99%-certain headache that is managing a convoluted, multi-cloud cluster.

Many times the business owners will get the message and finally reveal that they have always had access to a completely ridiculous workaround involving literal paper & pen that is just as feasible in 2023 as it was in the 18th century.


In my experience the customers, and even the POs, are the easy ones to convince. “We get 99% of the uptime for 30% of the price? Great!”

It’s the resume-driven mid dev in the next office you’ve got to watch out for.


or the dev who spent a few nights and weekends rescuing the system after one of those 1% failures the customer, as it turns out, has no patience for at all


Disaster recovery is just one of many things that is much simpler in non-distributed systems.

You seem to be confusing a system that produces bad results 1% of the time with a system that's down 1% of the time. If you can only write the first kind of non-distributed system, you're in for a bad trip if you try to write a distributed equivalent.


Last month I had a deployment go wrong on one box, and the part of deployments outside of my control is all or nothing. No partial credit for 96% success. Some random consul call consumed the port we listen on, a shutdown timeout expired and the process was killed, and so that socket was left in CLOSE_WAIT (like seriously, Hashicorp, SO_LINGER has been around for at least 30 years).

This led to an existential crisis because given the number of ports we open and the number of machines we run and the number of processes per machine, there must be over a 0.1% chance of any deployment blowing up this way. We do hundreds a year in prod and probably hundreds a month in preprod. We've been winning the lottery this whole time.

Throw enough events around and a one in a million corner case will happen every week, every day, twice a day, three times in a row. That gets old really really quickly.


> SO_LINGER. Lingers on close if data is present. If this option is enabled and there is unsent data present when close() is called, the calling application program is blocked during the close() call, until the data is transmitted or the connection has timed out.

I had to look this one up for a refresher, but 100% violently agree - Such behavior certainly warrants a bug submission.


The obnoxious thing about CLOSE_WAIT is that it's supposed to time out after 2 minutes or 10 minutes, but I gave up and kicked the box out of the cluster after a half hour of trying to ask it nicely. Which is probably what everyone else does.


Maybe I didn't get the point. Of course we can't have exactly-once delivery directly in the layer of an unreliable network - but it seems pretty easy to construct it on a higher layer if your network stack supports at-least-once delivery: Just assign each message a unique ID during sending, then track those IDs in the receiver side and discard duplicates. And you need those IDs anyway so you can generate ACKs.

Isn't this basically what every "reliable" transport (TCP, HTTP3, message queues...) does?


Wherever you construct it you must necessarily have a machine whose failure mode is that "exactly once" degrades into either "at most once" or "at least once".

What determines which failure mode you get is whether the machine will failover to a machine that retries uncertain messages (giving you "at least once"), or it doesn't (giving you "at most once").

But, you say, why can't we have it failover to a machine that asks recipients what they have got and goes from there? Well we can, but the recipients don't know what messages are in the network still on their way to them.

But, you say, why not have the recipients disregard those inbound messages once they know about the replacement machine? Well you can do that, but now the *recipients* become machines whose job is to ensure the deduplication. And now *they* become the machine with a bad failure mode.

But, you say, does this not reduce the odds of failures? Why yes, it does. Which is why people do things like this. And there has to come a point where we accept SOME failure rate.

The alternative, well, read The Saddest Moment at https://scholar.harvard.edu/files/mickens/files/thesaddestmo... to see where madness leads.


> And now they become the machine with a bad failure mode.

What is the failure mode the recipients have here?


If the recipient fails after telling the deduping machine what it has seen, the recipient's failover will be in an unknown different state. And now the failover machine has to try to figure out what its state should be, what is going on, and so on. You can add ways to addressing each possibility here, and you'll get ever more obscure chains that again can result in failure. At the cost of ever increasing complexity, you'll make failures ever more complicated.

Frequently developers who implement these manage to do two bad things. First they introduced a lot of complex code which rarely triggers except in disaster, and so whose bugs tend to survive. And second, they manage to convince themselves that they have accomplished the impossible, and make reliability promises that other developers unwisely believe. Exactly how unreliable most systems were and how much the documentation couldn't be trusted was underappreciated until https://jepsen.io/ came along and started proving how bad most distributed software was.

Now it may seem bizarrely unlikely that you'll ever see this kind of situation. But failures often start from network congestion due to a packet storm. And the failures lead to chatty Byzantine fault tolerance protocols adding significant traffic. This causes cascading failures. And so a small, simple outage can escalate into a series of outages as ever more confused servers continue overwhelming the network with their futile attempts to discover what is supposed to be true. So complex combinations of failures occur together more often than most of us would expect.


Certainly the recipient can fail, this seems obvious; if a user's phone dies then your app is not going to work. Perhaps I have the wrong model/framing here, but I was thinking from the perspective of being resilient to any failure outside of the recipient device.


Ah, but distributed systems tend to be hooked together. So the recipient device may itself feed into something else. And in the case of a message queue, generally does.


> then track those IDs in the receiver side

Now you need a database. Do you also need exactly-once delivery to the database? Now the service is no longer stateless too, which means scalability is a problem. Maybe you decide to make it just an in-process cache for de-duping, but that needs expiring and now the semantics are exactly-once within a given time period, and not across service restarts.

We can definitely solve this with higher level constructs, but they're not free, and they can introduce the same issues themselves.

> Isn't this basically what every "reliable" transport (TCP, HTTP3, message queues...) does?

TCP does this, to solve retries at the TCP layer. HTTP3 does this to solve issues at the HTTP3 layer. Message queues might solve this for the message queue, depends. But none of these solve the product level, or the user experience level, or other higher levels where these issues still crop up. They're issues you have to solve at every layer in some way.


Or I just use a monotonically increasing ID and track the highest ID I've received in order. I might have to buffer a certain number of packets/messages/whatever to deal with out-of-order arrivals, but the entire state fits into RAM. (Edit: Actually I don't even have to. It's probably a good thing to do for efficiency, but in principle I can just drop out-of-order messages and wait until they are redelivered, hopefully in the correct order)

But yes, even if I needed some stuff to archieve it, that doesn't make it impossible as the OP claimed.

> But none of these solve the product level, or the user experience level, or other higher levels where these issues still crop up.

I don't understand this point. Do you have some examples?


If everything smells like shit, check your shoes.

How do you know what the last message you received is? You can crash in the middle of receiving a message. You can crash after you've written the ID to disk but before you've processed it, or you can crash after you've processed it but before you've written the ID to storage.

If you're a distribution box (which is quite, quite common in these message queue systems), you can get the message, and send it to a box that just powered off. You saw it, you recorded it, and now you're not sure if you forwarded it successfully or not.

Fun thing about power outages, not every box turns off at exactly the same nanosecond. PSUs are full of capacitors and inductors. Sometimes that's just enough to float through a brownout, too (and a bunch of machines booting can also cause a brownout)


> Or I just use a monotonically increasing ID and track the highest ID I've received in order.

This assumes you can generate monotonically increasing numbers. If you have many clients, now they all need to share a data source and may be performance bound by generating those numbers.

> Actually I don't even have to. It's probably a good thing to do for efficiency, but in principle I can just drop out-of-order messages and wait until they are redelivered, hopefully in the correct order

True (modulo first problem), but efficiency may be necessary here. With many clients, you may end up in a state where only a small fraction of messages get through successfully, and most traffic is unsuccessful, which is bad. This also makes performance commitments hard to maintain as it's perhaps just luck when a client manages to get a message through. Clients also now need more buffering, more state, etc.

>> But none of these solve the product level, or the user experience level, or other higher levels where these issues still crop up. >I don't understand this point. Do you have some examples?

Let's assume a simple client->server instant messaging app. As a user, if I send a message, I expect that to arrive exactly once. It's going over TCP which is "reliable", but it doesn't stop the HTTP request from failing and needing to be retried. It's using HTTP3, but that doesn't stop the server generating a 503 and needing to retry the POST request (or whatever). Maybe the server puts the message in a message queue, but that connection fails after sending a transaction commit, did it get committed?

Idempotency tokens or an equivalent mechanism do solve this, but there isn't one magic trick to solving it in some base layer technology like TCP, this needs to be solved again and again whenever you have distributed systems.

Also, this isn't just networking. Two processes on a single machine communicating via IPC may be effectively a distributed system! I've got some experience doing this on Android, and it's still hard.


> This assumes you can generate monotonically increasing numbers. If you have many clients, now they all need to share a data source and may be performance bound by generating those numbers.

You can assign an ID to your nodes and let them generate increasing numbers on their own. Node ID decides on a tie, and if one node sees a larger counter value appear, it adjusts its own counter so that it doesn't stay behind:

- https://en.wikipedia.org/wiki/Lamport_timestamp

- https://en.wikipedia.org/wiki/Vector_clock


This is a logical solution but not a full practical solution.

With this approach you'd still need to communicate the current clock number back to clients as otherwise one will get ahead and have all its traffic accepted, and others will fall behind and be unable to get traffic accepted. Even if an error causes a client to bump forwards to retry, by the time it has done that the number it is about to retry with may have been used.

Additionally, the aim is still to get exactly-once delivery, so clients need to be able to differentiate between an error caused by them reusing an ID that was rejected to enforce exactly-once delivery, and an error caused by another client getting that ID.

Basically, this issue is easy to solve with low traffic and reliable persistent storage everywhere, but hard to solve with high traffic, or the acceptance that all persistent storage brings additional reliability challenges.


You don't "just" do anything when networking is involved. You can check out a course on TCP/IP if you want to see some reasons that is not enough. What you're describing is essentially a part of the TCP and isn't sufficient for a basic implementation of that.


Do you also need exactly-once delivery to the database?

Yes, with uniqueness constraint.

the service is no longer stateless too, which means scalability is a problem

Do you have a specific problem in mind?


Yeah often the best way to tackle exactly-once delivery to a database is a uniqueness constraint, but that isn't free – there's the index cost, additional write cost, and the error needs to be handled when it's thrown back to the client on a collision (something many applications don't handle well).

Stateful services are far harder to scale than stateless ones. Typically a stateless service can be scaled out horizontally with relative ease, but when there's state storage involved this becomes harder. You can scale vertically, but only so far. You can scale horizontally, but then typically need to introduce some sort of sharding/consistent hashing/etc to ensure that the service has a consistent view of the world without needing to connect to every database instance.


Not sure where the expectation of things being free comes from. If your stating point is stateless then you can consider the tradeoff of introducing state vs. processing the same request multiple times.


It's probably more accurate to say that toy systems can maintain the illusion of exactly-once for a while, but it doesn't scale. You can't keep a record of every message ever seen in a message based system. Message passing systems exist to handle rates of traffic that cannot be achieved by rolling your own event queue as a thin wrapper around other tools like databases. It's not just the storage, it's the throughput.

The first time I encountered RabbitMQ it could only handle 60 messages per second with delivery guarantees. We already had our own bespoke system that used a database to handle a couple multiples of that. So we ended up limping along with what we had.


So when does the receiver record the IDs? When it receives the message but before processing or after it's processed the message? If the former, then what if it goes down during processing? Then the other receivers will keep rejecting the message even though it's never processed. So now it's less than once. If the receiver records it after it's done processing, then it could go down after processing but before recording it in the DB. So now you have more than once.

Also, isn't the assumption here that you will have a reliable connection to a shared DB?

You can have engineered solutions that is pragmatically close to deliver exactly once but it's not "pure" -- there are still scenarios, however unlikely, that it will fail.


Even if you record both when it arrives and when its completed a power failure mid way will still leave everything in between in a partial state unless everything has XA transactional behaviour and as we know that scales quite poorly.


XA scales well enough for many workloads. I believe that too many developers discard solutions like XA because it "scales quite poorly" without doing serious analysis of how much scalability they are likely to need and whether XA scales well enough to support it. On the flip side I believe that too many developers underestimate the complexity of managing state and failure in distributed systems.


XA transactions have the exact same problems, it doesn't solve it either.


The context is exactly once in a distributed system. When you construct that higher layer you will make your system highly coordinated, thus no longer a distributed system.


I mean, the property of distributed systems is that they crop up everywhere you have an unreliable transport and generally only bring downsides. If you could un-distribute your system that easily, I bet that would make a lot of people very happy.

The OP defines "distributed systems" like this:

> Web browser and server? Distributed. Server and database? Distributed. Server and message queue? Distributed.

By that definition, as soon as I have a server, a client and an unreliable connection between them, I have a distributed system. In that context, nothing stops me from counting IDs.


When you receive a message in your scheme, you have to do the following:

1. Some action with a side-effect (ex: update an entry on disk, send a message out to some third party, etc.). This might be a bank transaction, or a note saying "you gotta ship package X to person Y".

2. Some action to note that you've received ID X (ex: write to disk, send a message out to your DB, etc.)

How do you set up your server to deterministically do neither in event of a crash, and both in event that your code turns to completion?


If the action is sending a message, I’d say the answer is to just send it again, and defer the “exactly-once” problem to the point where the message is actually used. (It seems like we’re in a context where sending messages is always unreliable, so that send could fail, so there needs to be some protocol for retrying it anyway.)

So the action is some local action. If it’s purely digital, it’s fiddly but surely not impossible to ensure that the action and the record of the action either both take place or both don’t. It’s a database with a transaction log.

If the action is some irrevocable physical thing - remotely controlling a printer, say - you need to make a best effort to handle errors gracefully, sure.

I’d concede that it’s impossible to ensure that a document is printed out exactly once, say - maybe there’s a paper jam, and it’s debatable whether the jammed paper counts as a valid printout. But that’s not very surprising and I don’t think it tells you much. It’s mostly a problem for printer manufacturers rather than distributed system architects.


This issues are in the context of distributed systems where you want to be able to recover from losing a receiver (f.ex., we want to be able to reassign partitions for a Kafka topic when a consumer goes down). If you don't mind your system grinding to a halt whenever you lose one of your receivers (that's perfectly fine in some circumstances!), then your proposed solution works great.

Edit: also, i should be fair and acknowledge that you're effectively describing idempotency (i'm guessing you already knew that ;P ), which the article's author eventually points out is a way to recover "exactly-once" semantics. The point, maybe, is that someone needs to explicitly do this somewhere; you can't really rely on your protocol to do it for you.


Yes, but the receiver can be faulty. If it acknowledges the message and then crashes before handling it, you've got at-most-once, and if it handles the message and then crashes before acknowledging it, you've got at-least-once. You can avoid this if the receiver handles and acknowledges in a single transaction, but I only know of one platform that implements this and everyone hates it (hence the throwaway).


Even with a transaction, if the processing involves external side effects, e.g. sending and email, the rollback won't matter and you still get at least once.


There's no rollback, it's an atomic transaction. The certainty that messages are always handled completely or fail completely is one of the big design constraints that made the whole thing so hinky.


What if the receiver process fails? How do you know which messages it processed successfully? You can shuffle the problem around indefinitely but it doesn’t go away.

If your processing doesn’t have any external side effects (make external API calls, send emails, charge credit cards, etc) then one option is to put your message queue in a relational DB alongside all your other data. Then you can pull a message, process it, write the results to the DB, and mark the message as processed all inside one big transaction. But not many use-cases can fit these constraints and it also has low throughout.


> FLP and the Two Generals Problem are not design complexities, they are impossibility results.

These are impossibility results given various assumptions and requirements that may not hold in practice, or may be too restrictive. For one thing, i suppose we're pretty happy with a probabilistic solution as long as we can get the probability below an acceptable threshold.


Ok so maybe the letter gets sent more than once but the message gets sent exactly once, because the messages are numbered and you only process each number once.

If you get a letter with the same number you already read you don't even open it.

In Kafka this is also handled this way, events are numbered, and you request "latest" from the last one you processed.

In our eventstreaming it's also done like this, it may surprise you that Kafka is just an implementation of eventstreaming and not the same as.

With Kafka the offset of the consumers, or until which number it had already processed, used to be handled by the ZooKeeper but is migrated to the consumers.

There is on "exactly one consumer gets the message" done by the ZooKeeper, all consumers get all the messages from the topics they subscribed to. If you want exactly on exactly one consumer you should create different topics.

So not true.


I think there is a useful distinction here though -- you're by definition doing processing there on a "non-exactly-once delivery" system in order to get your "exactly-once" result, and by definition anything that requires exactly-once message delivery must do this process itself: it cannot abstracted away into a separate system.

So, in essence, you can never have "I will get this exactly once", and at best you can only ever have "I will have a plan for what to do if I get this more than once".


Looks like pure philosophical distinction.

The question here would be who "you" are. Are "you" the low-level system processing raw messages, or are "you" a system on top of that?

The high-level system can rightfully claim to that it receives messages "exactly once"—from the low-level system.


> So, in essence, you can never have "I will get this exactly once", and at best you can only ever have "I will have a plan for what to do if I get this more than once".

I dont understand this at all. TCP has exactly once message delivery that the application layer is completely unaware of...


TCP isn't guaranteed delivery if the receiver crashes; that's what queues and similar persistent systems are introduced to solve, but at-least-once is what you normally get outside of specialized two-phase (write then commit later) idempotent (so writing a second time is OK if the first one was never committed since your producer died and restarted from an earlier position, say) systems, AFAIK.


In theory you could simply wait with TCP ACK until you had committed to your ACID database and then TCP would in fact be exactly once.

I guess hardly anyone does this though.


This is something I don't understand with the common idiom.

You need some processing for anything on the network, either you accept it or not. Yet, network protocols are described by the behavior they export to their consumers, not by their internals. Well, with the single exception of exactly-once delivery.


> it cannot abstracted away into a separate system.

It absolutely can. That's the entire point of TCP.


TCP can't do exactly once in every scenario without fail. It does a pretty good job, but failure in the edge cases is the whole point of the difficulty we're looking at.

If the network loses the last packet in a TCP stream, then goes down for an indefinite period, the two ends of the connection have irreconcilable differences of understanding about whether the entire transmission was received. As far as the recipient is concerned, they have an entire message that's fully ack'd and so they should process it. As far as the sender is concerned, they have no ack for the last chunk of data and must redeliver it. This is the crux of the Byzantine Generals problem.

TCP can solve problems that happen in its own domain, and give reliable in-order delivery once over an unreliable network up to a point. It's not able to provide exactly once semantics in all scenarios though. Because that's logically impossible.


At the same time, don't trust any network middleware not to break your expectations on TCP behavior... also don't trust that middleware will be visible and you'll know what's going wrong.


The article is saying a message queue service can't guarantee exactly once delivery and your comment is saying it's possible for a message queue consumer to handle duplicate delivery. Those are different things.


> Ok so maybe the letter gets sent more than once but the message gets sent exactly once, because the messages are numbered and you only process each number once.

Isn't that just shifting the goal a bit? Now the trick is "only process each number once" which seems to have its own transactional issues if you can crash between "taking action based on the message" and "recording that this number has been seen"? If you need non-idempotent actions wouldn't this still be a potential issue?


Why not mention the architecture that comes closest to exactly-once delivery? If you store a Kafka offset along with your application state in a transactional datastore, then for all intents and purposes you have exactly-once delivery semantics. This is something I really like about Kafka’s design.


This is again assuming that you have no side effects. Imagine that you want to email users based on a list in Kafka. You read the offset in a transaction and update it. But do you send the email inside the transaction or after closing it? You are back to picking between at-least-once and at-most-once.


Just put the offset in the email to users and make them keep track of it??


Move the problem to another system does not solve the problem. Someone is necessarily dealing with it.


Exactly - the user could die after opening the email but before reading the number!


You got some good jokes as answers already, but what you do if you care about correctness is to store a “send email job” in your transactional data store.


What transactional data store capable of sending email do you recommend?


Just about any transactional datastore will do. Just google for “job queue in [insert name of datastore here]”.


You've missed the point: How do you tie sending email to the transactional semantics? Your email server and your SQL server (or whatever) make just as much of a distributed system as Kafka and the email server without some other store in the middle. If you want to solve it really, your email server itself needs to participate in the consensus process in some way.


I suspect it's not mentioned because in the real world there are ways to work around this limitation. And because those work arounds actually work, there's nothing interesting to say. The post is much ado about nothing.


Yes, it’s always possible to build a big pile of workarounds. I agree.


Discussed at the time:

You Cannot Have Exactly-Once Delivery - https://news.ycombinator.com/item?id=9266725 - March 2015 (55 comments)


So evidently the article did fail exactly-once delivery.


Fail delivery of what ?


Delivery to HN.


I decided a long time ago in 3+Mail that we could occasionally have messages delivered twice, or not at all, but there was no easy way to be sure neither ever happened. So you bias it to "twice."



I'd tell you an exactly once delivery joke but you may not get it.


I’d tell you an idempotency joke but I think you might have heard it before…


I'm sure it won't make a difference


You can go ahead. It won't modify the state-of-having-heard-it.


"Now listen very carefully, I shall say this only once."

I suspect that quote will meet Resistance.


I was completely naive to distributed systems until I was field promoted to owning one after tons of attrition with no backfilling roles.

It was a system built by people that also didn't have distributed system experiences. It was not enjoyable at all, and at least once delivery was a consistent headache that required infrequent but time consuming remediation.


"was field promoted"....could you explain what you mean? To myself, it's always been a military term. As far as 386 pc's etc go....?


It literally just means internally promoted


There is a certain nuance of “in war time” (as opposed to school time).

It's the case here, but isn't automatic.


"You Cannot Have Exactly-Once Delivery" - 8 years ago | 55 comments

https://news.ycombinator.com/item?id=9266725


I have been toying with the idea lately of using a transactional database (like SQL) to manage some of the very important queues.

Using a transaction to retrieve an item from the queue, and locking the row using "SELECT FOR UPDATE" and "SKIP LOCKED". Such that the row gets locked on read, and several workers can read from the table at the same time. Within the same transaction, other work is done, and everything gets committed to the database as a single atomic operation.

CockroachDB (a consensus/raft distributed database) recently added supported for SKIP LOCKED, but I still have yet to work on this idea.


I have worked on a system that took exaclty this approach for ~17 years. The database was Oracle, at the time we started 'SKIP LOCKED' was not even a documented feature of the Oracle DBMS. It is now. The approach worked quite well for us and happily working today at several large banks. Also, Oracle sells what I think they call AMQ (Advanced Message Queing) that provides a messaging API but uses the DBMS for storage. No idea how it performs relative to dedicated persistent messaging solutions, but I would guess that it probably good enough for many workloads.


It can definitely work. It is a pretty good way to avoid the distributed systems problem by avoiding having a distributed system.

Of course there are other concerns with using a database as a queue (mostly at high throughput) but for most cases it will work well.


Using a database as a queue. What could possibly go wrong?

I guess everyone has to make this mistake once in their career.

Funny enough, when I searched for “database as a queue”, my own comment from four years ago came up as the fourth result.

https://news.ycombinator.com/item?id=18774559


> Therefore consumer applications will need to perform deduplication or handle incoming messages in an idempotent manner.

> The way we achieve exactly-once delivery in practice is by faking it. Either the messages themselves should be idempotent, meaning they can be applied more than once without adverse effects, or we remove the need for idempotency through deduplication.

Doesn't have to be a "true" exactly-once. Just practically so. It's like saying "humans can't actually fly... they are faking it by using machines".


This article reads like someone who has a very superficial understanding of the theory he/she claims proves his/her point. For starters, the two generals problem does not prevent one party knowing a message they sent previously was delivered exactly once. It just prevents both parties knowing about some common knowledge in the presence of message loss. Not that I am claiming to be any better!


Funny how the author later worked on NATS [1] which supports exactly-once semantics [2].

--

[1] https://nats.io/

[2] https://docs.nats.io/using-nats/developer/develop_jetstream/...


Of course you can have exactly-once delivery. I mean... we know how to construct software that will transfer money from one account to another, exactly once. It really isn't rocket science but it does require a little bit of understanding of various tradeoffs that you are making.

It is a bit like saying that we can't have straight lines. Of course, if you zoom in far enough to see individual atoms, every physical surface will look jagged. But in practical terms we can have straight lines and surfaces to a good enough approximation. It means specifying what "straight" means and figuring out how to measure it and how to produce "straight" according to specification and measurements.

Engineering is about knowing and making tradeoffs. Every device we have ever created has to contend with limitations of physical existence. Engineering is about accomplishing goals in presence of those limitations.

A person who says "you can't deliver a message exactly once" clearly lives in an idealised, theoretical world. I would urge to leave your ivory tower for a second and see how engineers in real world accomplish what you say is not possible.

I get that this knowledge is useful -- but don't publish it as gospel. "You cannot have exactly-once delivery" is true, but not the same kind of truth as "you can't travel faster than light". No engineering can get you to travel faster than light. But engineering can get you as close to exactly-once delivery as you want to the point where the original statement stops being meaningful for real life problems.


I don't understand where the anger comes from, the article makes it clear it's talking about distributed systems theory.

Like someone else said, you can use at least once delivery and handle duplicate messages, but that's not quite the same as a distributed system guaranteeing that a message will be delivered exactly once.


> I mean... we know how to construct software that will transfer money from one account to another, exactly once.

Assuming you are talking about transferring between institutions, there is actually no single piece of software with this responsibility. The business processes are effectively what provide these guarantees (typically by way of another 3rd party).

In order to accomplish this, added latency (settlement time) is necessarily introduced into the process.


.... they said, misunderstanding the article entirely.

This isn't an opinion. This is a fact of distributed systems. An axiom, if you will.


there is no distributed consensus. however we can reach consensus with an _arbitrarily high_ probability by adding additional machinery. until the point where is equally likely that the earth will simply fall into the sun one day.

so while this is a hugely important result, it doesn't stop us from building useful systems.


Have you actually read the article? The title is just a summary and the author fully acknowledges that their argument is essentially based around edge cases, not that this in any way diminishes it for me.

It’s just an interesting piece of theorising.

I think your comment (particularly your 4th paragraph about ivory towers etc) comes across as overly harsh and a little aggressive.


You can’t do any transfer by any particular deadline if the network is down.

Assuming the network will recover in time may be a reasonable assumption sometimes, though.


If "the network might be down arbitrarily long" is a possibility, you cannot have at least once delivery, either.


technically you can, as at least once delivery is satisfied by not delivering :)


You can get exactly-once in a system if you design for consistency (in the CAP sense) and use a consensus protocol. those systems don't offer availability (in the CAP sense) by definition. and I guess when people say exactly-once is impossible they're speaking about systems that offer availability.


No, you can't get exactly once guarantees, in any system. This is an axiom. There will always be a failure mode somewhere that puts you in either at least once or at most once mode.

This has nothing to do with CAP.


Think you may want to read about what an axiom is.

An axiom, by the very definition, cannot be proven.

Since the mathematical impossibility of exactly once delivery can be proven (there exists a proof), it means it is not an axiom.

Now, a mathematical proof is different from physical reality. In mathematics 10^(10^(10^10)) is different from infinity while in reality it is not. In mathematics you can halve distance between you and a point every second and you will never reach it while in physical world you will reach it in less than a minute. In mathematics you can win any lottery however low your win chance is by just trying it an infinite number of times. In real world you can't.

An anecdote (funny regardless of whether true) says that CIA cryptographers created once an unbreakable encryption scheme using one time pad. One time pad is mathematically proven to be impossible to break.

USSR cryptographers promptly learned to decipher all cryptograms.

Apparently, when combining plaintext and one time pad the machine would generate different voltages depending on the bit used in the pad, so rather than output 0V for false and 1V for true, the output became 0.9V or 1.1V for true and 0V or 0.1V for false depending on what bit was used in the pad.

This shows how idealised world is different from engineering reality and forgetting about it can lead to large errors in judgment.

The same kind of errors in judgment as people saying "you cannot have exactly-once delivery".

Programs work in real world and not in idealised mathematical space. In real world, exactly once delivery is a solved problem for any practical purpose. Which is evidenced by all those systems that actually do, in fact, provide exactly once delivery.


What is this guy smoking? Send a UDP packet from one node and receive it on another. Exactly-once delivery.

If what he really means is "guaranteed to be delivered exactly once", then, yeah, no, of course not, because if your network goes down, you can't send anything.

Over a network.

You can, of course, just have one big fat machine and have two nodes in the machine and send the packet from the one node to another in the one big giant machine. There's no "network" to go down, unless somebody slices into the machine with a red-hot katana.

You can also have a fully-connected topology where every machine is hard-wired into every other machine, and every machine will route for every other machine. There's no real "network" to go down; as long as there are 2 hosts that aren't dead, a message from one will get to another.

What I find amusing is that well before you get into truly hard distributed system problems, for a sufficiently complex implementation/application, your apps will be riddled with so many bugs and so many operational problems that the thing is going to fall over way before a well-built network goes down.


I'd tell you a joke about UDP, but you might not get it


I have a wifi repeater/range extender. I call it repeater, because often when pinging with it, I get duplicated packets. I know, I know, it's ICMP, not UDP, but yeah, don't tell me the joke. I may get tired of hearing it again and again.


I mean isn’t it just the same as with a real world package/message. It just depends on how much effort you are willed to put into your desired delivery model?! … But guaranteed is nothing, never.


am i correct in assuming that all of this is true for non-distributed systems as well, just less likely?


the workaround is de-duplication & idempotency


This article spends a long time discussing idempotency but only mentions deduplication in passing.

Can’t you de-dupe by attaching a UUID to messages?


To a point. You can do that in your own system, but at some point down the line you interact with something that cannot or does not want to de-duplicate based on UUID, such as a different system, or a person. Then you have to choose between at-least-once or at-most-once.


This article is incorrect.

> The way we achieve exactly-once delivery in practice is by faking it. Either the messages themselves should be idempotent, meaning they can be applied more than once without adverse effects, or we remove the need for idempotency through deduplication.

The author is wildly overestimating the complexity of the solution by ignoring the existence of a very simple but critical concept: UUIDs

If the client generates and assigns a UUID to each message it sends, the receiver can easily check if a specific message was already received before by comparing it against previously received UUIDs and can discard duplicates.

The title of the article is misleading because exactly-once delivery is in fact possible by deduplication. The deduplication can occur automatically (without having to know any context about the message beyond its UUID); it can occur BEFORE the delivery of the message to the consumer... So the title and entire premise of the article is incorrect.

What the author should really be saying is that more than one copy of the same message may occasionally pass through the network due to failures... But they are misleading in suggesting that it's not possible to deliver exactly one copy or that deduplication is a herculean task when in fact it's trivial.


> the receiver can easily check if a specific message was already received

It seems like you're conflating exactly-once delivery with exactly-once processing. Your proposed solution is only necessary _because_ we can't have exactly-once delivery as TFA states.


I don't agree with that contrived definition of delivery. As a developer, what I care about is delivery of the message to my function/handler. E.g. so that I can store it in my local database/store on the receiver side exactly once and ultimately show it to the user exactly once. Definitely possible.


>> we remove the need for idempotency through deduplication.

> If the client generates and assigns a UUID to each message it sends, the receiver can easily check if a specific message was already received before by comparing it against previously received UUIDs and can discard duplicates.

You quoted the part of the article that suggests exactly your solution: deduplication.


Yes but the author suggests that it's difficult to perform or requires constructing messages in a complicated way which is not the case. Generating and attaching a UUID to a message is very simple, so is checking whether or not a UUID has already been encountered on the receiver side.

The author's definition of 'message delivery' is incorrect and so is the title which is clickbait.


Perhaps adding a UUID is too much overhead? If you have a very high throughput system with messages of less than 128 bits, then the addition of a UUID to every message would effectively cut throughput in half.

I do not have such a use case in mind, but the description above does not sound terribly unrealistic to me.


Also in order to track UUIDs that have been received, you would either need an infinite amount of memory, or have the system bounded in terms of how many messages out of order a message be delivered and still be processed correctly.


You can use a database. Practically, most systems will keep a record of each message on disk in any case (at least they will have some logs) so it's not that outrageous. Safe to assume that the receiver can discard non-authenticated messages to avoid spam.


It depends entirely on volume of messages vs available memory (ram or disk) how outrageous it is


You’re completely missing the point. Many multi master databases exist. Network partitions happen. Bugs in multi master databases during network partitions have happened and can happen. This is like distributed systems 101.

So given those facts, your design of using UUIDs is absolutely not full proof if the database where you stored your UUID is not in a consistent state.

Here’s some more useful info for your perusal (https://ucare.cs.uchicago.edu/pdf/socc14-cbs.pdf)

“ Data consistency means that all nodes or replicas agree on the same value of a data (or eventually agree in the context of eventual consistency). In reality, there are several cases (5%) where data consistency is violated and users get stale data or the system’s behavior becomes erratic. The root causes mainly come from logic bugs in operational protocols (43%), data races (29%) and failure handling problems (10%).1 Be- low we expand these problems.”


Your database does not have to be networked, it can be on-disk KV store like RocksDB, which is what things like Flink state backend use.


If all of your agents are seeing the UUID for the first time, they cannot deduplicate amongst themselves without quorum or shared state which has race conditions itself. It’s non trivial.


There's no need to. Different agents will have different copies of the same message, that's a given. If the sender ensures that a UUID is attached to a single specific message and never re-used (though they may retry to send the exact same message multiple times), then there is no risk that multiple receivers would have different versions of a message and therefore consensus is guaranteed without even having to talk to each other.

If you assume that the sender cannot be trusted to not reuse a single UUID for multiple messages (e.g. to trick the receiver), you can still work around that limitation by computing and comparing message hashes (e.g. sha256) on the receiver side... Don't even need UUIDs. Every time you receive a message, you can hash it and check if you're received a message with this hash before (storing the hash of each message as it is received). You can use the hash in place of a UUID though in this case you probably need to add some index to each message to ensure that each hash is unique over time (since a message with the exact same payload will be counted as the same message, even if broadcast a long time apart).


exactly once delivery is not satisfied if the same message is delivered to the recipient multiple times


It depends on your definition of 'recipient' and 'delivery'. If you assume that the receiver is a specific database instance (or shard) or a specific data store (on the receiver side), then it's totally possible to have exactly once delivery in the sense that each message sent would end up being inserted into the database/data store exactly once without duplicates.


but you can't make those assumptions, in general


Yeah, non trivial is correct.

A common Kafka approach is to partition by key, so that a given UUID will only be placed on one partition, and we're guaranteed that any further messages with that key will also be placed on that partition, so handled by the same consumer.

Then create a change-log topic that's co-partitioned by key with the input topic. And then funk around with partition assignment strategies so if consumer X is assigned partition 10 of the input topic, then it's also assigned partition 10 of the change-log topic.

Then add RocksDB as a fast state store on a persistent volume claim, as restoring state fresh from your change-log topic turns out to take about 7 minutes.

And then realise you've just reimplemented bits of Kafka Streams poorly.


But not impossible.


Depending on your ability to share state between these agents it could very well be impossible. It depends on other factors.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: