"With Storm, I distilled the realtime computation problem domain into a small set of abstractions: streams, spouts, bolts, and topologies. I devised a new algorithm for guaranteeing data processing that eliminated the need for intermediate message brokers, the part of our system that caused the most complexity and suffering."
As someone who has gone through the storm source in very fine detail, let me tell you how he did this. He hashed each tuple that needed to be processed then XOR'd it into a variable that started at a value of zero.
When the piece that needed to be processed was complete it would get XOR'd back into the variable. Once the variable hit 0 he knew everything was done! Pretty neat if you ask me.
Not quite. Every edge in the dependency tree is assigned a random 64-bit id, and when a tuple is acked it sends the xor of all the incoming and outgoing edges to the acker.
Random 64 bit ids are used in the process. So the probability of accidentally completing a tuple is very, very small (1 / 2^64 for every ack).
Counters don't work because of the asynchronous nature of Storm. For example, consider a topology that looks like this:
A -> B -> C
\-> D
Let's say A emits 2 tuples (+2 differential), B processed those and emits 2 to C and 3 to D (+3 differential), C processes 2 tuples (-2 differential), and D processes 3 tuples (-3 differential).
Everything's asynchronous, so the acker could receive the acks in this order: A, C, B, D
The counter would then look like this: 2, 0, 3, 0
So it would think that the tuple was complete before it actually was, which means the counter algorithm doesn't work.
"So the probability of accidentally completing a tuple is very, very small (1 / 2^64 for every ack)."
Which means, due to the birthday paradox, that you should expect your first collision around 2^32 tuples. Process a few million tuples a day (that's only about 25 per second) and you should expect your first error within your first year.
The birthday paradox doesn't apply. All that matters it the value at the time the ack is applied, so it's always 1/2^64 (because the xor of any number of random numbers is still random).
No, the birthday paradox very much applies. The chance of success for your first ack is (2^64-1)/2^64. The chance of success for your first and second ack is the chance of success for your first ack times the chance of success for your second ack, ((2^64-1)/2^64)^2. And so on for your third ack, and your ack.
By the time you reach your 2^32 ack, your chance of having all successes is ((2^64-1)/(2^64))^2^32, which is < 0.5.
EDIT: My test program seems to indicate that I'm wrong, but I can't see the flaw either in it or in my reasoning. Can you explain why the birthday paradox doesn't apply here?
EDIT': In the birthday paradox, it would be ((2^64-1)/2^64)) * ((2^64-2)/2^64) * ((2^64-3)/2^64), etc.
The birthday paradox concerns the chance of 2 random elements in an entire set being the same. There's no set here, so there's no birthday paradox. http://en.wikipedia.org/wiki/Birthday_problem
So the chance of success after 2^32 acks is >0.999999999, not <0.5.
It takes about 2^60 acks before there's a significant chance of a mistake. That's a lot of acks, so it will take an insanely long time for a mistake to be made.
The birthday paradox kicks in when you have a set of objects, and a collision between any pair is interesting. Here, only a collision with 0 is interesting.
Suppose that the sequence of values you have after each ack is A, B, C, D, E (five acks total). So long as A, B, C, and D are all nonzero, we're OK. With the birthday paradox, we'd be looking at A=B, A=C, A=D, A=E, B=C, B=D, etc. -- many more combinations.
As someone who has gone through the storm source in very fine detail, let me tell you how he did this. He hashed each tuple that needed to be processed then XOR'd it into a variable that started at a value of zero.
When the piece that needed to be processed was complete it would get XOR'd back into the variable. Once the variable hit 0 he knew everything was done! Pretty neat if you ask me.