Reply to Aphyr attack on Redis Sentinel

brown9-2 · on May 20, 2013

It's very refreshing to see here that "attack" is not used in the way that one might expect from just the headline, meaning "a possibly unwarranted criticism that I didn't like or found unfair, or that I am taking personally".

Legion · on May 20, 2013

I am endlessly impressed with how antirez responds to any critique of Redis that I've ever seen. He's always taken it as a positive, and looked for the truth in the critique, rather than searching for something to be wrong and try to discredit the critique.

My opinion of him and the Redis project increases further every time.

danso · on May 20, 2013

Really? I hadn't seen the original posts before clicking on the this one and I assumed this was some kind of security breach...I hadn't heard of Aphyr before but just assumed it was some kind of netsec (white or black hat) group. I actually skimmed the OP's first paragraphs several times because I didn't understand what was going on.

That said, I agree that DB reliability should be taken with the same rigor as net security...but I was kind of under the impression that it already was, in that DBs are pretty serious business. Also, "attack" has the connotation of, well, an "attack"...here, some of the failures happen in regular business operations, which is a problem different from when the system is under "attack".

But at least the OP took the criticism graciously. When I read what the case actually was, I then worried that the OP was having a bunker mentality.

pygy_ · on May 20, 2013

Tangentially related:

In the PostgreSQL evaluation[0], Aphyr noticed that, if a packet confirming a transaction is dropped, the client ends up in a deadlock.

Does PostgreSQL keep a record of the past transactions, and their success or failure. If so, is it possible to query it?

[0] http://aphyr.com/posts/282-call-me-maybe-postgres

aphyr · on May 20, 2013

Yes, you can recover from lost acknowledgements by asking for the transaction ID from postgres before committing--or by making up your own flake ID and writing it to a table. Given a queue with at-least-once delivery (which includes, say, durable storage on the client), you can check for the presence of that ID at a later time and re-apply the transaction to recover from network errors safely.

The transaction ID does wrap around, so there's a time limit depending on your transaction throughput. You can also ask for certain transactional properties on rows, though this won't allow you to recover in all (most?) cases.

fdr · on May 20, 2013

Database constraints usually catch these problems in event of re-submission, especially if the client can assign primary keys (e.g., a UUIDv4) a-priori, but this also tends to be true in simpler cases, too.

All in all, I am not sure if anyone should find this surprising: if anyone has ever had a network stall when clicking the 'confirm' button at a web-based store, they are familiar with the uncertainty as to whether the order has been submitted or not (resolved typically by browsing the history or waiting for an email, or no).

I would guess modern e-commerce vendors would send you a UUID or moral equivalent to de-dup cart resubmissions these days...but if not, it'd be interesting to know why not.

aphyr · on May 20, 2013

Correct; if your writes are idempotent, retrying is safe. I cover this in the post as well. My above comment shows that it's possible to recover consistency even for writes which are not idempotent--though depending on the semantics of your retries, there may be some locking required.

pygy_ · on May 20, 2013

Hey! I didn't expect you to chime in right here.

Thanks for the explanation.

praptak · on May 20, 2013

Yet more tangentially related: it is an instance of the Byzantine Agreement problem, which is unsolvable in general: no finite protocol guarantees consistent state in the presence of packet loss.

aphyr · on May 20, 2013

Yep, FLP applies here--but if a network works long enough to complete a round eventually, e3PC or similar can succeed. Pretty much all real-world networks do that. :)

Glyptodon · on May 20, 2013

Redis is one of those things I both love and love to hate.

I've had good results using Redis as a lock server, but I live in (perhaps misplaced) fear of a client hanging or crashing leaving a lock stranded. Not that this is really Redis's problem.

antirez · on May 20, 2013

Hello, you can easily mount a lock that auto-releases itself after some timeout using the new (2.6.13) extended SET command (see http://redis.io/commands/set) or simply a Lua script.

Glyptodon · on May 20, 2013

Since the jobs we're locking can have somewhat inconsistent times we're actually using an implementation where the tasks can get a lock with a time limit and can extend their lock so long as they still have it, so they do potentially auto-release.

Even given this, bad lock timing (not that likely) or a crash (more likely) could let inconsistency in.

Shrugs

Like I said, my problem is not really Redis's. If I can't trust everything that uses a lock not to crash 99.99% of the time I should really be looking at our jobs and not at Redis.

Even then, though, it's probably more a matter of me not trusting things than it is said things not actually being trustworthy.

rmaccloy · on May 20, 2013

We're about to open source a similar deal (redis-based "soft guarantee" mutexes) -- ours is written in Python and mostly used as a way to coordinate (very frequent) parallel task execution a la CountDownLatch, so 100% reliable exclusion in the face of failure isn't critical.

I'd be interested to hear about your implementation if you can share (email is HN username at gmail.com)

Glyptodon · on May 20, 2013

I sent you a rather quick email.

jamwt · on May 20, 2013

I recommend dreadlock:

https://github.com/jamwt/dreadlock

It will release the lock when the client dies (disclaimer: I wrote it).

Or you can go whole hog and use zookeeper + ephemeral nodes. More robust but quite a bit more complex.

nutmeg · on May 20, 2013

A response to this article on Redis: http://aphyr.com/posts/283-call-me-maybe-redis

keeran · on May 20, 2013

His continued use of "CP" confused me for a while, so TIL about CAP Theorem

http://en.wikipedia.org/wiki/CAP_theorem

krenoten · on May 20, 2013

If you have the time, this video by Basho's CTO will give you a much better understanding of the tradeoffs that are involved in distributed system design: http://www.infoq.com/presentations/Concurrency-Scale-Distrib...

A great alternative to thinking about things in terms of CAP that Justin brings up is harvest-yield, where yield is the probability of completing a request and harvest is the fraction of your data that the response actually represents. Here's the paper: http://lab.mscs.mu.edu/Dist2012/lectures/HarvestYield.pdf

bretthoerner · on May 20, 2013

And better: http://henryr.github.io/cap-faq/

lacksconfidence · on May 20, 2013

hmm, i'm not sure how it could be better worded, but since antirez already links to this, i had thought you were posting a response to antirez's comments

undoware · on May 20, 2013

I'm frustrated that when the HN editors deduped the original story, they apparently deleted ALL the instances, leaving only this one. I wanted to read the discussion on the subject of Aphyr's research, not Antirez' response.

It looks bad, HN. We all know that VMWare is litigious as (try looking up benchmarks sometime.) But to (presumably) cave so quickly and effortlessly suggests... well, I'm not sure.

The other possibility is that Aphyr yanked them himself, probably under duress (or else there'd just be an 'update' at the bottom of the research's page.) Aphyr, is this what happened? I figure you probably can't talk freely if so, but say something.

antirez · on May 20, 2013

Hello,

1) I no longer work for VMware, but Pivotal. Redis is open source and copyright is of the original guys that wrote the code: I, Pieter Noordhuis, other contributors.

2) I posted the link to the original article in the first very lines of my reply. Actually thanks to my reply the exposure the Aphyr research had about Redis is the greatest, compared to the other data stores mentioned. I publicly said thank you to Aphyr on Twitter, and posted its blog post.

So I really don't understand your theories here.

undoware · on May 20, 2013

Sorry, to clarify -- I was suggesting that it was possible that VMWare (a sponsor of Redis, correct?) leaned on someone. I didn't mean to besmirch you or redis, antirez, and I enjoyed your response.

It wouldn't be the first time a reputable news site was forced to bury a story by a litigious company. Sponsoring FOSS does not make any organization beyond doubt. Especially if they, say, have a history of suing anyone who benchmarks them.

andypiper · on May 20, 2013

As Antirez says, VMware were formerly a sponsor of Redis, and he now works for Pivotal (as do I), who are the current sponsor of the project. Either way, I'm highly skeptical that anyone at either company did such a thing.

undoware · on May 20, 2013

I have a background in ethics and law so I've seen too much to make apologies for being suspicious. :) In fact, this sort of suspicion is a good reason NOT to establish a track record of litigating away freedom of speech (as VMWare notoriously threatens to do if someone publishes their benchmarks). But again: nothing to do with Redis, if (as you say) VMWare is no longer a sponsor.

aphyr · on May 20, 2013

https://www.hnsearch.com/search#request/all&q=aphyr.com

HN stories on my original posts are still there, as far as I can tell. They just never hit frontpage.

antirez · on May 20, 2013

Aphyr, this is very lame, it's not common to see a work like what you did, and none of your stories hit the HN front page? I don't know what to think, but I hope that at least my post will help to show more people your awesome work.

hendzen · on May 20, 2013

I think Aphyr's series was a little too meaty for the general HN audience (of today).

Talking about things like the FLP impossibility result, CAP theorem and specifying protocols with TLA+ may be a bit over the heads of many HN readers - clearly, people would rather read stories about the latest funding round, acquisition or frontend UI framework than a substantive article on distributed systems.

pyre · on May 21, 2013

It's not fair to imply that these thing are over the heads of HN readers. There are plenty of smart people that just might not care about distributed systems enough to read through. Does my lack of reading medical journals speak to my ability to read/comprehend them?

undoware · on May 20, 2013

Yes, they did -- I saw them do so. There were several, in fact. And then they were gone. You've been robbed?

tptacek · on May 20, 2013

It is extremely unlikely that any pressure was put on the HN admins by VMWare or anyone else to get stories scrubbed. It's almost as unlikely that VMWare gives a shit about stories about Redis.

jacquesm · on May 20, 2013

It's not unlikely they got a bunch of (unjust) flags.

tptacek · on May 20, 2013

Based on pressure from VMWare? No, that's extraordinarily unlikely.

JulianMorrison · on May 20, 2013

RethinkDB people, how does your database compare?

contingencies · on May 21, 2013

DRBD? http://drbd.org/

aphyr · on May 22, 2013

Same limitations as any asynchronously replicated system; if both nodes diverge during a partition, you'll probably have to drop one's writes.

http://aphyr.com/posts/287-asynchronous-replication-with-fai...

contingencies · on May 22, 2013

Right. By operating at the block level it's a little more portable than most of the solutions discussed, though. Worth people's consideration, IMHO.

aphyr · on May 23, 2013

I'm inclined to think just the opposite. It's often possible to recover divergent data structures logically. Good luck doing that on an arbitrary block store.

contingencies · on May 23, 2013

My impression is that most DRBD setups are such that the backing volume is marked to recall which node last had 'master' (ie. write capacity), thus avoids this issue. However, to achieve this reliably it needs out of band STONITH (shoot-the-other-node-in-the-head), eg. IPMI.