Hacker News new | past | comments | ask | show | jobs | submit login

I agree with the other commenter. Eventual consistency has always been roughly a synonym for "tactical lack of consistency." The reason this works is that inconsistency is, in many business domains, not such a big deal as we make it out to be. Most business are used to data lagging behind, documents being filed incorrectly, decisions being changed and half of documents referring to the old decision, to mention just a few possibilities. As long as everything is dated and there are corroborating versions of all facts, this can be untangled by experts in the few cases it really matters. Most of the time, it doesn't matter that much.

Eventual consistency is embracing this philosophy of a lack of consistency for computer systems too, on the basis that maintaining actual consistency would be too expensive/complex/slow, which is frequently the case.

This of course, in principle, can lead to ever degrading consistency and since you can't assume everything is consistent, you also cannot really verify consistency in any other way than heuristically, as another commenter suggested.

Eventual consistency is a design driven by practical needs. It is never a path to reach complete data purity.

And this applies both to streaming and batch tasks alike.




> the basis that maintaining actual consistency would be too expensive/complex/slow, which is frequently the case.

Maintaining actual consistency is seldom more complex - the opposite is true, eventual consistency can lead to mind boggling complexity (because it's very hard to reason about your guarantees anymore... even the "eventual correctness" guarantee; in practice it's more often than not a handwavy "yeah, it's likely probably correct in many cases, and if you find something wrong, we'll take it as a bug and fix it. Or at least claim to fix it, because you know, it might be hard to reproduce". Good enough for usecases like advertising, I guess)

Too expensive/slow is the typical reason for eventual consistency - but the whole point of materialize.io is to challenge this "too expensive/slow" assumption.


> but the whole point of materialize.io is to challenge this "too expensive/slow" assumption.

how exactly is it challenging it. Spanner is too expensive.


By moving up the stack a bit (managing computation, rather than storage) we can provide consistency using techniques other than just using the guarantees provided by the storage itself. This isn't a new observation (Dryad/DryadLINQ made it wrt MapReduce/Hadoop, among other examples I'm sure) but that is where the trade-off lies.

If you instead implement low-latency systems where each step along a dataflow involves a round-trip through replicated highly available storage, Spanner if you like or even just Kafka, then 100% you might reasonably conclude that eventual consistency is the right call. This is roughly the situation that microservice implementors currently find themselves in. I don't think it is a great situation to be in, personally.

The value proposition with something like Materialize (and there are other options) is that you can get consistency and performance if you can express your computation as something more structured than imperative code that writes to and reads from storage. In our case, the "something" is SQL.

Hope that helps!


Hey - great work with materialize.io, I've always wanted to play more with it but life always got in the way so far :(

One question I have for you is whether it would be appropriate for processing where you need to iterate (think e.g. connected components in a graph, where you repeatedly broadcast the component ID to the neighboring nodes: can this be somehow done with materialize's version of SQL? You can of course do looping with timely - but, how do you do that with SQL?


In SQL you would most likely be directed to use `WITH RECURSIVE`, which is something we plan to do, but not yet.

It can be a bit gross to use WITH RECURSIVE, because there are often some constraints on the types of queries you can express (e.g. that the recursive body must conclude with a UNION/UNION ALL with some base case). Differential dataflow doesn't have that requirement, but we'll have to sort out whether we'll remove that requirement for Materialize, or impose the traditional constraints. There is a Chesterton's fence moment to have first.

Whether it ends up being "appropriate" or not will be a great thing to determine. I anticipate eating a lot of crow when it turns out to be lots slower than bespoke graph processors. :)

edit: Thanks, btw!


enterprise is rapidly approaching a data quality crisis where they have all these data warehouses but the final analytic artifacts end up being garbage and unusable for data science ... you will be hearing a lot more of this in the 2020s


A lot of this isn't related to data processing tools at all, but is a sort of downstream affect of the predominant "bugs are cheap" mentality of today.

The less guarantees of correctness on your daily/weekly/whatever releases, the messier your downstream data is gonna be. Monday's data is partially missing due to a bug in the client; Tuesday's data is weird/nonrepresentative because of a server bug that caused 5% of sessions to get disconnected; Wednesday's data is good; Thursday's data is good but was a release day and the feature changed so it means different stuff...


I'd argue that is a completely orthogonal problem. Business have extracted useful metrics out of their "eventually consistent" operations ever since operational research was invented.

That companies have collected more data than they can pay for processing of is a separate issue, I think.


I don't think that has as much to do with eventual consistency as with the old school system design of "the UI is a database editor, here are your plaintext fields" that still permeates a lot of businesses today.


If the term "asynchronous consistency" was adopted, I wonder if people would grok it easier.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: