FaunaDB 2.5.4

evanweaver · on March 5, 2019

So excited to see this; this is the culmination of three months of very hard work by both teams.

FaunaDB 2.5 passed the core linearizability tests for multi-partition transactions immediately. To my knowledge no other distributed system has done this. Zookeeper was the strongest candidate on initial testing in the early days, but it does not offer multiple partitions at all, as discussed in the FaunaDB report. And Jepsen itself was much less comprehensive at the time.

All other issues affecting correctness were fixed in the course of the analysis, and FaunaDB 2.6 is now available with the improvements.

We're happy to answer questions along with @aphyr. Our blog post is here: https://fauna.com/blog/faunadbs-official-jepsen-results

AtlasBarfed · on March 5, 2019

I greatly appreciate any distributed system that subjects itself to jepsen. It shows real commitment to honesty and a genuine desire to improve a db tech.

One thing that seems to fall by the wayside frequently is a followup. Often the problems found are declared solved after a couple point releases in a blog. Elasticsearch did this to my disappointment, and so did Cassandra.

Reading your stuff, it appears you scale with complete replicas of a database on individual nodes. Is that still true?

evanweaver · on March 5, 2019

> One thing that seems to fall by the wayside frequently is a followup.

Yeah, I agree. We worked very hard to fix all major issues during the evaluation period so that Kyle could test the fixes himself, and we are a planning a formal followup on the remaining items and some planned improvements as well, once we are ready.

freels · on March 5, 2019

No, in FaunaDB, a replica is a group of nodes, within which each node only stores a part of the dataset. One thing that's different from other systems is we make replicas a first-class operational unit to think about.

The cluster topology which was tested was a 3 replica cluster with 3 nodes each. Each node contained 1/3 of the dataset.

judofyr · on March 5, 2019

Congratulations; this looks like some seriously nice results! I haven't used FaunaDB, but it's nice seeing your name around. I still remember your personal blog (blue/grey color; with flowers, right?) fondly from way back, and I believe also "Fauna" existed back then, but it was more an umbrella project for your open source work?

Anyway, FaunaDB looks like a well-engineered product in a highly competitive space and I hope you will succeed :)

lobo_tuerto · on March 5, 2019

What programming language did you use to develop FaunaDB?

northstar702 · on March 5, 2019

Scala/Java

rubyn00bie · on March 5, 2019

How does Fauna compare to FoundationDB?

ryanworl · on March 5, 2019

FoundationDB assumes your clients are located close by in latency terms. It is not a geographically distributed database. You need to contact a machine to get a timestamp before you can read. You can build a layer with non-interactive transactions that are sent over the network from anywhere in the world, which may provide acceptable latency for some applications, but the data is only located in one region.

One could build a Calvin-style system on top of FDB to provide durability, cluster membership, and fault tolerance for data at a single site.

jatsign · on March 5, 2019

I hadn't heard of Fauna before. What's the use case?

Looks like it's not open source, and the pricing isn't very clear if I want to host it locally. The "Download" page requires you to provide your contact info first.

Why should I go through all those hoops?

jchrisa · on March 5, 2019

TLDR: FaunaDB is a distributed database that offers multi-partition ACID transactions and is optimized for cloud / multi-cloud deployment. This is a big deal because there aren't many other options with this level of data integrity at worldwide scale.

Use cases include financial services, retail, user identity, game world state, etc. Basically anything you'd put in an operational database.

In addition to the download (free for one node), you can get started with FaunaDB Serverless Cloud for free in moments. It's a fully managed version that is used in production by some major sites and many smaller apps.

jatsign · on March 5, 2019

Sorry, I'm not trying to be snarky, but that sounds like "it's good for everything" and doesn't help me determine when I should look at Fauna and when I shouldn't.

Perhaps alternately - when is Fauna NOT a good choice?

aphyr · on March 5, 2019

A few other points to consider!

FaunaDB is a unique database: its architecture offers linear scalability for transaction throughput (limited, of course, by contention on common records and predicates). That sets it aside from databases which use a single coordinator for all, or just cross-shard, transactions, like Datomic and VoltDB, respectively.

It's also intended for geographic replication, which is a scenario many databases don't try to handle with the same degree of transactional safety--lots of folks support, say, snapshot isolation or linearizability inside one DC, but between DCs all bets are off.

FaunaDB also does not depend on clocks for safety, which sets it aside from other georeplicated databases that assume clocks are well-synchronized. Theoretically, relying on clocks can make you faster, so those kinds of systems might outperform Fauna--at the potential cost of safety issues when clocks misbehave. In practice, there are lots of factors that affect performance, and it's easy to introduce extra round trips into a protocol which could theoretically be faster. There's a lot of room for optimization! I can't speak very well to performance numbers, because Jepsen isn't designed as a performance benchmark, and its workloads are intentionally pathological, with lots of contention.

One of the things you lose with FaunaDB's architecture is interactive transactions--as in VoltDB, you submit transactions all at once. That means FaunaDB can make some optimizations that interactive session transactions can't do! But it also means you have to fit your transactional logic into FaunaDB's query language. The query language is (IMO) really expressive, but if you needed to do, e.g. a complex string-munging operation inside a transaction to decide what to do, it might not be expressible in a single FaunaDB transaction; you might have to say, read data, make the decision in your application, then execute a second CaS transaction to update state.

FaunaDB's not totally available--it uses unanimous agreement over majority quorums, which means some kinds of crashes or partitions can pause progress. If you're looking for a super low latency CRDT-style database where every node can commit even if totally isolated, it's not the right fit. It is a good fit if you need snapshot isolation to strict serializability, which are strong consistency models!

justinsaccount · on March 5, 2019

> If you're looking for a super low latency CRDT-style database where every node can commit even if totally isolated, it's not the right fit.

What is actually a good for this lately?

aphyr · on March 5, 2019

Gosh, I wish I had a good answer. Riak (and Riak-DT) were the best I knew of in this regard, but Basho folded and I haven't kept track of the level of community/corporate support available. Not sure where Riak stands these days. There are a bunch of prototype databases experimenting with CRDTs and other eventually-consistent mechanisms, but I'm not sure which ones have garnered significant traction in the industry yet.

_russelldb · on March 6, 2019

It's extremely rewarding to read what you've written there, thank you!

It's true. The community is keeping Riak alive, and pushing it forward. The CRDT stuff has not been worked on for a long time, despite there being numerous tabled improvements in that area SINCE 2014. It was neglected by the "new" management that crashed Basho into the ground as being "complicated computer science nonsense". Since then I've been unable to do the CRDT work for Riak I wanted, due to lack of time, and no one willing to pay for it (despite numerous bait-and-switch offers for completing the bigsets work.)

The CRDT work in riak needs: 1. the smaller/more efficient map merging 2. some new types from Sam merging (range reg etc) 3. the bigsets work completing and integrating (this enables delta-CRDTs) 4. big maps (built on the above work)

When it has all that it would be at the place I'd envisioned for it, before everything went super bad at Basho. I work at Joyent now, so can't dedicate any time to taking the CRDT work forward, though I really wish I could. I still have a deep interest in the EC datatypes world.

IMO, at the time it was released (2014) Riak had ground breaking support for convergent datatypes, since then it has only lost ground. Am I bitter? Yes, a little.

WRT other systems with CRDT support, https://www.antidotedb.eu/ came out of the syncfree research project. Chistopher Meiklejohn's work is industrial grade, state of the art, actively developed, and highly recommended. https://github.com/lasp-lang As others have mentioned, REDIS also uses CRDTs for MDC replication, and has CRDT co-inventor Carlos Baquero on the tech board.

NickM · on March 6, 2019

Riak still seems to have a decently active community; all of the formerly commercial-only features have been open sourced, and there have been multiple community releases since the demise of Basho, with another one coming out soon. I've seen a fair amount of activity on the riak-users mailing list lately too, and there is also paid support available via a couple different companies last I checked.

ddorian43 · on March 6, 2019

Redis labs has crdt support on top of redis

dsergeyev · on March 5, 2019

Hey I am curious what does it use in place clocks for safety and quorum? Is there a paper

aphyr · on March 5, 2019

Yes, there IS a paper! Check out Calvin: Fast Distributed Transactions for Partitioned Database Systems. FaunaDB uses Calvin with some changes to allow more types of transactions, and to improve fault tolerance. :)

http://cs.yale.edu/homes/thomson/publications/calvin-sigmod1...

evanweaver · on March 5, 2019

See our blog post as well: https://fauna.com/blog/consistency-without-clocks-faunadb-tr...

evanweaver · on March 5, 2019

Well, it is a general-purpose database with a document-relational programming model. It's a good choice when correctness, high availability, and flexible data modeling matter, like the core business objects of a service or app.

Currently it supports its own native DSL similar to LINQ or an ORM, and if you want compatibility with existing query languages from other databases, we will be rolling those out over time.

It's not a good choice for analytics or time series data which is redundant or aggregated, and doesn't need the high availability or performance overhead of transactional isolation.

0xbadcafebee · on March 5, 2019

I'm not trying to be snarky either, but your question was a bit like "What's the use case for pants?" The answer is, a lot of things. We would need to know what you're doing to tell you what it's good for, what kind to choose, and why.

zzzcpan · on March 5, 2019

In general consensus based systems are not good when latency and performance actually matter and in real world WAN setups, where connectivity issues are pretty much constant. These are fundamental limitations they can't fix.

freels · on March 5, 2019

Unless you have specifics, I’m not sure this comment stands up to reality. Modern networks are pretty reliable in my experience. The state of the art in consensus reduces learning to one round trip. Calvin further eliminates all but one global RT in distributed transaction commits.

Additionally, FaunaDB gives you the tools to work within a global environment with _safe tradeoffs_. For example, reads at SI/Serializable can be fast and uncoordinated. You choose per-query.

zzzcpan · on March 5, 2019

Well, networks are not that reliable. But you don't have to believe me, there is enough of public information about real world operations. Take Aurora paper for example, where they can't even do 2/3 quorum, but do 4/6 quorum instead and that's between datacenters they completely control with connectivity they more or less control.

aphyr · on March 5, 2019

You're not wrong: networks in general aren't super reliable, and partitions are a real problem! Peter Bailis and I wrote a paper on this in ACM Queue. I've spent much of my professional life exploring the consequences of network failure on distributed systems.

That said, I don't think it's reasonable to infer that because failures occur more often than we'd like, systems based on consensus are always inappropriate for latency and performance sensitive systems. While there are minimum bounds on latency in consensus systems, there are also plenty of human-scale problems for which that bound isn't a dealbreaker. Moreover, some types of computation (e.g. updates in sequentially consistent or snapshot isolated systems) simply can't be performed without paying the same costs as consensus. Consensus can be an appropriate, and in some cases the only possible solution, to some problems.

zzzcpan · on March 5, 2019

I didn't infer they are always inappropriate, just that they are not good for latency and performance sensitive systems, i.e. they shouldn't be considered unless absolutely necessary.

northstar702 · on March 5, 2019

Would you give some more insight into your opinion? What latency/performance are you targeting -- have you tried using a system like this to arrive at your conclusion?

freels · on March 5, 2019

Just because failure can happen doesn't mean it's frequent. For the record AWS does this to protect against correlated failure. At their scale, they assume they have some percentage of local nodes down at any given time, so have designed the system to tolerate both network partitions and local failure at the same time.

zzzcpan · on March 5, 2019

They did it for stable performance, because failures are in fact frequent. Things are much worse once you go out into the public internet, where different hosting providers communicate over all kinds of networks with all kinds of issues.

evanweaver · on March 5, 2019

FaunaDB can tolerate the loss or partition of a minority of replicas (datacenters, AZs, or racks) in a cluster, so if you want to tolerate more concurrent failures, just add more replicas.

For example, a 7 replica FaunaDB cluster can lose 3 replicas and maintain availability for reads and writes; better than Aurora in your example.

twic · on March 5, 2019

Would it be fair to say that if i am considering Gemfire/Geode, i should also consider Fauna?

evanweaver · on March 5, 2019

Yes, for sure. They are both distributed object stores and share inspiration from the Linda/tuplespace lineage, as well as relational and document databases in FaunaDB's case.

FaunaDB supports function execution in the database, if the functions are written in FQL. It has a much more powerful index system and consistency model than Gemstone/Gemfire/Geode and is designed for synchronous global replication.

However, unlike Geode it is not an in-memory system, so it is not appropriate to use as a cache.

metatype · on March 11, 2019

As noted in the writeup, the FaunaDB transaction model uses a unique query expression language. Geode applies a more familiar begin + <operations> + commit / abort approach.

Geode skews towards eventual consistency when connecting geographically dispersed clusters. This means you still get super fast local transactions with batched updates to remote sites.

wmsiler · on March 5, 2019

From the post, FaunaDB initially had several issues, which they've generally resolved. Jepsen is open source, so I'm curious why a database company wouldn't run Jepsen internally, work out as many problems as they can, and then engage aphyr in order to get the official thumbs up. Given how important data integrity is, I would assume that any database company would be running Jepsen (or something equivalent) regularly in-house. If they are doing that, then how is it that aphyr finds so many previously unknown issues? And if they aren't running Jepsen in-house, why not?

evanweaver · on March 5, 2019

This is a very good question, and to a substantial degree, this is what we did. We have internal QA systems that overlap Jepsen that catch most issues. We also ran our own Jepsen tests on the core product properties last year, fixed some issues and identified others, and reported the results on our blog.

However, correctness testing is fundamentally adversarial, like security penetration testing. Building a database is not easy, and testing a database is not easy either. It is a separate skill set, as anomalies that lingered for decades in other databases reveal. The engagement with the Jepsen team is explicitly designed to explore the entire product surface area for faults, not to apply Jepsen as it currently stands. Thus, a lot of custom work ensued on both sides to make sure that the database was both properly testable, and properly tested. The result of that work is what you see in the report.

The typical Jepsen report implicates not just implementation bugs, but the entire architecture of the system itself. Jepsen usually identifies anomalies that cannot be prevented even with a perfect implementation, which didn't happen here.

Some vendors restrict their engagement with the Jepsen team to only what they have tested themselves already, although those tests are not always valid. This was not our mindset—we wanted to improve our database by taking advantage of Kyle’s expertise, not present a superficially perfect report that failed to actually exercise the potential faults of the system.

aphyr · on March 5, 2019

To follow up on this a little bit--many of my clients do their own Jepsen testing, or have analogous tests using their own testing toolkit. When they engage me, the early part of my work is reviewing their existing tests, looking for problems, and then expanding and elaborating on those tests to find new issues in the DB.

Companies are finding bugs using Jepsen internally, which is great! But when they hire me, I'm usually able to find new behaviors. Some of that is exploring the concurrency and scheduling state space, some of it is reviewing code and looking for places where tests would fail to identify anomalies, some of it is designing new workloads or failure modes, and some is reading the histories and graphs, and using intuition to guide my search. I've been at this for five years now (gosh) and built up a good deal of expertise, and coming at a system with an outsider's perspective, and a different "box of tools", helps me explore the system in a different way.

I do work with my clients to determine what they'd like to focus on, and how much time I can give, but by and large, my clients let me guide the process, and I think the Jepsen analyses I've published are reasonably independent. If there's something I think would be useful to test, and we don't have time or the client isn't interested in exploring it, I note it in the future work section of the writeup.

It's not like clients are saying "please stick ONLY to these tests, we want a positive result." One of the things I love about my job is how much the vendors I work with care about fixing bugs and doing right by their users, and I love that I get to help them with that process. :)

wmsiler · on March 5, 2019

Thanks for the response. That all makes sense. I assume the FaunaDB devs would have already tested and fixed all the scenarios they could come up with, so it's reasonable you'd want an outside party to come up with even more scenarios to examine.

asien · on March 5, 2019

Tried Fauna once with their « Cloud » versions.

I was absolutely shocked by the poor performance of the service.

In my case I prototype some simple CRUD queries with NodeJS ,within the same datacenter region.

Insert took well over a second to complete and reading a simple document with one field took also half a second.

I was also unable to make « join » between document because how complex their query language is and their support basically encouraged me not to use « join » but to use « aggregate » like mongo ... Why offer this feature if I can’t use it ?

Has it changed since then ? It seems very clear for me that Fauna is entirely focused on Enterprises customers ( after all this is where the money is ) the cloud version seem to be just a gimmick.

evanweaver · on March 5, 2019

Hmm, when did you try it? FaunaDB and the cloud service have changed a lot in the last few years, and performance is always improving.

Typical write latencies in Cloud are in the 100ms range because the data is globally replicated. Typical read latencies are the 1ms-10ms range, because global coordination is not required, discounting the additional latency from the client to the closest datacenter.

If you experienced something worse than that recently, maybe there is some other issue going on.

asien · on March 5, 2019

I will give it a second shot then.

But just to confirm , is doing « join » something that is still not recommended ? Aggregation is tedious and lead to queries that are difficult to read and maintain.

freels · on March 5, 2019

It's definitely recommended, depending on what you're trying to get out of the result. Happy to help out with query patterns on our community slack https://publicslack.com/slacks/fauna-community/invites/new

georgewfraser · on March 5, 2019

Fauna’s writeup heavily emphasizes the fact that it doesn’t rely on atomic clocks. My understanding is that both AWS and GCP use atomic clock based timekeepers since 2017, so it’s not like this is some exotic technology.

The primary advantage described in the Calvin papers is that it’s the only distributed transaction protocol that can handle high contention workloads. But Fauna never seems to bring this up. Does that mean that Fauna’s current implementation isn’t fast under contention?

freels · on March 5, 2019

It does handle contention well, we haven't emphasized that point well enough yet. Writes never contend on conflicting reads, and serialized reads never contend at all.

Accurate clocks are not enough... to really get the benefits that Spanner alone enjoys, you have to have a TrueTime equivalent service available, and it has to be rock solid. As well once your system is sensitive to clock skews in the milliseconds, you start having to care about things like the leap-second policies of your clock sources. All in all, the resiliency tradeoffs are a significant downside to relying on clock synchronization, which is why we did not pursue a transaction protocol dependent on it.

georgewfraser · on March 5, 2019

My understanding is that AWS TimeSync and the current GCP time daemon implementation are the equivalent of TrueTime, is that incorrect?

I guess what I’m saying is it seems like Fauna is using atomic clocks as FUD against Spanner and CockroachDB, when they aren’t really a problem. Based on my reading of the Calvin paper, the main advantage of Calvin style systems is higher throughout under contention. But for some reason the Fauna marketing team has chosen not to emphasize that, which makes me suspicious that maybe Fauna hasn’t yet realized that advantage in its implementation.

evanweaver · on March 5, 2019

If only this were the case! Unfortunately it is not practical to reliably maintain single-digit millisecond clock tolerances in the public cloud as an end user via NTP. The entire software/hardware/network stack has to be tightly controlled, not just the clock source itself. And atomic clocks across multiple cloud providers are not in sync with each other, either.

Thus, databases that rely on clock synchronization recommend configuring tolerance windows of 500ms and above, and cannot reliably detect if those windows have been violated. Additionally, this window affects latency for serializable local reads all the time, even if the clocks actually fine, because there is no way for the system to know.

georgewfraser · on March 5, 2019

AWS has GPS/atomic clocks in each datacenter that provide an accurate reference time. Recent linux distros use chrony instead of ntpd to synchronize with the reference time, which should introduce only microseconds of error between the reference time and the system clock.

Am I missing something? I am not an expert, I'm just not seeing where the 100s of ms of error is going to enter this system.

(edit: thanks for the great explanation aesipp!)

aseipp · on March 5, 2019

I spent some time looking at TimeSync and my primary takeaway was simply that while it was nice, there's no actual hard numbers on how accurate it really is. I suspect it is very accurate but proving this (to yourself) is going to be challenging if you want to rely on global clocks to avoid consensus without details or insight. You are essentially making a huge bet on performance considerations by trading consensus for clocks, at the expense of a far, far higher bar for correctness.

It seems very likely based on when it was rolled out that it underpins AWS tech like DynamoDB Global Tables -- so it almost certainly powers critical infrastructure. But there's no SLA or reports on what the tolerances you can expect without doing a lot of work on your own. It's more of a nice bonus rather than a "product" they offer you, in that sense, so being wary maybe isn't unwarranted.

IIRC from the original Spanner/TT paper, they had a general error window of ~10ms from the TT daemons, and I would be extremely surprised if Google hasn't pushed that even lower, now, so your job is much more cut out for you than 100s of ms of error. And yes the clocks are in the same DC at a very precise window, but bugs happen through-out the stack, your hypervisor bugs out, systems get misconfigured, whatever, your process will fuzz out, especially as you begin to tighten things. You don't have the QA/testing of Spanner or DynamoDB, basically.

None of this is insurmountable, I think, though. It's just not easy any way you cut it. Even a few people doing the work to test and experiment with this would be very valuable. (It would be even better if AWS would make it a real product with real SLAs/numbers to back it up.) It's just a lot of work no matter what.

The fact that it is limited to AWS (for now) is a bit of a shame. I do hope other cloud providers start thinking about providing precise clocks in their datacenters, as well as accompanying software to go with it.

> Recent linux distros use chrony instead of ntpd to synchronize with the reference time, which should introduce only microseconds of error between the reference time and the system clock.

To be fair not everyone uses chrony; a lot of systems still use just ntpd or timesyncd (I spent a lot of time working on fixing time-sync related issues in our Linux distro lately across all our supported daemons, so I can at least say Chrony is a very wonderful choice, accurate, and so very easy to use! I actually found out about it when looking up TimeSync)

erik_seaberg · on March 6, 2019

If you need strongly consistent data, you must write to a global table in only one region, and then clocks don't matter because replication does not create conflicts.

If you can survive lost writes, clock skew just makes a zone win more or less often. Even if the clocks were in perfect sync, you still wouldn't observe causality across regions (changes to different items can replicate out of order).

etaioinshrdlu · on March 5, 2019

This actually reminds me a lot of how Ethereum transactions are represented as code as well.

Anyone else see a parallel there?

Seems like a good idea, overall. One annoying thing that affects pretty much every database with transactions is that the effort of retrying failed transactions is pushed onto the user, by necessity.

But if your transactions are airtight chunks of code... then the DB can retry them for you and provide a simpler interface to your app code.

cakoose · on March 6, 2019

> One annoying thing that affects pretty much every database with transactions is that the effort of retrying failed transactions is pushed onto the user, by necessity. > > But if your transactions are airtight chunks of code... then the DB can retry them for you and provide a simpler interface to your app code.

Building an FaunaDB query is more difficult than just writing session-based DB code.

But if you are willing to build FaunaDB queries, it should be strictly easier to write session-based DB code as "airtight" chunks that are easy to retry.

buremba · on March 5, 2019

Looks great but why did you decide to develop your own query language instead of just using SQL? Even no-sql transactional database solutions started to adopt SQL lately and learning a new language is not really easy for the application developers.

aphyr · on March 5, 2019

Ooh, I'll bite! For one, they're different data models. FaunaDB is a document store, so records are fundamentally trees, whereas SQL is oriented around processing tuples. FaunaDB records have queryable metadata; SQL rows (I think?) don't. You can extend SQL (look at JSON support in Postgres) to deal with arrays and maps as core datatypes, but a special-purpose language can be better suited to the purpose.

Second, FaunaDB's transactional model precludes interactive transactions, whereas SQL transactions are designed for interactive use. Imagine if every transaction was a stored procedure--that's the query structure you'd be looking at. It's certainly possible to do, but stored procedures are sort of an imperative language grafted on to the relational algebra of SQL, and support isn't as standardized as SQL's core.

Third, FaunaDB is a temporal store--you can ask for the state of any query at any point in time, and even mix temporal scopes in the same query expression. SQL doesn't have a first-class temporal model.

In general, using SQL offers advantages, including user familiarity, code reuse, and easier migration from other SQL stores. None of the things FaunaDB does are impossible to express in SQL, and they have been tackled by various DBs' extensions to SQL, but the familiarity+reuse advantages aren't as applicable once you start thinking about the distinct properties of FaunaDB's data model.

freels · on March 5, 2019

We are always trying to reduce the learning curve of FQL, but we've gotten a lot of good feedback from those who do take the plunge. That being said, I'd love to provide SQL support eventually, but there were a few reasons why we don't support it as the primary query language for FaunaDB:

Our core use-case is OLTP, and we wanted to address the shortcomings of SQL in this context, such as query performance being highly unpredictable in the presence or not of indexes, whims of the optimizer, etc.

SQL is just not great as an application-level interface: Tables are not a natural fit for many data models (the classic impedance mismatch problem), programmatic composition is difficult. We want to obviate the necessity of an ORM library on top Fauna.

SQL the language is not great for writing complex transactions in, and session transactions require a lot of back and forth between the client and database. We wanted to make it easy to write queries which can encode as much business logic as possible, hence FQL's semantics are a lot closer to a regular programming language.

twic · on March 5, 2019

FaunaDB uses Calvin, a transaction protocol developed by Daniel Abadi. Their blog post explains it nicely, after a bit of a slow start:

https://fauna.com/blog/consistency-without-clocks-faunadb-tr...

But in summary:

1. A 'transaction' is a self-contained blob of code which reads input, does deterministic logic, and writes output (so not like a traditional RDBMS transaction, where the application opens a transaction and then interleaves its own logic between reads and writes)

2. When a transaction arrives, the receiving node runs it, and captures the inputs it read, and the outputs it wrote

3. The transaction, with its captured inputs and outputs, is written to a global stream of transactions - this is the only point of synchronisation between the nodes

4. Each node reads the global stream, and writes each transaction into its persistent state; to do that, it repeats all the reads that the transaction did, and checks that they match the captured input - if so, the outputs are committed, and it not, the transaction is aborted, and retried

The key idea is that because the process is deterministic, the nodes can write transactions to disk independently without drifting out of sync.

It's pretty neat. And it's exactly what Abadi wrote about a couple of months ago:

http://dbmsmusings.blogspot.com/2019/01/its-time-to-move-on-...

This is also what VoltDB does (which Abadi worked on, along with Michael Stonebraker):

As an operational store, the VoltDB “operations” in question are actually full ACID transactions, with multiple rounds of reads, writes and conditional logic. If the system is going to run transactions to completion, one after another, disk latency isn’t the only stall that must be eliminated; it is also necessary to eliminate waiting on the user mid-transaction.

That means external transaction control is out – no stopping a transaction in the middle to make a network round-trip to the client for the next action. The team made a decision to move logic server-side and use stored procedures.

https://www.voltdb.com/product/data-architecture/no-wait-des...

It's also similar to, although categorically more sophisticated than, the idea of object prevalence, which is now so old and forgotten that i can't find any really good references, but:

Clients communicate with the prevalent system by executing transactions, which are implemented by a set of transaction classes. These are examples of the Command design pattern [Gamma 1995]. Transactions are written to a journal when they are executed. If the prevalent system crashes, its state can be recovered by reading the journal and executing the transactions again. [...] Replaying the journal must always give the same result, so transactions must be deterministic. Although clients can have a high degree of concurrency, the prevalent system is single-threaded, and transactions execute to completion.

https://web.archive.org/web/20170610140344/http://hillside.n...

drej · on March 5, 2019

I recommend first reading bits of the Jepsen report, because the company blog paints quite a different picture.

> We’re excited to report that FaunaDB has passed: > Additionally, it offers the highest possible level of correctness: > In consultation with Kyle, we’ve fixed many known issues and bugs

vs.

> However, queries involving indices, temporal queries, or event streams failed to live up to claimed guarantees. We found 19 issues in FaunaDB[.]

greggh · on March 5, 2019

Your comment doesnt seem fair at all. The blog post is from after they worked hard to fix everything. The comment from Jepsen that you are quoting is from before any of that fix work was done.

You are comparing two very different situations.

drej · on March 5, 2019

I know they worked hard, I appreciate that. But I got a wholly different feeling about the state of things from those two sources.

Also, does everyone run the very latest version of all their software? What use to me is that my vendor has fixed everything in the newest release that I am not using?

Oh and yes, I'm only quoting bits of each (fully knowing you all have links to both and can read it in full), but that's to illustrate the omission from the PR piece. I know that aphyr concludes that work has been done, "By 2.6.0-rc10, Fauna had addressed almost all issues we identified; some minor work around availability and schema changes is still in progress.", but that doesn't change the fact that the blog post doesn't address their past shortcomings.

jwr · on March 6, 2019

Approach like yours causes companies not to submit to Jepsen testing and try to hide shortcomings. They did submit, they found problems (it is rare for the Jepsen suite not to find any), and they fixed them. This is a fantastic result and definitely one to be proud of.

Additionally, I am not sure if you fully appreciate the complexity of Jepsen and distributed databases in general.

As for me, I've actually been waiting for this result to recommend the use of FaunaDB in a commercial setting.

drej · on March 6, 2019

Oh I do appreciate it - I've read Martin Kleppmann's book cover to cover and then watched all of the speeches Kyle Kingsbury has given in the past three years or so. I love this area, my absolute favourite is deterministic testing of FoundationDB [0].

It's because I appreciate this work that I felt the blog post didn't do it justice. And I know Jepsen hardly ever passes (ZooKeeper, I believe, did). And I don't take FaunaDB's hard work for granted.

[0] https://www.youtube.com/watch?v=4fFDFbi3toc

samcday · on March 5, 2019

Why should they be required to "address their past shortcomings"?

Last I checked, when I visit the Windows website it doesn't say "Really stable now but we've had 1,123,432 critical security bugs in the past!". Same for any other product or open source project.

Acknowledging current limitations is an absolute must, and posting thoughtful articles that delve into a past issue and how it was addressed are a bonus. Otherwise, there's no need for self-flagellation ;)

drej · on March 5, 2019

Um, that simile does not make any sense - because we're not talking about their website, but their blog about Jepsen tests, which showed some shortcomings (and it is not derogatory, it's extremely rare that somebody passes Jepsen with flying colours).

freels · on March 5, 2019

The latest released version, 2.6.0, has most of the fixes as outlined in the post & report. Minor/ergonomic fixes are forthcoming.

If you’re running on cloud, we do the upgrade work for you.

zellyn · on March 5, 2019

First Jepsen report, huh? That was pretty complimentary, as these things go!

jwr · on March 5, 2019

I know this is slightly off-topic, but I'd be very interested in Jepsen testing FoundationDB. They claim to have developed the database test-first (starting with simulations), and it would be great to be able to compare the claims to reality using an external (by now becoming an industry standard!) testing tool.

preetamjinka · on March 5, 2019

This was done a few years ago. See https://web.archive.org/web/20150325003526/http://blog.found....

anentropic · on March 6, 2019

That was pretty great. Does anyone have links to tests of FaunaDB write performance?

evanweaver · on March 6, 2019

We are working on new benchmarks.

gigatexal · on March 5, 2019

If they added a standard SQL layer I’d be onboard. Interesting project though.

devDan · on March 6, 2019

They are adding a GraphQL layer in. It's in prototype mode right now, I believe, but we should see it soon.