> The canonical example of a transaction is transferring money between bank accounts: to transfer $100 from me to you, the bank runs a transaction which withdraws $100 from my account and deposits $100 in your account. By running both operations in one transaction …
I know it’s canon at this point, but please stop using this example people. No bank ever worked this way, and often outsiders who hear about this example get the wrong idea.
But to record a transfer with a list of transactions, you need at least two transactions, and you need to write both transactions atomically in the ledger. Otherwise you end up with the same problem.
You only need one single transaction. The value of each account is deduced from the list of transactions that happened with checkpoints (fancy caching) once in a while.
Or, are you old enough to have learned to reconcile a checkbook? Where you keep track of all the checks you write and what your balance should be, and then mark them as confirmed when they show up on your (monthly, because paper via usps) bank statements?
This is a checksum-like trick for manual record keeping, and isn't specifically relevant to the grandparent. It isn't required in digital world. The grandparent talks about having to write 2 separate transactions in 2 accounts in such a way that either both are successfully written, or none are written. This means that there must be transactionality between 2 services, regardless of balance being cached or not.
The point is that there is no such thing as "2 separate transactions in 2 accounts" in banking. There is one global ledger that contains transaction between all accounts - if you move money from account A to account B, you append one entry to the ledger that says that some sum of money moved from A to B. Then if you need to calculate an account balance, you look at all transactions in the ledger, filter by the account ID, and calculate the total. (You can then cache this result, for performance benefits, but this changes nothing about the way it works.)
This is really more like a CRDT than like transaction control.
It's also where Bitcoin took its inspiration from, the blockchain is just one big ledger with some hashing on top.
This is a good explanation for Bitcoin-type ledger, but as far as I understand, banks don't share a global ledger. In US they run settlements through FED, which is the closest thing to global ledger we got, but internally they keep their own separate ledgers. Fed sees settlements between banks, but allocation within a bank is done via internal transactions. If you wanted to give banks a transfer rail that allows them to communicate directly incurring liability, and later settle via FED, you would need to write a transaction in each bank's ledger.
The truth is, this does happen, but you want it to happen less. Banks accept certain amount of risk due to logistical limitations, but always seek ways to reduce it.
Tradeoff between availability and consistency. Not to error when accepting a transaction probably makes bank good money, but showing correct numbers all the time or handling transactions larger than entire account balance probably only ever helps poor people like me who anxiously check incoming transactions at the ATM and we don't make banks lots of money. So a queue of transactions to record doesn't sound like a problem, but idk I don't work at a bank
Accounts go over their overdraft limits all the time. Arranged overdrafts are simply the limit the bank promise to honour without punitive fees, not an absolute limit.
Honestly I have seen enough times purely savings/debit account goes negative a few dollars that what you describe is probably not a show-stopper :) Only if discrepancy is large enough then probably transaction gets flagged and a human steps in.
And the events include things like "refused because the destination account has ceased to exist", "refused because the currency is invalid", "refused because the amount has too many decimal digits", "refused because we are boycotting you", and so on. Anything can happen.
I will keep using this example. It's a good example for illustrating why transactions are necessary and what they are. And it's not relevant whether or not banks actually work this way.
Thank you for submitting this and thank you to the authors.
I am curious how this kind of thing can scale. What I'm inspired by this article is the idea of applying multiversion semantics to datastores that aren't multiversioned.
Sounds like if you have a transactionally sound store of "versions" and a way to query every datastore with a version field, you could implement point in time consistency, because the current version would only change when all datastores are updated. You still have to wait for all datastores to be updated, so that part is similar to 2 phase commit.
My understanding is that two phase commit does not scale and is slow because of all the round trips, but we still want linearizability and strong consistency in systems.
I tried to implement an asynchronously replicating data protocol but it is eventually consistent, not linearizable (it fails the Jepsen linearizability test)
I'm a beginner in this area but I have a toy multiversion concurrency control implementation that uses Java threads. (See MVCC.java and TransactionC.java) the key statements of code is in commit() and shouldRestart().
While we haven't tested this, I suspect the main scaling bottleneck in practice is the primary database, as every Epoxy transaction goes through it. If you wanted to run Epoxy at large scale, you'd need a primary that supports very high transaction throughput.
More seriously, I suspect that plain SQLite by itself, running in process in the main service, writing to a ':memory:' database for ultimate speed or perhaps to a local file with WAL mode for slightly more durability, could achieve pretty nice efficiency and work as the main transaction manager database.
I'm a bit confused why OP says that because two-phase commits are not natively supported by various databases, that it's not a viable strategy. It's possible to implement 'two-phase commits' manually through a simple algorithm. You just need an extra field to keep track of whether or note the record is 'pending' or 'settled' and also some additional fields for creation date and settlement date. The main process just copies over all the records as 'pending' state then you have another parallel process/task which goes over all the pending records in order and changes them to 'settled' if some condition (e.g. some related records in different tables/collections also exist) is met. Records need to be settled in a particular order to protect from failure of the settlement process itself but it's not particularly difficult to implement. Also, you need good indexing for fast/efficient cross-table/collection lookups (you especially need indexing on date fields in order to quickly look up and settle records in the same order as they were inserted starting from the last settled record).
I did something like this a while ago; some operations in my system inserted multiple records into different collections (into a database which did not support atomic transactions) and I needed to guarantee that when a record existed in a particular collection in the 'settled' state that it would guarantee the existence of a matching record in a different collection. It was very reliable. I believe I didn't even end up needing to launch a separate process for the settlement logic since I was using Node.js, I just used an async task scheduled to run on an interval. The front end would just ignore records until they were in the settled state (but you could also show them as pending).
Another thing which really helps is using UUIDs are IDs as it guarantees idempotence so you don't have to worry about double-insertion in case of process failure/restart. You can just re-process some records that you may have already processed with no side effects; you just need to check that the error is an insertion conflict and continue.
If a transaction A adds 10 records, and they're currently being settled, with 5 of them being pending and 5 of them being settled, how do you ensure that a concurrent transaction B either sees all of them as settled, or none of them?
That's why the settlement order matters. If the records are settled in a specific order, you only need to check the last one (the last leg of the settlement); if that one is settled then you know all the others are settled.
For example, if you settle based on creation timestamps then you know that if a record is settled, then all records with a smaller timestamp within the same collection/table are settled too. You can also implement a similar guarantee if dealing with multiple collections/tables.
For example, if you create a 'Company' record along with a 'User' record which points to the 'companyId' and you want them to be settled atomically as a pair, then in your settlement process/logic, you could ensure that you always mark the company as settled first (and wait to get a success response from that DB query) before you then mark the user as settled - In this case, you only need to check that the user is settled to know that the associated company is also settled since your settlement logic guarantees that it's not possible for the user record to be settled before its associated company since you check that the company settlement query was successful before moving onto the related user.
In this case, you can think of settlement as having two legs; the first leg is the company record, the second leg is the user record. If the second leg of the settlement fails, the user will stay in a pending state which may cause your settlement script to reprocess (or re-check) the first leg (the related company record) since, if the last leg of the settlement failed, it will treat it as if the entire transaction failed and that's fine.
How did you handle cleaning up aborted / rolled back transactions after binaries restarted? Or did you just let the pending records hang around forever.
When a node rejoins after a restart, have all other nodes drop any pending transactions initiated by the node in question. I implemented something similar for a distributed filesystem and it worked pretty well.
I get what you're saying. This was not an issue. If one or more of the records in the 'transaction' failed to be inserted, then none of the other records inserted as part of this transaction would be able to be converted into the settled state. As part of my settlement script, there was logic which looked at creation timestamps and would change the state to 'failed' if they had been pending for too long (which was only a few minutes). My script settled records created by different accounts in parallel (though for each account, sequentially based on timestamp) so as not to allow a user to intentionally make incomplete transactions in order to hold up settlement and prevent the script from settling other users' records. A user could only delay settlement of their own records.
Because I used UUID as IDs and they were created on the front end, the user in my app could potentially re-submit the form (click submit again after seeing the 'Unexpected connection error please try again...' message); this represented a second chance to complete the transaction. Those records which were already successfully inserted into the db the last time would be ignored the second time (due to ID conflict) and those which had failed the first time and not been inserted into the db would then be inserted as pending; then the settlement script could complete its job and settle on the next interval.
Does Epoxy use postgres' own transaction ids across datastores? If not, does it implement its own "transaction id horizon" like postgres has so that you can assume all transactions before a certain counter are committed?
Interesting to see Matei's name on the paper. Maybe Epoxy ends up at Databricks to enable updates across, e.g., Delta Lake & a traditional DWH simultaneously, for, let's say, migrations. Who knows :)
An interesting idea, but if I’m understanding the problem trying to be solved - might be better suited by durable execution (two examples being Azure’s durable functions, and Temporal.io).
In practice transactions between arbitrary data stores would result in potentially boundless and unpredictable latency, no?
Also, is Postgres strongly consistent and linearizable)? One alternative would be using a database with stronger consistency guarantees (Spanner is but not open source, FoundationDB is but has limitations on transactions unless you implement mvcc yourself, which to be fair you are).
Durable execution is good for atomicity, but this approach also gives you isolation. If you're doing updates on both Postgres and Mongo and you want to guarantee both updates happen, Temporal can do that. But if you want isolation, for example the guarantee that any operation that can see the Postgres update can also see the MongoDB update, then you need distributed transactions like in Epoxy.
Consistency is a property of distributed databases. Stock Postgres is not distributed, and thus gets strong consistency for free.
There is still a concept of (transaction) isolation levels, and the ANSI SQL standard defines a transaction mode READ UNCOMMITTED that could give you inconsistent results, but Postgres ignores that and treats it as READ COMMITTED.
in terms of ACID
"Consistency ensures that a transaction can only bring the database from one consistent state to another, preserving database invariants: any data written to the database must be valid according to all defined rules, including constraints, cascades, triggers, and any combination thereof. This prevents database corruption by an illegal transaction. Referential integrity guarantees the primary key–foreign key relationship."
So no it's not free
use a tp monitor? such things have been around for a very long time. i remember writing code to interface with one in the 90s, and a bit later. but things like cics and tuxedo were there long before that.
In most cases microservices are an anti-pattern, because they are trying to solve problems that can be solved without incurring their complexity and sacrifices.
However, I remember that Sagas originally were proposed to deal with multiple 3rd party APIs. The example was "book flight, hotel, rental car". If you're dealing with necessarily separate systems, then there's probably a place for sagas.
> As soon as we start doing separation of domains according to data governance, we may quickly realize that in the vast majority of cases, microservices look a lot like traditional monoliths and the domain they should handle is big. Or realize that traditional monoliths are, in fact, microservices. This happens because most organizations have only a very limited number of truly independent domains. Most often — one.
I didn't understand anything written in that post. Saga was mentioned 5 times: once in the title, once in a tag, and 3 times in the 1st paragraph. I would have benefited from an example, as everything was too abstract for me to follow.
You have four separate businesses (at least): a bank, an airline, a hotel, and booking agency.
There's no way that some random booking agency is getting an XA transaction (or other distributed transaction protocol) implemented between the disparate systems it is orchestrating.
It has to maintain conversational state with each of those systems in the domain/model of the exposed APIs of each in order to coordinate a resolution. Even if you could do that with an distributed transaction coordinated between those disparate entities, the possible failure states of each are such that you likely wouldn't want to. For example: go ahead and book the flight and pay for it if you can, but don't rollback the whole thing if the hotel ends up needing to decline for some reason.
These business models exist today as real world systems, and they aren't implemented using distributed transactions.
This what basically what a Saga is... a process that uses custom protocols to resolve the state of multiple entities to a consistent valid state.
Sure, avoid Sagas when possible (use a DB transaction if you can merge microservices), but when you aren't setting system boundaries (aka your world isn't a single system supporting XA everwhere), the Saga is a completely valid and necessary mechanism.
The author already acknowledged that there are other ways to do distributed transactions
> The traditional solution is to use two-phase commit through a protocol like X/Open XA. However, while XA is supported by most big relational databases like Postgres and MySQL, it's not supported by popular newer data stores like MongoDB, Cassandra, or Elasticsearch
Cassandra has a distributed transaction based on paxos, and I think a raft version is in the works.
I admittedly skimmed the article, but the notion of a central coordinator for the transaction isn't really a scalable solution. In the stated use case, heterogeneous stores, it's kind of what you're stuck with to some degree.
But... why not use a purely distributed central store like Cassandra or Zookeeper (I think zoo is masterless?) rather than Postgres?
IMO this doesn't have a chance of standing up to a network partition like Aphyr would throw at it, but then again I didn't graduate from Stanford.
How heterogeneous do you mean? But basically, "yes".
IBM had CICS transactions back in the day for some definition of heterogeneity. Tuxedo did this in the 80s across broader platforms. In the 90s (when I heard of it) there was an open standard created for it: https://en.wikipedia.org/wiki/X/Open_XA so that Unix-y type systems had some agreement on how to do it without buying mainframe/tuxedo-y type components. And around or a bit after then, Microsoft created its own "Distributed Transaction Coordinator". This is not my area of expertise but I've heard about it throughout the last 30 years.
I'd understood distributed transactions to be a vendor-specific thing. I'd never have realized I could have IMS/DB, DB2, and MSSQL all in the same txn.
Candidly this is not my area of expertise but I always thought it was multi-vendor. Lacking first-hand knowledge, a curious quick web search does imply that at least the Open/XA standard was supposed to be multi-vendor:
The Open Group has defined an industry-standard model for transactional work that allows changes made against unrelated resources to be part of a single global transaction. An example of this is changes to databases that are provided by two separate vendors. This model is called the X/Open Distributed Transaction Processing model.
Is Microsoft's MSDTC more single-vendor? Even that seems to support multi-vendor databases through its support of Open/XA:
When the DTC acts as an XA-compliant transaction manager, Oracle, IBM DB/2, Sybase, Informix, and other XA-compliant resource managers can participate in transactions that the DTC controls.
Diving into the history of which distributed transaction manager was the first to support multi-vendor database transactions is left as an exercise for the reader...
I suspect industry knowledge/awareness of this stuff waned as web-generated transactions boomed and cloud platform vendors like AWS sold "eventual consistency" to developers and through them back to management and end users (like myself!) who think "Oh, that's weird, why didn't X update? Let me just refresh my browser... oh, there it is".
I've met a lot of people (typically open source enthusiasts) who say that distributed transactions are a dumpster fire and other people (Microsoft/enterprise partisans) who think they are great.
My go-to for a critique of distributed transactions is "Life beyond Distributed Transactions:
an Apostate’s Opinion"[1] which says in the abstract "performance costs and fragility make
them impractical". That was 2007, though.
Again with the straw man example of a bank transfer.
Another article you didn’t need to read because it talks about a problem you likely don’t have because the author can’t be bothered to come up with actual, real world examples.
People who have a use for this technology don't need a real world example, they already know where they would apply it.
Its true that most people don't need it at all, and I cringe at the thought of a second wave of micro-service muppets stitching it all together with something like this, but there are actually real-world examples. I'm not going to offer you any though, due to your belligerence in the other comments.
A bank transfer is a good example because it is easy to understand, regardless of your background.
Fine. I've got a Document database for user data, AWS Cognito for user ID's and authentication, Stripe for payments, and Postgres for a subset of user profile data that interacts with application data.
Common actions like Create, Delete, and some Updates touch all of these services. Automatic transactions between these would be fantastic. Especially on the billing side, some billing flows involve multiple trips between Stripe and our DB, batching the whole process inside a guaranteed reversable transaction would be lovely.
Of course we're not going to get that with Stripe and Cognito as they expose bespoke API's, but the idea still holds.
Except it doesn’t, really. You don’t usually want your user sitting around waiting for all those services to talk to each other before they a notification that the process is complete or not.
Further, those complex interactions between those various services will be much hard to debug and tweak when trying to provide support to end users. Can’t pay your bill because Cognito is down? Can’t grant access because Stripe is down? The data warehouse is doing an index build so you can’t do any transactions? Guess we should turn off the website between 2AM and 4AM ET to allow for data warehouse rebuilds.
>> You don’t usually want your user sitting around waiting for all those services to talk to each other before they a notification that the process is complete or not.
I have had this exact use case in a problem I was solving where the user was waiting at the other end. Please do not presume that only the problems you have solved are worth solving or exist in the real world.
> if your AuthN and AuthZ is down, what else would you do ?!!
Minor miscommunication… from a write standpoint. Obviously if you can’t auth, you are in trouble. But if the billing needs to happen so you can restore a permission, then you need to hit a card (for example) and then somehow update your authz service.
In most cases, it’s not strict availability that’s likely your problem (eg. Full down, no responses) it’s going to be business logic or config issues impacting a small subset of TX.
And, honest, if I’m so wrong about this, where is the XA distributed transaction coordinator for Cognito?
I’d be shocked if Amazon ever implemented a single-point-of-failure distributed transaction coordinator and exposed that from their services.
as someone who hasn't implemented transactions in microservices before, I have only seen the bank transfer example and it seems adequate and easy to understand - I wouldn't know what limitations it has
It doesn’t exist. This isn’t how banks transfer funds between accounts.
It’s fake. As a result, you can’t actually evaluate the engineering trade offs of the proposed solution.
Here’s the thing: do you even need transactions (in the sense of ACID transactions that are resolved heuristically, as described in this article) between your microservices? Likely not.
And what are the implications of having to wait around for PGSQL and Mongo negotiate a transaction between a third data store? Probably blocking and latency and all the associated problems which will lead to … not use transaction across service boundaries.
The trick is to work out what you would do if you had to implement the entire system on paper in a 50s style office with triplicate carbon paper. Then do that, but with computers.
That is not a trick, it is a false analogy. Two humans in eyesight line can be assured that they have exchanged messages successfully. Two computers cannot due to recursing acknowledgements (byzantine generals).
No, it’s not a false analogy. Go look at an old office building and consider whether people really are “in eyesight line” - assuredly they are not.
The unreliable messaging links (internal mail) mean that resilience against missed messages and guards against not making progress must be built into the business process instead of an infrastructure layer.