There's at least a couple more major improvements I want to make. One is fixing all of the small errors, and one is adding c-syntax. They've been stalled out due to Reasons (tm), but I'm hoping on diving back in soon.
Every time I see some TLA+ related article, I read it and then spend an hour convincing myself that I don't have time to learn it right now, even though I really want to. That convincing usually comes in the form of showing that the gains are not worth the effort for the software field that I am in. Can someone change my mind on this?
It's almost certainly not worth the effort, given what you said about your work domain.
From what I've seen, TLA is finding a niche for modeling complex concurrent/distributed algorithms.
I'd recommend design by contract as a lighter way of verification. And if you don't like the TLA syntax, you could try a verification tool similar to TLA called SPIN, since it kind of looks like C. The reference for SPIN is the book written by the tool's author.
In many fields the gains are not worth the effort if you think of validating entire specs or real codebases.
Yet TLA+ can be beneficial to validate multiple-step interactions between components e.g. a quorum protocol on a network or locking and messaging between threads. Anything that involves more than one agent and a state machine essentially.
General, non-critical software systems. My understanding about TLA+ was that it really shines in critical software systems where the result of the system crashing would be catastrophic. The effort involved in modelling the non-critical systems I work on never seems to outweigh the effort of "write a test, fix the bug, re-deploy"
It's good for temporal errors, like ordering or concurrency. Those are often the hardest bugs to track down with inspection or testing. Model-checkers such as TLA+ can find them fast.
Formal specifications themselves have a few benefits. The first is they force you to make everything explicit that's usually a hidden assumption. From there, they might check the consistency of your system. From there, you might analyze them or generate code from the specs. Spotting hidden or ambiguous assumptions plus detecting interface errors were main benefits in industrial projects going back decades.
However, you can get a taste of the benefits by combining Design-by-Contract with contract/property/spec-based testing and runtime checks. Design-by-Contract is a lot like asserts. Should be easy to learn with more obvious value. I'll also note that tools such as Frama-C and SPARK Ada used with code proving are adding or did add the ability to turn all the specs/contracts into runtime checks. So, that's already available in open source or commercial form. And anything you can't spec, just write tests for. :)
This sort of stuff is maybe not useful in most simple systems and issues, one where "write test, fix bug, re-deploy" feels right.
But when you have tighter coupling and a lot more business rules, this stuff becomes useful.
For example, imagine that you are writing a billing system. Beyond just the code, there's a design that expects a passing of time and an ordering of operations (user signs up to the system, after a bit they will get invoiced, the amount invoiced will be right, perhaps there's pro-rating)
In models where there are a lot of moving parts and where you _don't_ really have locality of code, then this stuff becomes more useful.
If you've ever had a bug that exposes an issue not just with the code, but with the entire system and set of assumptions it's built on, it's possible that something like TLA+ could help you a lot for that system
I'm obviously biased in favor of TLA+, and think it's more useful than people think. But it could also be that another FM might be more useful to you, or that they're not useful at all in your case! Feel free to send me a ping if you want to talk more on this- my email's in the OP link.
I love the concept of formal verification, but translating my algorithms into yet another DSL and hoping I get that right seems like a waste of time. I'd rather simulate the algorithm with my brain.
Good luck with this. I heard about practical TLA usage. There was hang up in python asyncio library related to concurrency and it required 24 steps to reproduce. It was found and fixed with TLA help.
> But if we run the model checker, we find that’s not true: Alice could trade an item to herself and, before that process finishes running, resolve a parallel trade giving that same item to Bob. Then the first trade resolves and Alice gets the item back.
How does the model checker know that "parallel threads" are even a possibility here? It seems like it's checking what amounts to a leaky abstraction from the actual implementation of this algorithm.
In my mind a strictly logical validator would look at the proposed algorithm and say it checks out (since in math-land transactions are atomic). Is this just a feature of the model checker? Can it be turned off/tuned to the expected real-world implementations of the algorithm?
TLA+ is originally mainly meant to check for concurrency bugs, so the possibility of parallel threads is kind of in its core DNA. The name "process" might also give a hint that parallel threads are indeed possible.
I don't know how, but I expect you can indeed model actual transactions.
Depends on what you mean by "actual transaction". Every TLA+ behavior is describable as a series of "next-states", where _usually_ (not always, but _usually_) one action is true. In which case you'd model the transaction by determining the changes that the overall transaction makes and then declaring them all as part of a single action.
PlusCal uses labels (e.g. "Give:") to designate atomic actions - when you translate it, each labelled section is turned into an atomic action [1]. You can start off with a very simple spec with totally atomic transactions, and gradually make it more granular by introducing more labels, as shown in Wayne's Strange Loop presentation [2]
A general rule-of-thumb in TLA+ is "default is worst-case". If you don't explicitly declare your actions are fair, they'll be unfair. If you don't explicitly encode atomicity into your spec, it's not atomic.
Much better to accidentally get a false positive than a false negative.
We used this resource to put together a model for our distributed sessions management algorithm, heres a GitHub with Go implementation and PlusCal / TLA+ model [1]. We also wrote a more in depth article about it here [2] (with an interactive version in JS). It took a few attempts but eventually it clicked!
Very broadly: model checkers (which most TLA+ users use) verify systems by induction, by enumerating possible states a system can take on, and showing that the none of the states violate the system requirements.
In comparison, proof assistants take a deductive approach, requiring you to show - step by step - how the requirements are satisfied by the definition of the system. Dependently typed languages move some of that burden to the type system.
See my other comment on the matter, but this presents a choice where there is none. No tool that extracts or verifies actual code (and people have actually done both with TLA+) can operate on programs larger than a few thousand lines. So it's not a choice between fully verifying a 200KLOC program or just its 2000-line high-level spec. It's between verifying/extracting a 2000 line program or a 2000 line spec that abstracts a 200KLOC program. In other words, for programs of a certain size, verifying a high-level spec is the only real option for "deep" verification.
I've seen research tools to extract "real programs" from TLA+. But on the whole not having synthesis as your default case has two major advantages:
1. You can choose the level of modeling, so you can quickly spec out the high level and find bugs in that before moving on to the low level (if you want)
2. It doesn't tie you to an implementation language. So you can use it even if you're implementing in COBOL, JavaScript, or Brainfuck.
Of course extraction has its own benefits. Coq is a pretty cool tool :)
TLA+ uses model checking over what can be understood as a custom modal logic-- the latter means that TLA is basically adding a bunch of modalities (which could also be understood as (co-)monads) that allow for the modeling of state ("temporal" logic), nondeterminism (which also shows up in the modeling of abstraction/refinement relations) and perhaps some other features, over the basic logic that you would find pre-defined in most theorem provers.
The use of model checking means that TLA+ is basically a glorified fuzzer: it will attempt to find a refutation of the model you define (such a model, in turn is usually an abstraction of the system you actually care about; not something that gets verified "end-to-end" like in Agda or Coq) and any such refutation will be a genuine proof that the model fails in some way; but a failure to find one does not amount to a proof of correctness!
TLA, the logic at the heart of TLA+, is a linear temporal logic, which is, indeed, an example of a modal logic, but also the most ubiquitous and successful type of logic in program verification. I think that in computer science in general and formal methods in particular, temporal logic is more well known than all other modal logics combined (and probably more than even the name or concept of modal logic).
> which could also be understood as (co-)monads
This is only relevant if you're trying to represent temporal logic in some functional language, which TLA+ isn't.
> The use of model checking
TLA+ is a specification (and proof) language that's independent from how it's verified (or how the proof are checked). There is a model checker for TLA+, as well as a proof assistant.
> a glorified fuzzer
Model checkers and fuzzers are not at all alike. A fuzzer runs a program many times, while a model checker doesn't run it even once, but rather checks it. To see how different checking is from running, consider that a model checker like TLC (a model checker for TLA+) could completely check in a number of seconds an infinite (and even uncountable) number of executions, each with an infinite number of steps.
> a failure to find one does not amount to a proof of correctness!
A failure to find a counterexample most definitely amounts to a proof of correctness. It is not a deductive proof in the logic, but a proof nonetheless (one that employs the model theory of the logic as opposed to its proof theory). That's the meaning of the term "model checker": it automatically checks whether some program is a model for some formula (i.e. a structure that satisfies the formula).
> not something that gets verified "end-to-end" like in Agda or Coq
This is misleading. It is possible to do "end-to-end" in TLA+, pretty much as it is is in Coq, which is to say that it's possible only on very small programs, usually in research settings. For example, see here[1].
But the main difference between TLA+ and Coq/Agda is that it's not a research tool but a tool intended for use in industry on large, real-world systems. Now, all formal methods -- whether Coq or TLA+ -- have severe scalability limitations. Very roughly, one could verify about 2000 lines of specification -- that specification can be actual code (i.e. a ~2000 line program) or a high-level spec that abstracts 200K or even 2M lines of code. Because TLA+ is intended for use in industry, it is designed to be used on high-level specs more than code, because most programs that require verification are not small enough. But if you have a 200K-2M LOC program, it's not as if end-to-end verification is a choice. Remember that the biggest programs ever to be fully verified end-to-end were about 1/5 the size of jQuery, and took world experts years of effort -- not a feasible option for most software.
> ...Only if you're trying to represent temporal logic in some functional language
Well, since the user was specifically asking for a comparison of TLA+ with systems like Coq, Agda and Idris, I should think that the question "how would I represent the logic TLA uses, given a language like Coq/Agda/Idris i.e. a functional (dependently-typed) language?" is quite relevant! (Perhaps even more relevant than the issue of what happens to be better known among academics in the formal-methods community.)
> A failure to find a counterexample most definitely amounts to a proof of correctness. It is not a deductive proof in the logic, but a proof nonetheless
This is of course true if the model checking is exhaustive. Given that TLA+ is most often tuned "for use in industry on large, real-world systems" however, most uses of model checking are probably not going to exhaustively check the relevant state space.
(Similar to the underlying reason why the vast majority of users in industry would probably focus on the model-checking component in TLA+, not the fact that it also happens to include a proof assistant.)
> ...Now, all formal methods -- whether Coq or TLA+ -- have severe scalability limitations. Very roughly, one could verify about 2000 lines of specification...
(This is of course a very real challenge, but type checkers are far more scalable than that, and there are reasons to expect that this may generalize to other "local", "shallow" properties like simple checking of pre- and post- conditions ("design by contract"), or the sort of resource-based and region-based checking that is now being used for systems like Rust. This is where end-to-end methods might have real potential.)
> This is of course true if the model checking is exhaustive.
If it isn't exhaustive, it isn't model checking (i.e. it doesn't check that your program/spec is a model of a formula that describes some correctness property).
> most uses of model checking are probably not going to exhaustively check the relevant state space.
Ah, so what happens there is that you check a finite instance derived from your specification. I.e., instead of checking a specification for n ∈ Nat, you check a derived specification where n ∈ 0..4. But it's a proof of the correctness for the specification that is actually checked. Obviously, a model checker can't prove something it can't check (same goes for deductive proofs).
> Perhaps even more relevant than the issue of what happens to be better known among academics in the formal-methods community.
Definitely temporal logic, then :) FP is rare in formal methods, but is well known in programming language research. There is a small intersection of the two fields (proof assistants, dependent types), but I doubt it even makes out 10% of formal methods research (most of which is around model checking and static analysis).
> but type checkers are far more scalable than that, and there are reasons to expect that this may generalize to other "local", "shallow" properties like simple checking of pre- and post- conditions ("design by contract"), or the sort of resource-based and region-based checking that is now being used for systems like Rust. This is where end-to-end methods might have real potential.
We know what kind of properties you can scale, and the method of checking them -- either by type checking or any other form of abstract interpretation -- is not relevant (none can do a better job than the others). Those are inductive (or, since if you like programming language theory, the compositional) properties. Unfortunately, most deep properties are not inductive; this means that proving them using scalable inductive techniques (type preservation is an inductive property as are the inferences in any deductive system) would always require a long proof, that exists somewhere in a huge search space.
So, type checkers, model checkers and static analysis are all scalable for the same properties, and not scalable also for the same ones.
Also, there are no end-to-end methods, as far as I know. There are just projects small enough, and people devoted enough to do incredibly tedious work.
> I think that in computer science in general and formal methods in particular, temporal logic is more well known than all other modal logics combined (and probably more than even the name or concept of modal logic).
Man it'd be super fun to write an overview essay on all the kinds of modal logic used in CS that _aren't_ temporal logic.
They are many similarities and many differences, but the differences mostly stem from their goals being very different. Coq, Agda and Idris are mostly used for research. TLA+ is designed for heavy-duty, everyday use in industry by software and hardware developers working on large, complex real-world systems.
CompCert is written in Coq, and was developed as a research project at academic institutions [1]. It's a very small program (about 1/5 the size of jQuery) that has taken world-experts years to develop.
Yes, you can definitely tell the inventor of TLA was also the first developer of LaTeX. As such, it is a language best read after typesetting.
For a good example, see Appendix B in Diego Ongaro's dissertation [1], the formal Raft specification. After a little of the Learn TLA+ book, I was able to muddle through the spec enough to implement it.
TLA+ uses an almost-standard mathematical notation, so that people with some familiarity with mathematics can read it in ~10 minutes of training. You can see examples of what it looks like in my post here (https://pron.github.io/posts/tlaplus_part2)
This website chooses to show the ASCII version that you actually type to get the "real" pretty-printed syntax.
I guess this is a meta question: do typical US undergraduate university Computer Science curricula not have Formal Methods as a class?
I'm just wondering because of the number of fairly basic questions here. I went to a middling midwestern state school 20ish years ago and that was part of the curriculum. Perhaps we have lots of non-CS-graduates here?
This is not to denigrate the discussion. I'm just surprised that this is news.
While it seems anecdotally true that the mathematical foundations of computer science are being downplayed in universities in favour of giving students a more practical education in software engineering, which is an entire contentious discussion in and of itself. I think that a big factor in influencing your observation could be that CS is no longer the main entry point into a career in software development. In the organisation I work in, many of the younger hires do not come from CS backgrounds, and even many of the older hires come from other scientific disciplines.
if you want to get into software development, just go to any so called "boot-camp" class for few couple months.
Then you know how to write some webpages with new shiny JS framework, congrats you are a "qualified" software guy. Now go to write those sh*ty code ............
Math? what math?
CS background? algorithm? if you are not going to challenge FAAG why bothered?
if you just want to code few pages, writing some sql query, there is no need for that. and for 80% of enterprises, they need only CHEAP engineers.
sorry for the rant, just another morning with lots of idiots bothered me.
I agree with you.
Unfortunately there's numerous perverse incentives within the business sector that are promoting this.
I'm glad I won't be alive in 2070 to fly intercontinental in an aeroplane with avionics written in Node.js...
Speaking as someone who's graduating this year, I wasn't exposed to it at all in my undergrad. I might be covered in one or two upper levels, but I hadn't heard of it until taking a graduate course where we talked about verification.
Here at the University of Queensland formal methods are in optional classes for computing degrees. I'm in the last six months of a part-time postgrad coursework degree, during which I did both a formal specification and a concurrency validation course. For my thesis I did a formal model using a proof assistant to describe event sourcing.
Neither formal methods class was particularly full, but neither was OS architecture, advanced algorithms, compiler design... machine learning and AI were both packed to the rafters though.
Kids entering university are picking based on current trends, and formal methods are not trending.
There is huge variation between institutions and between individuals in terms of the electives they take.
My alma mater had everyone takes are an intro programming sequence, discrete math, and algorithms. Beyond that it was up to you. It could be all proofs, all systems programming, or a mix of the two. (I went with systems programming).
It uncovered an oversight in an algorithm we put together for managing distributed sessions. An invariant showed that our algorithm would accept and consider valid tokens that were ahead of the 'master' counter, which theoretically shouldn't of been a possible input in practice but now that we know we've added an additional guard. Original comment: https://news.ycombinator.com/item?id=19662123
I spent a 6 month parental leave learning it and experimenting with it. I enjoyed writing specifications so much that I thought programming was a waste of time. You should watch the video lectures. The first 2 or 3 will get you going. A first simple exercise might be to specify the least common multiple following Lamport's GCD example.
Specifications aren't tied to any specific programming language - you're outlining how the algorithm should work, not the implementation.
(You can write Java code and invoke it during model-checking if you really need to, but I haven't come across any situation where it's actually necessary.)
Since the author hangs out here on HN, I'd like to ask: how long does it take to develop a working proficiency to model standard distributed systems (ex: a souped up key/value store)? It seems like he's written 2 books:
Learn TLA+
Practical TLA+
Would there be any other resources that you'd recommend to supplement after those 2 books?
Not the author, but it would take 2-3 weeks if you learn by yourself (to get real stuff done, not to be an expert), or 3 full days of hands-on training.
I would recommend Lamport's video course[1] and TLA+ hyperbook[2] (Lamport says he's stopped working on it, but it's fairly complete). Note that both Learn TLA+ and Practical TLA+ don't teach much of TLA+, but rather a language called PlusCal that compiles to TLA+. Some programmers may find it easier to start with because, unlike TLA+, PlusCal resembles a programming language (and like all programming languages, it is much more complex than TLA+, but more familiar). Hillel's book also contains some good real-life examples.
You can also find examples and various other links posted on the TLA+ subreddit[3].
"Note that both Learn TLA+ and Practical TLA+ don't teach much of TLA+, but rather a language called PlusCal that compiles to TLA+. Some programmers may find it easier to start with because, unlike TLA+, PlusCal resembles a programming language"
Most evidence indicates using familiar concepts aids learning. That makes me hypothesize the best recommendation is the more mathematical ones for folks strong in math and the programmer-centric ones for those strong in that. That also implies that, for each of these formal topics, we need one version for each audience to draw more people in.
Seem's like hillel's material is more approachable for people with a programming background, and pron's recommended material better for people who are more math-y. I'm likely going to start with Hillel's books to develop a working understanding and then move onto Lamport's stuff once i'm ready for a rigorous deep dive.
In my workshops people usually start being able to apply it to their business problems by about day 3. It's actually pretty simple once you get the hang of it!
After those two, I'd probably recommend the original _Specifying Systems_. It can be a tough read, but it's a comprehensive dive into how TLA+ works mathematically and everything that you can do with it. You can get it for free here: https://lamport.azurewebsites.net/tla/book.html
I'd also recommend practicing by applying it to problems you've already solved in your place of work. That way you can compare it against a real system you know inside and out. I talk more about that here: https://www.hillelwayne.com/post/using-formal-methods/
When I did spend a little amount of time trying to learn TLA+, the biggest annoyance is the two different syntax/form in which TLA+ can be written, a mathematical symbolic form, and a text form like a program.
Now, this topic is a rather abstract and difficult to comprehend one, and having two equivalent ways of doing it, introduced right from the beginning, I thought was too much of a cognitive load, considering the difficulty of the subject.
I hope this material from the article is simpler for learners.
Is there a validator to verify if an implementation of an algorithm respects the TLA+ specification? Can somebody please tell me (some of) the main differences between TLA+ and JML?
> Is there a validator to verify if an implementation of an algorithm respects the TLA+ specification?
TLA+ has some tools to do this, if you write your implementation in some TLA+-specific language like PlusCal. But this would be the sort of "end-to-end" verification that TLA+ proponents would regard as only being feasible for relatively simple software.
Relatedly, JML basically adds support for "design by contract" specifications to the Java programming language - expressing preconditions, postconditions and invariants. These can be quite useful, but the focus with TLA+ - in this thread and elsewhere - seems to be rather on "deeper" properties that encompass entire systems in a more global and cross-cutting way, and are thus inherently harder to manage with simpler tools.
They share many philosophical similarities, but their design and goals are very different. Coq and Idris are mostly used for research. TLA+ is designed for heavy-duty, everyday use in industry by software and hardware developers working on large, complex real-world systems.
They're complementary, IMO: unit tests show that your implementation works as expected, while a formal specification allows you to check that your approach is right to begin with.
That being said, TLA+ specifications have uncovered really hairy concurrency bugs which would be a nightmare to unit test - for instance, Amazon found bugs in DynamoDB which required 35 steps to reproduce [1].
unit testing (and any kind of testing for that matter) shows that your code works for a set of inputs. formal verification methods show that it works for all inputs. it’s the holy grail but comes with a really high price tag to actually do it (so unless you’re writing something critical or have a lot of resources you normally get your software to the point where it’s good enough through unit/etc testing)
Think of a process by which you start with a vague sketch of a system and end up with a concrete implementation in, say, C++. The latter is the highest level and least detailed. The latter is as detailed as the specification of your system will get. A prose design document of your model might sit somewhere on this spectrum very close to the vague sketch. A TLA+ spec sits somewhere in the middle. It's a refinement of the design doc with more details specified concretely, but not all of them (that's the actual implementation). But this mid-level specification is usually enough to make highly non-trivial assertions that invariants hold (safety properties) and also liveness properties are satisfied too. But keep in mind that the specification of your system is distinct from the model checking of that spec. The latter is like a test, albeit of your system not just some "unit" within your system.
Not really. Formal verification methods are mathematically rigorous. They can mathematically prove or disprove correctness of programs and catch bugs that would be extremely difficult for a human to find https://en.wikipedia.org/wiki/Formal_verification
There's at least a couple more major improvements I want to make. One is fixing all of the small errors, and one is adding c-syntax. They've been stalled out due to Reasons (tm), but I'm hoping on diving back in soon.