I like this reply, but my instinct is to ask, would you not have found out the s...

hwayne · on June 11, 2020

I wrote this case study on using TLA+ for a business project: https://medium.com/espark-engineering-blog/formal-methods-in.... This was a system we'd already spent a lot of time working on and consistently saw a direct correlation between making changes and new production fires. TLA+ caught issues with the planned changes before we actually implemented them.

tonyarkles · on June 12, 2020

And I would like to thank you for the material you've written. You, personally, played a huge role in my choice to look into and eventually standardize on TLA+ as my "distributed systems modelling" tool as well as getting me going on it. You rock, thank you!

tonyarkles · on June 12, 2020

> "would you not have found out the same thing by doing the implementation?"

Yes, although there's a couple of important points to that:

- A lot of it wouldn't have been discovered until late into the implementation. The project itself looked straightforward, and my estimates would have been brutally off if I hadn't done the modelling exercise ahead of time.

- Some of it would likely not have been discovered until after the app shipped. Rare edge cases. To give you context, it's an Android app that talks to an Iridium (satellite) modem over BLE. I basically modelled an abstract Android Activity lifecycle with extra events for user actions, a BLE stack (both sides of the connection), the Iridium connection, and the server on the other side of it. Each of these state machines were relatively straightforward to model. The insight came from letting TLA+ find paths through the five parallel state machines that got me into a weird state. (For an easy example, we make a request that successfully makes it out the Iridium modem to the server, the server replies, the response is delivered to the modem, and there's a corrupt BLE packet that terminates the connection.)

- You're right, architecture and modelling are different things. My flow is basically: 1) high-level requirements, 2) model of the system based on the HLRs, 3) low-level requirements derived from the HLRs and the insight gained from modelling, 4) architecture derived from 1,2,3.

- The amount of thought I'm going to put into a project ahead of time depends drastically on the project. The Iridium/BLE project? On the safety spectrum of 1 (mobile game) to 10 (flight critical software for a commercial jet), that project was about a 6-7. The software didn't have the ability to kill someone, but if it failed in the wrong way, they might believe someone was coming to rescue them when no one was coming.