Deterministic simulation testing for our entire SaaS

taink · 2024-03-12T16:08:17 1710259697

This is related to Antithesis, here is the thread on the original announcement :

https://news.ycombinator.com/item?id=39356920

terpimost · 2024-03-12T15:30:26 1710257426

Off topic: warpstream's calculator on the pricing page is pretty cool https://www.warpstream.com/pricing

That breakdown switch is a lovely touch.

zarathustreal · 2024-03-13T12:36:36 1710333396

It’s nice functionality but the UI is very broken for me on mobile

necubi · 2024-03-12T18:12:28 1710267148

This is so, so cool. Basically the holy grail as a distributed systems engineer. Like the author, I've also avidly consumed every Jepsen report but the effort of actually implementing Jepsen tests for my systems always seemed too high.

Very excited to see this technology democratized and made available to to more companies!

mtremsal · 2024-03-12T16:04:51 1710259491

This is quickly becoming my favorite technical blog. Congrats Richie and Ryan. I didn't fully understand Antithesis the first time I ran into it; now it makes sense.

zellyn · 2024-03-12T21:10:58 1710277858

Hey WarpStream folks… does your blog have an atom/rss feed?

richieartoul · 2024-03-12T21:13:38 1710278018

https://www.warpstream.com/blog/rss.xml

Fomite · 2024-03-12T23:13:40 1710285220

Question from another field that does a lot of simulation - why is the assertion that deterministic simulation testing, rather than something stochastic, is the gold standard.

_dain_ · 2024-03-12T23:32:20 1710286340

[ I work at Antithesis ]

Concurrent/distributed system bugs can be really finicky because they may depend on subtle timing conditions to manifest. So you might see a bug once, then try to re-run the test using the "same" inputs, and the bug doesn't appear a second time. This might be because e.g. threads aren't scheduled the same way as before, so some 1-microsecond-wide window of vulnerability for a race condition was missed. If you can't reliably reproduce the bug, it's much harder to study and fix.

Determinism lets you perfectly reproduce the bug as many times as you want. Perfectly as in, exactly the same thread+process scheduling, exact same memory and disk access times, exact same network packet transit times and orderings .. exact same everything. Then once you have returned to the bug, you can rewind time, to do things like explore counterfactual scenarios by varying the random seed from that moment on.

We do have randomness of course, otherwise it wouldn't be a very good fuzzer. But we save all the seeds, so it's a controlled, reproducible randomness.

cpgxiii · 2024-03-13T01:21:40 1710292900

From yet another field where deterministic simulation is often a goal (robotics), the ideal is a simulation test system that is deterministic for a given initialization (e.g. a random seed) so that for an initialization that causes some error to occur, you can reliably reproduce and resolve the error. Of course, you then need to run that system with a range of initializations to have confidence that you didn't just get lucky with the initialization.

In practice, this can be quite hard to do in the presence of uncontrolled non-determinism (e.g. thread/process/GPU scheduling)* and it is often more pragmatic to invest the time in better stochastic testing and logging than deterministic reproduction.

* Yes, these can be made closer to deterministic. But doing so often comes with reduced performance, such that the system you are testing would no longer match the system being deployed, defeating much of the purpose of the test in the first place.

pests · 2024-03-14T09:31:26 1710408686

Isn't your * exactly what they are doing? And actually able to simulate faster than wall time?

figassis · 2024-03-12T16:03:28 1710259408

> Antithesis has created the holy grail for testing distributed systems: a bespoke hypervisor that deterministically simulates an entire set of Docker containers and injects faults, created by the same people who made FoundationDB.

I remember the Antithesis founder was having a hard time explaining what exactly they did.

mamidon · 2024-03-12T16:12:30 1710259950

I remember that too, the ambiguity for me was how their fuzzing was good enough to explore an arbitrary state space efficiently enough.

The deterministic hypervisor is 'simple' enough albeit a pretty heavy engineering lift.

nlavezzo · 2024-03-12T17:32:21 1710264741

One of the cool tricks we can use is that since the testing is all fully deterministic, once we find an interesting point in a test run - even if it is “deep” into the run time wise - our system can start many new branches of test runs off of that moment or moments just prior. So it is much more efficient than having to re-do the work to get to that rare interesting moment for each new branch.

azurelake · 2024-03-12T18:31:20 1710268280

I’m curious if you’re willing and able to share: Are you using FoundationDB as the data store for Antithesis?

nlavezzo · 2024-03-12T21:32:28 1710279148

We’ll be writing a lot in the near future about how Antithesis works, stay tuned :)

azurelake · 2024-03-12T22:00:48 1710280848

Can’t wait!

Klaster_1 · 2024-03-16T08:38:38 1710578318

This article and previous Antithesis ones mention testing distributed systems and, as someone who works at a company specialized in exactly this, I am excited. However, I wonder if Antithesis could help with nondeterministic failures observed in unit and integration tests I encounter in my Jasmine and TestCafe suites. Most of the time, these are quite hard to reproduce - if at all possible - and a significant portion of failures is caused by genuine application bugs. I wish there was a tool that helped with these.

fuzzy_biscuit · 2024-03-12T16:19:45 1710260385

Slightly tangential, but when I went to go look at pricing information on mobile, the rates were clipped/overflowed out of bounds.

oldstrangers · 2024-03-12T16:34:39 1710261279

Woops, what are you device details? I'll take a look!

ongy · 2024-03-12T16:56:47 1710262607

The "Fetch from follower" button is slightly broken on my Pixel6

The breakdown one looks good, but the follower seems like the background got reduced width, but the active button is moving full width

profstasiak · 2024-03-12T19:45:10 1710272710

hopefully in year 2300 we can have good way to test landing pages

eviks · 2024-03-13T05:53:56 1710309236

Hopefully a century before someone would finally invent a front end framework with robust text styling!

winrid · 2024-03-12T23:47:11 1710287231

BTW, why base pricing on r4 instances vs something more cost effective like r5?

richieartoul · 2024-03-13T00:44:29 1710290669

I think I just followed the official recommendations I found (which are probably stale now). I'll update it to r5, but it doesn't really matter. The price difference between the two is like 5%, but hardware only ends up representing a tiny fraction of Kafka's cost at scale (the real cost comes from EBS and inter-zone networking).

I could make the hardware free for Kafka in the comparison, and WarpStream would still come out significantly more cost effective. Cloud networking is really expensive.

wolframhempel · 2024-03-12T17:35:48 1710264948

I've bookmarked it, just because the site is so pretty.