There's indeed a connection between record/replay and deterministic execution, but there's a difference worth mentioning, too. Both can tell you about the past, but only deterministic execution can tell you about alternate histories. And that's very valuable both for bug search (fuzzing works better) and for debugging (see for example the graphs where we show when a bug became likely to occur, seconds before it actually occurred).
(Also, you won't be able to usefully record a hypervisor with jockey or rr, because those operate in userspace and the actual execution of guest code does not. You could probably record software cpu execution with qemu, but it would be slow)
I have been down this road a little bit, applying the ideas from jockey to write and ship a deterministic HFT system, so I have some understanding of the difficulties here.
We needed that for fault tolerance, so we could have a hot synced standby. We did have to record all inputs (and outputs for sanity checking) though.
We did also get a good taste of the debugging superpowers you mention in your blog article. We could pull down a trace from a days trading and replay on our own machines, and skip back and forth in time and find the root cause of anything.
It sounds like what you have done is something similar, but with your own (AMD64) virtual machine implementation, making it fully deterministic and replayable, and providing useful and custom hardware impls (networking, clock, etc).
That sounds like a lot of hard but also fun work.
I am missing something though, in that you are not using it just for lockstep sync or deterministic replays, but you are using it for fuzzing. That is, you are altering the replay somehow to find crashes or assertion failures.
Ah, I think perhaps you are running a large number of sims with a different seed (for injecting faults or whatnot) for your VM, and then just recording that seed when something fails.
I assume deterministic execution also lets you do failing test case reduction.
I've found this sort of high volume random testing w. test case reduction is just a game changer for compiler testing, where there's much the same effect at quickly flushing out newly introduced bugs.
https://www.cs.purdue.edu/homes/xyzhang/spring07/Papers/HPL-...