Hacker News new | past | comments | ask | show | jobs | submit login

Thanks for the emacs pointer. Interesting. It screams of continuations.

I know how to do snapshots of a process, what with criu. But if I just want a checkpoint to go back later, not a full serialization, and I don't want to save/restore a complete process, but take advantage of fork()'s CoW to save the least possible in a stopped process, then be able to come back. The rest of fork()' semantics are a problem, with threads, sockets, signals that are not passed down. An example of that approach is perf-fuzz where they add a new syscall to make fuzzing faster.

[0] https://github.com/sslab-gatech/perf-fuzz




And now I realize I was very wrong about what unexec does/did. Wow.


Does `rr` fit the bill?


From memory rr has a substantive recording overhead, and is specially made for debugging, right? But yes it is very useful to analyse a past state and to understand a chain of events.

I should clarify my use case: I would use such a feature (go back to previous state) for a speculative execution tool. I'd execute the happy path all the time assuming no error occured, but if I found out later that something went wrong somewhere, I'd want to go back and start from there knowing what went wrong, and so on. With as little perf loss as possible. Not sure my explanation makes sense.

I know about dmtcp, criu, vm snapshots, but they all come with big overhead (I don't want to pay too much for the checkpoint).

The closest I found was @gamozolabs amazing work on snapshot fuzzing (pushing the limit of what's possible on x86_64 hardware, including using Intel PML - similar to userfaultfd but hw-accelerated...).


Rr was initially designed to reproduce flaky tests I think. They then realized that they could modify it for reverse debugging.

The recording overhead is quite acceptable, about 50%.

Rr also has a 'chaos mode' which changes the thread scheduling, and which greatly facilitates finding the 'unhappy' path.


OK thanks for the feedback on recording overhead, I'll have to try for the checkpoint/restore use case.


You might be interested by pernosco, written by the same people. It's rather spectacular.

https://pernos.co

The recording overhead is the same (it leverages rr) but you can explore your bug to your heart's content.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: