Unfork: The Inverse of Fork(2)

touisteur · on May 29, 2021

That's very interesting. Userfaultfd opened a lot of features in Linux.

When I saw 'unfork' I thought 'someone found out how to replace the current process with a child process that was a older snapshot'. Any idea on how to do this? Like 'OK I did something wrong, let me back up a bit, kill me, replace me with my child x and refather every other one of my children processes to child x and gooo'.

remram · on May 29, 2021

Easier to do it the other way: when you fork, keep the parent as the snapshot and keep going in the child. That way you can back up by terminating and resuming in the parent. I think this is what timetravelpdb does for Python: https://github.com/TomOnTime/timetravelpdb

touisteur · on May 29, 2021

Yeah, thanks it was my first idea, but then 1) I'd have to re-attach everything to the child process and 2) I'd have to go one step further for every snapshot, right ? If everything goes well and I want to continue, next snapshot I'll have to fork the child process, and so on, no? The idea is to keep a somewhat short (some minutes?) of snapshots.

Thanks for the timetravelpdb link!

thomasballinger · on May 29, 2021

Here's another implementation of what's described above, it forks on each invocation of readline to provide undo for interactive interpreters. https://github.com/thomasballinger/rlundo

touisteur · on May 29, 2021

OK I love this, and not just because of the calls to system(nc) :-)

Your code made me think, maybe I'm tying myself into knots... But my thing is making hundred thousands of checkpoints so I'd have to have as many forks as savespoints. No way to coalesce parents, maybe reparenting could work there...

IanGabes · on May 29, 2021

You might be interested in CRIU: https://github.com/checkpoint-restore/criu

touisteur · on June 6, 2021

I'm hoping somewhere deep in Google or Cloudflare someone is implementing all this in chained ebpf scriptw...

cpufry · on June 3, 2021

this is neat. thanks.

CJefferson · on May 29, 2021

I would really like that function. While fork is often a bad idea, it can be an easy way of adding parallelization and backtracking to old large serial C code bases.

snarfy · on May 29, 2021

iirc emacs implements a generic 'unexec' function which serializes the current process state so it can be reloaded later. Seems like you could use it to make snapshots.

touisteur · on May 29, 2021

Thanks for the emacs pointer. Interesting. It screams of continuations.

I know how to do snapshots of a process, what with criu. But if I just want a checkpoint to go back later, not a full serialization, and I don't want to save/restore a complete process, but take advantage of fork()'s CoW to save the least possible in a stopped process, then be able to come back. The rest of fork()' semantics are a problem, with threads, sockets, signals that are not passed down. An example of that approach is perf-fuzz where they add a new syscall to make fuzzing faster.

[0] https://github.com/sslab-gatech/perf-fuzz

touisteur · on May 29, 2021

And now I realize I was very wrong about what unexec does/did. Wow.

the8472 · on May 29, 2021

Does `rr` fit the bill?

touisteur · on May 29, 2021

From memory rr has a substantive recording overhead, and is specially made for debugging, right? But yes it is very useful to analyse a past state and to understand a chain of events.

I should clarify my use case: I would use such a feature (go back to previous state) for a speculative execution tool. I'd execute the happy path all the time assuming no error occured, but if I found out later that something went wrong somewhere, I'd want to go back and start from there knowing what went wrong, and so on. With as little perf loss as possible. Not sure my explanation makes sense.

I know about dmtcp, criu, vm snapshots, but they all come with big overhead (I don't want to pay too much for the checkpoint).

The closest I found was @gamozolabs amazing work on snapshot fuzzing (pushing the limit of what's possible on x86_64 hardware, including using Intel PML - similar to userfaultfd but hw-accelerated...).

Agingcoder · on May 29, 2021

Rr was initially designed to reproduce flaky tests I think. They then realized that they could modify it for reverse debugging.

The recording overhead is quite acceptable, about 50%.

Rr also has a 'chaos mode' which changes the thread scheduling, and which greatly facilitates finding the 'unhappy' path.

touisteur · on May 29, 2021

OK thanks for the feedback on recording overhead, I'll have to try for the checkpoint/restore use case.

Agingcoder · on May 30, 2021

You might be interested by pernosco, written by the same people. It's rather spectacular.

https://pernos.co

The recording overhead is the same (it leverages rr) but you can explore your bug to your heart's content.

mattgreenrocks · on May 29, 2021

This is the first I've heard of userfaultfd! It is rather mind-blowing. Thank you for the comment.

touisteur · on May 29, 2021

There are so many features in the Linux kernel it sometimes blows my mind. eventfd, signalfd, timerfd, memfd, pidfd. The whole fricking tc/qdisc featureset (OMG). netlink. io_uring. criu. SO_REUSEPORT. Teaming. Namespaces. veths. vsocks. Dpdk/netmap/af_packet. XDP ! Seccomp.

I mean look at that https://developers.redhat.com/blog/2018/10/22/introduction-t...

Amazing.

agumonkey · on May 29, 2021

ProlOS

touisteur · on May 29, 2021

OK I really tried but my Googlefu seems lacking. Any pointer, pretty please?

agumonkey · on May 29, 2021

Oh it was mostly a pun blending prolog like backtracking at the os-process level.

touisteur · on May 29, 2021

I figured. I went down the rabbit hole, prolo, and prolog and hoped it wasn't a pun :-) yes I think backtracking as a general OS mechanism. Since the kernel already knows which pages differ between both processes, and could 'just' plug back any socket, file, shm, pipe 'as is' when 'restoring'. All the state is know to the OS!

Maybe something with ebpf in twelve years...

agumonkey · on May 29, 2021

and live exploratory debugging when your fs fails

touisteur · on May 29, 2021

Well if I could synchronise that with lvm snapshots and go back... Once you start going the backtracking rabbit hole, there's lots of thing one can imagine.

agumonkey · on May 29, 2021

time to make a pull-request

touisteur · on May 29, 2021

Wish I had the chops. Time to find money to have someone do it you mean.

agumonkey · on May 29, 2021

who sets up the kickstarter ? :p

formerly_proven · on May 29, 2021

> Dynamic binary analysis and instrumentation of applications with built-in integrity checks. As far as I know process_vm_readv isn't even detectable [...]

So... cheat development?

Coincidentally, the "manifesto" behind the bot invasion in Team Fortress 2: https://c-v.sh/unsownriddles

At first I thought that has to be straight trolling ("Educate yourself about GNU/Linux"!), but I'm not so sure it is: https://github.com/nullworks/cathook

Either way, deeply weird to put Linux (ehm, "GNU/Linux") into the headline of your cheat that's literally only designed to make a game unplayable (https://github.com/nullworks/cathook/issues/1480).

swinglock · on May 29, 2021

It's deep trolling, as it doesn't even do Linux any favors. Alternative slogan; "Port your game to GNU/Linux! Gain less than 1% players and no revenue while increasing your maintenance cost in ways you couldn't imagine. Try to reverse the decision without too much bad PR or lose all your established customers, then watch it all die. Educate yourself about GNU/Linux!".

notriddle · on May 29, 2021

Valve probably ported their stuff to Linux as a hedge against the Windows and Mac App Stores. It barely even matters if people use it or not; they just need a way to convince Microsoft and Apple that breaking Steam isn't in their best interests.

mhh__ · on May 30, 2021

Could also be a niche revenue stream in future if they starting build it now: A few old games (and modern ones) run better on Linux via Valve's work than they do windows, that mean's Valve can still sell them.

_d7dt · on May 29, 2021

Just from doing a little bit of research, the problem there seems to be that the netcode in that game is pretty old and has a lot of unfixed bugs that trolls are taking advantage of, not really anything to do with Linux. If they don't want to pay people to fix it, maybe they can open source the server code so someone else can fix the bugs?

somebodythere · on May 29, 2021

Or, DRM removal, or, malware analysis, or...

cestith · on May 29, 2021

Or forward porting your old software for which you've somehow lost the source. Or helping reverse some abandoned commercial code to write a compatible replacement. Or help verifying you're getting from the compiler what you expected from your source. Probably half a dozen other things neither of us have thought up.

This is the sort of software smart people can use for fresh and novel things the designer never even intended.

saagarjha · on May 29, 2021

Discussion from when it came out: https://news.ycombinator.com/item?id=21394678

sys_64738 · on May 29, 2021

I thought this might be something like thread.join() but I don't get it otherwise.

touisteur · on May 29, 2021

I think it's like getting all the updates after you forked a process. You (child) get notified of all the cow/vm changes (in your parent) and they get merged into your process space? Or is the opposite? :-)