Hacker News new | past | comments | ask | show | jobs | submit login
Unfork: The Inverse of Fork(2) (github.com/whitequark)
150 points by tta on May 29, 2021 | hide | past | favorite | 38 comments



That's very interesting. Userfaultfd opened a lot of features in Linux.

When I saw 'unfork' I thought 'someone found out how to replace the current process with a child process that was a older snapshot'. Any idea on how to do this? Like 'OK I did something wrong, let me back up a bit, kill me, replace me with my child x and refather every other one of my children processes to child x and gooo'.


Easier to do it the other way: when you fork, keep the parent as the snapshot and keep going in the child. That way you can back up by terminating and resuming in the parent. I think this is what timetravelpdb does for Python: https://github.com/TomOnTime/timetravelpdb


Yeah, thanks it was my first idea, but then 1) I'd have to re-attach everything to the child process and 2) I'd have to go one step further for every snapshot, right ? If everything goes well and I want to continue, next snapshot I'll have to fork the child process, and so on, no? The idea is to keep a somewhat short (some minutes?) of snapshots.

Thanks for the timetravelpdb link!


Here's another implementation of what's described above, it forks on each invocation of readline to provide undo for interactive interpreters. https://github.com/thomasballinger/rlundo


OK I love this, and not just because of the calls to system(nc) :-)

Your code made me think, maybe I'm tying myself into knots... But my thing is making hundred thousands of checkpoints so I'd have to have as many forks as savespoints. No way to coalesce parents, maybe reparenting could work there...


You might be interested in CRIU: https://github.com/checkpoint-restore/criu


I'm hoping somewhere deep in Google or Cloudflare someone is implementing all this in chained ebpf scriptw...


this is neat. thanks.


I would really like that function. While fork is often a bad idea, it can be an easy way of adding parallelization and backtracking to old large serial C code bases.


iirc emacs implements a generic 'unexec' function which serializes the current process state so it can be reloaded later. Seems like you could use it to make snapshots.


Thanks for the emacs pointer. Interesting. It screams of continuations.

I know how to do snapshots of a process, what with criu. But if I just want a checkpoint to go back later, not a full serialization, and I don't want to save/restore a complete process, but take advantage of fork()'s CoW to save the least possible in a stopped process, then be able to come back. The rest of fork()' semantics are a problem, with threads, sockets, signals that are not passed down. An example of that approach is perf-fuzz where they add a new syscall to make fuzzing faster.

[0] https://github.com/sslab-gatech/perf-fuzz


And now I realize I was very wrong about what unexec does/did. Wow.


Does `rr` fit the bill?


From memory rr has a substantive recording overhead, and is specially made for debugging, right? But yes it is very useful to analyse a past state and to understand a chain of events.

I should clarify my use case: I would use such a feature (go back to previous state) for a speculative execution tool. I'd execute the happy path all the time assuming no error occured, but if I found out later that something went wrong somewhere, I'd want to go back and start from there knowing what went wrong, and so on. With as little perf loss as possible. Not sure my explanation makes sense.

I know about dmtcp, criu, vm snapshots, but they all come with big overhead (I don't want to pay too much for the checkpoint).

The closest I found was @gamozolabs amazing work on snapshot fuzzing (pushing the limit of what's possible on x86_64 hardware, including using Intel PML - similar to userfaultfd but hw-accelerated...).


Rr was initially designed to reproduce flaky tests I think. They then realized that they could modify it for reverse debugging.

The recording overhead is quite acceptable, about 50%.

Rr also has a 'chaos mode' which changes the thread scheduling, and which greatly facilitates finding the 'unhappy' path.


OK thanks for the feedback on recording overhead, I'll have to try for the checkpoint/restore use case.


You might be interested by pernosco, written by the same people. It's rather spectacular.

https://pernos.co

The recording overhead is the same (it leverages rr) but you can explore your bug to your heart's content.


This is the first I've heard of userfaultfd! It is rather mind-blowing. Thank you for the comment.


There are so many features in the Linux kernel it sometimes blows my mind. eventfd, signalfd, timerfd, memfd, pidfd. The whole fricking tc/qdisc featureset (OMG). netlink. io_uring. criu. SO_REUSEPORT. Teaming. Namespaces. veths. vsocks. Dpdk/netmap/af_packet. XDP ! Seccomp.

I mean look at that https://developers.redhat.com/blog/2018/10/22/introduction-t...

Amazing.


ProlOS


OK I really tried but my Googlefu seems lacking. Any pointer, pretty please?


Oh it was mostly a pun blending prolog like backtracking at the os-process level.


I figured. I went down the rabbit hole, prolo, and prolog and hoped it wasn't a pun :-) yes I think backtracking as a general OS mechanism. Since the kernel already knows which pages differ between both processes, and could 'just' plug back any socket, file, shm, pipe 'as is' when 'restoring'. All the state is know to the OS!

Maybe something with ebpf in twelve years...


and live exploratory debugging when your fs fails


Well if I could synchronise that with lvm snapshots and go back... Once you start going the backtracking rabbit hole, there's lots of thing one can imagine.


time to make a pull-request


Wish I had the chops. Time to find money to have someone do it you mean.


who sets up the kickstarter ? :p


> Dynamic binary analysis and instrumentation of applications with built-in integrity checks. As far as I know process_vm_readv isn't even detectable [...]

So... cheat development?

Coincidentally, the "manifesto" behind the bot invasion in Team Fortress 2: https://c-v.sh/unsownriddles

At first I thought that has to be straight trolling ("Educate yourself about GNU/Linux"!), but I'm not so sure it is: https://github.com/nullworks/cathook

Either way, deeply weird to put Linux (ehm, "GNU/Linux") into the headline of your cheat that's literally only designed to make a game unplayable (https://github.com/nullworks/cathook/issues/1480).


It's deep trolling, as it doesn't even do Linux any favors. Alternative slogan; "Port your game to GNU/Linux! Gain less than 1% players and no revenue while increasing your maintenance cost in ways you couldn't imagine. Try to reverse the decision without too much bad PR or lose all your established customers, then watch it all die. Educate yourself about GNU/Linux!".


Valve probably ported their stuff to Linux as a hedge against the Windows and Mac App Stores. It barely even matters if people use it or not; they just need a way to convince Microsoft and Apple that breaking Steam isn't in their best interests.


Could also be a niche revenue stream in future if they starting build it now: A few old games (and modern ones) run better on Linux via Valve's work than they do windows, that mean's Valve can still sell them.


Just from doing a little bit of research, the problem there seems to be that the netcode in that game is pretty old and has a lot of unfixed bugs that trolls are taking advantage of, not really anything to do with Linux. If they don't want to pay people to fix it, maybe they can open source the server code so someone else can fix the bugs?


Or, DRM removal, or, malware analysis, or...


Or forward porting your old software for which you've somehow lost the source. Or helping reverse some abandoned commercial code to write a compatible replacement. Or help verifying you're getting from the compiler what you expected from your source. Probably half a dozen other things neither of us have thought up.

This is the sort of software smart people can use for fresh and novel things the designer never even intended.


Discussion from when it came out: https://news.ycombinator.com/item?id=21394678


I thought this might be something like thread.join() but I don't get it otherwise.


I think it's like getting all the updates after you forked a process. You (child) get notified of all the cow/vm changes (in your parent) and they get merged into your process space? Or is the opposite? :-)




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: