This is what I was thinking as well. Single address space OSes are actually quite powerful, and fork is 'trivial' if you have the correct page protection mechanisms. Basically a fork consists of starting a new process context, and each page that gets modified, gets a new page where internal pointers in the page are updated and external pointers are left alone. The trick there is managing a list of relocatable references for every page with tags of 'internal' or 'external' to facilitate fixups.
In particular, a single address space means that if you copy a page on write, you also have to copy any page that points to it, so that you can fix up pointers to it. And that's not transparent to userspace. With virtual address spaces, you can copy a page in physical memory without changing its virtual address.
Then again, many of the same considerations apply when using ASLR.
In practice, it'd be quite different because with ASLR, the compiler records the locations of all address references in the binary, so the OS knows what to fix up. At runtime, though, C programs normally leave no indication in memory of what is a pointer and what is pure data; even if you changed the compiler to emit this and the allocator to track it, you'd have problems with fairly common constructions like custom allocators, unions where it may be nontrivial to determine which alternative is in use (especially since the data may actually be uninitialized), tagged pointers, hashes based on the pointer value, et cetera. Garbage collectors for C run into the same problems and forbid some of those constructions, but they can always fall back on not collecting an allocation if they're unsure whether a reference to it is a real pointer or just an integer with the same value. If you're actually relocating things, you can't risk accidentally changing the value of some integer.
Honestly, I don't expect such a scheme to be implemented, considering how complicated (and limited) it would be. It sounds more realistic to implement a scheme that implements the full semantics even if at a severe performance cost, as suggested in the article, and expect ported Mill programs to stop using fork without exec for anything important.
You could drop the copy-on-write mechanism and just use copy-always for fork and have a base register which offsets all pointer access (analogous to the mostly disused SS register on x86). There could be some mechanism to detect fork-then-exec for performance.
Yep. You need some mechanism for running several times the same code anyway. Anything that solves that, solves the problem of forking without copy-on-write.
The memory model is one of the features of the Mill that I'm most curious about (the other one being access control).
I can guess: on kernel process switch, rewrite the TLB entry 'manually'. I used to have to do that on old 8086 OS for software interrupt vectors (because no TLB).