Nit: not for a day, more like 8 hours, and that's because we were lazy and somebody said he "just happened" to have a cluster with unbalanced resources (mainly used for deep learning, but all GPUs occupied with quite a lot CPUs / RAMs left), so we decided to brute force the last 16 bits :)
Also, the challenge host left useful state (which bit was flipped) in registers before running teams' code, without this I'm not sure if it is even possible.
Sure, all's fair in a CTF. That story came to me through the mouths of at least a handful of people, who might have a bit of an incentive to exaggerate given that they hadn't quite been able to get to zero and might be a just a little sour :P
The state was quite helpful, yes–for x86 it seems like a "clean slate" shellcode would be quite difficult, if impossible, to achieve as we saw. However, I am left wondering how other ISAs would fare…perhaps worse, since x86 is notoriously dense. But maybe not? The fixed-width ones would probably be easy to try out, at least.
Maybe being notoriously dense is not a bad thing? While those ModRM bytes popping up everywhere is annoying as f* (too easy to flip an instruction into a form with almost-guaranteed-to-be-invalid memory access), at least due to the density there won't be reserved bits. For example, in AArch64 if bit 28 and bit 27 is both zero the instruction will almost certainly be an invalid one (hitting unallocated area), and with a single bit flip all branch instructions will have [28:27] = b'00...
Right, I was saying that the other ISAs would do wore because they aren't as dense and will hit something undefined much more readily. But the RISCs in general are much less likely to touch memory (only if you do a load/store from a register that isn't clean, maybe). From a glance, MIPS looks like it might work, since the opcode field seems to use all the bits and the remaining bits just encode reg/func/imm in various ways. The one caveat I see is that I think the top bit of opcode seems to encode memory accesses, so you may be forced to deal with at least one.
Also, the challenge host left useful state (which bit was flipped) in registers before running teams' code, without this I'm not sure if it is even possible.