It'd definitely be tricky to get the computational performance that you'd want out of it. I'd imagine it'd be pretty easy to accidentally bust caches.
To solve for that, you could double your frame size and store the prev/next in an alternating fashion. IE [n, p, n, p, n, p] That way when you xor you are always working with highly local memory sets. You'd want to keep the frame basically global to avoid allocating.
If you wanted to be super clever then you could probably SIMD this up doing something like [n, n, n, n, p, p, p, p]. I'm not sure how you'd turn that into RLE in SIMD. I'm not clever enough for that :D (but I'm sure someone has done it).
But as you said, complexity would definitely increase rather than decrease even though you could get much better compute time.
It'd definitely be tricky to get the computational performance that you'd want out of it. I'd imagine it'd be pretty easy to accidentally bust caches.
To solve for that, you could double your frame size and store the prev/next in an alternating fashion. IE [n, p, n, p, n, p] That way when you xor you are always working with highly local memory sets. You'd want to keep the frame basically global to avoid allocating.
If you wanted to be super clever then you could probably SIMD this up doing something like [n, n, n, n, p, p, p, p]. I'm not sure how you'd turn that into RLE in SIMD. I'm not clever enough for that :D (but I'm sure someone has done it).
But as you said, complexity would definitely increase rather than decrease even though you could get much better compute time.