If you consider this a two sided game, then you can have the fakes quickly become so good that hardly anyone can tell.
And I'm not just talking about image fakes. You can have MCTS find the best arguments for ludicrous statements, and other paths to justify fake things that look just like real arguments.
That is the point of a GAN. We’re missing something about the problem space and/or human cognition or their output would already be indistinguishable from reality to all humans.
Yep, they're a 3D reconstruction of points from a 4D space a (massively simplifying the fact that the 2D video frames themselves are high-dimensional data) and that's just the video. Bring a 3D camera into the game and the underlying distribution and see if hilarity ensues IMO.
Furthermore, if the Generator outpaces the Discriminator by too much then the generator stops learning, or, worse, degenerates into mode collapse. Generators and Discriminators have to be close to each other in capability for either to get anywhere.
It's feasible to design a reinforcement learning based network that uses output from a non-differentiable deepfake detector as a component of the loss function.
And can you imagine the impact of false positives? Prosecutor has the perp on video committing a heinous offense but it's still insufficient evidence because of a software bug.