While progress has been made in computer vision, that progress has been relatively narrow up until now, and I think the activation energy required to produce this level of quality would be more than it's worth. As others have mentioned, new footage comes out all the time.
However, I agree with the sentiment. Someday, we will have a massive foundation model capable of producing any video with a little conditioning on text. But we don't currently have such a model. In some sense, we're still in the era of easily verifiable video, and this era might end someday soon.