You are missing compute aspect. R2 only solves the egress problem. YouTube accepts many formats and tries to stores them efficiently such that it makes seeking different timestamps in the video possible (and efficient). They perfected the art of encoding. Read about it.
I'm sure that infrastructure is valuable and useful, but might not be necessary. We had a simple encoder farm back in my kink.com days and it was a rounding error in our budget. I can only imagine it's cheaper and easier almost 20 years later.