Software can be done at utterly stupid multiples of realtime at SD resolutions w...

Software can be done at utterly stupid multiples of realtime at SD resolutions with only a few cores, depending on your quality target. Cores are very cheap

Fancy GPUs tend to support 8 or more HD streams, even consumer cards using patched drivers.

Then you have dedicated accelerator hardware, these can pack a tremendous amount of transcode into a tiny package. For example on AWS you have vt1 instances which support 8 (or 16?) simultaneous full HD/SD/QHD ladders at 2x realtime for around $200/mo.

In answer to your actual question, at least YouTube selectively transcodes using fancier/more specific methods according to the popularity of the content. They do the cheap thing in bulk and the high quality thing for the 1% of content folk actually watch