It's worse than that; the input itself is only 8-bit and they scale up, and then back down again to 8-bit on the output side.
But it works, for the same reason that audio engineers use 64-bit signal pipelines inside their DAW even though nearly all output equipment (and much input equipment) is 16-bit which is already at the limit of human perception.
If you have a 16-bit signal path, then every device on the signal path gets 16 bits of input and 16-bits of output. So every device in the path rounds to the nearest 16-bit value, which has an error of +/- 0.5 per device.
However if you do a lot of those back to back they accumulate. If you have 32 steps in your signal path and each is +/- 0.5 then the total is +/- 16. Ideally some of them will cancel out, but in the worst case it's actually off by 16. "off by 16" is equivalent to "off by lg2(16)=4 bits". So now you don't have 16-bit audio, you have 12-bit audio, because 4 of the bits are junk. And 12-bit audio is no longer outside the limit of human perception.
Instead if you do all the math at 64-bit you still have 4 bits of error but they're way over in bits 60-64 where nobody can ever hear them. Then you chop down to 16-bit at the very end and the quality is better. You can have a suuuuuuuper long signal path that accumulates 16 or 32 or 48 bits of error and nobody notices because you still have 16 good bits.
tl;dr rounding errors accumulate inside the encoder
> 16-bit which is already at the limit of human perception
Nitpick: 16-bit fixed point is not at the limit of human perception. It's close, but I think 18-bit is required for fixed point. Floating point is a different issue.
> If you have 32 steps in your signal path and each is +/- 0.5 then the total is +/- 16.
Uncorrelated error doesn't accumulate like that. It accumulates as RSS (root of sum of squares). So, sqrt(32 * (.5 * .5)) which is about 2.82 (about 1-2 bits).
> You can have a suuuuuuuper long signal path that accumulates 16 or 32 or 48 bits of error and nobody notices because you still have 16 good bits.
Generally the thing which causes audible errors are effects like reverb, delay, phasors, compressors, etc. These are non-linear effects and consequently error can multiply and wind up in the audible range. Because error accumulates as RSS, it's really hard to get error to additively appear in the audible range.
tl;dr recording engineers like to play with non-linear effects which can eat up all your bits
> > 16-bit which is already at the limit of human perception
> Nitpick: 16-bit fixed point is not at the limit of human perception. It's close, but I think 18-bit is required for fixed point. Floating point is a different issue.
16-bit with shaped dither should be good enough to cover human perception.
This is the same reason try and keep post production workflows at 10bit or better (my programs are all 32bit floating point). A lot of cameras are capable of 16bit for internal processing but are limited to 8 or 10bit for encoding (outside some raw solutions). An ideal workflow is that raw codec (though it's often a 10bit file instead of raw) going straight to color (me working at 32) and then I deliver at 10bit from which 8bit final delivery files (outside of theatrical releases which work off 16bit files and incidentally use 24bit for the audio) are generated. So all that makes sense to me.
I was mostly curious why people were converting what I assume are 8bit files into 10bit. The responses below about the bandwidth savings and/or quality increase on that final compressed version seem to be what I missing!
But it works, for the same reason that audio engineers use 64-bit signal pipelines inside their DAW even though nearly all output equipment (and much input equipment) is 16-bit which is already at the limit of human perception.
If you have a 16-bit signal path, then every device on the signal path gets 16 bits of input and 16-bits of output. So every device in the path rounds to the nearest 16-bit value, which has an error of +/- 0.5 per device.
However if you do a lot of those back to back they accumulate. If you have 32 steps in your signal path and each is +/- 0.5 then the total is +/- 16. Ideally some of them will cancel out, but in the worst case it's actually off by 16. "off by 16" is equivalent to "off by lg2(16)=4 bits". So now you don't have 16-bit audio, you have 12-bit audio, because 4 of the bits are junk. And 12-bit audio is no longer outside the limit of human perception.
Instead if you do all the math at 64-bit you still have 4 bits of error but they're way over in bits 60-64 where nobody can ever hear them. Then you chop down to 16-bit at the very end and the quality is better. You can have a suuuuuuuper long signal path that accumulates 16 or 32 or 48 bits of error and nobody notices because you still have 16 good bits.
tl;dr rounding errors accumulate inside the encoder