Probably because they're uploading (and playing back) from a webpage and Web Audio is weird and inconsistent, so sticking to a builtin codec is probably more reliable. As someone who trains on their data, it seems usable anyway. Training on 1000 hours of Common Voice makes my model better in very clear ways.
Yeah especially compatibility with Apple browsers was very important for them. I'd added functionality to normalize audio for verification but they removed it multiple times because it didn't work on Safari for various reasons.
In general I don't think normalization should happen at the backend. It's useful for training data to have multiple loudness levels, so that the network can understand them all.
https://caniuse.com/#search=mp3
https://caniuse.com/#search=opus
I got flac working for speech.talonvoice.com with an asm codec so they could do whatever in theory, but I do get some audio artifacts sometimes.