Only one data point, but I find it much harder to filter out speech from background noise in my second language than I do when people are speaking my first language in a noisy environment. My guess would be that we have multiple ways of processing language that use different bits of our brains.
I also notice that even though I can't read lips, I often find it much easier to understand what people are saying when I can see them. I think part of the reason speech recognition isn't generally at a human-level is that they don't receive the same amount or kind of data that we do.
An interesting experiment might be to include a speaker's native tongue when trying to recognize their speech in a different language. I bet that speech recognition would be a lot easier on e.g. native Spanish speakers speaking English if you know to ignore, for example, the sound "e" when spoken before "st" or "sp"
My son was so good at reading lips that we'd play a game where we only mouthed words and he'd tell us what we "said". In school we let the teacher know he needed to see her say the words on a spelling test. In those younger years that was the only way he could tell the different in some sounds. The big one I remember was "th" and teaching him to look for the tongue on the front teeth.
As he is getting older he relies less on seeing lips. I'm hopeful at some point he'll outgrow most issues.