It kinda seems like a model trained on multiple languages would to some extent be better at English than a model trained only on English? I mean so much of English comes from other languages, and understanding language as a concept transcends any specific language. Of course there are limits and it needs good English vocabulary and understanding, but I feel the extra languages would help rather than hinder English performance.
https://bigscience.huggingface.co/blog/building-a-tb-scale-m...