Hacker News new | past | comments | ask | show | jobs | submit login

Not necessarily, only ~30% of the database is in English, so it likely won't be as good as a smaller model trained solely or mostly on English words.

https://bigscience.huggingface.co/blog/building-a-tb-scale-m...




It kinda seems like a model trained on multiple languages would to some extent be better at English than a model trained only on English? I mean so much of English comes from other languages, and understanding language as a concept transcends any specific language. Of course there are limits and it needs good English vocabulary and understanding, but I feel the extra languages would help rather than hinder English performance.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: