Thanks for sharing this project. What do you think of the data with Mozilla Comm... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

forgingahead on Nov 16, 2020 | parent | context | favorite | on: Show HN: LibreASR – An On-Premises, Streaming Spee...

Thanks for sharing this project. What do you think of the data with Mozilla Common Voice? The random sampling I looked at a while back seemed pretty poor -- background noise, stammering, delays in beginning the speaking, etc.

I was hoping to use it as a good training base, but the issues I encountered made me wary that the data quality would adversely affect any outcomes.

iceychris on Nov 16, 2020 [–]

Depending on your objective, noisy data might be useful. I'd like LibreASR to also work in noisy environments, so training on data that is noisy should already help a bit with that. But yeah - stammering and delays are present not only in Common Voice but also Tatoeba and YouTube.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact