Thanks for sharing this project. What do you think of the data with Mozilla Common Voice? The random sampling I looked at a while back seemed pretty poor -- background noise, stammering, delays in beginning the speaking, etc.
I was hoping to use it as a good training base, but the issues I encountered made me wary that the data quality would adversely affect any outcomes.
Depending on your objective, noisy data might be useful.
I'd like LibreASR to also work in noisy environments, so training on data that is noisy should already help a bit with that.
But yeah - stammering and delays are present not only in Common Voice but also Tatoeba and YouTube.
I was hoping to use it as a good training base, but the issues I encountered made me wary that the data quality would adversely affect any outcomes.