Thanks, the oversampling mention gives me a good reference to start.
The model itself has generalized pretty well to handle both single and multi word utterances I think, without a separate classifier, but I'm definitely not going to rule out multi-model recognition in the long run.
My main issues with single words right now are:
- The model sometimes plays favorites with numbers (ace vs eight)
- Collecting enough word-granularity training data for words-that-are-not-numbers (I've done a decent job of this so far, but it's a slow and painful process. I've considered building a frontend to turn sentence datasets into word datasets with careful alignment)
For that last point, forced alignment tools may be useful.
An issue to watch for though is elision: a word in a sentence can often be said differently to the individual words, eg saying "last" and "time" separately one typically includes the final t in last and yet said together, commonly it's more like "las time".
Yeah, I'm familiar with forced alignment. This is slightly nicer than the generic forced alignment, because my model has trained on the alignment of all of my training data already. My character based models already have pretty good guesses for the word alignment.
I think I'd be very cautious about it and use a model with a different architecture than the aligner to validate extracted words, and probably play with training on the data a bit to see if the resulting model makes sense or not. I do have examples of most english words to compare extracted words.
The model itself has generalized pretty well to handle both single and multi word utterances I think, without a separate classifier, but I'm definitely not going to rule out multi-model recognition in the long run.
My main issues with single words right now are:
- The model sometimes plays favorites with numbers (ace vs eight)
- Collecting enough word-granularity training data for words-that-are-not-numbers (I've done a decent job of this so far, but it's a slow and painful process. I've considered building a frontend to turn sentence datasets into word datasets with careful alignment)