Thanks, the oversampling mention gives me a good reference to start. The model i...

nmstoker · on July 1, 2020

For that last point, forced alignment tools may be useful.

An issue to watch for though is elision: a word in a sentence can often be said differently to the individual words, eg saying "last" and "time" separately one typically includes the final t in last and yet said together, commonly it's more like "las time".

lunixbochs · on July 1, 2020

Yeah, I'm familiar with forced alignment. This is slightly nicer than the generic forced alignment, because my model has trained on the alignment of all of my training data already. My character based models already have pretty good guesses for the word alignment.

I think I'd be very cautious about it and use a model with a different architecture than the aligner to validate extracted words, and probably play with training on the data a bit to see if the resulting model makes sense or not. I do have examples of most english words to compare extracted words.

nmstoker · on July 1, 2020

Sounds like a great approach