It would be handy as a starting point for ADR, automated dialog replacement. In films you don't always get what you want in production sound - maybe a plane was flying overhead as the sun was setting on the last day that your Famous Actress was available, so you make the most of the visual opportunities and accept the inadequate sound. Then You bring the actors back to a recording studio later and have them re-read their lines. On large films, a lot of dialog that you hear in the final version is recorded this way, as much as 50% in action movies (because you have all this noisy equipement going on around the set, and getting good quality sound recordings always has a lower priority. On indie films there's more location shooting and smaller post-production budgets, so you aim to minimize adr requirements, to 0% if at all possible.
Actors hate doing ADR and it's time-consuming and annoying for editors. This wouldn't automatically solve the problem because you wouldn't have a good match between dialog recorded in different acoustic environments, but it does have the potential to save a lot of grunt work, especially for background dialog where you can compromise on quality a bit.
Also, in post-production you often find yourself wanting to edit just one or two words in a scene and you'd rather not bring the actor back for such a small problem, so you look for other scenes and other takes of the same scene where the same word or syllable appears, and do a little cut-and-paste and blending, the audio equivalent of photoshop retouching. It would be very useful for that.
I assume this will be useful to data scientists who want to process lyrics? what other intended/near-at-hand use cases are there?