> By clicking or tapping on a point, you will hear a standardized version of the corresponding recording. The reason for voice standardization is two-fold: first, it anonymizes the speaker in the original recordings in order to protect their privacy. Second, it allows us to hear each accent projected onto a neutral voice, making it easier to hear the accent differences and ignore extraneous differences like gender, recording quality, and background noise. However, there is no free lunch: it does not perfectly preserve the source accent and introduces some audible phonetic artifacts.
> This voice standardization model is an in-house accent-preserving voice conversion model.
I'm kind of curious if it would be possible for it to use my own voice but decoupled from accent. I.e. could it translate a recording from my voice to a different accent but still with my voice. If so, I wonder if that makes it easier for accent training if you can hear yourself say things in a different accent.
That would be interesting for sure, but considering you don't hear yourself the same way someone else or a mic does, I'm not sure it would have the benefit you're expecting.
> By clicking or tapping on a point, you will hear a standardized version of the corresponding recording. The reason for voice standardization is two-fold: first, it anonymizes the speaker in the original recordings in order to protect their privacy. Second, it allows us to hear each accent projected onto a neutral voice, making it easier to hear the accent differences and ignore extraneous differences like gender, recording quality, and background noise. However, there is no free lunch: it does not perfectly preserve the source accent and introduces some audible phonetic artifacts.
> This voice standardization model is an in-house accent-preserving voice conversion model.