A suggestion and some surprise: I’m surprised by your assertion that there’s no clustering. I see the representation shows no clustering, and believe you that there is therefore no broad high-dimensional clustering. I also agree that the demo where Victor’s voice moves closer to Eliza’s sounds more native.
But, how can it be that you can show directionality toward “native” without clustering? I would read this as a problem with my embedding, not a feature. Perhaps there are some smaller-dimensional sub-axes that do encode what sort of accent someone has?
Suggestion for the BoldVoice team: if you’d like to go viral, I suggest you dig into American idiolects — two that are hard not to talk about / opine on / retweet are AAVE and Gay male speech (not sure if there’s a more formal name for this, it’s what Wikipedia uses).
I’m in a mixed race family, and we spent a lot of time playing with ChatGPT’s AAVE abilities which have, I think sadly, been completely nerfed over the releases. Chat seems to have no sense of shame when it says speaking like one of my kids is harmful; I imagine the well intentioned OpenAI folks were sort of thinking the opposite when they cut it out. It seems to have a list of “okay” and “bad” idiolects baked in - for instance, it will give you a thick Irish accent, a Boston accent, a NY/Bronx accent, but no Asian/SE Asian accents.
I like the idea of an idiolect-manager, something that could help me move my speech more or less toward a given idiolect. Similarly England is a rich minefield of idiolects, from scouse to highly posh.
I’m guessing you guys are aimed at the call center market based on your demo, but there could be a lot more applications! Voice coaches in Hollywood (the good ones) charge hundreds of dollar per hour, so there’s a valuable if small market out there for much of this. Thanks for the demo and write up. Very cool.
(Minor nitpick, but I think "dialect" is a more appropriate word than "idiolect" here—at least according to Wikipedia, "idiolect" refers to a single person's way of speaking, whereas AAVE et al. are shared and are therefore considered dialects.)
OK, good read for me here. Based on your feedback and some research, I think I should have use ‘sociolect’ for both in that I was less complaining about ChatGPT’s unwillingness to use, say, finna, in a sentence, and more complaining about the vocalized accents. Anyway good catch, thanks!
Sociolect is the right term for a dialect used by a particular social group. A related idea is "register" when multiple related and mutually understandable standards exist, and are used in different contexts.
A suggestion and some surprise: I’m surprised by your assertion that there’s no clustering. I see the representation shows no clustering, and believe you that there is therefore no broad high-dimensional clustering. I also agree that the demo where Victor’s voice moves closer to Eliza’s sounds more native.
But, how can it be that you can show directionality toward “native” without clustering? I would read this as a problem with my embedding, not a feature. Perhaps there are some smaller-dimensional sub-axes that do encode what sort of accent someone has?
Suggestion for the BoldVoice team: if you’d like to go viral, I suggest you dig into American idiolects — two that are hard not to talk about / opine on / retweet are AAVE and Gay male speech (not sure if there’s a more formal name for this, it’s what Wikipedia uses).
I’m in a mixed race family, and we spent a lot of time playing with ChatGPT’s AAVE abilities which have, I think sadly, been completely nerfed over the releases. Chat seems to have no sense of shame when it says speaking like one of my kids is harmful; I imagine the well intentioned OpenAI folks were sort of thinking the opposite when they cut it out. It seems to have a list of “okay” and “bad” idiolects baked in - for instance, it will give you a thick Irish accent, a Boston accent, a NY/Bronx accent, but no Asian/SE Asian accents.
I like the idea of an idiolect-manager, something that could help me move my speech more or less toward a given idiolect. Similarly England is a rich minefield of idiolects, from scouse to highly posh.
I’m guessing you guys are aimed at the call center market based on your demo, but there could be a lot more applications! Voice coaches in Hollywood (the good ones) charge hundreds of dollar per hour, so there’s a valuable if small market out there for much of this. Thanks for the demo and write up. Very cool.