Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

People are allowed to recognize the realistic negative outcomes of technology, especially on a forum that frequently discusses the tradeoffs of modern, cutting edge technologies.


So many AI posts are overrun with this kind of complaining from folks with limited imaginations.

On a forum that frequently discusses technology with enthusiasm you'd think there'd be more enthusiasm and more constructive criticism instead of blanket write-offs.


I would argue that being able to see the drawbacks and potential negative externalities of a new technology is not a sign of a "limited imagination", but quite the contrary. An actual display of a limited imagination is the inability to imagine how a new technology can (and will) be abused in society by bad actors.


Developing some insight on its negative potential could demonstrate imagination, but the claim that it could be used to scam people is pretty much just rote repetition by now - an obligatory point made in every article and under every post about this tech (and not something that I think actually works out in practice the way most imagine it, since cold-call scam operations that dial numbers at a huge scale expecting most not to pick up can't really find a voice clip prior to each automated call).

As for positive applications, some I see:

* Allowing those with speech impairments to communicate using their natural voice again

* Allowing those uncomfortable with their natural voice, such as transgender people, to communicate closer to how they wish to be perceived

* Translation of a user's voice, maintaining emotion and intonation, for natural cross-language communication on calls

* Professional-quality audio from cheap microphone setups (for video tutorials, indie games, etc.)

* Doing character voices for a D&D session, audiobook, etc.

* Customization of voice assistants, such as to use a native accent/dialect

* Movies, podcasts, audiobooks, news broadcasts, etc. made available in a huge range of languages

* If integrated with something like airpods, babelfish-like automatic isolation and translation of any speech around you

* Privacy from being able to communicate online or record videos without revealing your real voice, which I think is why many (myself included) currently resort to text-only

* New forms of interactive media - customised movies, audio dramas where the listener plays a role, videogame NPCs that react with more than just prerecorded lines, etc.

* And of course: memes, satire, and parody

I appreciate HN's general view on technologies like encrypted messaging - not falling into "we need to ban this now because pedophiles could use it" hysteria. But for anything involving machine learning, I'm concerned how often the hacker mentality seems to go out the window and we instead get people advocating for it to be made illegal to host the code, for instance.


Of the 11 positive applications that you listed, only the 1st, 3rd, 11th and arguably the 4th would benefit from voice cloning, which is what's being promoted here. The rest are solved merely by (improved) TTS and do not require the cloning of any actual human voice.

Also, notice how the legitimate use-cases 1, 3 and 4 imply the user consenting to clone their own voice, which is fine. However, the only use-case which would require cloning a specific human voice belonging to a third party, use-case 11, is "memes, satire, and parody"... and not much imagination is needed to see how steep and buttery that Teflon slippery slope is.


> Of the 11 positive applications that you listed, only the 1st, 3rd, 11th and arguably the 4th would benefit from voice cloning, which is what's being promoted here. The rest are solved merely by (improved) TTS and do not require the cloning of any actual human voice.

2, 5, 6, 9: It's true that in theory all you need is some way to capture the characteristics of a desired voice, but voice-cloning methods are the way to do this currently. If you want a voice assistant with a native accent, you fine-tune on the voice of a native speaker - opposed to turning a bunch of dials manually.

7, 8, 10: Here I think there is benefit specifically from sounding like a particular person. The dynamically generated lines of movie characters/videogame NPCs should be consistent with the actor's pre-recorded lines, for instance, and hearing someone in their own voice is more natural for communication and makes conversation easier to follow.

Pedantically, what's promoted here is a tool which features voice cloning prominently but not exclusively - other workflows demonstrated (like generating subtitles) seem mostly unobjectionable.

> Also, notice how the legitimate use-cases 1, 3 and 4 imply the user consenting to clone their own voice, which is fine

I think all, outside of potentially 8 and 11, could be done with full consent of the voice being cloned - an agreement with the movie actor to use their voice for dubbing to other languages, for example. That's already a significant number of use-cases for this tool.

> use-case 11, is "memes, satire, and parody"... and not much imagination is needed to see how steep and buttery that Teflon slippery slope is.

IMO prohibition around satire/parody would be the slippery slope, particularly with the potential for selective enforcement.


This is a GitHub repo, not an article on the effects of TTS. Policy discussions at the level of the parent comment feel off topic.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: