Lyebird: Clone Your Voice

diegorbaquero · on Sept 9, 2020

Get started takes you to download, download gives you and installer, app requires sign up, after sign up you go to Overdub and, oh, it's a paid feature.

Remove installer, delete app.

Ruhrbaron · on Sept 9, 2020

I found it pretty disturbing that an installer was immediately downloaded on clicking the 'Get Started' button. Won't use.

outsidetheparty · on Sept 9, 2020

Thank you, that saved me some time!

echelon · on Sept 9, 2020

Show HN:

I made https://vo.codes as a side project to democratize text to speech. It's got lots of celebrities, politicians, cartoon characters, and tech figures (Paul Graham, Sam Altman, Mark Zuckerberg, et al)

Kids on YouTube found it and are making incredible music videos with it.

Homer Simpson and SpongeBob cover Money Machine: https://youtu.be/iBpqJF5LXX4

SpongeBob covers WAP (NSFW!) : https://youtu.be/dSgd4PoQofQ

SpongeBob covers 6IX9INE : https://youtu.be/IKs5iWVRE94

I'm planning to do the same thing as these folks after I finish my real time voice conversion system. It's a killer app for Discord and TeamSpeak - you can talk in real time as Gilbert Gottfried, SpongeBob, or Donald Trump.

JayStavis · on Sept 9, 2020

text2speech and speech2speech seem like very different problems no? Especially when so much of someone's voice identity is based on speech patterns, contextual inflection, rhythm/cadence, how would realtime work? Just wondering if you are thinking about realtime as in <10ms or realtime as in 1 second?

Awesome progress by the way, would totally follow along more closely on this project through an eng diary or mailing list.

ValentineC · on Sept 9, 2020

Neat stuff!

What would it take to get Morgan Freeman's voice onto it?

echelon · on Sept 9, 2020

Thanks!

I made one attempt at Morgan Freeman but I didn't have enough sample data. He narrated a few audio books, but they've either got background music that ruins the training or he collaborated with other speakers.

What data I gathered I obtained from interviews on YouTube. Unfortunately they're all so short that it's a lot of work to extract meaningful training data.

I really want his voice. I'll definitely get back to it.

ValentineC · on Sept 11, 2020

If you need training data and don't mind it being from grey-area sources, I can help with supplying the audio tracks from some of his narration or acting work. Email's in my profile.

failrate · on Sept 10, 2020

You really just need to get it to say "Morgan Freeman" over and over again.

one_comment · on Sept 9, 2020

Would it be possible to have shareable links? Like vo.codes/?speaker=Bart&text=I'm+out

echelon · on Sept 9, 2020

Absolutely!

I want to add an API too.

andygcook · on Sept 9, 2020

I signed up for Descript and did the Overdub training a few weeks ago. It's impressive, albeit not a perfect match for my actual voice. You train the voice by reading 30 minutes of Wizard of Oz using a high quality microphone. I used a Yeti in a closet at my house. I do plan to record the additional hour of supplemental reading which might help make it more authentic.

(Side note - I didn't know Dorothy's shoes were actual silver in the book)

My use case for Overdub is to quickly correct errors in recorded demo/how-to screencast videos for my startup without having to do redo the audio.

Here's a sample of the Overdub output vs. my actual recorded voice if you're curious: https://web.descript.com/084a416c-57d4-4df8-980b-24e0df82532...

jeremyw · on Sept 9, 2020

Thanks for the sample. There's a flatness to the audio in recreation, but the major difference is lack of inflection. I imagine the latter might be tunable in a future incarnation.

andygcook · on Sept 9, 2020

You can record your voice in different styles, which I think would help with inflection. I haven't had enough time to really experiment with it yet. The starting point is definitely an impressive effort on the Overdub/Descript team's side.

kundan2510 · on Sept 9, 2020

How does it compare with the training data you submitted to create your overdub voice? Sometimes, the voice sounds different because of the recording conditions, room tone, etc. and overdub voice copies these environmental factors as well.

andygcook · on Sept 9, 2020

Here's a sample clip of the training audio: https://web.descript.com/afdff9c8-17d0-41bb-84ec-77709120709...

I recorded on a high quality microphone in a small closest with a foam mattress pad and blankets surrounding me. It was pretty soundproof. Upon re-listening to the audio, it does sound a little tinny, which likely has to do with the microphone settings.

I think the issue with the monotone is that my reading voice is different from my talking voice. When I'm freely talking, I'm more dramatic. When I'm reading, I'm less so because the only time I read out loud nowadays is to children before bedtime.

I do plan to resubmit the training script with a different style, but recording the full audio was admittedly tiring so I haven't sent that in yet. I'm sure with some experiment I could get it closer to reality.

I did get some helpful advice from someone on the Descript team on how to use punctuation and misspellings to fix some of the intonation. Overall, it's really impressive technology and I can think of a few different uses for my business for my own voice.

prophesi · on Sept 9, 2020

Before Lyrebird was bought by Descript, this was a simple API endpoint. You could generate short MP3's of your digital voice reading any text of your choosing, for free.

I used it extensively in one of my projects, only to find everything broken after leaving it alone for a few months.

All of that to say, are there any alternatives out there HN knows of?

This isn't an attack on Descript, either. The Lyrebird purchase made a lot of sense, and they've built an amazing product with the technology.

happycry · on Sept 9, 2020

Hey, founder of Resemble (https://resemble.ai) here. We're building a custom voice cloning service with APIs and controllability. Shoot me a message if you want to build a voice from raw MP3s.

derekja · on Sept 9, 2020

That's great! I'm glad to see any competition in the space. When I first tried Lyrebird a year or 18 months ago the quality wasn't great. It was OK and I could hear a resemblance to the voice it was meant to create, but there were hisses and fuzzy sections and it just wasn't long term listenable. Recently the examples I've heard from Descript have sounded really crisp. Keep on the path you're on! I like your business model better, but from the examples on your website the voices are still not quite as refined. Now what I'd really like is a local means of generating great TTS of my own voice, but that's a different offering.

prophesi · on Sept 9, 2020

Wow, this is perfect! It even has a Unity plugin. I'll certainly give Resemble a whirl.

Dramatize · on Sept 9, 2020

Zohaib and I can always be found on these threads :)

The two main non Google/Amazon competitors are https://www.resemble.ai/ and https://replicastudios.com/

kundan2510 · on Sept 9, 2020

Hey! Lyrebird founder here! The simple API endpoint is work in progress. Please fill this form to get an early access: https://descript.typeform.com/to/Phn5zATR

prophesi · on Sept 9, 2020

Thanks kundan2510! I sent in a submission. Your API endpoint worked really well for me in the past, so your service will certainly take precedence.

blacksmith_tb · on Sept 9, 2020

Yes, I see to register and install you agree to these TOCs: https://www.descript.com/terms

Not the worst I have ever seen, but it is pretty clear that they will be using everything you do to improve their product, which doesn't seem absurd, given that you get to use it for free. But it might make you think twice about having it say anything you wouldn't want stored forever / passed around the office.

Riccardo_G · on Sept 9, 2020

Hey Prophesi, on Replica (https://replicastudios.com/) you can generate voices from our stock library for free, and of course you can create your own Replica voice too.

If you have more audio data you would want to upload to improve the quality of the voice, you can also get in contact with the team and they will help you out!

You can also access your own voice via the API as well as other stock voices.

failrate · on Sept 9, 2020

OP: "lyrebird" is misspelled in the title.

netsec_burn · on Sept 9, 2020

The live demo is incredible. I've never heard TTS sound so realistic, when did it get this good?

cblconfederate · on Sept 9, 2020

Actually TTS voices have been great since years ago (e.g. https://www.nuance.com/omni-channel-customer-engagement/voic...) , these new neural things make them a lot more expressive though.

summitsummit · on Sept 9, 2020

agreed. it is surprisingly exceptionally good at profanities and vulgar languages.

summarity · on Sept 9, 2020

Download page throws this error, might want to look into that. I guess it benefits the user that it's broken.

> Upscope.io: You have exceeded your Upscope subscription usage limits. We will collect data again once usage falls back within your subscription's limits.

cblconfederate · on Sept 9, 2020

I wonder who will make somethingn like this a downloadable, locally executable program.

yonixw · on Sept 9, 2020

Since you probably need an high end Nvidia GPU at scale - Self hosted is more likely

kundan2510 · on Sept 9, 2020

We just launched a bunch of built-in stock voices as well: this is how they sound: https://twitter.com/andrewmason/status/1303384858249494529?s...

All these voices have been created via our voice cloning pipeline. Something to note: the generated audio is 44.1kHz and super crisp, ready to be useful for voice-overs or editorial corrections. Let me know if you have any questions about the tech. (I am one of the lyrebird founders.)

obiefernandez · on Sept 9, 2020

I would love to use this to "record" voiceovers for my weekly radio show. Would fix one of the least enjoyable parts of the process, which is actually recording the voiceovers.

shannifin · on Sept 9, 2020

I have not yet had any serious use cases with this tech, but I've enjoyed playing around with it and I love its possibilities.

(A while back, as an experiment, I recorded samples with a variety of silly character voices and fake accents; when the end result merged them into one, it sounded hilarious.)

Riccardo_G · on Sept 9, 2020

Hi Shannifin, at Replica (https://replicastudios.com/) we have a lot of serious use cases from our customers and increasingly more uses as the tech has been improving and the product becomes easier to use!

Currently we have seen users develop pretty much anything such as: games from indie to AAA, animations and other creative clips, audio books and spoken stories of all types, enterprise coaching videos, ads, and much more!

coupdejarnac · on Sept 9, 2020

I'll use this to dub over videos I'm making.

singhrac · on Sept 9, 2020

Voice acting for ... games, audiobooks, etc?

shannifin · on Sept 10, 2020

I think the tech still needs a bit more polish to replace great voice acting, but it's definitely an exciting possibility! (Although I've definitely heard some voice acting and audiobook narration (libravox) that's pretty bad and this tech already beats!)

keenmaster · on Sept 9, 2020

Cloning your voice to TTS reader software so that your kids or loved ones can read/listen using your voice.

shannifin · on Sept 10, 2020

That would be awesome

russfink · on Sept 9, 2020

Robocalling, with some interactivity. "Hello! This is Camp Ayne running for city council. How are your kids Doris and Eugene?"

inetknght · on Sept 9, 2020

That's going to be completely indistinguishable from phishing. How would you protect yourself from that?

kevin_thibedeau · on Sept 9, 2020

Lenny will tie them up.

shannifin · on Sept 9, 2020

Or use it with GPT-3 to answer phone calls I don't want to take...

blastro · on Sept 9, 2020

Shoutout to Andrew Mason - I'm glad to see Detour/Descript continue on in some way. The guys that were building this back in the day were pretty sharp. Shoutout Ulf, Steve, DJW, Levi

obiefernandez · on Sept 9, 2020

Is Andrew Mason (Groupon CEO) involved in Descript? I see a note from him in the initial composition after installing.

andygcook · on Sept 9, 2020

Yes, Andrew Mason is the CEO. He's full time on Descript now and isn't working on Detour anymore.

Tepix · on Sept 9, 2020

So, does this upload your private personal voice audio data into their cloud? Has anyone read their privacy notice?