Hacker News new | past | comments | ask | show | jobs | submit login
[flagged] Lyebird: Clone Your Voice (descript.com)
43 points by fctorial on Sept 9, 2020 | hide | past | favorite | 49 comments



Get started takes you to download, download gives you and installer, app requires sign up, after sign up you go to Overdub and, oh, it's a paid feature.

Remove installer, delete app.


I found it pretty disturbing that an installer was immediately downloaded on clicking the 'Get Started' button. Won't use.


Thank you, that saved me some time!


Show HN:

I made https://vo.codes as a side project to democratize text to speech. It's got lots of celebrities, politicians, cartoon characters, and tech figures (Paul Graham, Sam Altman, Mark Zuckerberg, et al)

Kids on YouTube found it and are making incredible music videos with it.

Homer Simpson and SpongeBob cover Money Machine: https://youtu.be/iBpqJF5LXX4

SpongeBob covers WAP (NSFW!) : https://youtu.be/dSgd4PoQofQ

SpongeBob covers 6IX9INE : https://youtu.be/IKs5iWVRE94

I'm planning to do the same thing as these folks after I finish my real time voice conversion system. It's a killer app for Discord and TeamSpeak - you can talk in real time as Gilbert Gottfried, SpongeBob, or Donald Trump.


text2speech and speech2speech seem like very different problems no? Especially when so much of someone's voice identity is based on speech patterns, contextual inflection, rhythm/cadence, how would realtime work? Just wondering if you are thinking about realtime as in <10ms or realtime as in 1 second?

Awesome progress by the way, would totally follow along more closely on this project through an eng diary or mailing list.


Neat stuff!

What would it take to get Morgan Freeman's voice onto it?


Thanks!

I made one attempt at Morgan Freeman but I didn't have enough sample data. He narrated a few audio books, but they've either got background music that ruins the training or he collaborated with other speakers.

What data I gathered I obtained from interviews on YouTube. Unfortunately they're all so short that it's a lot of work to extract meaningful training data.

I really want his voice. I'll definitely get back to it.


If you need training data and don't mind it being from grey-area sources, I can help with supplying the audio tracks from some of his narration or acting work. Email's in my profile.


You really just need to get it to say "Morgan Freeman" over and over again.


Would it be possible to have shareable links? Like vo.codes/?speaker=Bart&text=I'm+out


Absolutely!

I want to add an API too.


I signed up for Descript and did the Overdub training a few weeks ago. It's impressive, albeit not a perfect match for my actual voice. You train the voice by reading 30 minutes of Wizard of Oz using a high quality microphone. I used a Yeti in a closet at my house. I do plan to record the additional hour of supplemental reading which might help make it more authentic.

(Side note - I didn't know Dorothy's shoes were actual silver in the book)

My use case for Overdub is to quickly correct errors in recorded demo/how-to screencast videos for my startup without having to do redo the audio.

Here's a sample of the Overdub output vs. my actual recorded voice if you're curious: https://web.descript.com/084a416c-57d4-4df8-980b-24e0df82532...


Thanks for the sample. There's a flatness to the audio in recreation, but the major difference is lack of inflection. I imagine the latter might be tunable in a future incarnation.


You can record your voice in different styles, which I think would help with inflection. I haven't had enough time to really experiment with it yet. The starting point is definitely an impressive effort on the Overdub/Descript team's side.


How does it compare with the training data you submitted to create your overdub voice? Sometimes, the voice sounds different because of the recording conditions, room tone, etc. and overdub voice copies these environmental factors as well.


Here's a sample clip of the training audio: https://web.descript.com/afdff9c8-17d0-41bb-84ec-77709120709...

I recorded on a high quality microphone in a small closest with a foam mattress pad and blankets surrounding me. It was pretty soundproof. Upon re-listening to the audio, it does sound a little tinny, which likely has to do with the microphone settings.

I think the issue with the monotone is that my reading voice is different from my talking voice. When I'm freely talking, I'm more dramatic. When I'm reading, I'm less so because the only time I read out loud nowadays is to children before bedtime.

I do plan to resubmit the training script with a different style, but recording the full audio was admittedly tiring so I haven't sent that in yet. I'm sure with some experiment I could get it closer to reality.

I did get some helpful advice from someone on the Descript team on how to use punctuation and misspellings to fix some of the intonation. Overall, it's really impressive technology and I can think of a few different uses for my business for my own voice.


Before Lyrebird was bought by Descript, this was a simple API endpoint. You could generate short MP3's of your digital voice reading any text of your choosing, for free.

I used it extensively in one of my projects, only to find everything broken after leaving it alone for a few months.

All of that to say, are there any alternatives out there HN knows of?

This isn't an attack on Descript, either. The Lyrebird purchase made a lot of sense, and they've built an amazing product with the technology.


Hey, founder of Resemble (https://resemble.ai) here. We're building a custom voice cloning service with APIs and controllability. Shoot me a message if you want to build a voice from raw MP3s.


That's great! I'm glad to see any competition in the space. When I first tried Lyrebird a year or 18 months ago the quality wasn't great. It was OK and I could hear a resemblance to the voice it was meant to create, but there were hisses and fuzzy sections and it just wasn't long term listenable. Recently the examples I've heard from Descript have sounded really crisp. Keep on the path you're on! I like your business model better, but from the examples on your website the voices are still not quite as refined. Now what I'd really like is a local means of generating great TTS of my own voice, but that's a different offering.


Wow, this is perfect! It even has a Unity plugin. I'll certainly give Resemble a whirl.


Zohaib and I can always be found on these threads :)

The two main non Google/Amazon competitors are https://www.resemble.ai/ and https://replicastudios.com/


Hey! Lyrebird founder here! The simple API endpoint is work in progress. Please fill this form to get an early access: https://descript.typeform.com/to/Phn5zATR


Thanks kundan2510! I sent in a submission. Your API endpoint worked really well for me in the past, so your service will certainly take precedence.


Yes, I see to register and install you agree to these TOCs: https://www.descript.com/terms

Not the worst I have ever seen, but it is pretty clear that they will be using everything you do to improve their product, which doesn't seem absurd, given that you get to use it for free. But it might make you think twice about having it say anything you wouldn't want stored forever / passed around the office.


Hey Prophesi, on Replica (https://replicastudios.com/) you can generate voices from our stock library for free, and of course you can create your own Replica voice too.

If you have more audio data you would want to upload to improve the quality of the voice, you can also get in contact with the team and they will help you out!

You can also access your own voice via the API as well as other stock voices.


OP: "lyrebird" is misspelled in the title.


The live demo is incredible. I've never heard TTS sound so realistic, when did it get this good?


Actually TTS voices have been great since years ago (e.g. https://www.nuance.com/omni-channel-customer-engagement/voic...) , these new neural things make them a lot more expressive though.


agreed. it is surprisingly exceptionally good at profanities and vulgar languages.


Download page throws this error, might want to look into that. I guess it benefits the user that it's broken.

> Upscope.io: You have exceeded your Upscope subscription usage limits. We will collect data again once usage falls back within your subscription's limits.


I wonder who will make somethingn like this a downloadable, locally executable program.


Since you probably need an high end Nvidia GPU at scale - Self hosted is more likely


We just launched a bunch of built-in stock voices as well: this is how they sound: https://twitter.com/andrewmason/status/1303384858249494529?s...

All these voices have been created via our voice cloning pipeline. Something to note: the generated audio is 44.1kHz and super crisp, ready to be useful for voice-overs or editorial corrections. Let me know if you have any questions about the tech. (I am one of the lyrebird founders.)


I would love to use this to "record" voiceovers for my weekly radio show. Would fix one of the least enjoyable parts of the process, which is actually recording the voiceovers.


I have not yet had any serious use cases with this tech, but I've enjoyed playing around with it and I love its possibilities.

(A while back, as an experiment, I recorded samples with a variety of silly character voices and fake accents; when the end result merged them into one, it sounded hilarious.)


Hi Shannifin, at Replica (https://replicastudios.com/) we have a lot of serious use cases from our customers and increasingly more uses as the tech has been improving and the product becomes easier to use!

Currently we have seen users develop pretty much anything such as: games from indie to AAA, animations and other creative clips, audio books and spoken stories of all types, enterprise coaching videos, ads, and much more!


I'll use this to dub over videos I'm making.


Voice acting for ... games, audiobooks, etc?


I think the tech still needs a bit more polish to replace great voice acting, but it's definitely an exciting possibility! (Although I've definitely heard some voice acting and audiobook narration (libravox) that's pretty bad and this tech already beats!)


Cloning your voice to TTS reader software so that your kids or loved ones can read/listen using your voice.


That would be awesome


Robocalling, with some interactivity. "Hello! This is Camp Ayne running for city council. How are your kids Doris and Eugene?"


That's going to be completely indistinguishable from phishing. How would you protect yourself from that?


Lenny will tie them up.


Or use it with GPT-3 to answer phone calls I don't want to take...


Shoutout to Andrew Mason - I'm glad to see Detour/Descript continue on in some way. The guys that were building this back in the day were pretty sharp. Shoutout Ulf, Steve, DJW, Levi


Is Andrew Mason (Groupon CEO) involved in Descript? I see a note from him in the initial composition after installing.


Yes, Andrew Mason is the CEO. He's full time on Descript now and isn't working on Detour anymore.


So, does this upload your private personal voice audio data into their cloud? Has anyone read their privacy notice?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: