Get started takes you to download, download gives you and installer, app requires sign up, after sign up you go to Overdub and, oh, it's a paid feature.
I made https://vo.codes as a side project to democratize text to speech. It's got lots of celebrities, politicians, cartoon characters, and tech figures (Paul Graham, Sam Altman, Mark Zuckerberg, et al)
Kids on YouTube found it and are making incredible music videos with it.
I'm planning to do the same thing as these folks after I finish my real time voice conversion system. It's a killer app for Discord and TeamSpeak - you can talk in real time as Gilbert Gottfried, SpongeBob, or Donald Trump.
text2speech and speech2speech seem like very different problems no? Especially when so much of someone's voice identity is based on speech patterns, contextual inflection, rhythm/cadence, how would realtime work? Just wondering if you are thinking about realtime as in <10ms or realtime as in 1 second?
Awesome progress by the way, would totally follow along more closely on this project through an eng diary or mailing list.
I made one attempt at Morgan Freeman but I didn't have enough sample data. He narrated a few audio books, but they've either got background music that ruins the training or he collaborated with other speakers.
What data I gathered I obtained from interviews on YouTube. Unfortunately they're all so short that it's a lot of work to extract meaningful training data.
I really want his voice. I'll definitely get back to it.
If you need training data and don't mind it being from grey-area sources, I can help with supplying the audio tracks from some of his narration or acting work. Email's in my profile.
I signed up for Descript and did the Overdub training a few weeks ago. It's impressive, albeit not a perfect match for my actual voice. You train the voice by reading 30 minutes of Wizard of Oz using a high quality microphone. I used a Yeti in a closet at my house. I do plan to record the additional hour of supplemental reading which might help make it more authentic.
(Side note - I didn't know Dorothy's shoes were actual silver in the book)
My use case for Overdub is to quickly correct errors in recorded demo/how-to screencast videos for my startup without having to do redo the audio.
Thanks for the sample. There's a flatness to the audio in recreation, but the major difference is lack of inflection. I imagine the latter might be tunable in a future incarnation.
You can record your voice in different styles, which I think would help with inflection. I haven't had enough time to really experiment with it yet. The starting point is definitely an impressive effort on the Overdub/Descript team's side.
How does it compare with the training data you submitted to create your overdub voice? Sometimes, the voice sounds different because of the recording conditions, room tone, etc. and overdub voice copies these environmental factors as well.
I recorded on a high quality microphone in a small closest with a foam mattress pad and blankets surrounding me. It was pretty soundproof. Upon re-listening to the audio, it does sound a little tinny, which likely has to do with the microphone settings.
I think the issue with the monotone is that my reading voice is different from my talking voice. When I'm freely talking, I'm more dramatic. When I'm reading, I'm less so because the only time I read out loud nowadays is to children before bedtime.
I do plan to resubmit the training script with a different style, but recording the full audio was admittedly tiring so I haven't sent that in yet. I'm sure with some experiment I could get it closer to reality.
I did get some helpful advice from someone on the Descript team on how to use punctuation and misspellings to fix some of the intonation. Overall, it's really impressive technology and I can think of a few different uses for my business for my own voice.
Before Lyrebird was bought by Descript, this was a simple API endpoint. You could generate short MP3's of your digital voice reading any text of your choosing, for free.
I used it extensively in one of my projects, only to find everything broken after leaving it alone for a few months.
All of that to say, are there any alternatives out there HN knows of?
This isn't an attack on Descript, either. The Lyrebird purchase made a lot of sense, and they've built an amazing product with the technology.
Hey, founder of Resemble (https://resemble.ai) here. We're building a custom voice cloning service with APIs and controllability. Shoot me a message if you want to build a voice from raw MP3s.
That's great! I'm glad to see any competition in the space. When I first tried Lyrebird a year or 18 months ago the quality wasn't great. It was OK and I could hear a resemblance to the voice it was meant to create, but there were hisses and fuzzy sections and it just wasn't long term listenable. Recently the examples I've heard from Descript have sounded really crisp. Keep on the path you're on! I like your business model better, but from the examples on your website the voices are still not quite as refined. Now what I'd really like is a local means of generating great TTS of my own voice, but that's a different offering.
Not the worst I have ever seen, but it is pretty clear that they will be using everything you do to improve their product, which doesn't seem absurd, given that you get to use it for free. But it might make you think twice about having it say anything you wouldn't want stored forever / passed around the office.
Hey Prophesi, on Replica (https://replicastudios.com/) you can generate voices from our stock library for free, and of course you can create your own Replica voice too.
If you have more audio data you would want to upload to improve the quality of the voice, you can also get in contact with the team and they will help you out!
You can also access your own voice via the API as well as other stock voices.
Download page throws this error, might want to look into that. I guess it benefits the user that it's broken.
> Upscope.io: You have exceeded your Upscope subscription usage limits. We will collect data again once usage falls back within your subscription's limits.
All these voices have been created via our voice cloning pipeline. Something to note: the generated audio is 44.1kHz and super crisp, ready to be useful for voice-overs or editorial corrections. Let me know if you have any questions about the tech. (I am one of the lyrebird founders.)
I would love to use this to "record" voiceovers for my weekly radio show. Would fix one of the least enjoyable parts of the process, which is actually recording the voiceovers.
I have not yet had any serious use cases with this tech, but I've enjoyed playing around with it and I love its possibilities.
(A while back, as an experiment, I recorded samples with a variety of silly character voices and fake accents; when the end result merged them into one, it sounded hilarious.)
Hi Shannifin, at Replica (https://replicastudios.com/) we have a lot of serious use cases from our customers and increasingly more uses as the tech has been improving and the product becomes easier to use!
Currently we have seen users develop pretty much anything such as: games from indie to AAA, animations and other creative clips, audio books and spoken stories of all types, enterprise coaching videos, ads, and much more!
I think the tech still needs a bit more polish to replace great voice acting, but it's definitely an exciting possibility! (Although I've definitely heard some voice acting and audiobook narration (libravox) that's pretty bad and this tech already beats!)
Shoutout to Andrew Mason - I'm glad to see Detour/Descript continue on in some way. The guys that were building this back in the day were pretty sharp. Shoutout Ulf, Steve, DJW, Levi
Remove installer, delete app.