If someone can come up with a voice clinging product that I can run on my own co...

dsign · on March 29, 2024

I couldn't agree more.

I've tried some of this ".ai" websites that do voice-cloning, and they tend to use the following dark strategy:

- Demand you create a cloud account before trying.

- Sometimes, demand you put your credit card before trying.

- Always: the product is crap. Sometimes it does voice-cloning sort of as advertised, but you have to wait for the training and the execution in queue, because cloud GPUs are expensive and they need to manage a queue because it's a cloud prouduct. At least that part could be avoided if they shipped a VST plugin one could run locally, even if it's restricted to NVidia GPUs[^2].

[^1]: To those who say "but the devs must get paid": yes. But subscriptions miss-align incentives, and some updates are simply not worth the minutes they cause in productivity lost while waiting for their shoehorned installation.

[^2]: Musicians and creative types are used to spend a lot in hardware and software, and there are inference GPUs which are cheaper than some sample libraries.

andrewstuart · on March 29, 2024

I don’t mind if the software is a subscription it just has to be installable and not spyware garbage.

Professional consumer level software like a game or productivity app or something.

andoando · on March 29, 2024

I made a voice cloning site. https://voiceshift.ai No login, nothing required. Its a bit limited but I can add any of the RVC models. Working on a feature to just upload your own model.

I can definitely make it a local app.

riwsky · on March 29, 2024

How do you figure subscriptions misalign incentives? The alternative, of selling upgrades, incentivizes devs to focus on new shiny shit that teases well. I instead rather they focus on making something I get value out of consistently.

dsign · on March 29, 2024

- A one-off payment makes life infinitely simpler for accounting purposes. In my jurisdiction, a software license owned by the business is an asset and shows as that in the balance sheet, and can be subject to a depreciation schedule just as any other asset.

- Mental peace: if product X does what I need right now and I can count that I will be able to use product X five years from now to do the same thing, then I'm happy to pay a lump sum that I see as an investment. Even better, I feel confident that I can integrate product X in my workflows. I don't get that with a subscription product on the hands of a startup seeking product-market fit.

jeroenhd · on March 29, 2024

RVC does live voice changing with a little latency: https://github.com/RVC-Project/Retrieval-based-Voice-Convers...

The product isn't exactly spectacular, but most of the works seems to have bene done. Just needs someone to go over the UI and make it less unstable, really.

rifur13 · on March 29, 2024

Wow perfect timing. I'm working on a sub-realtime TTS (only on Apple M-series silicon). Quality should be on-par or better than XTTS2. Definitely shoot me a message if you're interested.

smusamashah · on March 29, 2024

Buy this one is supposed to be runnable locally. It has complete instructions on Github including downloading models locally and installing python setting it up and running it.

andrewstuart · on March 29, 2024

I'm wanting to download an installer and run it - consumer level software.

endisneigh · on March 29, 2024

I see these types of comments all the time, but fact is folks at large who wouldn’t use the cloud version won’t pay. The kind of person who has a 4090 to run these sort of models would just figure out how to do it themselves.

The other issue is that paying for the software once doesn’t capture as much of the value as a pay per use model, thus if you wanted to sell the software you’d either have to say you can only use it for personal use, or make it incredibly expensive to account for the fact that a competitor would just use it.

Suppose there were such a thing - then folks may complain that it’s not open source. Then it’s open sourced, but then there’s no need to pay.

In any case, if you’re willing to pay $1000 I’m sure many of us can whip something up for you. Single executable.

andoando · on March 30, 2024

I have a 2070 and it works just fine, as long as you're not doing real time conversion. You can try it on https://voiceshift.ai if youre curious.

washadjeffmad · on March 29, 2024

I mean this at large, but I just can't get over this "sell me a product" mentality.

You already don't need to pay; all of this is happening publication to implementation, open and local. Hop on Discord and ask a friendly neon-haired teen to set up TorToiSe or xTTS with cloning for you.

Software developers and startups didn't create AGI, a whole lot of scientists did. A majority of the services you're seeing are just repackaging and serving foundational work using tools already available to everyone.

TuringTest · on March 29, 2024

I agree, buy playing devil's advocate, it's true that people without the time and expertise to setup their own install can find this packaging valuable enough to pay for it.

It would be better for all if, in Open Source fashion, this software had a FLOSS easy-to-install packaging that provided for basic use cases, and developers made money by adapting it to more specific use cases and toolchains.

(This one is not FLOSS in the classic sense, of course. The above would be valid for MIT-licensed or GPL models).

_bkyr · on March 29, 2024

The answer is convenience. Why use dropbox when you can run Nextcloud? You can say the same thing about large companies. Why does Apple use Slack (or whatever they use) when they could build their own? Why doesn't Stripe build their own data centers?

If I had a need for an AI voice for a project I would pay the $9 a month, use it, and be done. I might have the skills to set this up on my machine but it would take me hours to get up to speed and get it going. It just wouldn't be worth it.

nprateem · on March 29, 2024

You can extend that reasoning to anything, but time and energy are limited

ipsum2 · on March 29, 2024

How much would you pay? I can make it.

andrewstuart · on March 29, 2024

You can’t sell this cause the license doesn’t allow it.

pmontra · on March 29, 2024

"This repository is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, which prohibits commercial usage"

People could pay somebody for the service of setting up the model on their own hardware, then use the model for non commercial usage.

GTP · on March 29, 2024

IANAL, but this looks like a grey area to me: it could be argued that the person/company getting paid to do the setup is using the model commercially.

GTP · on March 29, 2024

Doesn't allow it yet, but on the readme, they write "This will be changed to a license that allows Free Commercial usage in the near future". So someone will soon be able to sell it to you.

ipsum2 · on March 29, 2024

Not using this model, but something similar. How much would you pay?

ipsum2 · on March 29, 2024

Based on the lack of replies, the answer appears to be $0.

ddtaylor · on March 29, 2024

Bark is MIT licensed for commercial use.

palmfacehn · on March 29, 2024

XTTS2 works well locally. Maybe someone else here can recommend a front end.

ddtaylor · on March 29, 2024

I can show you how to use Bark AI to do voice cloning.

rexreed · on March 29, 2024

What local hardware is needed to run Bark AI? What is the quality? Looking for something as good or better than Eleven Labs.

ddtaylor · on March 29, 2024

It can run on CPU without much issue and takes up a few gigs of RAM and will produce about in realtime. If you GPU accelerate you only need about 8GB of video memory and it will be at least 5X faster.

Out of the box it's not as good as Eleven Labs based on their demos, but those are likely cherry picked. There are some tunable parameters for the Bark model and most consider the output high enough quality to pass into something else that can do denoising.

mdrzn · on March 29, 2024

Please do!