If someone can come up with a voice clinging product that I can run on my own computer not the cloud, and if it’s super simple to install and use, then I’ll pay.
I find it hard to understand why so much money is going into ai and so many startups are building ai stuff and such a product does not exist.
It’s got to run locally because I’m not interested in the restrictions that cloud voice cloning services impose.
Complete, consumer level local voice cloning = payment.
I've tried some of this ".ai" websites that do voice-cloning, and they tend to use the following dark strategy:
- Demand you create a cloud account before trying.
- Sometimes, demand you put your credit card before trying.
- Always: the product is crap. Sometimes it does voice-cloning sort of as advertised, but you have to wait for the training and the execution in queue, because cloud GPUs are expensive and they need to manage a queue because it's a cloud prouduct. At least that part could be avoided if they shipped a VST plugin one could run locally, even if it's restricted to NVidia GPUs[^2].
[^1]: To those who say "but the devs must get paid": yes. But subscriptions miss-align incentives, and some updates are simply not worth the minutes they cause in productivity lost while waiting for their shoehorned installation.
[^2]: Musicians and creative types are used to spend a lot in hardware and software, and there are inference GPUs which are cheaper than some sample libraries.
I made a voice cloning site. https://voiceshift.ai
No login, nothing required. Its a bit limited but I can add any of the RVC models. Working on a feature to just upload your own model.
How do you figure subscriptions misalign incentives? The alternative, of selling upgrades, incentivizes devs to focus on new shiny shit that teases well. I instead rather they focus on making something I get value out of consistently.
- A one-off payment makes life infinitely simpler for accounting purposes. In my jurisdiction, a software license owned by the business is an asset and shows as that in the balance sheet, and can be subject to a depreciation schedule just as any other asset.
- Mental peace: if product X does what I need right now and I can count that I will be able to use product X five years from now to do the same thing, then I'm happy to pay a lump sum that I see as an investment. Even better, I feel confident that I can integrate product X in my workflows. I don't get that with a subscription product on the hands of a startup seeking product-market fit.
The product isn't exactly spectacular, but most of the works seems to have bene done. Just needs someone to go over the UI and make it less unstable, really.
Wow perfect timing. I'm working on a sub-realtime TTS (only on Apple M-series silicon). Quality should be on-par or better than XTTS2. Definitely shoot me a message if you're interested.
Buy this one is supposed to be runnable locally. It has complete instructions on Github including downloading models locally and installing python setting it up and running it.
I see these types of comments all the time, but fact is folks at large who wouldn’t use the cloud version won’t pay. The kind of person who has a 4090 to run these sort of models would just figure out how to do it themselves.
The other issue is that paying for the software once doesn’t capture as much of the value as a pay per use model, thus if you wanted to sell the software you’d either have to say you can only use it for personal use, or make it incredibly expensive to account for the fact that a competitor would just use it.
Suppose there were such a thing - then folks may complain that it’s not open source. Then it’s open sourced, but then there’s no need to pay.
In any case, if you’re willing to pay $1000 I’m sure many of us can whip something up for you. Single executable.
I mean this at large, but I just can't get over this "sell me a product" mentality.
You already don't need to pay; all of this is happening publication to implementation, open and local. Hop on Discord and ask a friendly neon-haired teen to set up TorToiSe or xTTS with cloning for you.
Software developers and startups didn't create AGI, a whole lot of scientists did. A majority of the services you're seeing are just repackaging and serving foundational work using tools already available to everyone.
I agree, buy playing devil's advocate, it's true that people without the time and expertise to setup their own install can find this packaging valuable enough to pay for it.
It would be better for all if, in Open Source fashion, this software had a FLOSS easy-to-install packaging that provided for basic use cases, and developers made money by adapting it to more specific use cases and toolchains.
(This one is not FLOSS in the classic sense, of course. The above would be valid for MIT-licensed or GPL models).
The answer is convenience. Why use dropbox when you can run Nextcloud? You can say the same thing about large companies. Why does Apple use Slack (or whatever they use) when they could build their own? Why doesn't Stripe build their own data centers?
If I had a need for an AI voice for a project I would pay the $9 a month, use it, and be done. I might have the skills to set this up on my machine but it would take me hours to get up to speed and get it going. It just wouldn't be worth it.
Doesn't allow it yet, but on the readme, they write "This will be changed to a license that allows Free Commercial usage in the near future". So someone will soon be able to sell it to you.
It can run on CPU without much issue and takes up a few gigs of RAM and will produce about in realtime. If you GPU accelerate you only need about 8GB of video memory and it will be at least 5X faster.
Out of the box it's not as good as Eleven Labs based on their demos, but those are likely cherry picked. There are some tunable parameters for the Bark model and most consider the output high enough quality to pass into something else that can do denoising.
I find it hard to understand why so much money is going into ai and so many startups are building ai stuff and such a product does not exist.
It’s got to run locally because I’m not interested in the restrictions that cloud voice cloning services impose.
Complete, consumer level local voice cloning = payment.