15.ai

15ai · on June 13, 2022

Hello. I made this. I'm a bit shocked at how many people are getting things wrong and taking things out of context, so perhaps I could answer some questions.

To clarify: I have not been with MIT in years. I was paid the minimum hourly rate (roughly $14 an hour) to work on a related project during my undergraduate years, which eventually evolved into this project years down the road. (In fact, I had to pay for my own compute to get my work started - MIT never offered me any credits.) Everything else has been paid out of pocket since then. Yes, it does indeed cost several thousands of dollars a month - that is not an exaggeration. This has been optimized so many times that the technology needs to improve first before I can cut down on costs.

The timer in the "TOS" was put there in hopes that people could understand where I was coming from in regards to misappropriating this kind of research. I did not expect people to get this riled up over a 10 second timer. (Especially not on Hacker News, of all places...)

Edit: I suppose it makes sense to include other information in this post so that others don't have to go hunting for my comments in this thread:

- >This is unrelated but what's with the fascination with HN users and My Little Pony? I've noticed this on a lot of posts in the past few months.

- Twilight Sparkle's voice is indispensable in getting emotional contextualizers to work properly. The logo and profile picture is an homage to that fact.

- >But seriously: how did you get the domain `15.ai`?

- I purchased it. It was definitely not cheap.

- >They named their tts "Deep Throat"? Why would you?

- It was a suggestion from a Twitter user, and I found it clever.

- >I heavily doubt it's "several thousands of dollars"...

- It is indeed several thousands of dollars a month. I can show you AWS invoices, if you're skeptical. Just send me an email and I'd be happy to show proof.

- >The disclaimer is a little ironic considering the site owner doesn’t own the model (MIT does) and doesn’t own the training data (the various shows and games do)

- I'm sorry to tell you that I do, in fact, own my own model. I have not been with MIT in years.

Zababa · on June 13, 2022

Thank you for this, it's great to have passionate people working on interesting projects like those and sharing them with everyone.

mrstone · on June 13, 2022

How do you have the money to support this project as an undergrad?

15ai · on June 13, 2022

I have not been with MIT in years. I had a successful exit not too long after graduation, and I've been spending most of my earnings on this project.

As an undergrad, I was completely broke. I figured that keeping the project free to use was the best thing I could possibly do with my research as I continued to work on it.

mixermachine · on June 13, 2022

Have you thought about other hosters? Maybe try contabo.com or hetzner.com Should be a lot cheaper if you mainly need CPU power.

15ai · on June 13, 2022

Unfortunately, this is mostly run on the GPU. Believe it or not, AWS was by far the cheapest.

ellieh · on June 13, 2022

Hetzner sometimes have GPU instances available on their auction site, and if you get in touch with them they can sometimes sort something out!

They used to sell them openly, but coin mining ruined that :(

funkycolemedina · on June 13, 2022

Lambda Labs offers both on-demand and reserved instances that are much cheaper than other providers - https://lambdalabs.com/service/gpu-cloud/pricing

zerr · on June 13, 2022

Why do you keep it up and burning the money? Can't you continue/do your research without a public website?

15ai · on June 13, 2022

>Why do you keep it up and burning the money?

I can afford it.

>Can't you continue/do your research without a public website?

Yes, but the website has multiple purposes. It serves as a proof of concept of a platform that allows anyone to create content, even if they can't hire someone to voice their projects.

It also demonstrates the progress of my research in a far more engaging manner - by being able to use the actual model, you can discover things about it that even I wasn't aware of (such as getting characters to make gasping noises or moans by placing commas in between certain phonemes).

It also doesn't let me get away with picking and choosing the best results and showing off only the ones that work (which I believe is a big problem endemic in ML today - it's disingenuous and misleading). Being able to interact with the model with no filter allows the user to judge exactly how good the current work is at face value.

junon · on June 13, 2022

Despite others here, I personally certainly think this is admirable. I've played with your models a long time ago with some colleagues of mine and we were all shocked how good it was, and that it was free.

I'm no stranger to passion projects, I have a lot of respect for people like you. This is great stuff.

15ai · on June 13, 2022

Thank you for the kind words. I know that HN is a tough crowd to please (I myself as well), so I hope that my next update will be well worth working for.

vsnf · on June 13, 2022

The Rise model in particular is amazingly good quality. I pranked a friend with some text from her a few minutes ago and he chastised me for wasting my money on hiring voice actors just to troll him.

So, excellent job.

_kuvn · on June 13, 2022

Do you cache results from this (especially the random samples provided)? It seems to be regenerating those, which might be expensive if lots of people are using the same prompts

Edit: also wanted to thank you for the Chell voice, it sounds completely true to life to me! (minus some jumping noises)

deeplearner1 · on June 13, 2022

Do you still plan to publish your research in a paper?

15ai · on June 13, 2022

Yes, I do. For the past three years, I have done nothing but work on this project nonstop. I've been working on massive improvements (that some have pointed out in this thread) that I've been stuck on for the past several months, but I'm getting close to finishing that up.

I don't feel comfortable publishing or releasing anything until I know for a fact that I can make no further improvements. It's not out of corporate greed or anything like that - I'm just really paranoid about getting out the best work possible.

maxander · on June 13, 2022

Respectfully, the perfect is the enemy of the good, and it’s entirely reasonable to publish what you have now. If later you make further improvements, you can simply publish again.

15ai · on June 13, 2022

You're completely correct, but I'm afraid this is more of a personal problem. I know I'll never be able to forgive myself if I figured out a solution to one of the more obvious problems with the model after I've already published it. I'd just be far more comfortable being happy with my own work before I release it to the wild. I know that this is selfish, and I apologize.

Dramatize · on June 13, 2022

Do you still ban users from using different AI voices together?

15ai · on June 13, 2022

I have no control over that. Exactly how do you expect me to enforce this?

imdsm · on June 13, 2022

This is a fantastic project but I wish you would expand Spongebob to include Patrick, Squidward, Plankton, and my favourite, Mr Krab$!

armchairhacker · on June 12, 2022

This is really cool. It's a text-to-speech and the gist seems to be that they synthesize it from only a little audio.

The results are clearly synthetic and need work. However what's cool is that there are a ton of characters (from popular shows and video games) and there are useful statistics like inferred emotion (which is also in the output).

Honestly it's a big problem how a lot of AIs are like "black boxes" where you really can't customize or see anything. Yeah we have DALL-E and GPT which can generate text images but the lack of customization or fine-tuning the image afterwards severely hinders what's possible with them. Ultimately what you want is something interactive, where you can control how much or little the AI generates, and give it really specific criterion.

But seriously: how did you get the domain `15.ai`?

userbinator · on June 12, 2022

I agree this is an amazing demonstration of what AI can do, but I think that the current method of "learn and repeat" that depends on having tons of computing resources available is still too inefficient in many ways. Personally I'm more interested in what parameterisable formant-based synths can do, since they are extremely efficient and can produce a theoretically infinite variation of voices, although the output quality is still not great. Example: https://news.ycombinator.com/item?id=31604299

15ai · on June 13, 2022

>But seriously: how did you get the domain `15.ai`?

I purchased it. It was definitely not cheap.

marcofatica · on June 12, 2022

> But seriously: how did you get the domain `15.ai`?

it's an MIT project so I'm sure that was a factor

paulsutter · on June 12, 2022

.ai domains cost a couple hundred bucks a year so domains are very available / not widely used by domain squatters (Its the country domain for the island of Anguilla, pop 15,000)

mfkp · on June 13, 2022

$65/year on Porkbun, not terrible.

Der_Einzige · on June 12, 2022

In the case of text generation, we call this "Constrained Text Generation" and it is an active field of research. Without going into too many details (I have a paper out for review about this), it's pretty trivial to get "interactive control over how much or how little the AI generates" by a combination of filters on the LMs vocabulary, and effective selection of the various hyperparamaters in the decoder (top_p, top_k, temperature)...

Deritio · on June 12, 2022

Dall-E 2 has customization.

You can remove or add things etc.

And for GPT you can also specify more details.

Only a question of time until you can work with the ai on your art/thing.

There are ai models which keep track of context and others which generate a plan of actions.

AI is not a blackbox

sterlind · on June 12, 2022

OpenAI itself is a black box. Until I can reproduce their models or download them myself, and have unfettered access to them, it's just gatekept magic behind an API. So much for democratizing machine learning.

judge2020 · on June 12, 2022

> So much for democratizing machine learning.

Unless this is a recent change, their mission isn't that:

> OpenAI’s mission is to ensure that artificial general intelligence (AGI)—by which we mean highly autonomous systems that outperform humans at most economically valuable work—benefits all of humanity.

https://openai.com/about/

Deritio · on June 13, 2022

If that's your definition of black box, you need to specify this.

Blackbox would mean in a normal setting that no one knows how it works

teaearlgraycold · on June 12, 2022

You can fine tune GPT-3

canjobear · on June 12, 2022

Only if you're OpenAI

jameshart · on June 12, 2022

Fine tuning of GPT3 models is available via their public API. Costs credits, and you need to get their permission to use it in an actual application, but it’s not locked in a lab.

sillysaurusx · on June 12, 2022

So “Only if you’re OpenAI” :)

If the weights were public, the community would figure out a way to fine tune it.

jameshart · on June 12, 2022

It’s not a matter of ‘figuring out’. The model supports fine tuning. It’s a core feature of the openai API. Running ‘fine tuned’ versions of GPT-3 that are created by customers is literally their SaaS model. They have examples in the documentation. Here: https://help.openai.com/en/articles/5528730-fine-tuning-a-cl...

sillysaurusx · on June 12, 2022

I get that. But you don’t get the weights. Which means they limit what you can and can’t generate.

A friend of mine was going to build a writing assistant on top of GPT-3. She got a lot of encouragement from them. Then one day the social media storm hit OpenAI, and suddenly safety became a non-negotiable feature of their api. And along with that came the restriction “absolutely no products that can generate unlimited amounts of text, even if you can pay for the credits.”

Poof, no more business.

Imagine buying a keyboard that restricted what you could type.

All of these problems go away when you have access to the weights.

Dangeranger · on June 12, 2022

GPT-3 can be quite adaptive given prompt engineering and the uploading of sample files.

Have you used GPT-3 with any of the methods mentioned in the docs?

I’ve seen that GPT-3 can produce quite starkly different results when prompted differently and when samples have been uploaded.

jamal-kumar · on June 12, 2022

I just used it to make spongebob squarepants say bad things.

BuyMyBitcoins · on June 12, 2022

This thing synthesizes dolphin squeaks? Wow!

deeplearner1 · on June 12, 2022

If you want more information about 15.ai, I highly suggest reading their Wikipedia article! https://en.wikipedia.org/wiki/15.ai

The whole history behind the project is fascinating: 4chan had a huge role in its development, and the project's work was stolen by an NFT company that a famous voice actor endorsed not too long ago.

julianeon · on June 12, 2022

Ah, I was wondering why they were so concerned about attribution.

The truth is that, today, if I was going to use a tool to generate voices (say for YouTube), I wouldn't necessarily pick a small SaaS tool. I'd use Amazon Polly or some other GCP-style platform voice creation tool. There are already a few products in the space, and their costs are so low as to be almost negligible (example: Polly, 5 million characters free). For a commercial project, I could probably stay on a free tier for a whole year.

With Dall*E, it seems like the only option, and it's such a superior option that a website could abuse it for commercial profits. But for voice synthesis, it's already dirt cheap and commercially available without limitations.

rockemsockem · on June 13, 2022

Polly and GCP's voices still sound a tad robotic though unfortunately :/

15.ai seems to beat them on some sentences, but not all. Looking forward to the day when we can have real human-level quality of voices on-demand.

gwern · on June 13, 2022

15-kun has always been fanatical about attribution; the plagiarism just made him more so.

apeace · on June 12, 2022

Kind of off-topic but... I clicked the "Random sample text" button and was given this:

> OOPSIE WOOPSIE! Uwu. We made a fucky wucky! A wittle fucko boingo!

I was a bit taken aback because I'd never seen this meme before. So I looked it up and I found the origins of it pretty funny: https://knowyourmeme.com/memes/oopsie-woopsie

Also, hearing the voice of Spongebob saying that really brightened my day. Bravo, MIT.

s3p · on June 13, 2022

Unrelated, but years ago iOS used a new ML voice for Siri. It was quickly abused by people who found out how easy it was to make her say weird things [1]. Just by repeating the same character over and over again, she made weird sighs and other noises. Seems this AI has the same 'problem'. lol

[1] https://www.youtube.com/watch?v=Bb5eHXavk34

Edit: more links because this made me laugh way more than it should have https://youtu.be/zIUiOUGiB7s?t=6 https://youtu.be/0VovEGbYsI4?t=41

jason2323 · on June 12, 2022

Hah! If you click on reject on the cookies window it rickrolls[1] you!

[1]https://www.urbandictionary.com/define.php?term=Rick%20Roll

capelio · on June 12, 2022

Except that wasn’t a cookies acceptance window…

quickthrower2 · on June 12, 2022

Those 2 comments sum up the web in 2022

forrestthewoods · on June 12, 2022

The copyright laws around this are fascinating. They're adamant it must be non-commercial, they must be credited, and it can't be mixed with any other generated content. Meanwhile their content is exclusively derived from popular commercial products. Oh and they also make money via Patreon donations.

I dunno. Feels a little gross to me. Eventually there is going to be a big copyright case about a model trained with copyrighted material. I have no idea how that will be resolved. Or maybe there will simply be new laws passed to make it either explicitly ok or explicitly not ok.

deeplearner1 · on June 12, 2022

“Make money”? The creator loses several thousands of dollars a month hosting the site, and it’s done for free. The Patreon donations are all voluntary and only offer a pittance to the developer.

I highly suggest reading into the project first. The Wiki article I linked before (https://en.wikipedia.org/wiki/15.ai) answers all of your questions about copyright infringement.

forrestthewoods · on June 12, 2022

Feel free to replace "make money" with "collect revenue". This is currently a research project (with funding). However it's long-term goal is to achieve commercial quality voice acting and dubbing. It could be given away for free, sold directly, sold downstream, sold indirectly, or otherwise generate commercial value.

In terms of copyright infringement, your wiki link answers nothing. A court ruled that Google could use copyrighted book text to train an algorithm to improve search results because the copying was highly transformative and did not serve as a market substitute to the original work.

Meanwhile 15.ai is using copyrighted voice recordings to train an algorithm to synthesize new voice recordings that sound like they came from the original speaker. This is radically different from the Google case. Just because one instance of using copyrighted material to train an algorithm qualifies as fair use does not mean that all use of copyright material to train any algorithm also qualifies as fair use.

There is absolutely nothing about this that is settled law. In the next 20 years there are going to be lots of lawsuits, lots of settlements, possibly a few rulings, and maybe even a few new laws. I find the whole topic very interesting. YMMV.

Closi · on June 13, 2022

Like you say, the law is not settled on this, but I assume if the author got a takedown request they would probably comply.

In many instances a policy of "ask for forgiveness rather than permission" can get you further, faster. While Nickelodeon are unlikely to grant you a license to the Spongebob voice because that has broader licensing and IP repercussions, they are likely to tolerate a research project using their characters (e.g. just as they have to-date tolerated The SpongeBob SquarePants Movie Rehydrated, which was a fan re-creation of one of their actual movies).

noobermin · on June 13, 2022

I heavily doubt it's "several thousands of dollars"...

15ai · on June 13, 2022

It is indeed several thousands of dollars a month. I can show you AWS invoices, if you're skeptical. Just send me an email and I'd be happy to show proof.

adastra22 · on June 13, 2022

AWS is like 10x as expensive as the competition, most of the time. Have you considered switching hosting providers?

noobermin · on June 13, 2022

No, you don't need to send me pictures of your invoice...I'll just take your word for it then.

redredrobot · on June 13, 2022

I would imagine it will end with a similar outcome to video game likenesses - a person owns their likeness and you can't create products that includes their likeness without their consent.

MrTrvp · on June 13, 2022

What would that mean for parodies though, death of satire. Can likenesses have fair use or perhaps only for for positive representations of the person?

throwaway675309 · on June 12, 2022

Also he admits to doing all of that or at least a majority of the work while under the funding of MIT...

nope96 · on June 12, 2022

Oh god, 50 shades of SpongePants. The future is wild in ways I never imagined. Star Trek style holodecks in what, 15 years?

So, creepy thought: should we be recording audio of our parents, so we can still "hear from them" once in a while after they die? People are going to want to reconstruct their lost loved ones with AI. This project seems to imply you only need an hour or so of audio.

batch12 · on June 12, 2022

After my dad died, we found that he had recorded every phone call he had with us. I thought about doing this combined with text generation to create plausible prompts but never got the guts to go through with it. He wouldn't care if I had done it, but it wouldn't ease the guilt from years of sighs and rolling my eyes when he called at always the wrong times.

gridspy · on June 13, 2022

There is a great story of this - the creation of a "DadBot"

https://www.wired.com/story/a-sons-race-to-give-his-dying-fa...

rockemsockem · on June 13, 2022

Sounds like a pretty unhealthy thing to do TBH :/

Tade0 · on June 13, 2022

> So, creepy thought: should we be recording audio of our parents, so we can still "hear from them" once in a while after they die?

Wouldn't do it for me. My mother is an artist and makes the most outlandish connections between seemingly unrelated topics on a regular basis.

Engineering-MD · on June 14, 2022

There is a black mirror episode about this which then extends into a whole robotic replacement for a lost loved one. As with all black mirror episodes, it’s pretty dystopian.

WalterGR · on June 12, 2022

If anyone is curious, the previous submission of this was popular: https://news.ycombinator.com/item?id=25654118

hojjat12000 · on June 12, 2022

They named their tts "Deep Throat"? Why would you?

neilv · on June 12, 2022

I came back to the HN comments to ask the same thing, when I saw that.

I think this is kinda edgy humor that can work in small, select groups.

But maybe in most contexts in which people will be looking at AI method tech demos (e.g., within a company, or researchers at a university), we're still feeling ongoing effects of multi-generational injustices. In such a context, no one wants to be associating to ideas of women around some infamous '70s porn film. Doing that, and making light of it, seems like it'd rightly bother a lot of people.

When you're focused on a project, and maybe first discussing/showing in one small group, it can be easy to forget there are many additional things going on outside that group, some of which we also want to consider. I've made that mistake multiple times, including with the wrong humor for a context, still cringe when I think of instances of it, and this looks like that to me.

Maybe the developer will see this discussion, and decide to change some things, ASAP. It might still be relatively easy to change. (Maybe doable before Monday business hours, in the time zone of the university mentioned prominently.)

alexb_ · on June 13, 2022

Imagine being so uptight that you see a harmless joke, get mad about it, make many assumptions about a person that you've never met, and not only tell them to change it but give them a deadline to do so.

xdennis · on June 12, 2022

It could be a reference to https://en.wikipedia.org/wiki/Deep_Throat_(Watergate)

droidist2 · on June 12, 2022

Which itself was a reference to the pornographic film of the same name.

https://en.wikipedia.org/wiki/Deep_Throat_(film)

cal85 · on June 13, 2022

Also https://en.wikipedia.org/wiki/Deep_Throat_(The_X-Files)

mgdlbp · on June 12, 2022

to the DeepFoo pattern in deep learning naming, more likely.

BeFlatXIII · on June 12, 2022

Why not both at once?

mgdlbp · on June 12, 2022

Sill nothing to top 'DeepCreamPy' https://news.ycombinator.com/item?id=29737049

mgdlbp · on June 13, 2022

All six contiguous subsequences of the three tokens have meaning:

- Deep, because it's deep-learning based

- Cream, referring to its pornographic application

- Py, as it's written in Python

- DeepCream is close to DeepDream

- CreamPy is homophonic to creampie

- As a whole, homophonous to deep creampie, a meaningful pairing of noun and adjective

quenix · on June 12, 2022

Perhaps as a joke?

userbinator · on June 12, 2022

Relatedly, a speech synth (or rather, the "output" part) that has appeared on HN before is named the Pink Trombone:

https://news.ycombinator.com/item?id=18912628

15ai · on June 13, 2022

It was a suggestion from a Twitter user, and I found it clever.

Akronymus · on June 13, 2022

I am just gonna take this oppurtunity to say: THANK YOU. For your work on the site and the joy it brings. And thank you for not censoring input. Your site is simply the best out there for making characters voice copypastas and such.

layer8 · on June 12, 2022

Maybe they’re seeing a need for text-to-speech in the porn market?

Bytewave81 · on June 12, 2022

They knew.

latenightcoding · on June 12, 2022

bronnies

darkerside · on June 12, 2022

Unfortunately, I guess I've reached the stage of my life where there are only three choices I actually would recognize out of the entire selection

ntoskrnl · on June 12, 2022

The Chell voice from Portal is extremely accurate

BeFlatXIII · on June 12, 2022

How does she compare to the Gordon Freeman model?

blooalien · on June 12, 2022

100% accurate to be precise. ;)

deathanatos · on June 13, 2022

I think that depends on whether you subscribe to Chell not having a voice, or having a voice and simply not using it. The game is silent on whether Chell can't talk, or simply has nothing to say to the AI around her.

fdschonborn · on June 13, 2022

IIRC the official explanation of why Chell is silent through the games is that she's just _really_ pissed all the time.

joshxyz · on June 13, 2022

> DeepThroat (/ˈdēpˌTHrōt/): Natural emotive high-fidelity text-to-speech synthesis with minimal viable data

Shut up and take my money.

dunefox · on June 13, 2022

Redirecting to a YouTube video playing "never gonna give you up" after rejecting cookies...

foleac · on June 13, 2022

I suggest you read what you rejected.

dunefox · on June 13, 2022

It seems I have been conditioned by cookie laws.

convery · on June 12, 2022

Interesting how it seems like there's little correlation between source sample-size and quality. e.g. the Portal Sentry turret at 1.5min input vs the 100+ minutes of the narrator from Stanly Parable which sounded like auto-tune had a stroke.

jeroenhd · on June 12, 2022

The AI seems to work best on high-pitched, female voices. The model seems to have improved in this regard since I last tried this website, but it's still very significantly biased towards female voices it seems.

crooked-v · on June 12, 2022

Much of it depends on refinement work on each specific model. Try the Daria voices, for example, which are easy to get results with that sound like they came straight out of the show.

esjeon · on June 13, 2022

I think it's because the underlying(?) TTS can't really portrait how the narrator speaks, which is very exaggerating and highly varying in tempo. The key idea of the app should be that we can easily transform "voice AND emotional tone" of the underlying voice.

klankbrouwerij · on June 13, 2022

Nice work! Has anybody compared this to TorToiSe [1] ?

[1] https://github.com/neonbjb/tortoise-tts

peterlk · on June 13, 2022

I have no involvement in either of these companies, but I'll mention that this seems like a beta version of uberduck. Personally, I think uberdurk is awesome, and probably worth a look

VanTheBrand · on June 13, 2022

The disclaimer is a little ironic considering the site owner doesn’t own the model (MIT does) and doesn’t own the training data (the various shows and games do)

deeplearner1 · on June 13, 2022

MIT doesn’t own the model, where did you get that idea from? If you read through the website, it says that the developer alone owns everything related to the project, and the only funding he received from MIT was a small amount from the beginning.

It’s really strange reading these ignorant comments from HN…

15ai · on June 13, 2022

I'm sorry to tell you that I do, in fact, own my own model. I have not been with MIT in years.

fny · on June 13, 2022

The use of emojis to determine sentiment is incredibly clever.

> The DeepThroat model is able to generate voices of varying degrees of emotion despite never having been exposed to emotive data of the character during training. Furthermore, multiple characters can be trained simultaneously, significantly reducing the amount of time required compared to if one were to train the character models individually.

Can't wait to read the paper.

faitswulff · on June 13, 2022

That's interesting, I wonder if it can account for any drifts in meaning across generations: https://www.wsj.com/articles/sending-a-smiley-face-make-sure...

mechanical-sen · on June 13, 2022

The emoji thing was a 4chan discovery. As was the idea to use both Arpabet and English, rather than just one of the two.

disprog · on June 12, 2022

First randomly generated text was "Have you read your SICP today?"

Brings back fond memories of /prog/ and all its nonsense. Praise The Sussman!

15ai · on June 13, 2022

To be honest, I thought more people on HN would appreciate the little in-jokes that I added to the website (such as the Interjection copypasta and the multiple Rickrolls hidden throughout), but it's nice to see that some people do still share my lame attempt at humor :)

kellymore · on June 14, 2022

Are you open to talking about your research and how devs can go about building something like this from scratch? How can people best connect with you?

Akronymus · on June 13, 2022

Is there a way to retrieve a list of all the possible random sample texts? I tried going through the JS, but vue is too confusing for my small brain.

jordanwallwork · on June 13, 2022

Just search the source for one of the random texts, they're all just there in an array

Akronymus · on June 13, 2022

Oh, thanks I must've missed it.

encryptluks2 · on June 13, 2022

Would be hilarious if Google deranks their site and bans their YouTube for redirecting you to a Rick Roll YouTube video for rejecting their terms.

Communitivity · on June 13, 2022

Bravo, nice work!

One thing I noticed experimenting with it, questions don't have the typical rise in pitch at the end of a question in English.

s-xyz · on June 12, 2022

The DeepThroat model? Sounds familiar…

lagrange77 · on June 12, 2022

I only get white noise after trying several inputs. Alignment Confidence > 80%

esjeon · on June 12, 2022

Very interesting. How does this thing work? TTS engine tuned with learning?

rglover · on June 13, 2022

Had way too much fun making Spongebob say disgusting things.

vehemenz · on June 12, 2022

That's a lot of SpongeBob and My Little Pony characters. At this point, is it fair to say the attachment to kids' cartoons is a cultural (or pathological) phenomenon for under 30s?

0xedd · on June 13, 2022

Amazing and inspiring. Hat's off to you.

ausbah · on June 13, 2022

shouldn't there be a more descriptive title for this

thow34wqway · on June 13, 2022

How is this useful? They dont share code. Waste of time

kybernetyk · on June 13, 2022

meh/10

quickthrower2 · on June 12, 2022

What is the tldr. Got a wall of terms of service I didn’t want to agree too and clicking reject was a Rickroll.

claviska · on June 12, 2022

I appreciate the intent, and I understand that many people will do the wrong thing so this was probably an attempt to get such folks to actually read and adhere to the TOS, but the obnoxious consent dialog with a mandatory countdown turned me off. It’s probably not effective, either.

On desktop, maybe I’d open dev tools and remove it. On mobile, I won’t be bothered. I hate that this is what the web has become and I choose to simply miss out on websites that behave this way.

sophiebits · on June 12, 2022

Weird, I read through the text because I care about how I’m allowed to use the things people are giving me – and by the time I got to the Accept button, it was enabled.

tesrx · on June 12, 2022

I guess the creator added that specifically for the YouTubers and TikTokers who failed to credit the site: https://twitter.com/fifteenai/status/1522289980420337666

daniel_iversen · on June 13, 2022

Same here. And I figured if a hobby project had such a disclaimer it would be important and so I was interested to read what the rules were. But how spoiled and precious are we all today when we can’t even read a few paragraphs and accept some terms to be able to try something cool for free!

what-imright · on June 13, 2022

it kind of sucks, not accurate or convincing. if they spent less time making the disclaimer and more on the product… Reminds me of the 90s when everyone had a secret weapon IP in the making, until open source showed by example how futile and silly that approach was. You want people to use your work, because then they need you.

jhanschoo · on June 13, 2022

After reading the replies of the creator here on the effort that they have put into the project, would you still stand by this opinion?

what-imright · on June 13, 2022

Well, lets not confuse well intentions and hard circumstances with bad product and poor execution. My criticism is of what I experienced as a user, and it really doesn’t matter what the hacker news community or myself thinks, what the OP cares about is the user experience of a larger group. If you burn the user at the door it sets the tone of the experience and further, again, the product wasn’t impressive and the IP not worth defending. I mean where is this even going? If you train on actors voices they will come at you with an army of lawyers. Spongebob squarepants imitations aren’t going to pay the rent. There’s no game plan here

grog454 · on June 13, 2022

I prefer to know what it is I'm being given before I decide whether its worth it to figure out how I'm allowed to use it. To each their own.

dheera · on June 13, 2022

To use 15.ai without accepting the TOS just paste this in the console

    document.querySelector('.vm--modal').remove();document.querySelector('.vm--container').remove();

Or add it to some default script that is applied to all pages, that way when you visit 15.ai you never even see the TOS box in the first place.

15ai · on June 13, 2022

It seems a bit overkill to do this instead of just waiting 10 seconds to click on the little green button, but I suppose this is a viable alternative.

dheera · on June 13, 2022

The main difference is you get to use the service without actually agreeing to any TOS.

But also, 10 seconds is extremely annoying.

15ai · on June 13, 2022

I'm afraid it's a necessity after all the times that my work has been appropriated by companies and TikTokers/YouTubers. Yes, I am fully aware that most people will not read it. But at least I tried.

dheera · on June 13, 2022

(I mean, what I was saying is by getting rid of the box in the DOM, you avoid the issue of the TOS altogether since you "walk around it" instead of agreeing to it.)

Aren't we all appropriating the work of Newton, Maxwell, Einstein, and others? It's not like Maxwell's equations are copyrighted.

You're entitled to your opinions, but as a PhD myself I'd rather my research get used by people than end up in a copyright junkyard of things people can't use.

15ai · on June 13, 2022

I'd argue that there is a rather massive difference between invoking Maxwell's equations to invent GPS and literally plagiarizing my work by using it to broker partnerships with celebrities and subsequently selling my work as NFTs. (Yes, this really happened - I'm not making this up.)

darkerside · on June 12, 2022

I just want you to know that it was absolutely hilarious to hear (the first half of) this read in the voice of SpongeBob SquarePants.

jmspring · on June 12, 2022

I closed the page at the 15 second count down.

what-imright · on June 13, 2022

couldn’t agree more, the web has become user hostile and I will boycott sites and their services and products if they disrespect me by wasting my time try to trick me into agreeing to a list of smallprint demands

dheera · on June 13, 2022

Oh yep. I'll be reading an article and a damn popup appears midway through a sentence. I usually just quit the website at that point. I hope their bounce detectors pick up on it.

_gabe_ · on June 12, 2022

> All code and models used for this website were written and trained as part of my research at the Massachusetts Institute of Technology (MIT). The code and models are privately owned and are not to be sold or distributed for unauthorized use.

Does anybody else find the irony in this statement absolutely amazing lol.

redredrobot · on June 13, 2022

The author took someone else's IP as training data, trained a model on someone else's compute, and then gets extremely bent out of shape when others use the model without crediting them?

deeplearner1 · on June 13, 2022

This entire thread is honestly so disturbing, this comment especially. Not only is it rife with misinformation (using copyrighted material for training is totally legal and the whole project is paid out of pocket), but is it really that big of a deal to want credit for the work they’ve done? The developer has had their work stolen by companies, influencers, and grifters, and people here are getting pissy that they can’t wait 10 seconds to wait for a popup.

I don’t know why, but I honestly expected more from HN.

redredrobot · on June 13, 2022

You're right about the compute part being wrong. I never said it wasn't legal, just that they took someone else's work to train it. I would hope that voice synthesis is illegal without permission from the voice's owner, but I imagine it is untested so far.

But it's not just about the popup - it 's more that when your work is fundamentally about using reusing someone else's character, it feels pretty hypocritical to be so focused on making sure you get credit.

deeplearner1 · on June 13, 2022

Just curious. Do you feel the same way about DALL-E and Imagen?

redredrobot · on June 13, 2022

If they are used in a tool that lets you generate someone's likeness as part of user-specified new content, yes. But unlike 15.ai that isn't their core purpose and no such tool exists.

layer8 · on June 13, 2022

> wait 10 seconds to wait for a popup

The problem is that after having to wait for 10 seconds to reject their terms of service (which you should be able to reject right away) before even being able to see what the site is about, they are rickrolling you, effectively giving you the finger for not wanting to agree to their terms without context. That‘s quite unprofessional, counterproductive and antagonistic.

bilekas · on June 13, 2022

I share this sentiment entirely. There seems to be a growing trend on HN that negativity is popular. A project like this, to me at least, would seem to be right up HN's street.

Shame to see the toxicity over a passion project, whos creator generously went out of his way to answer the questions and ridiculous comments.

redredrobot · on June 13, 2022

I think there are a bunch of people who consider this work unethical or at least deeply in the grey. The negativity isn't that surprising

Kiro · on June 13, 2022

Just stop it. We need good vibes, not this toxic hate or we will drive the cool people away.

deeplearner1 · on June 13, 2022

Making things up out of thin air like “the creator used someone else’s compute” goes beyond negativity because someone thinks the project is in the grey. That is just straight up disinformation.

belter · on June 12, 2022

https://tlo.mit.edu/learn-about-intellectual-property/owners....

"...MIT owns inventions made or created by MIT faculty, students, staff, and others participating in sponsored research projects or in MIT programs using significant MIT funds or facilities or those inventions developed pursuant to a written agreement with MIT..."

I got RickRolled as soon as arriving to the page. :-)

keyle · on June 13, 2022

So is this a blanket approval for anyone with AI synthesis of voices, to sample hours of any _copyrighted content_ and come up with a TTS that is copyrighted to the new owner?

In other words, if I deep fake someone's photo on someone else's body, I own the rights of that 'model'?

folkrav · on June 13, 2022

I don't get it.

_gabe_ · on June 13, 2022

This is from the MIT license, which is the school he's doing research for (emphasis mine):

> Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so

His TOS is literally the antithesis to the free nature of the MIT license :)

15ai · on June 13, 2022

This is my fault. I do see how that part is worded ambiguously (I'll fix it later), but I have not been with MIT in years. Copy pasting what I've written in another comment:

To clarify: I have not been with MIT in years. I was paid the minimum hourly rate (roughly $14 an hour) to work on a related project during my undergraduate years, which eventually evolved into this project years down the road. (In fact, I had to pay for my own compute to get my work started - MIT never offered me any credits.)

And to address the philosophy behind the MIT license (also copy pasted from another comment):

For the past three years, I have done nothing but work on this project nonstop. I've been working on massive improvements (that some have pointed out in this thread) that I've been stuck on for the past several months, but I'm getting close to finishing that up.

I don't feel comfortable publishing or releasing anything until I know for a fact that I can make no further improvements. It's not out of corporate greed or anything like that - I'm just really paranoid about getting out the best work possible.

_gabe_ · on June 13, 2022

Gotcha, I have no problem with wanting to keep your personal work closed source. I was just under the impression that this had been created as research funded by MIT. If that's not the case, then sorry for the confusion :)

folkrav · on June 13, 2022

Being from MIT and releasing work under MIT license are two very, very different things.

eljimmy · on June 12, 2022

This is unrelated but what's with the fascination with HN users and My Little Pony? I've noticed this on a lot of posts in the past few months.

btown · on June 12, 2022

https://en.wikipedia.org/wiki/My_Little_Pony:_Friendship_Is_... explains in detail - between 2010 and ~2015 there was a massive overlap between millennial geek culture and unironic fandom of the rebooted My Little Pony show, especially among millennial men. One dedicated fan hub averaged almost 400k page views per day over its first 3.5 years of existence. And throughout it all, programming projects abounded, such as the delightful FiM++ esoteric language (https://esolangs.org/wiki/FiM%2B%2B) styled after the show's framing device. For many in tech now, it was an inescapable part of internet culture of the early 2010s, and a fond memory for many.

jonas21 · on June 12, 2022

One of my favorite examples from that era:

https://pjreddie.com/static/Redmon%20Resume.pdf

And in case you were wondering what this little pony did next...

https://scholar.google.com/citations?user=TDk_NfkAAAAJ&hl=en

Der_Einzige · on June 12, 2022

Wait, the guy who wrote darknet IS THE SAME GUY WHO DID THIS RESUME?

AHHHHHHHHH

smoldesu · on June 12, 2022

I mean, 15.ai started as a 4chan project for /mlp/ users to generate voice lines from official voice actors now that Friendship is Magic is over (google Pony Preservation Project). Honestly, the more impressive part is that a bunch of nobodies on an imageboard leapfrogged the rest of the industry and made a now-famous voice transformer model.

In the greater sense, though? Ponies have always been this weird relic of internet absurdity and bear-baiting. Some people rep it ironically, other people are dead-serious, but the community has significant overlap with the STEM field. As a result, a lot of pony-related stuff would end up propagating into the tech world, much like this very project.

jeroenhd · on June 12, 2022

Aside from the causal brony references, this project originally featured a lot of my little pony voices because it needed meticulously annotated transcriptions of the input audio to be trained well.

The extremely dedicated brony subculture voluntarily put in a lot of work to get a corpus for the AI to learn from.

There's also another factor at play: this AI works best with highly pitched voices, which my little pony is just full of. Not only did MLP provide such a generous source of training data, its results were also much more impressive than the dry dictation many other corpi would've resulted in, adding to its fame.

I personally haven't seen any significant rise in MLP references, though that could be because I don't know the show so I don't catch references to it. It's also very possible that you've caught the Baader-Meinhof phenomenon.

Der_Einzige · on June 12, 2022

My ML professor at the university I went to was also weirdly obsessed with MLP.

Weeaboo/furry data scientists are always ahead of the industry - I seem to recall an effective decensoring model that was called "DeepCreamPy" and had almost 10K github stars before it was nuked and rehosted.

I'm convinced that learning Statistics is in a zero-sum game with social skills.

canjobear · on June 12, 2022

A lot of people in tech circles have a sexual fixation on the show and its characters.

BeFlatXIII · on June 12, 2022

It's a good thing they're warehoused in cities and apartments, then.

crooked-v · on June 12, 2022

It's basically the same as unironic appreciation of various child-targeted-but-adult-friendly 'slice of life' anime, just more incongruous-seeming because of the 'pony' thing.

astrange · on June 13, 2022

Slice of life anime are targeted to college students and usually air too late at night for kids to catch them.

loves_mangoes · on June 12, 2022

A lot of people in or around tech are furries, are into things like japanese animation, or are into My Little Pony. I don't consider myself one, but people often jokingly say that furries run the Internet.

And it's not really specific to HN. For instance you have well-known people in the community who do vaccine R&D, or cryptography, or contribute to the C/C++ standards at ISO, or several other STEM things that are pretty outspoken about their interests.

This is made more obvious on Twitter, where people tend to blur their personal and work identities a lot.

drblue · on June 12, 2022

Friendship is Magic was a legitimately good show. (Or at least Season 1 and 2 were).

15ai · on June 13, 2022

Twilight Sparkle's voice is indispensable in getting emotional contextualizers to work properly. The logo and profile picture is an homage to that fact.

layer8 · on June 12, 2022

Well that is one shitty ToS dialog.

kettleballroll · on June 13, 2022

Site doesn't work on Firefox with noscript enabled because you can't get passed the ToS dialog (the 10 second timer never ends)