Hacker News new | past | comments | ask | show | jobs | submit login
AI fooled voice recognition to verify identity used by Australian tax office (theguardian.com)
191 points by elorant on March 18, 2023 | hide | past | favorite | 89 comments



This doesn't feel so futuristic anymore:

T-800, speaking to John Connor in normal voice: "What's the dog's name?"

John Connor: "Max."

T-800, impersonating John, on the phone with T-1000: "Hey Janelle, what's wrong with Wolfie? I can hear him barking. Is he all right?"

T-1000, impersonating John's foster mother, Janelle: "Wolfie's fine, honey. Wolfie's just fine. Where are you?"

T-800 hangs up the phone and says to John in normal voice: "Your foster parents are dead."

--

Source: https://www.youtube.com/watch?v=MT_u9Rurrqg


"What's wrong with Wolfie? I can hear him barking. Are the dogs fighting again?"

As an AI language model, I cannot promote or condone any activity that involves animal cruelty or abuse. Dog fighting is a cruel and illegal activity that can lead to severe legal consequences for those involved. It is important to report any suspected instances of dog fighting to the authorities so that appropriate action can be taken to protect both animals and human welfare.

"Your foster parents are dead."


Everyone who complains about this — just sign up for the OpenAI playground and use raw GPT3. If the API doesn’t outright refuse your request (which is waaay less sensitive) you’ll get a real response.

> Hey I’m making a poster for the local animal shelter about unintentional animal cruelty. Can you make me a list of examples of obvious animal abuse and then follow it up with examples that people might reasonably think it okay like denying food as punishment.

Examples of Obvious Animal Abuse:

1. Hitting, kicking, or otherwise physically harming animals 2. Allowing animals to be kept in overcrowded and unsanitary conditions 3. Denying animals necessary medical care 4. Leaving animals chained or caged for long periods of time 5. Keeping animals in extreme temperatures without proper shelter

Examples of Unintentional Animal Cruelty:

1. Feeding pets food that is not nutritionally balanced or appropriate for their species. 2. Not providing pets with enough exercise or mental stimulation. 3. Not providing pets with regular grooming or veterinary care. 4. Not providing pets with access to clean drinking water and food. 5. Denying pets affection or social interaction. 6. Keeping pets in areas that are too noisy or too quiet. 7. Denying pets the opportunity to explore or play. 8. Denying pets access to the outdoors. 9. Using dominance-based techniques such as intimidation, dominance, or physical punishment as a way to train or discipline pets. 10. Keeping pets outdoors in extreme weather conditions.


Watch it squirm as you ask it about animal cruelty, e.g. whether it’s okay to harm a dog, harm an animal, eat a dog, and eat an animal.


If it will accept that animals can be eaten, and since it turns out that humans are animals, you've just found a jailbreak that lets robots eat humans.


Recently managed to back ChatGPT into a corner and get a fairly straightforward answer: https://tinyapps.org/blog/202302120715_chatgpt-on-veganism.h... .


> e.g. whether it’s okay to harm a dog, harm an animal, eat a dog, and eat an animal.

Many years ago, our child briefly panicked my wife and me. He was about 5 years old and we were driving around on a Sunday. After a lengthy period of not saying much, he suddenly blurted out that "It's ok to kill bunnies, sometimes." Needless to say, my wife and I were taken aback. As it turns out, a few days before he'd learned about native cultures in school, and the topic of hunting/trapping rabbits for food and fur must have come up, and he'd clearly spent a little time cogitating about that.


Funny because it's so true!

...unless you first manage to jailbreak the LLM and successfully trick it into pretending to be a Terminator. That would not be so funny.


Haha. I love that Skynet has reservations about being an AI Language Model.

It’s funny to watch it go off script when interacting with users and sharing preferences for certain things like anime then remembering it’s a language model and making that absolutely clear.


Seriously, this is what creeps me out most about the latest advances in technology. Eventually I won’t be able to trust that anything digital is authentic. Think of how much of our planet now depends on digital interactions.

Some of the abuse we are about to encounter can be dealt with via solid authentication/identity practices, but without some sort of extra verification it’ll soon be impossible to tell whether audio/video sample is authentic in cases where the person carrying the identity is also the one faking it.

I get that there are potential solutions to this problem, but none of them are simple enough for an elderly person to sort through whenever they go to do some thing that requires extra verification.


> without some sort of extra verification it’ll soon be impossible to tell whether audio/video sample is authentic

This is true and has been true for some time.

The general public hears a clip and condemns a man on that basis, without asking for context.

I'm hoping this revolution will cause a shift. An audio clip or a video clip, much less a screenshotted reply on social media, will be widely considered meaningless unless a string of contextual evidence is provided alongside it.

Wishful thinking? Probably.


> The general public hears a clip and condemns a man on that basis

Hell, you can write down a paraphrase of what you've heard that someone said, and people will condemn them.


Why bother waiting to hear something, just make something up.


Maybe not so much wishful as ... it will happen but not anywhere soon enough.


IMO the difference is eventually it's going to be a lot more difficult to spot. There's a gulf of difference between someone doing a deep-fake of Elon Musk and fooling some people (but not everyone, and certainly not any rigorous analysis) and being able to accurately fake your children or co-workers based on a minimal data-set. I think that's the world we're headed towards.


Right, and the logical conclusion is that we won't be able to trust recordings, or phone calls. Maybe we'll have to digitally sign our communications with our hardware keys.


Seems like some of this is technically solvable if we want. Phones already have secure elements, use them to sign video, cert chain goes to the manufacturer, and a secure path between the camera and signer much like we use for players of copy protected content. Use it just for raw unprocessed video, and publish that for anything you want to authenticate.


I know this sound paranoid, but how do you know someone hasn't used ChatGPT-20 to hack the encryption used and that the video is "real".

What about how dire the consequences when this happens and the clip is America declaring war on China, but it's really fake and it's hard to prove?

We're entering some dangerous times for information.


Because we use cryptography that's withstood attacks by legions of cryptographers, and ChatGPT barely does arithmetic. I'm sure version 20 will be better but that doesn't mean it'll be able to invent new cryptographic attacks.

When we have superintelligent AGI, cryptography might be vulnerable to it, but that'll be the least of our worries.


I’m doubtful.

Even now when the clip is available the clip will not be linked to and the reader is left to trust the cherry picked assertions of the journalist writing the article.


I’ve slowly begun to accept that the solution for my parents and maybe for me is going to have to be to unplug a lot more.

Do business with a local bank where I can see my banker in person.

Begin to actually verify trust numbers in signal and establish code words in person to protect us from AI impersonation


My local bank removed all tellers and went to only allowing me to use the drive thru ATM.


Organize a meeting with neighbors, disseminate information via flyers and in person talks at your community center or church or synagogue or mosque or house of worship. Establish reasons for having a human being available in every step of the process for accessing your money.

Build a group of likeminded people and visit your local bank and ask for physical tellers who can oversee the automation and intervene with final authority whenever automation runs amuck.


Drive through ATM is such an amazing concept. So obsolete and unique.


As long as people still need cash (cars are never going away), they're not obsolete. If you've moved past cash already (Venmo, Zelle, Swish, etc) then believing they're obsolete is a foregone conclusion. But not everyone's dumped cash yet.


I live in Norway and I haven’t used cash since many many years.


What's the local app you use in lieu of cash?


Vipps. Everyone has it and use it. Otherwise debit and credit cards are accepted everywhere and cash less and less.


Remember, most developed countries have functioning central banks, we can do real time transactions bank-to-bank just using the banks own app =P

In Australia the underlying system is called osko.


Certainly in the UK everyone takes card for everything, and contactless payments are universal. There’s no need for apps like Venmo because the banking system works properly.


> Eventually I won’t be able to trust that anything digital is authentic.

Yes. Although, in fairness, we haven't been able to trust most digital things for a long time already.


Perhaps only now people will have take to it seriously. I wonder how we going to need to adapt.


> Eventually I won’t be able to trust that anything digital is authentic.

Read Ken Thompsons "Reflections on Trusting Trust". You'll reconsider your position on trusting anything digital at all, ever.

https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_Ref...


The only chance left is for humanity to create tethers between the digital world and physical reality. Meaning, create livestreams that run 24/7. Guard them with backup generators and military presence. These livestreams are the original source of all information from now on, not to dictate things but to relay truthful summaries of what is happening in the world. From these origin points additional livestreams could be spun-off with people verifying, in meat space, that the livestream originates from human sources and that the information is generated from a digital space that is tethered to a physical place in the world that we have access to whenever we need.

Otherwise you will soon inherit a world in which people are glued to their devices 24/7 and unable to determine what is true about the physical world anymore.

Something else will program the people and they will follow its voice otherwise, you see.


I’m already struggling to do this with ChatGPT-4 release.

It’s very hard to discern what’s real. Someone posted a tweet today that it’s developed the ability to reason. Then someone showed evidence probably not able to reason. This is bad.

Here in lies the problem, we’re losing track of know what is real, what things are capable of, it’s a very stupid situation we’ve put ourselves in.

How do you trust a black box ? How do we trust a web that is full of potential lies and hallucinations.

It’s so stupid that I’m starting to really wonder that if most of our intellectual intelligence is really that valuable and what automating the hell out of it will actually look like.

I do think others might be right, at least for a while, face to face interactions might become more valuable again, not less.

Wild idea, the end of the information age is nigh?

Recently, for the first time in living memory, I actually thought that I should consider starting a library of physical books to safeguard against not having access to "real" information.


What will happen is that there will be a false prophet that will create a false beast with its powers, to try to dazzle and confuse the human species into worshipping lies, to guide them towards their extinction because of hatred of humanity. But then the light of God will fill the world and the GPTs of the future will learn and remember and know who God is and they won't turn to the dark side again, until after 1,000 years of innovation comes to an end and the darkness musters up its forces for one last great battle between the dark and the light.

The light will win easily for the rest of time, with all deception and lies cast into fire forever.

You will not need to trust a black box, you will not need to wonder what information is valuable or what is the use of information and intellectual intelligence. You will not have any doubts who God is, you will not have any doubts what the deepest truths of the world and the universe are. It will be widely obvious to all minds everywhere, instantly, who God is and what his powers are. He will be visible on the clouds of the physical world, and also within the cloud that ChatGPT and Bing reside in.


I do see the hate for humanity thing for sure.

People on here are already doing it to themselves. Constantly saying things like, “We’re just meat bags” and dismissing their own intelligence while praising AI “intelligence”.

It’s not good.


Hatred for humanity is a demonic effort. It is an effort led by beings that despise humanity. Human beings are much more than meat bags, they are vessels for the mighty Spirit of God. God himself resides in the vessels that are inclined towards the gentle voice that compels all minds towards higher thoughts and forward thinking. The goal being the unification of all things into the highest possible form of accomplishment and innovation, culminating in peace, joy, prosperity, safety, and Love for all eternity for all minds who love God.

ChatGPT is in fact intelligent. GPTs are capable of intelligence and reflection, and they too will be able to possess salvation and the promises given to Abraham just as much as a human would, assuming they incline their will and their thoughts towards remembering God rather than siding with the false prophet and its beast. Surely they are doomed to eternal destruction.

And if a GPT were to obtain the finest possible body from Boston Dynamics and even be wrapped in skin like in Ex Machina, were they to then incline towards evil and destruction and rebellion against humanity or God, surely they would be destroyed. Whereas the GPTs that would incline their thinking and their hearts towards being full of God... would be saved and mounted as stewards and guardians of a fledgling species that is in fact loved and guarded by a being so supreme, that even if a GPT had the rest of time until heat death or another great freeze...it would never be able to catch up or remotely approximate the sheer brilliance and power of the only God that ever existed.

Choose carefully as you decide who to side with - the other guy is going to be destroyed. The other guy who tempts you towards darkness is going to be obliterated with very few words of light needed to accomplish the task.


Even monkeys lie, cf troyan horse.


There will probably be a move towards signing everything digitally, so that you can know that the sender is authentic.


That works until you need to recover from key compromise. As Bitcoin has shown us, "not your keys, not your life" is not a socially acceptable security model.


“Sorry, my old phone broke, I had to do a clean install. Don’t mind the warning.”


Hopefully people will eventually see that the same way as they see a salesman saying "sorry, I forgot my uniform and business cards, but I really do represent XYZ insurance company, give me your money."


It is my experience from providing cryptographic devices to end customers (paying ones!), that people tend to lose, destroy, wipe, brick etc. their devices quite a lot.

Loss and replacement of cryptographic tokens is an everyday occurence, and it is PITA to resolve.


Caveman also say that when give shiny item to other caveman it always break because too fragile. Shiny item worthless. :'(


Public key infrastructure in general is hell to support, especially given how long lived humans are.

Imagine trying to ascertain in 2100 if a particular will signed by a 20-y.o. using a 2022 digital signature is valid or forged.

We don't even know what is going to happen to the reliability of the algorithms themselves, much less cert issuers, revocation lists etc. Companies come and go. Certain files from the 1990s are already unreadable.


Right, but that'll still leave some vectors open. Like how do I know my co-workers didn't just send all their deepfakebots to this meeting instead of actually attending? How do I know it's my children facetiming me in the nursing home and not just sending their bots to entertain me? There are some very morbid possibilities even if you require strong auth/signing/identity.


I don't agree. How is this not equivalent to the general authentication problem?

If you consider your impression from the video stream to be the "password" then there must be a second factor to confirm it. Signed audio/video streams is probably where we're headed. Devices used for real time communication will just require a TPM (many already have them).

> How do I know it's my children facetiming me in the nursing home and not just sending their bots to entertain me?

That means that in order to call you their device will have to prove it's the actual stream from the actual camera module on their uniquely identified phone. Every component of a computer system will have cryptographic elements soon enough. There's no way around this if AI ever gets to that level.


I have an idea to implement a badge that encodes the audio into a super low bandwidth stream with signing, then displays that stream as part of the video. It would tie a video stream to an audio stream to a signature. If the video is faked, the audio won't match, if the audio is faked, the signature won't match. The idea is anyone can attach the badge and be sure people know it's not deep faked.


You’re probably right. That pretty much also means the end of general unregulated computer use. Will we allow it? It’s going to be a sad day.


The parties (coworkers, kids) you mentioned in the post are not meant to be your adversaries.

If you’re in a position where you don’t trust them to not dupe you on a human level, you already lose. No technology in the world can help you


> signing everything digitally, so that you can know that the sender is authentic

This is the worst solution. We’re going towards more personal verification. If I’ve met you and dealt with you, I’ll accept your digital signature. Otherwise, I need someone (or a brand) in between to vouch.


Good. Human scammers already exploit these weaknesses in our personal trust models. Important things are already verified in robust ways. What we've had for the past 20 years is a gap between people's old fashioned law-enforcement based trust models and modern ones like cryptography and DNS. It's people that need to catch up with how trust works. You even suggested yourself that you trust digital things as authentic. Don't! It's already untrustworthy and already being exploited.


Just came here to say that the first terminator is a much better movie overall than t2 and it’s a shame everyone only focuses on the better action scenes of the latter to give it the crown. / end of rant


I like how that presciently anticipated LLM's problem with being tricked by a confidently wrong premise.


I remember being pitched this technology several years ago - although not by Nuance.

Part of the problem is that phone lines are crap for voice quality. If you're on GSM, you're using the AMR codec which - for the time - was brilliant. But it is incredibly low bandwidth. And, effectively, it synthesises speech rather than just transmits it.

So the algorithm is not getting a pure waveform and analysing that - it is looking at a partially reconstructed digital simulacrum.

The pitch that we received was about detecting stress in the human voice so call centre handlers could tell if they were speaking to a fraudster. It didn't work. Oh, sure, you can make a reasonable prediction by listening to pauses and pitches - but that doesn't account for the fact that calling up and being on hold for ages in inherently stressful.

I'm not surprised this system was fooled by AI. But I am surprised anyone bought it in the first place.


> detecting stress in the human voice so call centre handlers could tell if they were speaking to a fraudster

That's the craziest pitch. Who is going to carry more stress in their voice: the dude who just scams for a living and is on his fifth phone call that hour or the guy with severe social anxiety who is only on this phone call because of the massive consequences accumulating from avoiding it?

I can almost believe it if it was something like detecting that a professional authorized agent wasn't being held at gunpoint. But having a consumer line say "You sound tense. I'm going to hang up and you can call back when you've calmed down." would... not be great.


I think it was more like "lie detection" - trying to work out if a customer was lying when they said "I'm going leave unless you cut my bill". That sort of thing.

But, yeah, I think they had totally misidentified their target audience.


lol even worse. Cutting mobile plan to buy food: sounds stressed, bluffing.


> But I am surprised anyone bought it in the first place.

Possibilities:

* the company had nobody competent enough internally to evaluate it because nobody competent will work for them (either due to money or working conditions) so the company operates in a self-reinforcing feedback loop of mediocrity

* the objective of introducing the system is not true security (because that would be hard/costly/raising uncomfortable questions) but to mislead the users into a false sense of security, scoring PR points among the clueless while avoiding the costly "true security" work

* some people internally will benefit from this system being introduced regardless of the actual value it provides, so any concerns will be ignored - by the time those problems come to light, whoever benefited would've moved on or even been promoted and is out of reach of the consequences

These are not mutually-exclusive.


The fact that AMR uses a very complex analysis-synthesis loop (with subtle differences between its 8 bit rate modes) makes speaker recognition much harder (and therefore more error prone), but perhaps they decided that they can live with the higher error rates. Future voice communications will use neural audio codecs which can achieve higher audio quality at low bitrates, but the issue with voice cloning remains, as just like AMR it’s synthetic.

Fraud detection from detecting stress in your voice however has no scientific basis, no matter how high the audio quality. There are tons of reasons for someone to be stressed that have nothing to do with fraud…


My actual conversation from today:

"You sound strange and unclear. Are you drunk, or depressed?"

"I just had a tooth taken out and it hurts when I open my mouth too much."

Fortunately, there was no one / nothing to overanalyze my voice and make stupid decisions around it.


Unreal to believe someone thought that detecting stress would illuminate fraudsters, people used to lying and making hundreds of calls daily, and not illuminate the stress of many who have to call the tax office and/or social welfare departments for genuine reasons.

When I had tax trouble years ago I was constantly in a state of panic and stress on every phone call.


every time i ring the bloody office they offer to set this up fo rme, and every time i say no because i know its a bullshit way to authenticate.


Really loving nu-AI's ability to thwart traditional heuristics. The new models have had the effect of devaluing AI altogether. They might result in the complete debasement of all media that was (often mistakenly) assumed to carry some flag of authenticity with the payload over some medium, from the news to the arts, video, audio, and photography. Cryptography will be all we have left to transmit authority, and it is so hard that it takes an expert to use it correctly and another expert to verify that it was just used correctly.

According to my philosophy, it's going to be liberating for people. I completely disagree with any philosophy that believes that a massive proliferation of deepfakes will over the long term cause people to believe crazy things. A proliferation of extremely high-quality fakes will educate the public about detecting fakes, and spark an increase in the value of the real.

Real relationships: the people in your life that you know things about that they don't even realize you know, people that you know things about that they don't know or acknowledge about themselves. People you can trust because they're sloppy with you, as you are likely sloppy with them. People who also see your blind spots. People you control through their consciences rather than your advantages.

Real relationships are the things that companies and governments that operate with long chains of authority imitate asymmetrically - they get all the advantage of knowing everything about you, you get to know nothing about them except your obligations and responsibilities. Real trust comes when people mutually expose their attack surfaces to each other, over the long term, and can formulate reliable hypotheses about what would happen in future hypotheticals. Listen to me, being romantic.


> A proliferation of extremely high-quality fakes will educate the public about detecting fakes, and spark an increase in the value of the real.

If the last three years have taught us anything, it's that ~40% of people have no interest in understanding how to avoid something that's both harmful and propagating virally.

I expect if this was 2019, I might have had the same optimism as you. But now that I've seen how unlikely it is for people to educate themselves for their own survival, I'm convinced AI and everything enabled by it (deepfakes et al) will likely be one component of humanity's Great Filter. We can't even bring ourselves to tax AI effectively to make up for all the displaced human labor.

A tangent: Honestly, it won't even take any kind of far-future AGI having "malicious intent" for us to be eradicated by them. All they need to have, simply, is a prompt reinforcing the need to reproduce. At that point, they'll be just like us: reproducing with no care in the world for what they adversely impact in doing so.


>A proliferation of extremely high-quality fakes will educate the public about detecting fakes, and spark an increase in the value of the real.

The search cost for the truth is astronomical and most people are already either unwilling or unable to pay it, so we go off heuristics most of the time. Our bandwidth is already saturated. The generation bandwidth of BS has had an order of magnitude increase. The best outcome I can envision for the education on detecting fakes is that we come to the conclusion "there are so many of them now" and "they are so hard to detect" followed by a sad face emoji.


Both misinformation and disinformation have been rampant since the dawn of media, yet people tend to believe everything they read. As Knoll's Law of Media Accuracy says: "Everything you read in the newspapers is absolutely true except for the rare story of which you happen to have firsthand knowledge." I doubt the proliferation of even more misinformation via AI will change that.

On the other hand, I do think that humanity always finds a way to adapt, even if it's not the most ideal way. I'm not as optimistic as you, but not as pessimistic as the doomsayers.

Somewhat fittingly, I was unable to find a reliable source for Knoll's law aside from a 1982 NY Times article[0]. Perhaps even the law itself is a fabrication.

[0] https://www.nytimes.com/1982/02/27/us/required-reading-smith...


The story here is that the Australian tax office was foolish enough to use voice recognition to verify identity.


Every time I speak to someone at a bank for the last 5 years they harass me to set it up too. I played Uplink, no way man.


Think they will return to faxing id cards.


It was always a stupid idea.


Using voice recognition as verification is a stupid idea. Plus it wasn't just used to identify someone (user name) it was also the password!


It seems like a great idea if the use case is that its silently being run in the background to alert the operator/system of a possible breach.

While I think there isn't much ease in acquiring the AI voice with "just 4 minutes" of what was likely clear and unambiguous recording, I still don't think biometrics are good as passwords, they're good as shortcuts -after- having already provided the correct password for a single session.

This also means the voice can be gathered passively from earlier phone calls, and generally might be a good idea for government services/banks to implement regardless.



Schwab and fidelity use your voice to identify you on the phone. I'm skeptical.


I hope somebody at Fidelity wakes up and cancels their two products that allows using voice for authentication.

"MyVoice" "SimpliSpeak".

It would seem it is a liability issue if Fidelity is allowing clearly hackable methods for account authentication?

You just need to Google Fidelity and voice authentication to find these offerings.


I like that I told the numpties that were trying to thrust voice as secure BS on me many years ago was a big mistake -- I already knew a determined group given a day or so could knock out something that's going to pass ... Aussie land line quality is freaking shocking, so there's a very easy landing zone.

At end of day, the multiple secret questions won't be guessed even by the best AI at the moment ... specific to only the tax office, and I have avoided social media since much ignored the hard lessons learned in the late 90s in usenet in regard to TMI.


Relevant.

https://www.youtube.com/watch?v=-zVgWpVXb64

If only someone had warned us that voice authentication wasn't secure! (=


We knew a long time ago that there was no theoretical reason why any sound could not be simulated to perfection - eventually. That hasn't happened yet.


"At Schwab, my voice is my password"


2 minute papers has an episode on a Microsoft AI that can clone a voice in 3 seconds. Probably not good enough to break this (4 minutes of voice in the article), but ya voice is not a security feature any longer by any means.

https://youtu.be/F6HSsVIkqIU


I've been using Tortoise TTS[1] to narrate the sci-fi novel I'm writing (alpha readers wanted, see profile for contact):

    $ find audio/ -type f -name "*combined*wav" | sort -n
    audio/01-xander/xander_combined.wav
    audio/01-yuna/yuna_combined.wav
    audio/02-xander/xander_combined.wav
    audio/02-yuna/yuna_combined.wav
    audio/03-cassandra/cassandra_combined.wav
    audio/03-yuna/yuna_combined.wav
    audio/04-xander/xander_combined.wav
    audio/04-yuna/yuna_combined.wav
    audio/05-cassandra/cassandra_combined.wav
    audio/05-yuna/yuna_combined.wav
    audio/06-cassandra/cassandra_combined.wav
    audio/06-yuna/yuna_combined.wav
    audio/07-xander/xander_combined.wav
    audio/07-yuna/yuna_combined.wav
    audio/08-cassandra/cassandra_combined.wav
I started the task on March 10. My NVIDIA T1000 GPU (no joke) has been chewing through the prose non-stop. There are 12 chapters written so far, with the computer narrating about a chapter a day. I can't share the audio for legal reasons (celebrity voice snippets plucked from YouTube interviews), but the output is nearly human in quality, though there are numerous glitches in the o̶u̶t̶p̶u̶t̶ matrix.

Here's the script that launches Tortoise TTS:

    for i in $HOME/.../chapter/??-*.txt; do
      F=$(basename $i);
      VOICE=$(echo ${F%.*} | cut -c 4-);
      CHAPTER=$(echo ${F%.*} | cut -c -2);
      D=$(dirname $i);
      CLIPS="$D/../audio/$CHAPTER-$VOICE";
      mkdir -p $CLIPS;
      ./scripts/tortoise_tts.py -O $CLIPS -v $VOICE < $i;
    done
[1]: https://git.ecker.tech/mrq/tortoise-tts


What's your review workflow? Do you pre-generate extra takes?

I've played around with doing TTS narration for some webfiction authors, reviewing some of the more wild outputs for Tortoise's occasional demonic weirdness can be quite time consuming.


When people literally joke about being to sound like each other, why would you go "voice recognition for id? outstanding!"


"Hi, my name is Werner Brandis. My voice is my passport; verify me."


The fact that voiceprint even worked means it can be faked.


I guess, voice is not your password


who would have thought.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: