I still remember when the kindle first came out, it would read to you. Before publishers threatened to sue. With modern text to speech getting so good, there really is no reason an inference model couldn't run on every device to read to us. I guess book readers were the first to have their jobs replaced by AI, but contracts didn't let it happen.
I don’t know. I’ve been listening to an unabridged audiobook of the Count of Monte Cristo. The single narrator has done a fantastic job of giving a different voice to every single character. Even if a particular character hasn’t appeared for tens of hours, when they do reappear, I remember who they are from their unique voice.
Plus, the narrator does a great job of pronouncing names with a french accent (at least, it sounds legit to me, a non-french speaking person). I wonder how a computer voice would do with speaking English with a distinct French accent. Would it understand when to go more heavily English vs French?
Absolutely. There's a world of difference between a professional voice actor narrating an audiobook, and AI/amateur. Personally I can't listen to anything narrated by anyone other than (good) pro voice actors, it just kills the enjoyment.
On a similar note, Sanderson's own books in "graphic audio" format (multiple voice actors, sound effects, music, etc) are a wonderful piece of art and is my preferred way to consume audiobooks when possible. I don't see that being replaced with AI any time soon.
> I wonder how a computer voice would do with speaking English with a distinct French accent. Would it understand when to go more heavily English vs French?
If you can get a synthesized voice to speak French in a French accent, you can also get it to speak English in a French accent. That part's easy.
> Would it understand when to go more heavily English vs French?
I like Neil Gaiman's quote in there, and I agree if you record someone reading a book you can copyright that, but that copyright has no bearing on the copyright status of the recording of a another person or machine reading the book. If I own the content, I can format shift, that is actually in law in the US. Audio is a format.
I think what actually happened is that they convinced Amazon that there was more money in audiobooks than they thought, and it wasn't worth using it as a feature to sell kindles. Now Amazon knows how much money is in audiobooks and they aren't sharing. Are they surprised?
Copilot and chatgpt are going to replace me, so I say this with all humility, don't protect jobs from AI. They are saving us effort we can spend on other things. This is just industrialization taken another step.
My worry is that you could do something 60% as good for 1% of the cost, at which point a well-narrated audiobook becomes an extreme luxury good. Most people will pay $5-$10 for the "good enough" algorithmic version, and there aren't enough people who care to pay the fixed cost of Michael Kramer doing a version.
Chat gpt can already parrot back some idea to you in the written "voice" of any famous historical figure you care to name, and remembers context from earlier in your session to inform its written inflection as well. Presumably this implies we have "line of sight" to doing something analogous in the audio space, at least in this generation. Certainly if you fed a whole book into chat gpt that it had never read before and asked it to describe the intonation of a character's voice it would have some level of accuracy (e.g. "husky" vs "meek") so I think we would want to do something similar for the AI reading. It could also probably pick up on context in what it's reading and read it with emotion.
In 2007 the state of playback was pitiful. Of course the authors guild saw the trend lines, but even in 2022, we know Siri/Alexa don't sound human at all (my pre-teens make fun of them).
It's all possible but I doubt it'll be here in 10 years.