Hacker News new | past | comments | ask | show | jobs | submit login
Why computer voices still don't sound human (slate.com)
39 points by soundsop on March 4, 2009 | hide | past | favorite | 8 comments



We are still far from the HAL 9000 in Stanley Kubrick's "2001". However, once we are able to be understood emotionally and spoken to with humanesque emotions, the line between personal and impersonal software will become greyer.

Check out this online demo of the near current state-of-the-science in real-time speech synthesis (in many languages). It does not do natural pauses, for one thing, and there is a noticeable lack of tone and emotion. However, once real-time speech synthesis gets to the level of National Public Radio's "Selected Shorts" or books-on-tape, we'll be talking to our AI psychologists.

TTS demo (Flash based)

http://www.acapela-group.com/text-to-speech-interactive-demo...

NPR's Selected Shorts

http://www.symphonyspace.org/shorts


The article doesn't mention this, but a potential problem: The Da Vinci Code doesn't have <goodnews> tags. For a text-to-spech system to truly be able to read books naturally, it has to be able to parse emotion out of a text and figure out how to read the text based on the situation. We could have people annotate books, but that would give the TTS limited ability.


They should go back and study Alexander Bell's work on human physiology and the voice. I wonder if it would be possible to use virtual models of human vocal anatomy to inform the production of more accurate sounds in realtime?


Even if you just modeled a lung and rhythmic breathing, that could be really useful in figuring out when to put in more natural pauses.


That still only address superficial issues, not the ones that involve actually understanding what you read.

True text-to-speech that has that level of comprehension is AI-complete. (http://en.wikipedia.org/wiki/AI-complete)


"Why is Amazon's text-to-speech system so bad?"

I find that harsh. From the sample from "The Da Vinci Code", I think it's pretty good. It sounds a little bit like an old radio recording but it could be mistaken for a human voice if it were not for the flat intonations.

In any case, it sounds way better than the current text-to-speech on a Mac, which, even though has improved since 1984, doesn't sound that much different from the original Macintosh. (still have the synthesized sound to it)

So, no it's not perfect and sounds odd especially for dialogs but I was expecting something more like on Macs and was pleasantly surprised.


the last recording of <good news>These cookies are delicious</good news> is amazing. 'delicious' sounds almost human.


In fact, this article missed a very strong research team in this area. Take a look at how ATT Labs has achieved so far.

http://www.research.att.com/~ttsweb/tts/demo.php

Try the demo.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: