Researchers reach human parity in conversational speech recognition

Eridrus · on Oct 18, 2016

The actual paper has a section on error analysis that is particularly enlightening: https://arxiv.org/abs/1610.05256

On the CallHome dataset humans confuse words 4.1% of the time, but delete 6.5% of words, most commonly deleting the word "I".

Their ASR system confuses 6.5% of words on this dataset, but only deletes 3.3% of words, so depending on how you view this their claim about being better than humans isn't definitely true, if you consider the task to be speech recognition, rather than transcription.

Also, while the overall "word error rate" is lower than humans it's not clear if this is because the transcription service they used is not seeking perfect output, but rather good enough output and the errors the transcription service makes may not be as bad as the errors the ASR system makes in terms of how well you can recover the original meaning from the transcription.

It's clearly great work, but reaching human parity is marketing fluff.

xdh168 · on Oct 19, 2016

The conversational speech has been very difficult. The switchboard task has been around for over 20 years. We used the exact same published data and evaluation protocols. The Human Parity claim on the switchboard task is as scientific as we can be based on the best knowledge of ours.

derekja · on Oct 19, 2016

Ah, you finally made a hackernews account! Congrats, XD, that's a great milestone you guys hit.

joe_the_user · on Oct 18, 2016

I was recently at a bar where they showed a movie with incomprehensible subtitles (English to English). I assume this was because they skimped and bought automatic subtitling.

I think one important aspect is while humans miss words, they often get the sentence meaning correct. When computers miss words, they tend to substitute words that sound similar. That's readable if you have time but not necessarily as a stream of text going by...

ChrisClark · on Oct 18, 2016

It might have also been a 'bootleg' DVD from China or downloaded a version from there. I've had quite a few where they had done a horrible job translating into Chinese for the subtitles already, and then did a literal machine translation back to English.

The character on the screen said "Hello", the English subtitle said "You good." Which would be the literal translation of nihao.

Arkaad · on Oct 19, 2016

see also the infamous Star Wars Episode 3 Chinese bootleg "Do no want".

joe_the_user · on Oct 19, 2016

It English subtitling on an movie in English, as they often have at bars.

codekansas · on Oct 19, 2016

From an ML perspective, recently there's been a lot more work done on language modeling (e.g. billion word corpus). Good language models can usually fix those types of perspective. So hopefully you can expect to see that change in the future!

aswanson · on Oct 19, 2016

Hilarious that the grandparent states performance gained by deleting "I"from the training corpus and the child comment starts with that pronoun.

cloudjacker · on Oct 18, 2016

> I was recently at a bar where they showed a movie with incomprehensible subtitles (English to English). I assume this was because they skimped and bought automatic subtitling.

I don't think that is possible. Like, worse subtitling isn't even an option.

john_oshea · on Oct 18, 2016

"Trainspotting" would be an interesting challenge: <https://www.theguardian.com/books/2008/may/31/irvinewelsh>

WildUtah · on Oct 19, 2016

Don' You Go Rounin' Roun To Re Ro:

http://www.nbc.com/saturday-night-live/video/british-movie/n...

AnimalMuppet · on Oct 18, 2016

I'm not an expert on this at all, but my suspicion is that humans are more tolerant of missing words than we are of mistaken words. Especially a word like "I", we just assume it if it's missing.

So if this speech recognition is for creating a transcript for humans, this (in my uneducated opinion) isn't as good as humans do, at least not yet.

rhizome · on Oct 18, 2016

If nothing's changed in the past 15 years, anything under 95% accuracy is not accurate enough for automation without human intervention (copy editing).

nshm · on Oct 18, 2016

From the paper eight different recognizers combined together approached results of a single person reviewed once. Not really a parity.

rattray · on Oct 18, 2016

Isn't it substantially simpler + cheaper to combine 8 computer programs than to combine >1 people?

djsumdog · on Oct 18, 2016

Looks like they used various neural networks and trained based on data they currently already transcribe professionally. So it had to be trained and only represents a subset of people who speak and have their works transcribed at Microsoft events.

Being Microsoft, I'm sure the place is diverse, but there's no mention in the paper on accents or dialects that I can see (might just be missing it).

jpm_sd · on Oct 18, 2016

I look forward to being able to converse with Microsoft's research team as easily as I can with humans. I hope that one day, journalists can learn to write headlines with similarly low rates of error.

hashkb · on Oct 18, 2016

It's on purpose... journalists learned to do it this way.

avodonosov · on Oct 18, 2016

http://www.smbc-comics.com/comics/20090830.gif

namrog84 · on Oct 18, 2016

Linking the image instead of the site shall more people to miss out on the red button extra comic frame. And the meta hover text(xkcd style)(which there usually is) .

http://www.smbc-comics.com/?id=1623

mastazi · on Oct 18, 2016

Wait a second... I just realised I've been reading SMBC for years now and I've never noticed the red button! I'm mad at how much I must have missed out on but, at the same time, I'm glad you pointed that out!

uvesten · on Oct 18, 2016

Good for them! I'm a bit surprised that the researchers didn't already possess human-level speech recognition, though.

jameshart · on Oct 18, 2016

I wasn't sure which way round to parse it (ironically). Are they saying that Microsoft researchers are now almost as good at recognizing conversational speech as humans? Or that Microsoft researchers can now almost pass as human in conversational speech? Either way, good news for Microsoft Research, I think.

cptskippy · on Oct 18, 2016

Plot twist, Microsoft's researchers aren't human.

radarsat1 · on Oct 18, 2016

The term "human parity" refers to a comparison of the error rate, which is a single scalar summarizing performance in terms of mistakes made. It says nothing about the kind of mistakes, and I can easily imagine that machines qualitatively do not make at all the same kind of mistakes as humans. I'd be curious to know if the kind of mistakes machines make might strike human listeners as quite stupid, but maybe not.. many algorithms are getting better at taking into context and prior knowledge into account.

saidajigumi · on Oct 18, 2016

In my mind, this is analogous to the reasons why evaluation of lossy audio compression codecs requires human listening tests. Simply running some simplistic signal analysis like SNR (Signal-to-Noise Ratio) completely fails to capture the as-perceived quality of a compression implementation.

To explore that analogy: In the case of lossy audio compression, the compressor deliberately introduces quantization noise into the signal. It does so by running a "psychoacoustic model", which attempts to capture a broad quality of human hearing called auditory masking. There's a number of different kinds of masking[1]: a strong tonal sound creates an "umbrella" across nearby frequencies that can mask quieter noise-like sounds. Similarly there's noise-vs-noise masking, as well as forwards- and backwards- temporal masking. (Yes, backwards. A sound can mask perception of a sound that occurred before it.)

In the audio compression case, we've built an algorithm that attempts to characterize exploitable phenomena of human hearing. These masking characteristics aren't perceived quite the same by human listeners as the model, nor even the same between individual human listeners. Thus the need for human listening tests. These, due to the experimental care and human subjects required, are expensive.

Back to speech transcription. Say the end goal is "how well does a human comprehend this transcribed speech"? (e.g. vs some standard, such as the original speech, vs. the original speech transcribed by a skilled specialist, etc.) The problem starts to look pretty similar. We can cite numerical, word-centric error rates, but that fails to capture how well meaning is preserved and transmitted. Imagine a perverse algorithm that did a perfect transcription, but then dropped or altered words for maximum meaning obfuscation. It might equal or even beat the cited error rates but be much harder to actually comprehend.

[1] https://en.wikipedia.org/wiki/Auditory_masking

ArkyBeagle · on Oct 19, 2016

I've always evaluated audio codecs on differential signals - subtract A from A'. For telephony codecs, there are formal tests for MOS and/or PESQ .

saidajigumi · on Oct 19, 2016

Which, isn't substantially different than evaluating a codec on introduced SNR.

These classic signal processing analyses tell you nothing about the correct operation of a psychoacoustic model codec design (i.e. MP3, AAC, Vorbis, etc.) An analysis of A (original signal) vs. A' (signal passed through a compression-decompression cycle) is just extracting the quantization noise introduced by the codec. That provides no information about how effective the codec was in masking that noise with the original signal content.

To illustrate, imagine a perversely designed codec: it runs two models, the first "good model" is a normal psychoacoustic model. The second "bad model" is the one the compressor uses: it applies the total amount of quantization noise allowed by the good model, but applies it in ways that are maximally annoying to human listeners. This isn't just avoiding masking, it's things like using noise correlated to the original signal, which is generally more obtrusive than uncorrelated noise, etc.

A codec using just the "good model" and one using the "good + bad model" would have (by definition) exactly the same introduced noise, but the latter would sound FAR worse to a human listener.

ArkyBeagle · on Oct 19, 2016

I am not familiar with the term "introduced SNR"... Googling.

No joy.

I can't really follow your thinking above; sorry.

In audio, there is distortion, which is correlated with the signal. Noise is uncorrelated. Codec error would seem very much to me to be at least much more correlated than uncorrelated. MP3 artifacts sound at least more "phase-ey" than they sound anything like quantitization noise ( at least to me ). This may be because I have heard badly aligned tape machines make that sort of error happen. "Phase-ey" also triggers my (feeble) mind into thinking about allpass filters as the model for the error.

In terms of intelligibility, it's possible to improve intelligibility by adding noise, and by adding clipping - aviation comms does this at times. What destroys intelligibility is phenomes being destroyed by phase changes and bad amplitide errors ( where there are actually good amplitude errors ).

I've run an ABACUS voice quality analyzer several times, and I don't think it's purely a distortion analyzer - adding clipping at least can improve MOS/PESQ score surprisingly. Even more surprisingly, there's no mechanism available for calibrating gain staging on one.

saidajigumi · on Oct 20, 2016

Ah, that's a mistake, that should have read "introduced [quantization] noise", which reduces the SNR in A' vs A.

If I understand you correctly, your discussion around intelligibility primarily applies to signal processing of voice vs general audio. Codecs such as MP3, AAC, etc. don't have the luxury of making assumptions about the signal content, and so aren't designed along those principles. E.g. speech codecs can generally run well at much lower bitrates than general audio codecs because they operate on a constrained domain of audio (i.e. speech).

Regarding distortion management, see Rate-distortion optimization[1] for lossy audio codecs: where the purpose is to manage distortion within the limits of the bit rate supported by a communication channel or storage medium.

[1] https://en.wikipedia.org/wiki/Quantization_(signal_processin...

ArkyBeagle · on Oct 20, 2016

Thanks for clarifying.

bendykstra · on Oct 18, 2016

That reminds me of the photocopier that used a very poorly chosen image compression algorithm. It would sometimes substitute similar looking characters to save space. A $7000 figure in a budget might become $9000 and a person looking over the document would have no indication that the '9' had been cut and pasted from somewhere else on the page.

Google Voice transcription often makes easily detectable mistakes that I can mentally correct using context. Smarter, context-aware mistakes might be worse, if they are also more difficult to detect.

envy2 · on Oct 18, 2016

For anyone else wondering about that copier: http://www.theregister.co.uk/2013/08/06/xerox_copier_flaw_me...

szupie · on Oct 18, 2016

Section 9 in the paper[1] is all about comparing these mistakes between the system and humans. The most common mistakes for humans and the system are in tables 9—11.

We find that the artificial errors are substantially the same as human ones with one large exception confusions between backchannel words [acknowledgment words like “uh-huh”] and hesitations.

The difference they found, but suspect might be a result of the different transcription guidelines of the training corpus: we see that by far the most common error in the ASR system is the confusion of a hesitation in the reference for a backchannel in the hypothesis. People do not seem to have this problem.

[1]: https://arxiv.org/abs/1610.05256

radarsat1 · on Oct 18, 2016

Very cool that they investigated this! Thanks, I hadn't read the paper (obviously)

tominous · on Oct 19, 2016

The other day I asked Siri, "Remind me to pick up [daughter's name]" but it interpreted that as, "Remind me to pick up pussy." A more context-aware engine would realise we don't even have a cat. On the other hand it made me wonder to what extent Siri is trained on real-world data from other users.

WildUtah · on Oct 19, 2016

When SIRI reminds you to go grab that p-ssy, it's finally human enough to run for president.

WildUtah · on Oct 19, 2016

The term "human parity" refers to a comparison of the error

The term 'human parity' refers to adding up all the bits in a human and taking the residue to a convenient modulus to detect human error.

radarsat1 · on Oct 19, 2016

Sounds messy ;)

Animats · on Oct 18, 2016

Very nice. How long before something this good is available as open source?

A tough test would be to hook this up to a police/fire scanner, or air traffic control radio.

ChuckMcM · on Oct 18, 2016

They did post the code on github (https://github.com/Microsoft/CNTK) with the Microsoft open source license.

Presumably you could feed it speech from a running instance of gnu-radio.

ar15saveslives · on Oct 18, 2016

CNTK is just a toolkit, like tensorflow or theano. Code for the paper was not published.

azinman2 · on Oct 18, 2016

Let alone the datasets used to train, which are worth a lot of money.

zump · on Oct 19, 2016

Aussie company (Appen) provides the datasets :D

mtrimpe · on Oct 19, 2016

That's the one that seems to be using illegal GCHQ wiretaps ... see the (Dutch) article below about an Appen translator getting a private voicemail of her ex to translate.

http://www.volkskrant.nl/tech/privegesprekken-van-duizenden-...

zump · on Oct 19, 2016

No, it's GCHQ contracting Appen. They only deliver the platform for distributed outsourced transcription.

ankitbko · on Oct 19, 2016

I guess it would come in https://www.cris.ai/

skoocda · on Oct 18, 2016

The novel elements of this have already been released individually, particularly the lattice-free MMI training which can be run through Kaldi's nnet3 configuration.

Also, don't assume that a 0.4 % increase means drastically better real-world results. This dataset has been around longer than I have, so by this point Microsoft has just gotten really good at tuning.

nshm · on Oct 18, 2016

Open source Kaldi gives you 7.8%, not very far from 5.9, you can check the thread from the discussion of the first version of the paper https://news.ycombinator.com/item?id=12502119

grawlinson · on Oct 18, 2016

Or even mock emergency calls where the callers aren't in a rational state of mind.

grzm · on Oct 18, 2016

I don't have a background in this area, so I'm likely easily impressed, but this seems really impressive. And the acknowledgement that there's a lot of work to be done, such as discriminating between speakers and recognition in adverse environments. Yeah, it's Microsoft writing on their own technology, but they addressed in the text the questions I had already in mind from just reading the title. It didn't leave me with feeling that it's just a marketing piece.

> Still, he cautioned, true artificial intelligence is still on the distant horizon

It's frustrating when technologies like image and speech recognition and robotics are conflated with AI.

tree_of_item · on Oct 18, 2016

> It's frustrating when technologies like image and speech recognition and robotics are conflated with AI.

Are you kidding? Of course these things are examples of artificial intelligence. I don't understand why people keep moving the goalposts wrt "AI".

arstin · on Oct 18, 2016

If you take the goalposts to have been set by Turing's 1950 "Computing Machinery and Intelligence", the only people moving them are researchers who want to trump up their own work or marketers who want to sell things.

My own taste uses the word "AI" as you do in a permissive way to include simpler tasks (which in themselves are more elemental than useful) like identifying an object in an image and presenting a few straightforward interpretations of sentences. But what Turing stipulated an actual AI would be able to do is "reach parity" with a fully human conversation, with all of our knowledge and values and desires and subtlety of motivation. When you really dig into what goes on in a real conversation---the joking, the shades of meaning, the individual quirks and tribal patterns, the lying, the compassion, ambition, insecurity---I think it's hard not to admit that we're still quite far from that. We may not even want it! But exactly the genius of the Turing test for setting a rough standard for AI is that it demands competences far beyond mere language processing.

abecedarius · on Oct 18, 2016

Turing wasn't trying there to articulate a single target for AI research, he was responding to the claim that artificial minds were not possible even in principle. His response was basically, try this, if in open-ended conversation you couldn't tell one from a human, would you really still say there's no mind there? As you say, that's a really high bar from where we are, then or now.

I'm sure Turing's own goals for AI were broader than eventually passing the Turing test. He worked on neural nets himself and would've considered progress in perception as partial progress in AI.

arstin · on Oct 18, 2016

Yep! The assertion that Turing was setting a "single target for AI research" in the way you're using the phrase was clearly not what he was doing or, I hope, what I said. My intent was the opposite: to draw attention the massive range of "targets for research" which are already implicit in the original Turing Test in order to note that saying "there's still lots more to do" is hardly "moving the goalpost".

Just to link up the way you put things with the way I chose to here, his argument for how an artificial mind is possible proposes a reasonable, minimal standard which different sides can agree to---a criterion of, "well if it can do that then sure it's a mind!". And the choice of a open and unbounded conversation as the standard was brilliant because of the massive range of subcompetencies which are required for actually executing it (including, obviously, perception). Which, of course, we gradually continue to plod through in AI research.

abecedarius · on Oct 19, 2016

Agreed, and you're right that I mistook your intent.

madenine · on Oct 18, 2016

Its frustrating when people assume any reference to AI means AGI or strong AI.

AndrewKemendo · on Oct 18, 2016

It's not their fault really. Tv and movies, especially ones like the matrix, AI and Irobot call AGI, AI.

Doesn't make it any less frustrating, but also distinguishes the people in it versus those who aren't.

jcoffland · on Oct 18, 2016

I'm not so sure anyone was using the term AGI when the first Matrix movie came out. Anyway, you cannot expect Hollywood to always use the precisely correct term for things scientific without first pushing up your glasses.

gremlinsinc · on Oct 18, 2016

I don't get why that's frustrating, without image and speech recognition AI isn't possible. -- How intelligent could people be without sensory perception? Hellen Keller is the exception, take away the ability for humans to process sound, and images as a species and we wouldn't be nearly as advanced as we are today even if we still had the same brain structure and mental capacity.

Ray Kurzweil understood this and is why he paved the way for some of the first speech / image recognition platforms like ocr/fax/etc...

To me speech/image recognition is a precursor/adjunct of AI. You can have the former without the latter, but the latter will never be realized without those.

grzm · on Oct 18, 2016

Lots of good questions about what I mean by AI. I agree my statement wasn't very precise. And to be honest, that's a reflection of me not having a precise definition of what it is, exactly. However, I do think there's a distinction between speech/image recognition and what's commonly understood as AI, with speech and image recognition being sensory input systems, and AI in a more colloquial sense, such as learning and problem solving on at approximately human level. My statement was imprecise, and given I was expressing frustration at imprecision, my bad :)

Thanks for the feedback! Gives me more to think about.

ghurtado · on Oct 18, 2016

> I don't get why that's frustrating, without image and speech recognition AI isn't possible

Wait, what? The most commonly known (and perhaps oldest) AI test in no way requires either image nor speech recognition.

Millions of blind and deaf human beings would like to disagree with your claim that they are not intelligent or sentient beings.

Seriously, what is the basis of this claim?

gremlinsinc · on Oct 18, 2016

I'm not saying one can't be smart/intelligent without being able to see/hear, I'm saying if you take away all senses from ALL human beings there won't be any way to communicate at all, or know that others exist, and to learn from them. Learning is what makes intelligence possible--and the passing of information through generations. Hellen Keller fought extremely hard to overcome her sensory issues, but she had a good teacher, who presumably had someone else help them or guide them some. -- But to organically learn to speak when nobody can hear you, or to read when you can't even feel the braile, etc..would be nearly impossible.

nercht12 · on Oct 19, 2016

> Learning is what makes intelligence possible

Intelligence (at least in this case) is being able to take data and draw conclusions from it, but you just can't see such potential when you don't have input. Is a computer not a calculator when there's no software installed? No. It's still a calculator, but it just doesn't have inputs. One day, we may be able to give sight to the blind, but for now, considering such people without ANY learning capacity as "unintelligent" is still wrong.

ant512 · on Oct 18, 2016

On the other hand, if you had true AI first, you'd just say "here's an input device; figure out how to parse the data."

duaneb · on Oct 18, 2016

Perhaps because "true AI" is an illusion. There is always a better term that has meaning.

And yet people like Ray Kurzweil toss the term around as if it does have meaning. What's the point unless you define it? You might as well reference anti gravity or teleportation.

nix0n · on Oct 18, 2016

It's something to work towards, always on the horizon.

Kurzweil was a great inventor once but he's a prophet now and he needs to speak prophetically, not scientifically.

Retric · on Oct 18, 2016

The touring test shows AI has little to do with image or speech recognition. An AI that operates in a purely virtual environment would still be revolutionary.

gingerrr · on Oct 18, 2016

The Turing test doesn't test for intelligence, simply human-like communication. Things like lying, typing errors, or reaction to insult aren't indicators of an artificial intelligence, but are requirements for passing the Turing test.

One of the biggest criticisms of the Turing test is, because it's a behavioral or functional test, there is no way to ensure that the computer is actually thinking intelligently at all, rather than following a very-well-articulated rules engine.

ghurtado · on Oct 18, 2016

I can think of no better test of whether some specific being is intelligent than fooling another human into thinking they are.

Can you think of a better one?

> because it's a behavioral or functional test

What else would you test for in an AI other than its output and behavior? Conversely: how would you test a human for intelligence other than through its output and behavior?

JoeAltmaier · on Oct 18, 2016

Surely that depends upon the human? An infant would do a terrible job. An institutionalized vegetable would fail to provide meaningful information.

A not-very-bright cousin of mine has been heard responding to robo-calls. Would his opinion do?

I think 'intelligence' is ambiguous, but many parts of it can be measured to some degree. Lets give the AI the SAT test perhaps? Or a test of hypothetical arguments. Or ask it how it would tell if you were an intelligent being yourself. Anything that's even a little bit 'meta' would confound most 'AIs'.

We can tell if an AI is functional in some environment or responds well to social queues by interacting with it. But not much else. Not how 'intelligent' it is for instance.

ghurtado · on Oct 18, 2016

> Anything that's even a little bit 'meta' would confound most 'AIs'. This is precisely why the Turing Test is so tricky for machines. Knowing the parameters, any regular human could test even the most advanced present-day AI and reveal it as a machine with very little effort and only a few questions.

> We can tell if an AI is functional in some environment or responds well to social queues by interacting with it. But not much else. Not how 'intelligent' it is for instance.

The Turing Test does not intend to determine level of intelligence (it's not an IQ test). It is intended to determine whether the intelligence you're interacting with is advanced enough to fool you into thinking they are of human nature.

It is not a test of degree of intelligence, but rather of its nature. Human / not human are the only two possible outcomes.

Retric · on Oct 18, 2016

A test that confuses a human for a machine is not a problem. False negatives don't remove value from this test.

gingerrr · on Oct 18, 2016

That last bit is a false equivalence - we have no way with current science to truly introspect the "thought process" of another human, so functional judgment is the only kind we can exercise.

With AI - we built it. We can read the code that's running. We can make a distinction, in cases where output or behavior is identical, whether it's a neural network or a 12-million-line switch statement - and that value judgment means something to our determination of intelligence. There are plenty of ways other than (or in addition to) output to determine the "intelligence" of a machine that we simply can't exercise with other people.

Retric · on Oct 18, 2016

It directly tests for intelligence, you confuse false negatives with false positives. An intelligent person that fails the test because they don't know a common language is fine.

Note: Medical tests generate both false positives and false negatives, they are still useful.

gingerrr · on Oct 18, 2016

I don't understand what your example here about common language rebuts - that's something that would be solved for in the premise of the test, or it's not a fair test.

Is your argument that that person is still intelligent even if they've failed the test? Because that's the entirety of my point - the Turing test does not return a measure of intelligence, but of communicability.

Or is your point that if such a person fails it doesn't invalidate the test's measurement of intelligence? In a world where the test is implemented correctly, meaning where things like cross-language barriers are accounted for, failing to pass means failing to convince another person that you can communicate like a human would. If you fail, the only way you can consider that a "false negative" would be if you concede that the test is not a sufficient measure of intelligence but of ability to communicate - that's what makes it a FALSE negative.

Retric · on Oct 18, 2016

No, the point is false negatives don't invalidate a test. It requires both intelligence and communication abilities.

Cost and accuracy are obvious trade-offs, but sometimes you are willing to trade say cost and false positives to avoid false negatives say, a mass screening for HIV in a blood sample. In that case you need a cheap test and while a false positive has minimal cost a false negative could be deadly.

Highering is the opposite case. Micdonalds want's a cheap test (interview + background) and as long they get enough acceptable low level candidate from their pool that's enough.

As such the turing test can fit the second example. A hypothetical AI could hold a huge range of real jobs even if limited to purely text based communications.

PS: Let's flip it. A super intelligent AI in a box that can't communicate in any form. Without IO it's indistinguishable from a space heater.

gingerrr · on Oct 18, 2016

Your last example of the hyperintelligent space heater is exactly my point - the Turing test is a test of the (necessary) precondition of communication, but NOT a sufficient test of intelligence. In your example, the black box AI is assumed to be intelligent, but would still fail the Turing test - because intelligence is not a necessary condition for passing. Communicability IS necessary for the test.

A passing grade on the Turing test just indicates intelligence is possible, but excludes assured intelligence in the absence of communication ability, which is why I argue it is not a sufficient guarantee of intelligence.

You make a good point with the hiring example about the distinction between "intelligent" and "intelligent enough to do some things" - our disagreement may be stemming from us having differing definitions of "an intelligence". Need to think about it.

Retric · on Oct 19, 2016

Nothing you just said invalidates the test.

X is a demonstration of intelligence says nothing about not X.

If you can design something that passes an arbitrary touring tests yes it is intelligent. For example I could teach it any subject that works in text format. It could then pass an open ended essay test. And get a reasonable essay.

Now, you could do the same thing with a pig and it would fail the test despite being more intelligent than the average dog. That just means there are limits.

makomk · on Oct 19, 2016

This is one of the practical limits of the Turing Test as usually proposed - the parties could just refuse to jump through the hoop of writing an essay, and that would be a perfectly normal reaction. In practice, attempts to pass the Turing Test have leaned heavily on hand-coded strategies for changing the subject and feigning refusal to co-operate, with some success - for example https://en.wikipedia.org/wiki/Eugene_Goostman

Retric · on Oct 19, 2016

What I find most interesting about that strategy is in practice it fails. The strength is it can push people to wait longer before judging. The weakness is it fails to demonstrate intelligence.

gingerrr · on Oct 19, 2016

So help me, I think you've convinced me. I don't know what to do with all this leftover xkcd/386 though.

Thanks for the great chat.

wbl · on Oct 18, 2016

This is also true for pwople. How do you know humans are thinking and not following lots of social rules they learn thanks to special neural hardware for learning social interaction?

Think about the intelligence conveyed in ritualized greetings.

JumpCrisscross · on Oct 18, 2016

You are recapitulating Descartes. Consciousness appears to be immeasurable. You can only know that you are conscious. It is polite to give others the benefit of doubt.

pharrington · on Oct 18, 2016

The commenter said nothing about consciousness. He simply said that just being able to hold a conversation is not a particularly useful measure of intelligence. The broader form of that is that we don't even know if humans in general are intelligent.

Scarblac · on Oct 18, 2016

You can know that you are conscious, but how do you know you are intelligent?

Lxr · on Oct 18, 2016

I'm curious, what is your definition of AI?

PravlageTiem · on Oct 18, 2016

A machine process qualifies as AI if it believes in a god.

EDIT: I see the nuances of epistemological problems are lost to HN and knee-jerk culture war atheism still rules supreme.

imh · on Oct 19, 2016

FYI about your edit: You aren't being downvoted because of "knee-jerk culture war" or because people here are too dumb to understand your enlightened view of epistemology. Your original comment is an incredibly unique view with no explanation that people probably disagree with, and your edit makes you sound petulant, like you can't handle disagreement. That's the reason for the downvotes.

PravlageTiem · on Oct 19, 2016

I'm being downvoted because my comment spat in the face of those who think general artificial intelligence automatically implies some superior Nietzschean post-human that is qualified to run the perfect communist utopia since it no longer has distracting human emotions like greed or envy. Somewhere, a Sanfraninite spurted himself to completion while watching a live feed of indigenous humans tending to some noble savage rice field he rented on AgroBnb with Bitcoin, mined entirely by solar powered GPUs made out of free trade copper.

When you drag people out of their fantasies, they used to kill you. At least they downvote you these days, so I supposed that's better.

Yes, this is an incredibly unique view. Yes, people are going to react negatively to it because it fundamentally undermines their revenge fantasies. Yes, I can call them out on their unspoken biases no matter how uncomfortable that makes them.

Scarblac · on Oct 18, 2016

This is 2016, people don't believe in gods anymore. It'd have to believe in the paleo diet instead.

ArkyBeagle · on Oct 19, 2016

On Westworld, one of the hosts referred to its creator in the presence of the character Robert Ford. Ford declined to identify himself as that person. Rather a nice riff, I thought.

ghurtado · on Oct 18, 2016

What a useless definition. I could just as easily say that no being (artificial or not) can be considered intelligent so long as it believes in flying spaghetti monsters.

PravlageTiem · on Oct 18, 2016

I didn't say "intelligent". I said "AI", as in "artificial intelligence" which implies a simulation of human intelligence. Machine processes already exist that demonstrate, say, ant-like intelligence.

jcoffland · on Oct 18, 2016

ELIZA could be made to believe in God.

PravlageTiem · on Oct 18, 2016

Being made to believe in a god isn't the same as believing in one.

Silicon Valley thinks AI is not an epistemological problem, as if neurons can be perfectly simulated atomically and that all intelligence processes can be categorized as structured vs. unstructured. Very naive conclusions.

Ironically, they BELIEVE if you simulate the axiomatic neuron perfectly, emergent properties of intelligence will mystically emerge after some undefined threshold of complexity. They are permanently unable to simulate the two billion years of neurological evolution and natural selection for such a belief to hold.

If you must emulate human intelligence, you must be able to navigate the realm of distilling reality from the overlap between truth and belief. Mocking how humans observe reality isn't enough.

Abrahamic religion explored the alternative intelligence problem in great depth over two thousand years ago. It's a pity the results have been lost to Progressive axe grinding.

joemi · on Oct 18, 2016

> Abrahamic religion explored the alternative intelligence problem in great depth over two thousand years ago. It's a pity the results have been lost to Progressive axe grinding.

I'll take the bait: What in the world are you talking about?

PravlageTiem · on Oct 18, 2016

What is monotheism but a comprehensive collection of thought experiments regarding the behavior of a Bronze Age singularity? :D

ralusek · on Oct 18, 2016

For Artificial Intelligence to believe in God would be for it to believe in its makers, which are human. A machine would be intelligent to be aware that it was made by man.

A man doesn't need to believe in God to be intelligent. In fact, the two are pretty much inversely related.

pbhjpbhj · on Oct 18, 2016

>For Artificial Intelligence to believe in God would be for it to believe in its makers //

That's a very limited sense of the idea of a god, and certainly doesn't match with definitions of God [a singular, eternal, omnipotent, omniscient, deity] that I've come across.

Merely making something doesn't make you a god, not even if that thing appears to display intelligence. Some sense of one of the characteristics of existing in a separate spiritual realm, having power/knowledge beyond that possible in the present realm, having an existence that's not bounded (eg physically) within the normally experienced space of the "mortals". They seem like a start for basic level definitions of a god.

ArkyBeagle · on Oct 19, 2016

There are a lot of very smart people who are at at least Deist levels of faith. I'd say it's orthogonal. An example? C.S. Lewis.

There is nothing wrong with that sort of metaphor, no matter how much stress you place on it. If you think of religion as a technology, then it can be abused but the abuse doesn't mean it's bad.

jcoffland · on Oct 19, 2016

Or Donald Knuth.

healthnutter · on Oct 19, 2016

> A man doesn't need to believe in God to be intelligent. In fact, the two are pretty much inversely related.

Claiming as fact. Citation, please?

PravlageTiem · on Oct 18, 2016

Believing in a god is unique to the human species. No other animal does this. Thus, if you want full and complete simulation of human intelligence, this is the ultimate benchmark to test against.

Furthermore, the purpose of this thought experiment is that you CANNOT know what god an AI would end up believing in. That should make you rethink everything you think you know about theology and reexamine it under epistemological terms

aninhumer · on Oct 18, 2016

>Believing in a god is unique to the human species. No other animal does this.

Do you have some evidence of that? And even if it's true, is it the only thing that's unique to humans?

Indeed, I suspect the interesting part of "believe in god" in terms of intelligence is probably "construct narratives". Assigning a high probability to those narratives reflecting reality (i.e. believing) seems like an aside to the main complexity involved.

>"The anthropologists got it wrong when they named our species Homo sapiens ('wise man'). In any case it's an arrogant and bigheaded thing to say, wisdom being one of our least evident features. In reality, we are Pan narrans, the storytelling chimpanzee." - Terry Pratchett, The Science of Discworld II: The Globe

PravlageTiem · on Oct 18, 2016

The best case for animals demonstrating religious behavior (which is fundamentally different than believing in a god, mind you) is from a very convoluted interpretation of a food coma.

http://bigthink.com/videos/can-animals-be-religious

Outside of that considerable stretching, it doesn't exist. Also, I challenge your premise that conflates "belief" with "constructing narratives"

Do you believe you will be alive tomorrow? If so, are you constructing a grand-weaving narrative... or simply making a singular belief? Or, more simply, are you just extrapolating off of past experiences?

aninhumer · on Oct 19, 2016

>Also, I challenge your premise that conflates "belief" with "constructing narratives"

I didn't conflate belief and narratives, I conflated belief with assigning high probability to those narratives. Sure, there are also simpler predictions we can estimate as well, but I think the thing that makes "belief in god" interesting from the perspective of judging intelligence is the complexity of the narrative.

>Outside of that considerable stretching, it doesn't exist.

What evidence would convince you that an animal "believed in god"?

PravlageTiem · on Oct 19, 2016

To kill two birds with one stone, as soon as animals demonstrate something more akin to tribal-level human behavior of religious worship (the grand fusion of self, environment, social, and mystery into the realm of higher powers that must be feared) and not postmodernist definitions of worship (the grand disconnection of the self from all other factors to appeal to an idealized self that can never manifest thank you CIA involvement in the postmodernist art movement and their unique ability to capture scary mommies everywhere) then I'll consider the idea that animals are demonstrating religious behavior.

And further more, if animals DO have religious behavior, then artificial intelligence research across the entire board is WOEFULLY inadequate to represent such a core part of the neurological interaction that generates religious behavior... a core behavioral capacity that was somehow missed in every single behaviorist research paper ever published since mankind mastered animal husbandry.

You either get in bed with the idea that only humans worship gods or you have to fundamentally throw all of psychology, sociology, and animal behavior studies completely out the window.

circlefavshape · on Oct 19, 2016

Do you alone in the world know what animals are thinking?

PravlageTiem · on Oct 19, 2016

I provided the very best and most recent research for the absurd position that animals supposedly have their own animal gods, attend regular communal worshipping ceremonies, engage in animal religious ritual, and demonstrate religious behavior outside of a priori instinct and social fitnesses.

You provided... passive aggressiveness.

And if animals now engage in religious ritual, doesn't that make them stupid for not being atheist like you? I expect several YouTube videos of you trying to convert your cat into the enlightened ways of post-theism.

circlefavshape · on Oct 21, 2016

> I provided the very best and most recent research for the absurd position that animals supposedly have their own animal gods

You did nothing of the sort. You provided a link to a video of an anthropologist describing chimpanzee behaviour and indulging in a little light speculation about why they're doing what they're doing.

Nobody is claiming that animals believe in god(s), rather I'm disputing your bizarre assertion that you know that they don't. There is no way to tell.

ar15saveslives · on Oct 18, 2016

Bash script can be made to believe in god. )

jcoffland · on Oct 18, 2016

Obviously a bash script should believe in God. All it has to do is look around itself at the complexity of it's world and conclude God must exist. I plan to shred any of my bash scripts that do not figure this out within their useful lifetimes. I will print out and file the ones that do. I've written a long text file which explains this in terms a bash script can understand and left it globally readable on my system.

ArkyBeagle · on Oct 19, 2016

... or at least root...

psyc · on Oct 18, 2016

I prefer a conceptualization where they are AI, AI is a gradient, and our current tech level is somewhere on the gradient.

windlep · on Oct 18, 2016

I'll admit I'm not very interested in speech recognition of this nature when it can't disambiguate the speaker. ie. the way Amazon Echo and other voice recognition systems can't tell the difference between a human in the room and the TV. Even when one might be clearly a female voice vs. a male.

None of the voice recognition systems on the market learn my voice distinctly from my wife's or sons, and I don't want their speech triggering things on accident (especially my son's), so I don't use any of them.

I'll be more impressed when I can restrict Amazon Echo or one of these assistants to ignoring any voice that isn't at least rather similar to my own, not merely recognizing the words I'm speaking.

ankitbko · on Oct 19, 2016

Well they do have speaker identification. But I don't know how good it is. You may try them out. https://www.microsoft.com/cognitive-services/en-us/speaker-r... They also are having a private preview of CRIS which supports custom acoustic models https://www.cris.ai/

gusmd · on Oct 19, 2016

for what it's worth, the "OK Google" functionality in my Android phone is trained against my voice and does a pretty good job at rejecting my wife's and coworkers commands.

sean2 · on Oct 19, 2016

For what it's worth, even after some training none of the phones in my office seem to pick up their owner's voice rather than someone else telling their own phone to run a search (we're all mid 30s male).

My phone rarely listens to me until I hold down the home button but one guy I know, who has a slow, deep voice, triggers Google to start listening in normal conversation all the time.

Maarten88 · on Oct 18, 2016

Nice to read this, being someone who uses lots of Microsoft products, but I have mixed feelings: after all these years Cortana still understands 0.0% of my native language (Dutch). Very disappointing, especially seeing that Google has no problems understanding Dutch.

jcoffland · on Oct 18, 2016

What do you think the chances are that Philips is working on this?

Maarten88 · on Oct 18, 2016

Zero. They focus on medical equipment these days, so their research is probably in different areas.

jcoffland · on Oct 19, 2016

They have a dictation product which sports some sort of voice recognition.

eb0la · on Oct 18, 2016

Knowing Microsoft this will be part of Cortana in a few weeks.

I hope it will be integrated soon with the Speech API as well ( https://msdn.microsoft.com/en-us/library/hh361633(v=office.1... ).

bpicolo · on Oct 18, 2016

It's not necessarily the case that they can do it in near-real-time (or near real time at reasonable scale). I would expect the true state of the art to take a while? Maybe part of the definition is real-time, but the article doesn't specify that.

ArkyBeagle · on Oct 19, 2016

They'll still call it Cortana?

I rather like Cortana, but it seems to get a lot of hate - comparisons to MS BOB and what not.

Kenji · on Oct 18, 2016

I have read too many human parity claims that left me disappointed to believe this one. Call me a pessimist or a cynic. I'll be very excited when I have the code running on my machine and when I can compose this comment verbally without a hassle.

cellis · on Oct 18, 2016

Ok, when's the next 2GB Xbox One update and will this fix the problem of me saying "Xbox watch NBC", and it 'hearing' "Xbox watch TV"?

wbhart · on Oct 18, 2016

So Microsoft finally have an AI that can "wreck a nice beach". Along with text autocompletion, we are all set for a decade of irritating miscommunication.

_kst_ · on Oct 18, 2016

I thought it was "wreck a nice peach".

raimue · on Oct 18, 2016

> [...] a speech recognition system that makes the same or fewer errors than professional transcriptionists.

How low would the error rate be for humans that can fully concentrate on listening instead of writing at the same time? Unfortunately, that cannot be tested.

nattyice · on Oct 18, 2016

Meaning will always escape us when it comes to language. Not only will there always be a disconnect between the speaker and his or her audience, there will always be a subjective perspective that cannot be tapped into. Can AI ever really be compared to a subjective perspective?

Although the article recognizes that perfection has not been assumed, parity might not even be a capacity.

Conversation is difficult to measure. Take a look at the philosophical viewpoint of Deconstruction. Food for thought.

http://www.iep.utm.edu/deconst/

aisofteng · on Oct 19, 2016

In that case, it's already 100% accurate.

Don't confuse handwaving with science.

chris_st · on Oct 18, 2016

Perhaps the folks at Microsoft's Lync (named after this gentleman, [1], no doubt) or maybe it's Skype for Business now could get some of this research.

We have this at work (alas), and it does "transcription" of voicemail, which it sends as an email. It's easily 90% wrong, regardless of speaker, unless it's a slightly bad connection, when it's worse.

[1] https://www.youtube.com/watch?v=NV9fKUkx76Q

dalys · on Oct 19, 2016

I think it's more impressive when you actually hear a sample from the switchboard task: https://catalog.ldc.upenn.edu/desc/addenda/LDC97S62.wav

From https://catalog.ldc.upenn.edu/LDC97S62

braindead_in · on Oct 19, 2016

I run an human powered transcription service and I get really excited on such news. Typing is the first step of our process (of four) and any ASR system which can generate even around 80% accurate transcript of a file will be incredibly useful. We have tried several systems but unfortunately none have been able to get there yet.

swagtricker · on Oct 18, 2016

Time files like an arrow, but fruit flies like a banana.

Wake me up when they can match human recognition of context.

tempestn · on Oct 18, 2016

At first I wondered whether you meant to type it that way to make a deeper point, but I'm assuming it's just a typo.

andulus · on Oct 19, 2016

I wonder if this success has any help in the advancement of another application neural networks? Do these achievements translate easily to other domains, or it's just an isolated case?

loup-vaillant · on Oct 18, 2016

Great. Now Microsoft has the means to store every Skype conversations indefinitely —it's only text, now.

Seriously, great work, but just like facial recognition, this will cut both ways.

dlubarov · on Oct 18, 2016

Compressed speech doesn't take much space anyway. Narrowband AMR uses around 7kbit/s (depending on the desired quality), or ~1 megabyte for a 20 minute call. The quality isn't great, but it's adequate for most purposes, including speech recognition with reasonable accuracy.

jarboot · on Oct 19, 2016

How long do you think it is until captioning companies / TRSs such as Captel downsize significantly because of tech like this?

aab0 · on Oct 22, 2016

Supply generates its own demand; by making captioning even cheaper, it can increase the demand for transcription services and people to check it over. There are a lot of podcasts and YT videos that could benefit from transcriptions but it's too expensive now.

mirekrusin · on Oct 19, 2016

Wouldn't just simple "word after list of words" probability help?

nicklovescode · on Oct 18, 2016

Is there a demo or video of them using this? Would enjoy playing with it.

mirekrusin · on Oct 19, 2016

...still waiting for english to/from dolphin translator.

plussed_reader · on Oct 18, 2016

Do I have to use Windows to leverage this new software setup?

botw · on Oct 19, 2016

off topic but related, is speech-to-text engine in android open sourced? can it work offline entirely?

dfgonzalez · on Oct 18, 2016

Did someone put it as SaaS already?

skoocda · on Oct 18, 2016

We're close to an alpha release of Spreza, which might be relevant to your question. Look us up! dm me if you've got questions

braindead_in · on Oct 19, 2016

Very cool. How do you compare to speechmatics?

skoocda · on Oct 19, 2016

Very similar accuracy, timing and alignment. We don't do speaker diarization at all because the results seem consistently weak, even among the competitors such as Speechmatics. I'd hazard to say our web editor is much better for providing an end-to-end solution, where accuracy and verification matters.

If I may ask, what software does your team use at Scribie?

EGreg · on Oct 18, 2016

In English, probably.

ahmetyas01 · on Oct 18, 2016

any video or audio to get the idea how close they are?