On the CallHome dataset humans confuse words 4.1% of the time, but delete 6.5% of words, most commonly deleting the word "I".
Their ASR system confuses 6.5% of words on this dataset, but only deletes 3.3% of words, so depending on how you view this their claim about being better than humans isn't definitely true, if you consider the task to be speech recognition, rather than transcription.
Also, while the overall "word error rate" is lower than humans it's not clear if this is because the transcription service they used is not seeking perfect output, but rather good enough output and the errors the transcription service makes may not be as bad as the errors the ASR system makes in terms of how well you can recover the original meaning from the transcription.
It's clearly great work, but reaching human parity is marketing fluff.
The conversational speech has been very difficult. The switchboard task has been around for over 20 years. We used the exact same published data and evaluation protocols. The Human Parity claim on the switchboard task is as scientific as we can be based on the best knowledge of ours.
I was recently at a bar where they showed a movie with incomprehensible subtitles (English to English). I assume this was because they skimped and bought automatic subtitling.
I think one important aspect is while humans miss words, they often get the sentence meaning correct. When computers miss words, they tend to substitute words that sound similar. That's readable if you have time but not necessarily as a stream of text going by...
It might have also been a 'bootleg' DVD from China or downloaded a version from there. I've had quite a few where they had done a horrible job translating into Chinese for the subtitles already, and then did a literal machine translation back to English.
The character on the screen said "Hello", the English subtitle said "You good." Which would be the literal translation of nihao.
From an ML perspective, recently there's been a lot more work done on language modeling (e.g. billion word corpus). Good language models can usually fix those types of perspective. So hopefully you can expect to see that change in the future!
> I was recently at a bar where they showed a movie with incomprehensible subtitles (English to English). I assume this was because they skimped and bought automatic subtitling.
I don't think that is possible. Like, worse subtitling isn't even an option.
I'm not an expert on this at all, but my suspicion is that humans are more tolerant of missing words than we are of mistaken words. Especially a word like "I", we just assume it if it's missing.
So if this speech recognition is for creating a transcript for humans, this (in my uneducated opinion) isn't as good as humans do, at least not yet.
If nothing's changed in the past 15 years, anything under 95% accuracy is not accurate enough for automation without human intervention (copy editing).
Looks like they used various neural networks and trained based on data they currently already transcribe professionally. So it had to be trained and only represents a subset of people who speak and have their works transcribed at Microsoft events.
Being Microsoft, I'm sure the place is diverse, but there's no mention in the paper on accents or dialects that I can see (might just be missing it).
I look forward to being able to converse with Microsoft's research team as easily as I can with humans. I hope that one day, journalists can learn to write headlines with similarly low rates of error.
Linking the image instead of the site shall more people to miss out on the red button extra comic frame. And the meta hover text(xkcd style)(which there usually is) .
Wait a second... I just realised I've been reading SMBC for years now and I've never noticed the red button! I'm mad at how much I must have missed out on but, at the same time, I'm glad you pointed that out!
I wasn't sure which way round to parse it (ironically). Are they saying that Microsoft researchers are now almost as good at recognizing conversational speech as humans? Or that Microsoft researchers can now almost pass as human in conversational speech? Either way, good news for Microsoft Research, I think.
The term "human parity" refers to a comparison of the error rate, which is a single scalar summarizing performance in terms of mistakes made. It says nothing about the kind of mistakes, and I can easily imagine that machines qualitatively do not make at all the same kind of mistakes as humans. I'd be curious to know if the kind of mistakes machines make might strike human listeners as quite stupid, but maybe not.. many algorithms are getting better at taking into context and prior knowledge into account.
In my mind, this is analogous to the reasons why evaluation of lossy audio compression codecs requires human listening tests. Simply running some simplistic signal analysis like SNR (Signal-to-Noise Ratio) completely fails to capture the as-perceived quality of a compression implementation.
To explore that analogy: In the case of lossy audio compression, the compressor deliberately introduces quantization noise into the signal. It does so by running a "psychoacoustic model", which attempts to capture a broad quality of human hearing called auditory masking. There's a number of different kinds of masking[1]: a strong tonal sound creates an "umbrella" across nearby frequencies that can mask quieter noise-like sounds. Similarly there's noise-vs-noise masking, as well as forwards- and backwards- temporal masking. (Yes, backwards. A sound can mask perception of a sound that occurred before it.)
In the audio compression case, we've built an algorithm that attempts to characterize exploitable phenomena of human hearing. These masking characteristics aren't perceived quite the same by human listeners as the model, nor even the same between individual human listeners. Thus the need for human listening tests. These, due to the experimental care and human subjects required, are expensive.
Back to speech transcription. Say the end goal is "how well does a human comprehend this transcribed speech"? (e.g. vs some standard, such as the original speech, vs. the original speech transcribed by a skilled specialist, etc.) The problem starts to look pretty similar. We can cite numerical, word-centric error rates, but that fails to capture how well meaning is preserved and transmitted. Imagine a perverse algorithm that did a perfect transcription, but then dropped or altered words for maximum meaning obfuscation. It might equal or even beat the cited error rates but be much harder to actually comprehend.
Which, isn't substantially different than evaluating a codec on introduced SNR.
These classic signal processing analyses tell you nothing about the correct operation of a psychoacoustic model codec design (i.e. MP3, AAC, Vorbis, etc.) An analysis of A (original signal) vs. A' (signal passed through a compression-decompression cycle) is just extracting the quantization noise introduced by the codec. That provides no information about how effective the codec was in masking that noise with the original signal content.
To illustrate, imagine a perversely designed codec: it runs two models, the first "good model" is a normal psychoacoustic model. The second "bad model" is the one the compressor uses: it applies the total amount of quantization noise allowed by the good model, but applies it in ways that are maximally annoying to human listeners. This isn't just avoiding masking, it's things like using noise correlated to the original signal, which is generally more obtrusive than uncorrelated noise, etc.
A codec using just the "good model" and one using the "good + bad model" would have (by definition) exactly the same introduced noise, but the latter would sound FAR worse to a human listener.
I am not familiar with the term "introduced SNR"... Googling.
No joy.
I can't really follow your thinking above; sorry.
In audio, there is distortion, which is correlated with the signal. Noise is uncorrelated. Codec error would seem very much to me to be at least much more correlated than uncorrelated. MP3 artifacts sound at least more "phase-ey" than they sound anything like quantitization noise ( at least to me ). This may be because I have heard badly aligned tape machines make that sort of error happen. "Phase-ey" also triggers my (feeble) mind into thinking about allpass filters as the model for the error.
In terms of intelligibility, it's possible to improve intelligibility by adding noise, and by adding clipping - aviation comms does this at times. What destroys intelligibility is phenomes being destroyed by phase changes and bad amplitide errors ( where there are actually good amplitude errors ).
I've run an ABACUS voice quality analyzer several times, and I don't think it's purely a distortion analyzer - adding clipping at least can improve MOS/PESQ score surprisingly. Even more surprisingly, there's no mechanism
available for calibrating gain staging on one.
Ah, that's a mistake, that should have read "introduced [quantization] noise", which reduces the SNR in A' vs A.
If I understand you correctly, your discussion around intelligibility primarily applies to signal processing of voice vs general audio. Codecs such as MP3, AAC, etc. don't have the luxury of making assumptions about the signal content, and so aren't designed along those principles. E.g. speech codecs can generally run well at much lower bitrates than general audio codecs because they operate on a constrained domain of audio (i.e. speech).
Regarding distortion management, see Rate-distortion optimization[1] for lossy audio codecs: where the purpose is to manage distortion within the limits of the bit rate supported by a communication channel or storage medium.
That reminds me of the photocopier that used a very poorly chosen image compression algorithm. It would sometimes substitute similar looking characters to save space. A $7000 figure in a budget might become $9000 and a person looking over the document would have no indication that the '9' had been cut and pasted from somewhere else on the page.
Google Voice transcription often makes easily detectable mistakes that I can mentally correct using context. Smarter, context-aware mistakes might be worse, if they are also more difficult to detect.
Section 9 in the paper[1] is all about comparing these mistakes between the system and humans. The most common mistakes for humans and the system are in tables 9—11.
We find that the artificial errors are substantially the same as human ones with one large exception confusions between backchannel words [acknowledgment words like “uh-huh”] and hesitations.
The difference they found, but suspect might be a result of the different transcription guidelines of the training corpus: we see that by far the most common error in the ASR system is the confusion of a hesitation in the reference for a backchannel in the hypothesis. People do not seem to have this problem.
The other day I asked Siri, "Remind me to pick up [daughter's name]" but it interpreted that as, "Remind me to pick up pussy." A more context-aware engine would realise we don't even have a cat. On the other hand it made me wonder to what extent Siri is trained on real-world data from other users.
That's the one that seems to be using illegal GCHQ wiretaps ... see the (Dutch) article below about an Appen translator getting a private voicemail of her ex to translate.
The novel elements of this have already been released individually, particularly the lattice-free MMI training which can be run through Kaldi's nnet3 configuration.
Also, don't assume that a 0.4 % increase means drastically better real-world results. This dataset has been around longer than I have, so by this point Microsoft has just gotten really good at tuning.
Open source Kaldi gives you 7.8%, not very far from 5.9, you can check the thread from the discussion of the first version of the paper https://news.ycombinator.com/item?id=12502119
I don't have a background in this area, so I'm likely easily impressed, but this seems really impressive. And the acknowledgement that there's a lot of work to be done, such as discriminating between speakers and recognition in adverse environments. Yeah, it's Microsoft writing on their own technology, but they addressed in the text the questions I had already in mind from just reading the title. It didn't leave me with feeling that it's just a marketing piece.
> Still, he cautioned, true artificial intelligence is still on the distant horizon
It's frustrating when technologies like image and speech recognition and robotics are conflated with AI.
If you take the goalposts to have been set by Turing's 1950 "Computing Machinery and Intelligence", the only people moving them are researchers who want to trump up their own work or marketers who want to sell things.
My own taste uses the word "AI" as you do in a permissive way to include simpler tasks (which in themselves are more elemental than useful) like identifying an object in an image and presenting a few straightforward interpretations of sentences. But what Turing stipulated an actual AI would be able to do is "reach parity" with a fully human conversation, with all of our knowledge and values and desires and subtlety of motivation. When you really dig into what goes on in a real conversation---the joking, the shades of meaning, the individual quirks and tribal patterns, the lying, the compassion, ambition, insecurity---I think it's hard not to admit that we're still quite far from that. We may not even want it! But exactly the genius of the Turing test for setting a rough standard for AI is that it demands competences far beyond mere language processing.
Turing wasn't trying there to articulate a single target for AI research, he was responding to the claim that artificial minds were not possible even in principle. His response was basically, try this, if in open-ended conversation you couldn't tell one from a human, would you really still say there's no mind there? As you say, that's a really high bar from where we are, then or now.
I'm sure Turing's own goals for AI were broader than eventually passing the Turing test. He worked on neural nets himself and would've considered progress in perception as partial progress in AI.
Yep! The assertion that Turing was setting a "single target for AI research" in the way you're using the phrase was clearly not what he was doing or, I hope, what I said. My intent was the opposite: to draw attention the massive range of "targets for research" which are already implicit in the original Turing Test in order to note that saying "there's still lots more to do" is hardly "moving the goalpost".
Just to link up the way you put things with the way I chose to here, his argument for how an artificial mind is possible proposes a reasonable, minimal standard which different sides can agree to---a criterion of, "well if it can do that then sure it's a mind!". And the choice of a open and unbounded conversation as the standard was brilliant because of the massive range of subcompetencies which are required for actually executing it (including, obviously, perception). Which, of course, we gradually continue to plod through in AI research.
I'm not so sure anyone was using the term AGI when the first Matrix movie came out. Anyway, you cannot expect Hollywood to always use the precisely correct term for things scientific without first pushing up your glasses.
I don't get why that's frustrating, without image and speech recognition AI isn't possible. -- How intelligent could people be without sensory perception? Hellen Keller is the exception, take away the ability for humans to process sound, and images as a species and we wouldn't be nearly as advanced as we are today even if we still had the same brain structure and mental capacity.
Ray Kurzweil understood this and is why he paved the way for some of the first speech / image recognition platforms like ocr/fax/etc...
To me speech/image recognition is a precursor/adjunct of AI. You can have the former without the latter, but the latter will never be realized without those.
Lots of good questions about what I mean by AI. I agree my statement wasn't very precise. And to be honest, that's a reflection of me not having a precise definition of what it is, exactly. However, I do think there's a distinction between speech/image recognition and what's commonly understood as AI, with speech and image recognition being sensory input systems, and AI in a more colloquial sense, such as learning and problem solving on at approximately human level. My statement was imprecise, and given I was expressing frustration at imprecision, my bad :)
Thanks for the feedback! Gives me more to think about.
I'm not saying one can't be smart/intelligent without being able to see/hear, I'm saying if you take away all senses from ALL human beings there won't be any way to communicate at all, or know that others exist, and to learn from them. Learning is what makes intelligence possible--and the passing of information through generations. Hellen Keller fought extremely hard to overcome her sensory issues, but she had a good teacher, who presumably had someone else help them or guide them some. -- But to organically learn to speak when nobody can hear you, or to read when you can't even feel the braile, etc..would be nearly impossible.
Intelligence (at least in this case) is being able to take data and draw conclusions from it, but you just can't see such potential when you don't have input. Is a computer not a calculator when there's no software installed? No. It's still a calculator, but it just doesn't have inputs.
One day, we may be able to give sight to the blind, but for now, considering such people without ANY learning capacity as "unintelligent" is still wrong.
Perhaps because "true AI" is an illusion. There is always a better term that has meaning.
And yet people like Ray Kurzweil toss the term around as if it does have meaning. What's the point unless you define it? You might as well reference anti gravity or teleportation.
The touring test shows AI has little to do with image or speech recognition. An AI that operates in a purely virtual environment would still be revolutionary.
The Turing test doesn't test for intelligence, simply human-like communication. Things like lying, typing errors, or reaction to insult aren't indicators of an artificial intelligence, but are requirements for passing the Turing test.
One of the biggest criticisms of the Turing test is, because it's a behavioral or functional test, there is no way to ensure that the computer is actually thinking intelligently at all, rather than following a very-well-articulated rules engine.
I can think of no better test of whether some specific being is intelligent than fooling another human into thinking they are.
Can you think of a better one?
> because it's a behavioral or functional test
What else would you test for in an AI other than its output and behavior? Conversely: how would you test a human for intelligence other than through its output and behavior?
Surely that depends upon the human? An infant would do a terrible job. An institutionalized vegetable would fail to provide meaningful information.
A not-very-bright cousin of mine has been heard responding to robo-calls. Would his opinion do?
I think 'intelligence' is ambiguous, but many parts of it can be measured to some degree. Lets give the AI the SAT test perhaps? Or a test of hypothetical arguments. Or ask it how it would tell if you were an intelligent being yourself. Anything that's even a little bit 'meta' would confound most 'AIs'.
We can tell if an AI is functional in some environment or responds well to social queues by interacting with it. But not much else. Not how 'intelligent' it is for instance.
> Anything that's even a little bit 'meta' would confound most 'AIs'.
This is precisely why the Turing Test is so tricky for machines. Knowing the parameters, any regular human could test even the most advanced present-day AI and reveal it as a machine with very little effort and only a few questions.
> We can tell if an AI is functional in some environment or responds well to social queues by interacting with it. But not much else. Not how 'intelligent' it is for instance.
The Turing Test does not intend to determine level of intelligence (it's not an IQ test). It is intended to determine whether the intelligence you're interacting with is advanced enough to fool you into thinking they are of human nature.
It is not a test of degree of intelligence, but rather of its nature. Human / not human are the only two possible outcomes.
That last bit is a false equivalence - we have no way with current science to truly introspect the "thought process" of another human, so functional judgment is the only kind we can exercise.
With AI - we built it. We can read the code that's running. We can make a distinction, in cases where output or behavior is identical, whether it's a neural network or a 12-million-line switch statement - and that value judgment means something to our determination of intelligence. There are plenty of ways other than (or in addition to) output to determine the "intelligence" of a machine that we simply can't exercise with other people.
It directly tests for intelligence, you confuse false negatives with false positives. An intelligent person that fails the test because they don't know a common language is fine.
Note: Medical tests generate both false positives and false negatives, they are still useful.
I don't understand what your example here about common language rebuts - that's something that would be solved for in the premise of the test, or it's not a fair test.
Is your argument that that person is still intelligent even if they've failed the test? Because that's the entirety of my point - the Turing test does not return a measure of intelligence, but of communicability.
Or is your point that if such a person fails it doesn't invalidate the test's measurement of intelligence? In a world where the test is implemented correctly, meaning where things like cross-language barriers are accounted for, failing to pass means failing to convince another person that you can communicate like a human would. If you fail, the only way you can consider that a "false negative" would be if you concede that the test is not a sufficient measure of intelligence but of ability to communicate - that's what makes it a FALSE negative.
No, the point is false negatives don't invalidate a test. It requires both intelligence and communication abilities.
Cost and accuracy are obvious trade-offs, but sometimes you are willing to trade say cost and false positives to avoid false negatives say, a mass screening for HIV in a blood sample. In that case you need a cheap test and while a false positive has minimal cost a false negative could be deadly.
Highering is the opposite case. Micdonalds want's a cheap test (interview + background) and as long they get enough acceptable low level candidate from their pool that's enough.
As such the turing test can fit the second example. A hypothetical AI could hold a huge range of real jobs even if limited to purely text based communications.
PS: Let's flip it. A super intelligent AI in a box that can't communicate in any form. Without IO it's indistinguishable from a space heater.
Your last example of the hyperintelligent space heater is exactly my point - the Turing test is a test of the (necessary) precondition of communication, but NOT a sufficient test of intelligence. In your example, the black box AI is assumed to be intelligent, but would still fail the Turing test - because intelligence is not a necessary condition for passing. Communicability IS necessary for the test.
A passing grade on the Turing test just indicates intelligence is possible, but excludes assured intelligence in the absence of communication ability, which is why I argue it is not a sufficient guarantee of intelligence.
You make a good point with the hiring example about the distinction between "intelligent" and "intelligent enough to do some things" - our disagreement may be stemming from us having differing definitions of "an intelligence". Need to think about it.
X is a demonstration of intelligence says nothing about not X.
If you can design something that passes an arbitrary touring tests yes it is intelligent. For example I could teach it any subject that works in text format. It could then pass an open ended essay test. And get a reasonable essay.
Now, you could do the same thing with a pig and it would fail the test despite being more intelligent than the average dog. That just means there are limits.
This is one of the practical limits of the Turing Test as usually proposed - the parties could just refuse to jump through the hoop of writing an essay, and that would be a perfectly normal reaction. In practice, attempts to pass the Turing Test have leaned heavily on hand-coded strategies for changing the subject and feigning refusal to co-operate, with some success - for example https://en.wikipedia.org/wiki/Eugene_Goostman
What I find most interesting about that strategy is in practice it fails. The strength is it can push people to wait longer before judging. The weakness is it fails to demonstrate intelligence.
This is also true for pwople. How do you know humans are thinking and not following lots of social rules they learn thanks to special neural hardware for learning social interaction?
Think about the intelligence conveyed in ritualized greetings.
You are recapitulating Descartes. Consciousness appears to be immeasurable. You can only know that you are conscious. It is polite to give others the benefit of doubt.
The commenter said nothing about consciousness. He simply said that just being able to hold a conversation is not a particularly useful measure of intelligence. The broader form of that is that we don't even know if humans in general are intelligent.
FYI about your edit: You aren't being downvoted because of "knee-jerk culture war" or because people here are too dumb to understand your enlightened view of epistemology. Your original comment is an incredibly unique view with no explanation that people probably disagree with, and your edit makes you sound petulant, like you can't handle disagreement. That's the reason for the downvotes.
I'm being downvoted because my comment spat in the face of those who think general artificial intelligence automatically implies some superior Nietzschean post-human that is qualified to run the perfect communist utopia since it no longer has distracting human emotions like greed or envy. Somewhere, a Sanfraninite spurted himself to completion while watching a live feed of indigenous humans tending to some noble savage rice field he rented on AgroBnb with Bitcoin, mined entirely by solar powered GPUs made out of free trade copper.
When you drag people out of their fantasies, they used to kill you. At least they downvote you these days, so I supposed that's better.
Yes, this is an incredibly unique view. Yes, people are going to react negatively to it because it fundamentally undermines their revenge fantasies. Yes, I can call them out on their unspoken biases no matter how uncomfortable that makes them.
On Westworld, one of the hosts referred to its creator in the presence of the character Robert Ford. Ford declined to identify himself as that person. Rather a nice riff, I thought.
What a useless definition. I could just as easily say that no being (artificial or not) can be considered intelligent so long as it believes in flying spaghetti monsters.
I didn't say "intelligent". I said "AI", as in "artificial intelligence" which implies a simulation of human intelligence. Machine processes already exist that demonstrate, say, ant-like intelligence.
Being made to believe in a god isn't the same as believing in one.
Silicon Valley thinks AI is not an epistemological problem, as if neurons can be perfectly simulated atomically and that all intelligence processes can be categorized as structured vs. unstructured. Very naive conclusions.
Ironically, they BELIEVE if you simulate the axiomatic neuron perfectly, emergent properties of intelligence will mystically emerge after some undefined threshold of complexity. They are permanently unable to simulate the two billion years of neurological evolution and natural selection for such a belief to hold.
If you must emulate human intelligence, you must be able to navigate the realm of distilling reality from the overlap between truth and belief. Mocking how humans observe reality isn't enough.
Abrahamic religion explored the alternative intelligence problem in great depth over two thousand years ago. It's a pity the results have been lost to Progressive axe grinding.
> Abrahamic religion explored the alternative intelligence problem in great depth over two thousand years ago. It's a pity the results have been lost to Progressive axe grinding.
I'll take the bait: What in the world are you talking about?
For Artificial Intelligence to believe in God would be for it to believe in its makers, which are human. A machine would be intelligent to be aware that it was made by man.
A man doesn't need to believe in God to be intelligent. In fact, the two are pretty much inversely related.
>For Artificial Intelligence to believe in God would be for it to believe in its makers //
That's a very limited sense of the idea of a god, and certainly doesn't match with definitions of God [a singular, eternal, omnipotent, omniscient, deity] that I've come across.
Merely making something doesn't make you a god, not even if that thing appears to display intelligence. Some sense of one of the characteristics of existing in a separate spiritual realm, having power/knowledge beyond that possible in the present realm, having an existence that's not bounded (eg physically) within the normally experienced space of the "mortals". They seem like a start for basic level definitions of a god.
There are a lot of very smart people who are at at least Deist levels of faith. I'd say it's orthogonal. An example? C.S. Lewis.
There is nothing wrong with that sort of metaphor, no matter how much stress you place on it. If you think of religion as a technology, then it can be abused but the abuse doesn't mean it's bad.
Believing in a god is unique to the human species. No other animal does this. Thus, if you want full and complete simulation of human intelligence, this is the ultimate benchmark to test against.
Furthermore, the purpose of this thought experiment is that you CANNOT know what god an AI would end up believing in. That should make you rethink everything you think you know about theology and reexamine it under epistemological terms
>Believing in a god is unique to the human species. No other animal does this.
Do you have some evidence of that? And even if it's true, is it the only thing that's unique to humans?
Indeed, I suspect the interesting part of "believe in god" in terms of intelligence is probably "construct narratives". Assigning a high probability to those narratives reflecting reality (i.e. believing) seems like an aside to the main complexity involved.
>"The anthropologists got it wrong when they named our species Homo sapiens ('wise man'). In any case it's an arrogant and bigheaded thing to say, wisdom being one of our least evident features. In reality, we are Pan narrans, the storytelling chimpanzee." - Terry Pratchett, The Science of Discworld II: The Globe
The best case for animals demonstrating religious behavior (which is fundamentally different than believing in a god, mind you) is from a very convoluted interpretation of a food coma.
Outside of that considerable stretching, it doesn't exist. Also, I challenge your premise that conflates "belief" with "constructing narratives"
Do you believe you will be alive tomorrow? If so, are you constructing a grand-weaving narrative... or simply making a singular belief? Or, more simply, are you just extrapolating off of past experiences?
>Also, I challenge your premise that conflates "belief" with "constructing narratives"
I didn't conflate belief and narratives, I conflated belief with assigning high probability to those narratives. Sure, there are also simpler predictions we can estimate as well, but I think the thing that makes "belief in god" interesting from the perspective of judging intelligence is the complexity of the narrative.
>Outside of that considerable stretching, it doesn't exist.
What evidence would convince you that an animal "believed in god"?
To kill two birds with one stone, as soon as animals demonstrate something more akin to tribal-level human behavior of religious worship (the grand fusion of self, environment, social, and mystery into the realm of higher powers that must be feared) and not postmodernist definitions of worship (the grand disconnection of the self from all other factors to appeal to an idealized self that can never manifest thank you CIA involvement in the postmodernist art movement and their unique ability to capture scary mommies everywhere) then I'll consider the idea that animals are demonstrating religious behavior.
And further more, if animals DO have religious behavior, then artificial intelligence research across the entire board is WOEFULLY inadequate to represent such a core part of the neurological interaction that generates religious behavior... a core behavioral capacity that was somehow missed in every single behaviorist research paper ever published since mankind mastered animal husbandry.
You either get in bed with the idea that only humans worship gods or you have to fundamentally throw all of psychology, sociology, and animal behavior studies completely out the window.
I provided the very best and most recent research for the absurd position that animals supposedly have their own animal gods, attend regular communal worshipping ceremonies, engage in animal religious ritual, and demonstrate religious behavior outside of a priori instinct and social fitnesses.
You provided... passive aggressiveness.
And if animals now engage in religious ritual, doesn't that make them stupid for not being atheist like you? I expect several YouTube videos of you trying to convert your cat into the enlightened ways of post-theism.
> I provided the very best and most recent research for the absurd position that animals supposedly have their own animal gods
You did nothing of the sort. You provided a link to a video of an anthropologist describing chimpanzee behaviour and indulging in a little light speculation about why they're doing what they're doing.
Nobody is claiming that animals believe in god(s), rather I'm disputing your bizarre assertion that you know that they don't. There is no way to tell.
Obviously a bash script should believe in God. All it has to do is look around itself at the complexity of it's world and conclude God must exist. I plan to shred any of my bash scripts that do not figure this out within their useful lifetimes. I will print out and file the ones that do. I've written a long text file which explains this in terms a bash script can understand and left it globally readable on my system.
I'll admit I'm not very interested in speech recognition of this nature when it can't disambiguate the speaker. ie. the way Amazon Echo and other voice recognition systems can't tell the difference between a human in the room and the TV. Even when one might be clearly a female voice vs. a male.
None of the voice recognition systems on the market learn my voice distinctly from my wife's or sons, and I don't want their speech triggering things on accident (especially my son's), so I don't use any of them.
I'll be more impressed when I can restrict Amazon Echo or one of these assistants to ignoring any voice that isn't at least rather similar to my own, not merely recognizing the words I'm speaking.
for what it's worth, the "OK Google" functionality in my Android phone is trained against my voice and does a pretty good job at rejecting my wife's and coworkers commands.
For what it's worth, even after some training none of the phones in my office seem to pick up their owner's voice rather than someone else telling their own phone to run a search (we're all mid 30s male).
My phone rarely listens to me until I hold down the home button but one guy I know, who has a slow, deep voice, triggers Google to start listening in normal conversation all the time.
Nice to read this, being someone who uses lots of Microsoft products, but I have mixed feelings: after all these years Cortana still understands 0.0% of my native language (Dutch). Very disappointing, especially seeing that Google has no problems understanding Dutch.
It's not necessarily the case that they can do it in near-real-time (or near real time at reasonable scale). I would expect the true state of the art to take a while? Maybe part of the definition is real-time, but the article doesn't specify that.
I have read too many human parity claims that left me disappointed to believe this one. Call me a pessimist or a cynic. I'll be very excited when I have the code running on my machine and when I can compose this comment verbally without a hassle.
So Microsoft finally have an AI that can "wreck a nice beach". Along with text autocompletion, we are all set for a decade of irritating miscommunication.
> [...] a speech recognition system that makes the same or fewer errors than professional transcriptionists.
How low would the error rate be for humans that can fully concentrate on listening instead of writing at the same time? Unfortunately, that cannot be tested.
Meaning will always escape us when it comes to language.
Not only will there always be a disconnect between the speaker and his or her audience, there will always be a subjective perspective that cannot be tapped into. Can AI ever really be compared to a subjective perspective?
Although the article recognizes that perfection has not been assumed, parity might not even be a capacity.
Conversation is difficult to measure. Take a look at the philosophical viewpoint of Deconstruction. Food for thought.
Perhaps the folks at Microsoft's Lync (named after this gentleman, [1], no doubt) or maybe it's Skype for Business now could get some of this research.
We have this at work (alas), and it does "transcription" of voicemail, which it sends as an email. It's easily 90% wrong, regardless of speaker, unless it's a slightly bad connection, when it's worse.
I run an human powered transcription service and I get really excited on such news. Typing is the first step of our process (of four) and any ASR system which can generate even around 80% accurate transcript of a file will be incredibly useful. We have tried several systems but unfortunately none have been able to get there yet.
I wonder if this success has any help in the advancement of another application neural networks? Do these achievements translate easily to other domains, or it's just an isolated case?
Compressed speech doesn't take much space anyway. Narrowband AMR uses around 7kbit/s (depending on the desired quality), or ~1 megabyte for a 20 minute call. The quality isn't great, but it's adequate for most purposes, including speech recognition with reasonable accuracy.
Supply generates its own demand; by making captioning even cheaper, it can increase the demand for transcription services and people to check it over. There are a lot of podcasts and YT videos that could benefit from transcriptions but it's too expensive now.
Very similar accuracy, timing and alignment. We don't do speaker diarization at all because the results seem consistently weak, even among the competitors such as Speechmatics. I'd hazard to say our web editor is much better for providing an end-to-end solution, where accuracy and verification matters.
If I may ask, what software does your team use at Scribie?
On the CallHome dataset humans confuse words 4.1% of the time, but delete 6.5% of words, most commonly deleting the word "I".
Their ASR system confuses 6.5% of words on this dataset, but only deletes 3.3% of words, so depending on how you view this their claim about being better than humans isn't definitely true, if you consider the task to be speech recognition, rather than transcription.
Also, while the overall "word error rate" is lower than humans it's not clear if this is because the transcription service they used is not seeking perfect output, but rather good enough output and the errors the transcription service makes may not be as bad as the errors the ASR system makes in terms of how well you can recover the original meaning from the transcription.
It's clearly great work, but reaching human parity is marketing fluff.