"As Axios[0] noted Thursday morning, there was something a little off in the conversations the A.I. had on the phone with businesses, suggesting that perhaps Google had faked, or at least edited, its demo. Unlike a typical business (Axios called more than two dozen hair salons and restaurants), the employees who answered the phone in Google’s demos don’t identify the name of the business, or themselves. Nor is there any ambient noise in Google’s recordings, as one would expect in a hair salon or a restaurant. At no point in Google’s conversations with the businesses did the employees who answered the phone ask for the phone number or other contact information from the A.I. Further, California is a two-party consent state, meaning that both parties need to consent in order for a phone conversation to be legally recorded. Did Google seek the permission of these businesses before calling them for the purposes of the demo? Was it staged in the simulated manner of reality TV?"
Two-party consent laws raise an interesting point: how could Duplex operate legally in California?
Google probably wants recordings of each call, so they'd have to include a preamble. Phone systems that say "Your call may be recorded for quality assurance purposes" usually:
1. Make you press a button before you speak to a human, thereby recording your consent.
2. Don't record before that point (since you're in the phone tree.)
Seems unlikely Duplex will pass as human, for legal reasons.
Honest question: is one-party consent satisfied if the party that has "given consent" is an AI bot? I could see how someone would argue that there was only one party on the call capable of consent, and that he/she (the human caller) did not give it.
Another hypo: What if my AI bot calls the restaurant and talks to their AI bot? Is that a call with zero parties? Can consent for recording be given at all?
My guess is that it depends on the definition of "party". Cue the lawyers.
<IANAL> I would assume that the bot is operating on behalf of someone and it is their consent that matters. So if bot gives the consent in their name, it would be the same as if they gave the consent themselves. </IANAL>
So that would mean that the person asking the bot to give consent or would that mean Google is the one giving consent? Or is the person giving permission to Google to give consent? Would Google give permission to the recording of the call?
I believe generally the US courts have decided that the person who triggers the action is the pertinent party.
Something similar has come up in the firearm fabrication community. There are companies that sell things that are legally paperweights, but are 80% of the way towards being a firearm (e.g. [1]). There are CNC mills that come with programming to take one of these "80% lowers" and finish it, making a fully functional firearm, or at least the part that is considered a firearm by the BATF; other parts can be ordered and shipped online (e.g. [2]).
In this case it's not the builder of the machine or the writer of the code that instructs it that is the "manufacturer" of the gun. It's not even the person who placed the unfinished lower in the enclosure and bolted it down. Instead it's the person who pushed that button.
I'm going to guess that this will be similar. The user who pushes the button (or asks assistant to make an appointment on their behalf) is likely to be the one giving consent.
I'm the person who originally posed this question, and I actually am a (former) lawyer. I think these are all good analogies, and courts would think about things like this.
But neither the laws nor the judges will be uniform. The laws use different words to say slightly different things, and will have different legislative histories (the record from the officials who voted them into law).
Some judges will look to the actual words to interpret the law, and others will look to the legislative history. Still others might desire a "living Constitution" approach — applying the laws in a way that they think makes sense in today's world.
Considering the dozens of laws and thousands of judges that could opine, there will likely be considerable uncertainty in this area for years to come.
I think they may be able to get around two-party consent laws if they never save the audio stream to disk, and only record the output of their audio processing function.
Interesting question - would love to hear an opinion from a lawyer.
Side question: What makes recording voicemail legal? Is there just a presumption that people understand that voicemail is a recording? I don't think I've ever heard a two-party consent warning before leaving a message.
Does the beep legally function as informing the other party?
Would it be illegal to record voicemail without the beep?
Would be cool if the government created a "recording consent" sound, so you could play it in lieu of spending 5 seconds to explain that you'll be recording. I suspect a beep does not qualify in most situations, would need to be more distinctive.
Saying "leave your message after the beep" seems unambiguous, but are fake messages where someone pretends to answer the phone in a legal gray area?
(I'd guess voicemail is a legally-ambiguous loophole that was grandfathered in because so much of the population understands it, but curious)
>Two-party consent laws raise an interesting point: how could Duplex operate legally in California?
I wondered how this worked with things like Google's earphones that (supposedly) translate from one language to another. Those translations all get recorded and all get sent to Google's cloud (and, likely, get stored there too).
Does Duplex neccessarily save recordings? i.e. does it save actual recordings of the called party's voice, or does it do real-time voice to text, never saving the voice after it's decoded?
Sure, there are going to be some ephemeral copies made in codecs, DSP's and such, but my cell phone does the same.
Google may want recordings so they can evaluate the system after a call, but they don't have to make them.
How does this work if someone in the house answers their cell on speaker and Alexa records that interaction? Maybe there's an exemption for inadvertent recording? Or maybe their "recording" includes enough abstraction so it's qualifies as "detailed note taking"?
I would like to see a law mandating all bots emulating any amount or type of artificial intelligence to announce themselves at the start of the conversation
The calls were likely edited. As far as two-party consent, maybe they called businesses outside of California... maybe they got consent after the call. Either way, something still seemed weird about the calls.
doesn't matter if the business was outside of California. The California Supreme Court has ruled that if one side of the conversation is in California (e.g., Google calling from Mountain View) then California law applies. See Kearney v. Salomon Smith Barney, Inc., 39 Cal.4th 95 (2006).
Consent after the call would be just fine to exempt someone from liability under California's privacy act.
Fair enough, maybe the calls originated from outside California. Either way, I think we can see there are some ways to get around the two-party requirement. But, there's still something fishy about the calls.
Google refused to provide the businesses' names to Axios (even with a guarantee not to publicly identify them), and refused to answer whether the calls were edited.
If the calls were real, I wonder how many recordings did it take to get these perfect examples? How many times were the appointments scheduled correctly, and how many failed?
But this was just a flashy demonstration to drum up excitement over the Google brand. Until they write up a paper or release it as a product, we're unlikely to know how well it really works.
Precisely. What happens if the system makes a mistake and tells you you're confirmed when that's not the case? Then you show up for a reservation that doesn't exist and you wish you'd spent 3 mins making the call yourself.
The question will be what is the net time savings? If it works 99 out of 100 times, maybe that's good enough for many mundane tasks. But if it's 75 out of 100, that's almost certainly not worth it.
I'm not even sure 99% would be enough for my tastes. Imagine an important date (anniversary or similar) with your S.O. If there was even a 1% chance that didn't happen because I was too lazy to call I'd certainly be hesitant to use it. And if I can't use it habitually for the task of scheduling, I probably will continue to do what I do now and just make the call.
I think human perception of technology is often dominated by annoyance. From a little bit of research I did years ago, 99% accuracy was nowhere near good enough to satisfy customers of handwriting recognition--that's about one word mistake per paragraph.
Consider the new Apple keyboards, which attract hundreds of intense complaints every time they come up here on HN. Apple Insider [1] estimated the percentage of service tickets that were attributed to keyboards:
2014 - 5.6%
2015 - 6%
2016 - 11.8% (first year of new keyboard)
2017 so far - 8.1%
So, an approximate doubling of the prevalence of keyboard problems in the first year has been enough to convince a lot of people that the entire product is an unmitigated disaster.
One of the very interesting facts in that Apple Insider study is that the total number of tickets actually decreased from 2015 to 2016. If the keyboard was actually a design disaster, you would expect the total number of warranty service tickets to climb, but they didn't.
I think people will use these services for less-important tasks at first, and use more reliable/verifiable means for making important reservations like anniversaries.
Sundar Pichai specifically mentioned that it was real time call. It is highly unlikely that person at that position would flatout lie on camera. However it was indeed odd that they avoid actually showing a person making a call from the stage. I tend to think Pichai did white lie in the sense it was "real time" but recorded in past call and perhaps after several trials and well conditioned environment.
Googler here. I've seen a few other demos, and there was even an internal dogfood, although I didnt use it myself. But I guess you could still choose not to take my word for it, since I'm just a random guy on the internet. Just wait til it launches, I guess.
Also, I distinctly remember on multiple occasions having to ask "is this <XX restaurant>?" when making reservations in the past. Does this really never happen to people that it feels more likely that Google is faking high-profile demos in the keynote that it can't actually deliver on? And wouldn't they want to pick demos that didn't include or redacted the name of the restaurant anyway?
No one is saying it was entirely fake, just that it was likely a staged event, perhaps even scripted. Maybe they had talked to that restaurant before hand and told them this was happening, then they edited it afterward to get rid of delays.
There's nothing entirely wrong with that, but not being open that such a thing occurred makes you look very dishonest...and if you can't answer basic questions about the event, you're probably hiding something. The tech isn't that different from what some telemarketing companies do, so it's not like people are saying this is impossible (some are very convincing.)
Of course it was a fake. Responding party was conveniently using sentences with all the keywords needed, otherwise that whole conversation would go wrong almost immediately.
Additionally - I have multiple google homes at my place. If you want to know how that demo would go in real life it would be something like this:
- (G) Hello, I want to make appointment for my client.
- Sure, what would you like and when?
- (G) Sorry, I don't understand
- What service would you like to set an appointment for and on which date?
I'd imagine Google assistant and duplex are two completely separate systems that share some common machine learning components so talking to your Google assistant doesn't mean anything useful necessarily.
Have you ever worked in a corporation? Corporation can work simultaneously on three extremely similar projects and each team might not even know that two others exist. Not due to secrecy, just due to mess.
We're talking about assistants here (you may replace 'assistant' with 'AI'). There are many-many-many teams working on different aspects of it but it's a single product at the end. It's not the same as spawning yet another chat app by another team. You may have different teams working on implementing similar feature but it still would be sitting on shoulders of/powered by the same assistant/product. And I have a general idea of what it's capable of at the moment (see above original comment).
Everyone who thinks that there are multiple distinct assistants worked on at Google (not the separate features powered by THE ONE) - Apple would like to have a word with you. Yes, it just like chat apps - another team just need to start thinking of promotions.
And since I've been caught talking nonsense let me explain how I (as an outsider) got these crazy ideas:
I overestimated complexity involved in creating smarts behind it, Siri and Bixby mislead me. Glad to know that google assistant and duplex are two absolutely different products and duplex is not a new feature on the same (often confused by the same words commands) foundation.
And I thought that due to all the complexity to create this original assistant/AI it would be high-level product where even in messed up corporation environments no duplicate efforts would slip through (we all know how dysfunctional big companies are, or maybe just because it's Google (they have multiple chat apps) that's the way how they approach all the initiatives). Of course I was wrong and they have multiple assistants being created. It's just a matter of having one more team tasked with it.
Those Apple and Samsung, looks like they're not even trying!
And about submission itself - it was an absolutely real call to suspecting nothing regular person, who just happened to talk in that particular way, similar to how I try to provide all the context when doing web searches (but funny thing is that it doesn't even matter - demo was not to make a call to real restaurant, it was just showing functionality being developed, they may have even put someone on scene to take a call, but decided to insist that demo call was absolutely regular one instead)
Anyway, I hope everybody is happy and not all worked up anymore, I was wrong on the internet, was put on my place and accepted that.
They store billions of lines of code in a monorepo. If they don't have an organizational structure and automated testing tools to make sure they aren't having multiple teams of expensive engineers duplicate work, they might as well close up shop right now.
It's true. You're making a ton of unsupported statements.
Is it so crazy that they would have two teams, or more, working on something like this? Not really, especially if you're familiar with how Google does things (how many chat apps have they had? do you think those were from one team?).
You're making statements like they're facts but you haven't backed anything up at all.
There is no concrete evidence supporting the theory that the demo was faked and you have multiple, hard accounts even just in this topic, that Googler's have been using this service already. So everyone is lying?
Just tried calling a bunch of restaurants in my neck of the woods. They all identified the restaurant right off the bat, along with the person speaking.
Is it some wierd Bay area thing that businesses don't identify themselves when they answer a call?
I have noticed that some Chinatown restaurants don't tend to identify themselves when I call in. I think part of it is a language / cultural barrier. It's a lot different calling a chain where they have huge manuals on how to interact with customers vs an immigrant family run business. Is that not the case for your Chinese restaurants?
However, more likely I think they edited out the initial introduction to give the businesses privacy. They also probably told the businesses beforehand that there was a chance they would be calling in the future with bots to get around California 2 party laws on recording (or maybe they got approval afterwards although I'm not sure that's legal).
That's the point. "What you're about to hear is an actual call." ("...what you're not about to hear is that we have heavily edited it"? Or perhaps "lightly edited it?" Well nobody knows - and since Google aint' telling, it is reasonable to assume the former).
I think faking is the wrong word here, but it seems entirely possible that the feature was demonstrated using ideal conditions, etc. Consider that we still haven't seen the Google Photo object removal feature demoed last year.
Every company showcases ideal conditions during product demos. Literally every iPhone interaction during a WWDC has been totally scripted. I don't know why people would assume a company would do anything other than show their product in the best possible light.
Although for a product like this, having it work in less than ideal conditions is showcasing their product in the best possible light. I'd agree with you if they were doing a live demo.
Actually it seems like the iPhone demos are pretty natural. The last one Craig had an issue with unlocking using faceid so he had to switch to a backup. Seems pretty authentic to me.
To be fair there is a long and storied history of tech companies juicing their demos in various ways. IIRC correctly the original demo of Microsoft's OLE technology was a carefully constructed simulation, as were parts of the iPhone's demo at its unveiling in 2007.
As long as the final product works as advertised, no one seems to much care.
The biggest difference is that most software is not trying to figure out the vagueness of human language. It's much easier to get a few apps working and fix a few bugs than getting natural language processing working with a level of accuracy.
I think another way to word the question is that if they're so confident with the technology, why didn't they demo it live? Why the recording? And why clearly edited recordings?
I wouldn't trust a live demo in that situation even without the AI in the mix. You have thousands of people in the room and millions online and you get "Hold please" when calling a business. Great demo.
"Dear aunt let's set so double the killer delete select all", et al. In other words, a demo is supposed to be awesome...actually running the real thing tends to crash and burn spectacularly. What the uproar seems to be about is that Google insinuates "this is not scripted" (doesn't quite claim it, it's extremely carefully worded: "Google Assistant actually calling a real salon" - never claims it is not rehearsed, just that it's a phone call and a hair salon), while showing something that almost sounds non-fake. Almost almost.
I'll say this again. Currently I have Google home hooked up to my home automation system. It has trouble correctly identifying commands for things like turning on my desk light and understanding South African accents when friend visit from there. My Alexa does a lot better with that. Yet somehow I'm to believe that they have made this massive step forward. Google used to under promise and over-deliver. Now it has been taken over by marketers who tend to over promise and under deliver.
The truth is that they still make the majority of their profit from search, but need to look like they have something to back that up to keep the stock price up. Between that and Waymo it really felt like they don't. Android and the Chomebooks are promising as technologies, and I like they refinements they are offering, but the monetization strategy doesn't seem to be clear strong. It is definitely not as strong as Apple's ability to make profit from selling Macs and I-Phones...
Sorry for the irrelevant comment - I've just been looking at your past comments. Looks like your gripe with Google started 4-6 months ago and been continuing ever since. What happened?
The parent comment itself describes dissatisfaction while using Google products. Perhaps this use, or at least the dissatisfaction it inspires, began then? Regardless, this sort of questioning seems unfair.
There are certain things in the original comment that I agree with(in certain situations I too have found Alexa to be better performing than Google Home), certain things maybe not so much(like this statement "Google used to under promise and over-deliver. Now it has been taken over by marketers who tend to over promise and under deliver.")
One of the things I've decided to do these days when I find things that make no sense when I read them, is to figure out if the commenter had any pre-existing biases.
So true. Especially when people hold presentations that don't do this on a regular bases it's often beyond recognition a) what is going on the screen and b) what the point is.
A sign of a good manager/exec is their reaction to a demo. If they say "I don't believe you, this looks magical", they might actually be thinking about what it would take to bring your demo to production. If they say "wow that's great, amazing work guys!" it is time to worry.
I guess it's nice to give great feedback when the work was great. I don't see why people sometimes want to obsessively find a problem that doesn't really change things.
But generally I totally agree with you. When the manager only makes positive comments and doesn't engage, the person is not being helpful at all.
Apple also faked their "entire internet" iPhone demo in their 2008 keynote. At the time, if you visited the National Geographic website without Flash installed, you'd be shown a hero banner stating "this presentation requires Flash" with a large download button. Jobs' demo had the banner, but the Flash message was edited out.
What I am getting out of this is that even though it was mostly fake, it still managed to do what it was supposed to which was change the smartphone landscape and turn it on its head.
So probably the calls were faked, but so what? It is a glimpse into the future, it shows what this tech can do maybe not now but a couple of years in the future.
Is it dishonest? Yes, but then again Google is not the first company to put vaporware out. Microsoft did it first and that did not stop them to become a leader in the industry.
They faked the phone's signal bars to always show five bars, since it was expected that the radio software would crash somewhere in the 90-minute presentation.
>The solution, he says, was to tweak the AirPort software so that it seemed to be operating in Japan instead of the United States. Japanese Wi-Fi uses some frequencies that are not permitted in the U.S.
Not only were these calls not faked, the dogfooders were required to go out and honor their reservations.
Internally, the criticism is brutal, because we want free-form, general, conversation abilities, and we don't consider the demo to be perfect.
But fake ?
IMO, if the team wanted to fake a demo, they could had done that years ago when the project started.
But we are Google. We don't fake demos because we don't have to.
[edit: To add to what I said: If you think that a team at Google could fake something at this scale and have the face of the company back it at our most high-profile event of the year, just think of the aftermath internally. The code is available for all of us to study. The design docs are there for us to read. There were thousands of engineers that at least peeked at the codebase even on the same day of the demo.]
I understand that place of employment is a big part of one's identity, but remember that you're a single Googler, not speaking on behalf of 50,000 of us. You should not be using the word "we" the way you are - it's in poor taste.
I disagree with you. The company has identity. Would I had misrepresented you if I had said that we are the best search engine in the world ?
I tried to give a balanced viewpoint so that I don't project more than what has been accomplished, but I feel personally offended when I read we are faking demos, and I instinctively desire to defend to the best of my abilities and without revealing non-disclosed information.
I'm looking past your ``poor taste'' editorialism and I apologize if you feel that I offended you or singled you, or a part of the company, out.
It is just not a good idea to authenticate yourself as a Googler in these sorts of discussions. If there is doubt in the media about a tech from Google, Google's PR department is more than capable to handle that in due time.
I think part of this guideline applies, and following it should avoid disclosure, embarrassment, or being forced to speak on the defensive of an entire company (not a job that most developers are automatically good at).
> You probably know that our policy is to be extremely careful about disclosing confidential proprietary information. Consistent with that, you should also ensure your outside communications (including online and social media posts) do not disclose confidential proprietary information or represent (or otherwise give the impression) that you are speaking on behalf of Google unless you’re authorized to do so by the company. The same applies to communications with the press. Finally, check with your manager and Corporate Communications before accepting any public speaking engagement on behalf of the company. In general, before making any external communication or disclosure, you should consult our Employee Communications Policy and our Communications and Disclosure Policy.
While a compiler may block you from writing faulty code, the media will just take your faults, and then present them as truths coming from upper management.
He broke no policy here. He made the claim that it was real (the same as Google's public stance as evidenced by Pichai's statements at Google IO) and that he's seen the code (if you've seen enough publicly available Google talks you'd know all code is public for whoever wants to view it internally).
I'm not saying he broke any policy. People will simply misrepresent - or take advantage of - his just well-intended representation.
Just take out of context or read the following with a different job role and see why these guidelines make sense:
> we are Google ... These kind of systems need 99% precision ... I feel it was full of tiny imperfections ... Internally, the criticism is brutal ... It seems that I am "attacked" primarily by fellow googlers ... There are a dozen variations of this question already for TGIF ... the team wanted to fake a demo, they could had done that years ago ... a team at Google could fake something at this scale and have the face of the company back it at our most high-profile event of the year ... Would volkswagen be able to do what they did ... I've been told that there were cases were the human would react by saying "no, you are not a robot, you are human!" when they were told that the caller was a bot
Nobody forced me to defend the company, that should be obvious, plus I'm sure I'm within policy here.
It seems that I am "attacked" primarily by fellow googlers, which speaks to my point: If this demo was fake, it would have been rightfully torn appart internally.
By the way, did you just copy-paste our internal policy on a public forum ?
I am not trying to imply you are against the policy, just that it isn't a good idea to make yourself an accessible target in a "witch hunt".
The media is clearly trying to kick up some shit. They know Duplex is hot, and so they try to find another angle/drama/controversy to continue the clicks-cycle. If they had anything of substance, then plenty of AI researchers would be lining up to be cited, warning against AI-hype and winters. The article would be called: "Google faked its Big A.I. Demo!". Now they are still on the prowl for anyone that will dignify them with a soundbite, be that on social media.
Notice how few Facebookers stepped up here on Hackernews, when Facebook was the target of a negative news cycle. There is just no winning, just a lesser of two evils: Take a temporary hit to your pride, or let the media and fellow Hackernews posters take everything you say as an official company statement, attacking you and your colleagues while you weren't even directly involved in the project and can do little to alleviate any concerns or lies.
My 2 cents: The demo was not faked. But of course the samples were cherry-picked to make for a good demo. Also, a large part of the negative coverage stems from irrational fear or misunderstanding of AI, futurism potential, and the first uncanny valley for natural conversation.
> a large part of the negative coverage stems from irrational fear or misunderstanding of AI...
Irrational? No. At the heart of this demo is deception. After the deception comes "impressive tech" and all the rest.
Booking hair appointments is one thing, but we all know these systems will be babysitting our children, teaching them new things, and responding to their verbal prompts.
Emotional development in children is crucial for psychological health. FAKE synthetic emotion is not healthy, it's not cool.
Having "Googlers" at the top of the ethics pyramid for AI systems in our homes, is worth a healthy dose of fear and loathing. "We are Google. We don't need to fake demos [of our fake human voice]" is precisely why Google shouldn't be dictating the terms of AI standards around ethical concerns and communication disclosures.
This is about more than hair appointment bookings.
The article and comments you are responding to is about the demo being poorly faked.
You seem to conflate this with another issue with the demo: It was so good that it seemed real, and you deem this to be deception/deceptive.
While I may share some of your concerns, I can't help but compare it to the rants against video games: unhealthy, fake social interaction. Often used by politicians without any scientific backing of their claims.
I do understand the current attraction from the general public to the unsurprising AI research going on at top labs. You don't need any relation to the field to muse about killer robot singularities and 2000 year old ethics philosophy, and no one will brand you a fool like they did to the people warning about earth-eating black holes at CERN.
No, I was responding to your comment, and your comment only, which is why I quoted you. I'm not conflating anything. Your tangent about "irrational fear" is what I was responding to, there's nothing more I can add to make that clearer.
You mention video games. In a video game, the interactions via voice chat, or text chat are person to person. I am not aware of any politicians calling that "fake". It's not related though, as online multiplayer gaming is a sub-set of a specific type of digital activity, whereas AI and bots and voice recognition is all-pervasive. It will be everywhere, and deserves scrutiny because you won't need to be a "gamer" to be exposed to this technology. IMHO it's vital we continuously examine the ethical concerns.
I don’t think anyone is suggesting it’s “fake” in the sense of complete vaporware. It’s that the demo gives the impression that it works great already and is nearly ready to launch, but we suspect it may be much less complete and much further off than that. The various suspicious details back that up.
It’s not like there’s a conspiracy of silence between yourself and hundreds of other employees; more that hundreds of people all independently have over-inflated faith in the product and all independently cut corners and “cooked” the demo in various small ways. I can easily see that happening in an intense, secretive team, even if every individual’s intentions are basically good.
From my perspective it seems unfortunate that google has mishandled these questions to the extent that we’re seeing stories saying the demo was fake. Perhaps the goal is to generate a narrative that the technology is so astonishing that everyone said it was fake.
Was the demo that amazing though ? I feel it was full of tiny imperfections. (Not to be a nihilist, it was a big step forward for sure.)
Anecdotally, I've been told that there were cases were the human would react by saying "no, you are not a robot, you are human!" when they were told that the caller was a bot, but I haven't been able to verify this.
The demo was not amazing. Take the the original Eliza/Doctor program. Take the "Hello, this is Lenny!" program/recordings. Mix those with some basic speech recognition...
The software might be good, better than what other teams would have produced, but I can't really tell that from a couple of carefully selected (and edited) recordings.
Think about what Volkswagon did. Companies can, at very high levels, commit incredibly blatant illegal acts on a massive scale, and can successfully cover it up for a significant amount of time.
I suspect you'd be surprised what skeletons Google has in their closet.
Why do you think Google is so scared to answer journalists' questions, if you are confident in the veracity of the demo?
Oh no, as an individual, I cannot. But a 1,000 engineers looking at it, even if they don't understand it in full, they'll be able to tell if it's fake or not.
How could we say the code is fake? We don't have access to it. The code is not under discussion here. That the "demo" was fake is another matter entirely.
Note the existence of code in a repository (that even, "works") does not mean the demo presented at I/O wasn't staged or heavily edited.
Again, if everything is above board here, why isn't Google answering questions? Perhaps pose the question at TGIF? If they have nothing to hide, why risk looking bad by not answering?
This reminds me of the advice: Don't write articles that can be answered with a single word "no".
Unlike startups, it's not like Google's business depends on investors supporting lofty goals. There would seem to be no benefit to faking a demo like this.
Not revealing the business location is likely just because they consider their business partners to be business data that they don't want to give away to competitors. Clean auto could easily just be done via a simple noise filter, just for the sake of the demo. When you have a little noise in your ear, it's not bad. But when you need to broadcast sound to an entire stadium, you need to rebalance it. This story is a load of nonsense.
How do we know the answer is no? Maybe Google’s software can do what they say but they didn’t demo that.
Thay supposedly made a call (won’t release proof) to a restaurant (that they won’t name) and talked to a receptionist (who didn’t mention the restaurant name) that didn’t ask what time the reservation was for. And you couldn’t hear the restaurant in the background.
The demo is really suspicious. They deserve to be called on it. For all we know that was a recording of a fake training/test call with a Google employee.
If they manipulated the audio in some way (cut out an intro, filtered out noise, etc) they just have to say so.
Google deserves this kind of scrutiny. They’re a MASSIVE company, they should be able to handle these kind of questions about new products.
Especially those that are supposed to be released in the next few months.
Demo faking happens quite a lot even when company is not seeking external investment. I won't name the name but one of the high profile demos by a was actually borderline fake. The CEO insisted for his desired demo and was promised by his senior team months in advance. The whole keynote was structured around this demo. As usual, project was behind the schedule and hopeless. Ultimately the decision was to semi-fake the things while walking the fine line.
> Unlike startups, it's not like Google's business depends on investors supporting lofty goals. There would seem to be no benefit to faking a demo like this.
If they don't need to impress anybody, why would they even have a demo at all?
> Unlike startups, it's not like Google's business depends on investors supporting lofty goals. There would seem to be no benefit to faking a demo like this.
You could say the exact same thing about Microsoft and the Milo demo.
> The snippets of conversation during Pichai’s demo, which can be heard in this clip, seem too polished and unrealistic to be real.
Reminder that Duplex explicitly uses Concatenative Text-to-Speech - i.e. they record humans saying phrases, and just play those soundbites back where appropriate. Sort of like a chess AI storing a dictionary of opening moves.
I believe most concatenative TTS systems nowadays are diphone/phone based, rather than playing back full phrases. But being phrase-based could explain how good the demos sound
If you want to be able to say arbitrary text, then you pretty much have to go lower than word level, because the number of different words is effectively unbounded. However, if you're speaking a controlled language where there are clear limits to what the system would ever want to say, then there's no problem to record enough high quality words or whole phrases to concatenate them into something nice sounding.
Remember, actually calling someone is the last resort. If google can make a reservation for you online then it will do that first.
If they have to call, I suspect that it isn't 100% without human intervention. I am reminded of "Samantha West" the telemarketer "bot" that will robocall people and respond to people. It turns out it isn't a bot at all but someone pushing buttons on a soundboard to play the appropriate prerecorded response. http://newsfeed.time.com/2013/12/17/robot-telemarketer-saman...
It would be pretty easy for the small number of times that google can't make a reservation online for you to pay some people in a call center to do this.
Not these days. Without code and reproducibility, most researchers would ignore your amazing demo. You should check out recent feud between Yann LeCun and Sophia.
I find that it's easy to get a proof-of-concept together. The challenging thing is getting all the details right for a working product. Or, to put it mildly, people who don't understand easily confuse a demo with a working product.
It's probably just a proof of concept that doesn't work very well. They probably just played back one of the few attempts that it worked very well.
I even remember, when reading the article, a lot of caution in overestimating the state of the technology. So, given their disclaimer, I'm giving them the benefit of the doubt; and I'm thinking that everyone else overreacted because they don't understand software development.
Google's in a lose lose here, if they relent and identify the businesses the press will descend on them to get quotes about how they feel "getting tricked" by a bot and so on. But if Google hadn't left out the identifying information in the first place the businesses would have been inundated by callers and Google would have been criticized for letting a small business get "doxed" like that. And finally if they don't release any info there will be more stories about how it's fake.
I guess they could have the bot call Dan Primack at Axios and clear it up...
I think the more import point rather than whether or not it was "fake", was that it was certainly cherry-picked. This is an issue with a lot of machine learning and AI presentation and publications today. If you want to show how useful something is, there needs to be a functional demo.
Who cares. Even if it is not a "real restaurant" - as long as it is even Duplex calling a Google employee pretending to be a restaurant, does it really matter? [note: I did not watch the actual reveal, so as long as they didn't make the exact claim]
Yes it matters. One of my go to "Hello World" type of programs that I use to write when learning a new framework or language was an Eliza like chatbot. Whenever I chatted with it, it worked amazingly well even when I was trying to test for corner cases.
The minute someone else used it, the illusion was shot. It's really easy to make a chatbot work when you keep it on the rails - even when you're not intending too.
Voice synthesis is really fantastic at this point. But there is a critical flaw in every single actual USE of it. They are obsessed with their 'cloud' and don't do any of the processing locally. This guarantees the telltale latency lag which can not be remedied. Conversation with any synthetic voice will always be a beat off due to the latency. It's the same as the annoying wait while Alexa or Siri sends your voice data off to a cloud to be processed (which is utterly unnecessary for the recognition part). You will always know you're talking to a robot because the brain isn't in the head you're talking to and the speed of light is only so fast.
> It's the same as the annoying wait while Alexa or Siri sends your voice data off to a cloud to be processed (which is utterly unnecessary for the recognition part).
True - in fact the iPhone actually does transcription on-device, AFAIK the cloud loop is for validation and to get answers.
Dont believe me? Put your iphone in airplane mode and use the dictation button on your keyboard!
This. I believe the conversations were genuine, but I also believe they were two of hundreds or even thousands. Cherry picked. I am absolutely certain that some calls ended in complete confusion on both sides.
Just curious but does it really matter for a tech demo. IMO even if the person answering the phone was scripted but the AI portion was real, its still impressive to me.
Is playing an edited video or audio piece a "demo?"
They clearly portrayed it as a real, unedited call. If they were dishonest, of course that matters. Arguably ethics don't matter as much as the underlying tech, I suppose. But it still says something about the individuals involved.
Probably more interesting regarding their ethics is that they've seemingly declined to answer most questions about the demo. They've declined to answer whether or not Duplex records calls, they've declined to confirm who they called with it, they declined to confirm if they tested it in a state that recording the calls is legal...
They seem incredibly dodgy for something they announced openly in front of thousands of people.
But do they really need to answer those questions? Clearly they don't want to publisize the establishments that they called. The most likely record calls (although they probably don't know if the final product will) and why would they confirm that they have done something illegal.
It doesn't hurt them to not answer these questions, and they definitely don't want to answer some of them, so why would they?
It could be that the audio is only edited in those first several seconds. That a normal call goes "Hello this is [business name]." Then Google starts recording as it replies that the call is being recorded. And then starts ordering.
And that's likely why, weirdly, no business identifies itself in the audio. They don't want to admit to ANY editing because it would open up MORE questions.
In the demo, the interaction was described as being "real":
> . “What you’re going to hear is the Google assistant actually calling a real salon to schedule an appointment for you,” Pichai told the audience. “Let’s listen.”
Your employer can record your phone calls at work. Google's testing of this service would have simply required Google to obtain permission from the restaurants enabled in its tests.
Your article says it works that way, so Google only has to ask permission from the employer. The employer is the "person" who must grant permission to record the call.
> Also, any business that is recording its employees must notify customers.
Google did the recording, so that doesn't apply. Even if it did, it would make sense for Google to edit out the "Your call may be recorded for quality recording purposes," in the demo.
Google AI hadn't published any major papers on ASR or TTS since WaveNet, even the new WaveNet google demo'd a few months ago weren't even close to this demo's voices. The use of "Uh Huh" "Hmmm" are called back channeling, it's moon shot away given the current state of technology. It requires very low latency and precise timing so it doesn't cut the speaker off.
I believe if this demo were real, it was tested and cherry picked from a very particular environment
They have published multiple major TTS papers since the original WaveNet paper in 2016. Including the recent Tacotron results with impressive style control.
I immediately discounted the whole "Uh Huh" "Hmmm" thing as just a cheap psychological trick - a way to humanize the interaction without requiring much effort at all. As for the rest of it, I didn't necessarily tag it as fake but I did recognize that the act of making a reservation is a fairly structured process, requiring the exchange of a few critical pieces of information in a quick and efficient manner. But not necessarily something to be bowled over by, and no doubt something well-rehearsed in this case. Given the choice, though, I will generally always go online to take care of something like this rather than interact with a human anyway, since there's always the possibility that they will just screw it up on their end.
BTW, I have no particular reason to trust (nor distrust) anything Google says or does.
This is a long shot but, do you think it is at all possible that they've had some kind of a breakthrough using their quantum processing work, and are using that to accelerate machine learning models and produce much better quality generative speech? After hearing about Bristlecone I can't rule it out.
"As Axios[0] noted Thursday morning, there was something a little off in the conversations the A.I. had on the phone with businesses, suggesting that perhaps Google had faked, or at least edited, its demo. Unlike a typical business (Axios called more than two dozen hair salons and restaurants), the employees who answered the phone in Google’s demos don’t identify the name of the business, or themselves. Nor is there any ambient noise in Google’s recordings, as one would expect in a hair salon or a restaurant. At no point in Google’s conversations with the businesses did the employees who answered the phone ask for the phone number or other contact information from the A.I. Further, California is a two-party consent state, meaning that both parties need to consent in order for a phone conversation to be legally recorded. Did Google seek the permission of these businesses before calling them for the purposes of the demo? Was it staged in the simulated manner of reality TV?"
0. https://www.axios.com/google-ai-demo-questions-9a57afad-9854...