The insistence on making the spoken interaction feel as "human" and "natural" as possible honestly introduces way more confusion than it needs to and makes the whole thing feel uncomfortable for its parasociability and stiltedness.
In Star Trek they were perfectly comfortable saying "Computer! Do the thing" in a more specific, 'computer' intonation. It was all fairly natural language, but there is no attempt to pretend the computer is a person. This made the thing feel more futuristic than what they're trying to do now.
It's not even that natural; with a new baby in the house I've really grown to dislike Alexa just for how much I have to yell at it. We're not a loud household, but talking to Alexa is like talking to my grandfather without his hearing aids. Everything has to be said at least three times, in increasing volume levels.
I just rewatched all of TNG, and that's actually not the case: there are several instances where a crew member (often Geordi?) would speak to the computer in a more "human" conversational way. I recall one episode in particular where Geordi's trying to set the mood in his quarters for an impending date, and he's very conversationally refining the music choice until he gets what he wants.
Yep, Star Trek computers understand addressing, the conversation is modal: one does not need to begin every sentence with the keyword, a first use of the hotword (or implicitly in some cases, like entering a turbolift) combined with a specific tone makes the computer “open” the conversation. From then on, tone only is sufficient for the computer to know when it is being addressed. With a conversation opened, context is remembered.
I am flabbergasted that the following hasn’t been an option:
- hey Siri
- yes?
- what are the last three releases from <artist>?
- X Y and Z
- search again without EPs
- W X and Z
- Play the first one
- <playing W>
- thank you Siri
<conversation closed>
Also, with attention tracking that -already- exists with the FaceID array, the phone can know when it is addressed and when it’s not. You know, just like when you’re talking to someone, you usually look at them...
As someone who recently acquire a Clapper... I had forgotten just how finicky it could be. Can't clap too quietly, too loudly, too slowly, or too quickly. Often takes me three or four tries! Not exactly an enjoyable experience for the person sitting next to you...
I vaguely remember a sitcom where someone had to watch a important event with friends on TV and someone else installed a clapper on said TV. Everything went well until the applause kicked in.
Yep. "That sounds like a clap from the next room" versus "that sounds like a quiet clap in this room" is not an impossible distinction to make (humans could do it most of the time), but not trivial either.
"Oh, let me demonstrate this. Ok Google, turn off the lights."
"Ok Google, turn on the lights now."
"Ok Google, mute."
"Ok Google, turn on the lights."
"Ok Google, turn on the lights, damn it."
"Ok Google, turn, on, the, lights---there you go. I swear it works better yesterday."
Might as well just flip the switch myself, if I have to debate the assistant half of the time in total darkness.