Hacker News new | past | comments | ask | show | jobs | submit login

There is a certain awkwardness I feel whenever I visit a friend who had smartified the house.

"Oh, let me demonstrate this. Ok Google, turn off the lights."

"Ok Google, turn on the lights now."

"Ok Google, mute."

"Ok Google, turn on the lights."

"Ok Google, turn on the lights, damn it."

"Ok Google, turn, on, the, lights---there you go. I swear it works better yesterday."

Might as well just flip the switch myself, if I have to debate the assistant half of the time in total darkness.




Smart assistants need to be able to bind predefined activity to clapping. The clapper had the UX nailed.

"Alexa, ask phillips hue to turn on the living room light"

vs

clap clap


The insistence on making the spoken interaction feel as "human" and "natural" as possible honestly introduces way more confusion than it needs to and makes the whole thing feel uncomfortable for its parasociability and stiltedness.

In Star Trek they were perfectly comfortable saying "Computer! Do the thing" in a more specific, 'computer' intonation. It was all fairly natural language, but there is no attempt to pretend the computer is a person. This made the thing feel more futuristic than what they're trying to do now.


It's not even that natural; with a new baby in the house I've really grown to dislike Alexa just for how much I have to yell at it. We're not a loud household, but talking to Alexa is like talking to my grandfather without his hearing aids. Everything has to be said at least three times, in increasing volume levels.


I just rewatched all of TNG, and that's actually not the case: there are several instances where a crew member (often Geordi?) would speak to the computer in a more "human" conversational way. I recall one episode in particular where Geordi's trying to set the mood in his quarters for an impending date, and he's very conversationally refining the music choice until he gets what he wants.


Yep, Star Trek computers understand addressing, the conversation is modal: one does not need to begin every sentence with the keyword, a first use of the hotword (or implicitly in some cases, like entering a turbolift) combined with a specific tone makes the computer “open” the conversation. From then on, tone only is sufficient for the computer to know when it is being addressed. With a conversation opened, context is remembered.

I am flabbergasted that the following hasn’t been an option:

- hey Siri

- yes?

- what are the last three releases from <artist>?

- X Y and Z

- search again without EPs

- W X and Z

- Play the first one

- <playing W>

- thank you Siri

<conversation closed>

Also, with attention tracking that -already- exists with the FaceID array, the phone can know when it is addressed and when it’s not. You know, just like when you’re talking to someone, you usually look at them...


Context is a hard problem and even the best chatbots don't nail this.


Star Trek's computers also never had the "I don't know a device named 'Lights'" problem that Echo devices often suffer.


As someone who recently acquire a Clapper... I had forgotten just how finicky it could be. Can't clap too quietly, too loudly, too slowly, or too quickly. Often takes me three or four tries! Not exactly an enjoyable experience for the person sitting next to you...


Out of curiosity, is the proper clap speed the same as their song from their commercials? If so I feel like I could nail that cadence every time.


Also: would the clapper ad activate nearby clappers??


I vaguely remember a sitcom where someone had to watch a important event with friends on TV and someone else installed a clapper on said TV. Everything went well until the applause kicked in.


It's 2020, how can things like this be so hard?


That probably depends where on the scale of analog peak detector, through to some kind of AI/DSP FPGA monster you want the product to be.


Yep. "That sounds like a clap from the next room" versus "that sounds like a quiet clap in this room" is not an impossible distinction to make (humans could do it most of the time), but not trivial either.


Sounds like (pun intended) multi-directional mics would solve this, i.e. analyzing echo patterns from different directions?


Yeah, then they’ll have caught up to the 80s.

https://en.wikipedia.org/wiki/The_Clapper




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: