Hacker News new | past | comments | ask | show | jobs | submit login
AirPods as a Platform (julian.digital)
128 points by sc90 on Feb 16, 2021 | hide | past | favorite | 137 comments



The article overlooks one of the major reasons for wanting a non-voice interface to an audio experience: Being quiet.

Most of the time, if I'm wearing headphones, it's so as to not disturb others around me. Otherwise I'd play it out loud.

This benefit goes away when everyone around me suddenly hears me bark out loud to adjust the volume, or send a text, or what-have-you.

I'm a big believer in audio-as-a-platform (particularly the AR possibilities), but I hate audibly trying to speak to a computer. It's by far the worst input interface.

(On the other hand, much like cameras, the best interface is the one you have with you. I yell at my Google speakers all the damn time, because my hands are busy around the house. But those are also speakers, not headphones, and therefor a different use case.)


So far the only thing I find myself using voice commands for is to set alarms on my iPhone. Saying "Hey Siri, set an alarm for 15 minutes" is so much easier than doing it manually through the clock app.

Other than that though, I haven't found any practical use for them. Maybe once AI improves a lot, and it feels like you're talking to an actual person, it might be more useful.


Similarly the only use Siri gets on the kitchen HomePod is for hands-free timers while cooking.

In my eyes the primary thing that's standing in the way of voice assistants being useful isn't even as high of a bar as general AI, but just the ability to parse a command into multiple, potentially chained commands. Even with the inability to figure out things like context that would boost usability a lot — for example, it'd allow commands like, "set timers for 5 minutes, 10 minutes, and an hour" instead of having to make each request separately.


Google Home supports this nicely. On my Lenovo Smart Display, I just tried "Hey Google, set a ten second timer and a fifteen second timer" and she did just that.

They also support custom "routines" that you can program. We have a Clever Dripper [1] and use Tom's (of Sweet Maria's) recipe: stir after 90 seconds, drip after 4 minutes (2:30 after the stir). So I created a routine called "coffee timer" that triggers those two timers, and when we pour the hot water into the Clever Dripper we can just say "Hey Google, coffee timer" and it sets both of them.

Actually I put in four variations so we don't have to get the language perfect:

"coffee timer"

"coffee timers"

"set coffee timer"

"set coffee timers"

[1] https://www.sweetmarias.com/clever-coffee-dripper-large.html


The first one to support a "conversation" even if with same level of intelligence and understanding today will be a MASSIVE improvement. If it would listen while talking so you can interrupt it. Would be so much more useful.


I feel like I want the opposite. For me, command lines work. They are precise, they are consistent, they are easy to reason about, and they are composable, which allows me to do very powerful things on the fly. There is a reason why I do not have a natural language interpreter wrapped around my Linux command line.

What a lot of these natural language systems do is they force me to intuit their interfaces instead of just looking them up. It's more complex, harder to learn, and much less powerful. What I want is very reliable, accurate voice detection with a well-designed, composable interface.

I want to treat Alexa like a computer, not like a person. People are not the most convenient interfaces to interact with, computers are better to interact with for me than people are.

I understand that different people are in different positions, some people want to have a conversation, but you're not going to make a voice assistant that's good for me if you follow that goal. At some point the ecosystem needs to fracture and diverge so that normal people can use whatever NLP interface Google/Amazon/Apple spits out of their AI opaque-boxes, and people like me can use a voice interface that is designed around well-tested decades-old computer UX principles that have been proven to work well for power users.

My vision of a voice-operated utopia isn't treating Siri like a person, it's on-the-fly composing a complicated new task by voice that is saved for later use. It's using timers as a trigger for other tasks with some kind of pipe command so I can tell Siri to send an email after ten minutes, or so I can have Siri look up a search and seamlessly pipe the result into a some kind of digital notebook.


My issue is I want to open a "voice shell". I want the same obvious commands but with more rapid responses and without needing to say "Hey Siri, X" every time. There should be a natural way to enter a mode where the back and forth is still not natural language but voice.


This is another area where I feel like the good answer is not necessarily the technically complicated one.

Saying "hey Siri" is fine if I'm in bed or in the shower, I don't need quick access to a shell in those places necessarily. That's fine to have as a backup. But for normal operation, if I'm wearing a smartwatch, it will pretty much always be more convenient and faster for me to tap and hold on that watchface than it will be for me to say "hey Siri".

I mean, that's a boring answer, but there's also a reason why my computers have buttons. I wouldn't want to use my phone because that's in another room or in my pocket. But a watch will always be reachable in less than a second, and the modern watches are waterproof, and I don't need to look at anything to use it -- I can just tap my watchface and start talking. And if my hands are dirty, or I'm carrying groceries, or I'm in bed, falling back to "hey Siri" isn't the end of the world in those scenarios.

In practice, when I see people interact with voice assistants today, they stop what they're doing, they give the command, they listen for a confirmation, and then they start what they're doing again. The biggest bottleneck there for their speed is precision -- they intuitively know that they need to stop what they're doing and optimize for the device. The precision, and the delays that are built into the UX to confirm what's happening -- that's the bottleneck. So if there's an operating mode that is just as fast and way more precise, we should just do that, we don't need to use voice triggers 100% of the time.

Bonus points if we're wasting processing time for a voice assistant to make a round trip and process the audio clip to try and figure out who's speaking. The person who pressed their watch is speaking, boom, we can get rid of that response delay now. How much time are we wasting trying to come up with wake words that optimize for both speed and precision -- when using wake words only as a fallback would allow us to make them more precise because they could be longer, more deliberate phrases?


Alexa will shut up immediately if you say “Alexa shut up” or “Alexa SILENCE” when she’s talking. We got an echo dot which came with our Alexa enabled microwave, and now that I have it set up to stream Apple Music, we use it all the time. Our thermostat also has it although when we change the temperature she’ll sometimes inexplicably change the house temp to 60°F which we won’t realize until 3am when we’re freezing in bed.


Wait... Alexa enabled microwave? I am genuinely interested in what the point of that is.


Well it’s also an oven, I often say “Alexa preheat the oven to 400” or “Alexa air fry for 10 minutes”. For soups instead of having them explode I say “Alexa microwave at power level 3 for 9 minutes” which avoids the exploding soup problem. Of course there’s also “bake at 400 for 30 minutes”.

I also have a five year old who is pretty competent at talking to Alexa even though she can’t intuit what the button controls mean on the microwave oven.


As someone that has last used microwaves on a regular basis when they just had two dials, I could see the use for Alexa to figure out how the more more modern ones with like 40 buttons work.

The first time I tried the office kitchen microwave, I had to ask someone from accounting standing around, how to heat my cup, because that stupid thing just responded with a condescending beep to whatever I pressed :/


Was this a microwave where "hit digits to type in a time, then hit start" doesn't work? If yes then yikes. If no then you only have to learn it once and trying to use alexa sounds much more complicated.


To sell more microwaves!!!!

/S


"Alexa stop" works too and is a bit more polite in my opinion :D


Hah yeah most of the time Alexa is playing Cocomelon music so I guess I’m letting out some frustration at being subjected to nursery rhymes all day.


>Similarly the only use Siri gets on the kitchen HomePod is for hands-free timers while cooking.

Same here. Just wish it could handle setting multiple timers (yes I realize you can set multiple alarms, but those aren't one-time use).


Homepod can set multiple timers now, just not in a single utterance.


Ahhh. I actually don't have a Homepod, so I use my iPhone in the same way. As far as I know it's still limited to a single timer.


Ah true. Maybe some day the iOS team will catch up with Homepod. Or even Android ;)


Chaining works on Google Home (and, I believe, Alexa?). I use it all the time, e.g., "Cancel the 5 minute timer and set a timer for 6 minutes" or your example of setting multiple timers.


This particular use case works for me for HomePod too. More complicated ones fail, though.


When I go to bed it's:

Hey Siri, turn out all of the lights.

Hey Siri, play Ocean from Ambient Sounds.

Not a huge deal, but enough of a bump to break the flow.


You could do a shortcut for this?

I've got one for "Siri, good night" that turns off the lights, sets the phone to DND, turns down the brightness and starts Sleep Cycle in the correct mode depending on the day (alarm on/off).


I have a HomePod in the kitchen and use Siri for two main things :

1. Hey Siri, add Olive Oil to the grocery list. Super easy. When I’m cooking and running low on something, no need to break my stride and pull out my phone, or clean my hands. It’s immediately into my grocery list and out of my mental inbox.

2. Hey Siri, play some Jazz. Or whatever music. This is nice and easy to get some music on the HomePod for either cooking, working, or dinner background. The only annoyance is that Siri can be super particular at times unlike when searching Apple Music on my phone. Also sometimes my kids hijack my music selection with their own, hehe.


I have a couple HomeKit devices. “Hey Siri turn the lamp on” is easier than walking across the room as I enter the house in the dark.


These types of things are my favorite use cases. I set up a little test with a beacon inside my bedroom and now it automatically turns on the lights for me when I walk in (if I'm wearing my Apple Watch) and then turns them off when I leave. It's not perfect but it's very convenient.

I also have a few automations that are set up for when I leave the house to make sure music turns off and my thermostat is set to "Away" mode.


Yea, I have during the winter months a "when arriving home after sundown turn the lamp on" automation as well. During summer months I turn that one off.

I've generally really enjoyed basic smart home stuff. I have a handful of smart wall switches that work well, and for the lamp a wall-wart that I use. One of these days I'll just retrofit some actual lighting in the living room and can skip the lamp situation but until then... this is a nice workaround.


Could you share some details on the beacon setup? I'd like to try messing around with something like that.


My 10 year old discovered "Hey Siri turn off the TV" works... and it's amazing. Can use the light of the TV to walk upstairs then turn it off at the top.


> Hey Siri, play some Jazz.

"OK, playing Supertramp."

Somewhere along the way that became my normal Siri experience, and it wasn't always like that.


Things that work for voice commands for me (iOS/watchOS):

* Call my wife (or $person)

* tell $person $message - sends a text eg: “tell Dave ETA 25min”

* ping $myname iPad - plays sound on my iPad so I can find it.

* set timer for $duration (I wish Siri wouldn’t be so loud in acknowledging)

* remind me $task on $date $time (you can switch the arguments around)

* take me home

* redial

* take me to $location (mostly used in my car)


Apple has a list of commands somewhere - even if it’s just the source code. That I cannot find that list and have to try and guess the magic words pisses me off to no end.


A default list (for core iOS) would be nice. I imagine the entire possible list could be infinite given Shortcuts and app-specific key phrases.

Edit: found a list online [1] - seems to cover a lot of default commands

[1] https://www.sparhandy.de/apple/info/siri-commands/


> I wish Siri wouldn’t be so loud in acknowledging

Hey Siri, speak quieter / speak at 25%


If you whisper to Alexa, she'll whisper back.

Example (NSFW/NSFL/gross): https://www.reddit.com/r/MakeMeSuffer/comments/l8d0qo/who_kn...


I live in a country where a good six months I'm wearing a warm hat and gloves and my phone is buried deep in my clothes to keep the battery over the freezing point.

It's a lot easier to say "Siri, play podcasts" (Which triggers a Shortcut to start playing a specific playlist in Overcast) rather than opening my jacket, digging out the phone, taking off my gloves and fumbling with it to get the podcasts running.


Also, Airpod Pros were also not made with cold weather in mind. The squeeze can become impossible with gloves on.


I have a couple other small ones:

* Hey siri what’s the weather

* Hey siri add broccoli to my grocery list

* Hey siri remind me when I get home to take out the recycling

Same principle. Simple request, hard to get wrong, low stakes, faster than clicking


Also,

* Hey siri remind me every other Tuesday at 8pm when I'm at home to take out the recycling

Some things are more easy to say than to configure with 20+ taps on the Reminders app.


Especially the shopping list. I most often notice that I need something when I'm in the middle of using the last of it, which means my hands are full and often dirty or wet. But I also don't want to put it off because it's something I do need to do.


I have multiple lists set up for shopping so I can say "Hey, Siri, add wood nails to my Home Depot list" or "Hey, siri, add butter to my Costco list".

Works beautifully for me and the lists are shared with our household so it updates for everyone.


How did you originally setup the lists?


You create lists in the Reminders app, then just use the name of the list when adding a reminder.


I'm also a fan of "hey siri, turn off the tv" when leaving.

I have huge gripes with "what's the weather". Siri will for sure say the temperature, which means nothing in a city where the windchill is often ten degrees lower.


If you have a HomePod and an Apple Watch, you can have Siri turn off all media devices when you leave the house. If you have a CRC receiver for your TV hooked up to an Apple TV, you can have it power down the whole entertainment system when you leave.


"Hey siri, lights off" is amazing if you have smart lamp


Fun fact - make a scene named whatever you want - say “rude word the rude word” and then you can tell Siri that and the scene will be enabled! No more “that’s not nice” replies.


>* Hey siri add broccoli to my grocery list

Are you using a specific app for this or is it just a list in Notes called 'grocery list'?


It's under Reminders. If you don't have a list named Grocery, it'll create it and populate it with broccoli.


I find myself setting alarms too.. the problem is I'm trying to set timers haha

"timer 1 hour 30" is parsed into "Make an alarm at 01:30 named Timer."

Every time I do laundry I get woken up at stupid o'clock the next morning haha.

Edit: Sorry to those below for the confusion I seeded. To clarify I mean the 1 hour 30 is the bit that doesn't work. If I add minutes to the end of it it works perfectly fine.


I definitely recommend just saying a sentence. The system isn’t designed to understand requests the way you’re presenting them.


I'm not going to try changing my behaviour for a voice assistant lmao. It works or it doesn't.

That's how I talk. Thats how I've talked for 32-$childhood years.


But you are changing your behavior. If you told a human “timer 1 hour 30” they’d look at you very strangely. My suggestion is to stop using special phrasings for voice assistants.


If I told a human to set me a timer they'd look at me strangely anyway. I want quick, done, command, action. It is a computer.

I'm not having a conversation with the thing, I want it do do something. Command, parameters.


I told bash

   computer, list the invisible files in my home folder
but in a shocking turn of events, when you use it wrong it doesn't work


Come off it. If I say "timer 1 hour 30 minutes" it works fine. It's the British way of giving duration that doesn't work.

In what world does "timer one hour 30" parse into "Set an alarm for 1:30am and call it timer"

Presumably if a French person had trouble you'd tell them to just speak English?


Oh sorry, I thought your complaint was about having to phrase things as sentences. Yeah, Siri is pretty stupid about inferring that the number after hours is minutes.

Parsing time related stuff in general seems to be an issue. "Set a timer for a minute fifteen" makes an alarm named "Timer" set for 3 PM.

But if you say "Set a timer for a minute fifteen seconds" it works fine.

Curious if anyone else can duplicate the "a minute fifteen == 3 o'clock" issue or if it's somehow hearing me wrong.


I used your exact wording and it did exactly what you said: created an alarm named "Timer" set for 3PM. I also did it with Siri's language set to "English (United Kingdom)" and "English (Ireland)" and it still did the same thing, so this idiosyncrasy appears to be independent of which language Siri is set to (at least within the set of varieties of English it supports).


Also replicated in English (Australia), so I agree. I would have phrased this as "one and a quarter minutes", which works, or "75 seconds" which also works just fine. for the 1 hour 30 request above, I would usually phrase this as "an hour and a half" which works just fine. I agree siri can be very picky though - things need to be phrased a certain way, and adapting speech patterns to that is frustrating.

Other simple time-based requests such as "what's the time difference to singapore?" don't work on siri either, which is irritating as I work across multiple time zones and I'm forever figuring out time differences.


This is hilarious considering the etymology of the term "computer". The first computers were people who performed computations. Computers as you're using it, is a Digital Computer.


Here are my use cases:

1. Timers. Super useful when cooking. Also my kids can use it as they are doing remote learning and need to know when to get back to their video calls.

2. Playing videos while cooking. Sometimes I enjoy watching a sitcom while cooking on my Echo Show. Or I might want a cooking video though that’s more rare. My roommate uses it all the time for questions like “how long do you bake salmon?” With mixed results.

3. Controlling lights. “Alexa give me a light” as I walk into a room is way easier than turning on three separate switches in separate parts of the room. Turning them off is equally nice.

4. “Alexa tell me a kids joke” is a frequent thing we use.

5. Answers to random questions while at the dinner table. “Alexa who is the prime minister of New Zealand?” That type of stuff. It feels more natural than whipping out our phones.

6. We tried using Echo Show’s drop in feature but it is just too intrusive as compared to something like FaceTime. The other side doesn’t have to pick up the call. You just are in their house, camera on and all.

7. My kids really like when we play Harry Potter quiz. It’s a silly app but it is somewhat entertaining.

8. Really funny “routines” (their word for scripts). “Alexa, set condition two throughout the ship” to turn off all the lights. “Alexa, release the kraken” to set off the Roomba, etc.

9. My kids listen to podcasts all the time.

10. My kids use it to help them spell difficult words.

What I wish I could do a bit more with it is integrate it with things like random status dashboards. I combined a power metering AC plug with my washing machine and Home Assistant to know when it’s done running. Would be nice to be able to say “Alexa notify [roommate’s name] when the washer is done.”

Overall I think something much simpler that does processing locally could replace it for me but so far these things are cheap enough (echo dot) to put in every room.


Shopping list while I'm cooking. "Hey Siri, add tomatoes to my shopping list".

(And then said list is shared with my family.)


I was the same way until I got a HomePod. After a little bit of trial and error, I use Siri a lot for automations, timers, shortcuts, spelling assistance, and a whole slew of other things. That has now translated back to my phone so it's nice that the interface across all my devices is the same for this.


In addition to that I frequently use "Hi Siri, call the cute one" (probably bad translation but feels natural in my native tongue and amuses my kids.)


Agreed. Lately I've been asking Siri how to spell words (not that I'm worse recently, I just found it that worked). Much easier than the alternative.


I use VoiceControl while feeding my infant. The phone is mounted nearby and I can navigate using voice commands like “Swipe up”, “Tap [button]”.


I use it for that and home control stuff mainly. Turn off the heater, turn on the lights, etc. Sometimes I’ll set a reminder.


Setting timers is basically the main use case for any voice assistant that people use it for.


Curious — is the use case mostly cooking? Or is it prep for X mins before Zoom calls?


I use timers for making tea, cooking and knowing when my car has finished heating up.

Unfortunately iOS only supports one timer at a time so I can't make tea while my car is plugged in.


Weirdly, HomePods do support multiple (optionally named) timers. Since they're also based on the general iOS family, there's a bit of hope for the feature eventually making its way to the phones. I'd assume it's held up on them having to redo the UI, if anything.


The spatial audio feature makes me think that Apple has some level of head movement tracked with airpods.

I'm imagining a system where you can just use nods/head shakes to move through some sort of binary decision tree to execute some basic interactions, like reacting to incoming alerts/messages.

new text, music pauses, siri reads it to you: "Your mother asks if you'll be home by dinner, Would you like to respond?" shake head no -> interaction cancelled, music resumes nod head yes -> "Ok, how would you like me to respond? Yes, you will, or no, you won't?" gesture head for appropriate response

Easy? Dumb/ridiculous? Sure, you can't get suuuuper deep with the decision tree and it's tough for non-binary responses, but it's enough of an interface to have a meaningful, non-verbal engagement with a computer.


The platform is AirPods, Apple Watch, and the yet to be announced AR product.

Just like with the M1, if you squint, Apple is testing and iterating in the open. The spatial audio is a good example but so is the Watch’s auto-detected hand washing countdown.

There are public sprinkles of this coming platform elsewhere, such as in Apple fitness workout HUD Rings widget. The proximity-based handoff is another.

The author is correct that Siri is not a good platform but for reasons they do not identify.

Voice based interaction model is weak from a UX perspective. But for Apple it is even weaker because the company is unable to use any of the unique advantages it holds over the competitors.

For example, apple’s array of services, control over the technology stack, reliable and secure intra-device communication, the App Store, the iPhone as a unified configurator, access point, update manager and biometric authenticator.

You can’t pull that stuff out of a hat.

Siri sucks. I have a few HomePods, use plenty of homekit and try to get the most out of it. But it is bad at almost everything it sets out to do.

Siri is clearly not the focus for the company, and if anything it sent competitors scrambling to own a space Apple doesn’t even want.

The interaction model includes physical hardware, like the big crown on the new APMs, but I suspect it is likely going to be based largely on eye movement. Something not too twitchy.

The enormous amount of sensor data from watch and AirPods are like the gps and gyroscope of iPhone. Apps can require either or none.

So I think the author is right that AirPods are important but they are not the center. They are a component of the next platform.


In my view, it's a mistake. Apples disinvestment in AI is why I typically pick the flagship Google phone over the iPhone nowadays.


I wonder if we'll ever get technology that can interpret very, very quiet vocalizations, almost a subvocalization.

That would definitely be something I would buy on to.


There's been a fair amount of research into this, eg: https://news.mit.edu/2018/computer-system-transcribes-words-...


I agree, but don't throw the baby out. One of the most useful uses of voice I've found is using a Fire stick with Alexa. Just press the mic button on the remote and say a actor/movie/tv series and it presents everything it finds from all your subscriptions. It's rare it doesn't understand and as a Glaswegian that's impressive.


This is true for me, on Roku. Considering the context, that's far from a platform, it's not even an interface. It's just an alternative input to the physical tv remote interface. If I were to simply re-task a tablet or old phone as my TV remote (easy to do with Roku), I'd stop using voice, and get much more out of the second screen.


> Most of the time, if I'm wearing headphones, it's so as to not disturb others around me. Otherwise I'd play it out loud.

Maybe that's the key to its network effect as a platform. If you don't want to hear everyone talking to themselves in the library, you'll need noise canceling headphones too.


Noise canceling headphones don't seem to do much for people talking in the background, or really any sudden loud noise.


In science fiction, the step before mind-machine interface is sub-vocalization. What would it take to allow an interface device to be able to hear you but others not?


I think sub-vocalization-driven interfaces could be amazing, I was super excited when I saw this research (posted it elsewhere in this thread too): https://news.mit.edu/2018/computer-system-transcribes-words-...


This is what I want:

When golfing, I want to keep track of how far I hit the ball, the club I used, and where I landed. There are apps for this, of course, but I can't use them. By the time I've arrived at my ball, I'm not going to stop, take out my phone, and start fidling with UI controls to select a club or confirm a location.

I would love to have an app that let me keep one AirPod in my ear, and allow me to track my golf game. The UX would be something like this:

1. Arrive at course, and use phone to select the tees and confirm the course I'm playing. Start the round.

2. Tap my AirPod and say, "Teeing off on hole 1 using driver"

3. Hit the ball

4. When I arrive at my ball, tap again and say, "hitting seven iron"

5. When I sink a putt, tap and say, "next hole".

From just those interactions, the app could keep track of every shot, and also keep my score and number of putts. I could choose to not announce each and every shot if I wanted to, and instead say, "add three strokes" once I'm done with the hole.

I could also ask, "How far to the middle of the green?" and get a distance in my ear. "What did I hit last time on this hole, for this shot?" (Answer: "You used a nine iron, and hit it 107 yards")

All that would be killer for me. Nicer than staring at my phone screen in the sunlight, and looking like I'm farting around to the players waiting for me to clear the fairway.

Anything like this exist today?


that sounds like a fun product to work on! it seems there are some apple watch golf apps already https://www.golfpass.com/travel-advisor/articles/apple-watch... and some apps specifically mention leaving your phone in your cart and using bluetooth audio queues https://play.google.com/store/apps/details?id=com.freecaddie... but none that specifically use bluetooth audio / cues as the main input method?


I don't think there is any Siri integration yet but disc golf has an app with some of that functionality in the uDisc app. You can "map" a round and it can track all of your throws. I think you can input disc (club) selection as well? No AirPods integration like that but it does have Apple watch integration I think.

[Blog post announcing the feature](https://udisc.com/blog/post/picture-this-udisc-unveils-map-s...)


Doesn't seem to be anything to do with airpods, and a phone with a hot mic in your pocket should be all you need for this.

An alternative solution would be a smartwatch app which would spare you chanting out on a course.

You might also want to check these: https://www.wareable.com/golf/best-golf-wearables-gps-watche...


Check out Arccos! Requires you to add little sensors to the butt of each club, but it’s seamless as far as tracking goes.


Has anyone actually TRIED using the AirPods in some form programmatically? It's impossible, you lock down your phone TouchID/FaceID and attempt to do a simple list by voice and "you need to unlock your Phone first" ... there is no trust in the pairing of hardware. I wanted to do a delivery platform based ONLY on the AirPods for all parties (pickup/dropoff/billing) but it's just not possible. I hope it changes in the future to reflect some of what is in this post.


The Apple Watch is a lot better at this. It is considered trusted because it locks as soon as you take it off your arm


Given that airpods already know when they get removed form your ears, I wouldn't be surprised for the next iteration to have a form of "unlocking" once you unlock your phone or watch while connected to the airpods. There's no reason for you to not be able to do tasks as long as you don't remove the airpods.


The Apple H2 chip - featuring EarID ;-)


Voice fingerprinting (ideally done locally) would also pretty much solve the issue.


Author's site appears to be slow/unresponsive; cached copy is at http://web.archive.org/web/20210216181726/https://julian.dig...


This starts to touch on one of the reason every "personal assistant" is crap: they're basically just a verbal command line.

Their only real value proposition is basic interactions when the user's hands and/or eyes are busy with something else.

Until they can get smart or fluent enough that they can rival the effectiveness and accuracy of hands-on-screen interaction, the use cases will remain fairly niche.

More programmable, multi-modal interaction is a step in the right direction, but it'll require a lot more.


Pro tip: You can shorten voice commands for Siri, Alexa, et al.

"Hey Siri, what's the weather today?" ==> "Hey Siri, weather"

"Hey Siri, set a timer for 10 minutes" ==> "Hey Siri, 10 minute timer"


Airpods (or any modern earphones) are absolutely the first really useful AR product available to consumers.

Transparency mode is a critical success even if it’s so boring as to be unremarked upon by most people. An AR device is most useful if it’s ubiquitously available. Transparency mode makes that possible (even if it could use improvement). The device also needs to avoid a negative social stigma. Airpods have largely achieved that as well (at least along younger people). That is partially marketing/brand image - but it is also based on utility. Older folks would generally find it rude to leave headphones in while having a conversation because they assume the listener isn’t listening, but I work with a lot of teens, and it seems like they couldn’t care less. It’s understood that the speaker can still be heard. A quick tap/squeeze is the social signal.

The author is right that there is huge untapped potential in auditory augmentations, but the focus on verbal input control is misplaced. It’s simply too obtrusive for public environments.

If I were betting, I’d say Apple won’t open up this kind of functionally until the (cross device) input control is generally codified, and that scheme will be intrinsically linked to a forward facing camera/sensor package to provide contextual awareness and implied user attention & intentions (i.e. glasses or similar).

Working on AR UX would be incredibly exciting.


Quality comment and helped me think about Airpods in a different way - thank you.


It’s fun stuff to notice and consider though. Anyone have a good AR slack/discord/... to pass along?


> A quick tap/squeeze is the social signal.

It does leave room for improvement. Perhaps some visual signal that shows that you can hear your surroundings clearly would be better.


I don't get what this has to do with AirPods exactly compared to other Bluetooth headphones. Besides the author using a lot of Apple words it seems like they're proposing a new product that has nothing to do with Apple.


"The most obvious choice here is Siri, which is already integrated into every pair of AirPods."

- Not entirely true. It's the pairing of the device with an iOS or capable watchOS device.

"Why has no one thought about additional buttons or click mechanisms that allow users to interact with the actual content?"

- It's called a smartwatch (The pebble was really nice at this). or generically bluetooth radio controls.

I wish these design analyses talked about the material input costs needed to produce the thing we might perceive as a 'platform'. I just see more batteries, wear, expense, etc.


The newer ones support ‘hey Siri’ which happens in the AirPods, otherwise they’d have to continually run audio to your phone.


I agree with the author about the potential of audio interfaces + some simple additional inputs + integration with certain apps.

For me personally there is a suite of tools involving audio books and note taking that would change my life: A remote with a few physical buttons to rewind, switch to record-mode, skip sections. Speech to text with full text search. Voice recordings tied to what I’m listening to. Basically, I want to be able to work through a difficult audiobook while walking around.


That was a lot of words to say “what if the AirPods had a programmable button so apps could favorite songs or add a bookmark?”


Not that patents are predictive of what will actually make it into the product, but Apple does have at least a few covering hand gesture and other inputs to AirPods:

https://patents.google.com/patent/US10757491

https://patents.google.com/patent/US10873798


> Why has no one thought about additional buttons or click mechanisms that allow users to interact with the actual content?

Microsoft added PowerPoint forward/back control to their earbuds.

https://www.businessinsider.com/microsoft-surface-earbuds-pr...


It could end up being quite hilarious if Apple ends up using various "signals" like biting your teeth, making grimmaces and weird noises to control invoke commands.

You'd no longer be sure if someone is having a seizure or trying to stop the podcast she is listening to.


You joke but clicking your tongue or tapping your teeth would be innocuous (if even noticeable) to others.


It makes sense considering Apple is all about diminishing physical technology (iMac: the screen is the computer). AirPods, Watch or something else as tiny or completely invisible will be the next platform, once they solve the performance and UI problem, which they will.


> Apple is all about diminishing physical technology

That's an interesting perspective. Theoretically, the best way to get rid of hardware is to move as much as possible to "the cloud"[1], yet Apple isn't very good at cloud. (At least, not as good as Google and Amazon.)

So let's say we're headed to a future where the only physical electronics anyone owns are wearables: watch, glasses, ear buds. No phones, tablets, laptops, or desktops. Just wearables.

In that scenario, who wins? Apple is best suited for making that hardware (by a long mile), but Google and/or Amazon are better suited for handling the software in the cloud.

I'd place my bets on Apple catching up on cloud faster than Google or Amazon catching up on hardware.

However, if we took it a step further and went full Mana[2], tapping right into the nervous system, my bet would be on Google winning that one. They have the cloud capabilities and expertise, but Alphabet also has some experience in health and biology (if I'm not mistaken).

--

[1] I know, I know. "Cloud" is just someone else's computer. It's also more than that.

[2] https://marshallbrain.com/manna1


I think apple is very good at cloud. They don't sell cloud services so comparisons to GCP or AWS is unfair, but their cloud integrations are pretty top notch from my perspective. My phone backs up automatically. My photos are available on all devices with the swipe of a single slider. iCloud is so tightly integrated with their products that a lot of people don't even know they are using it. I think that's a pretty good implementation of cloud.


I'm not saying Apple is bad at cloud, per say. (Although I think there's an argument to be made there.)

Rather, I'm saying they're nowhere near as good at it as Amazon or Google, and I anticipate that this gap is only going to grow.


I would argue that Apple has fewer cloud products, but most of their cloud products are very good. Amazon and Google have many cloud products of varying degrees of quality.


That's a fair and accurate assessment, I think.


> Apple is all about diminishing physical technology

That's an interesting perspective

It's Steve Jobs' perspective. He talked repeatedly about technology disappearing into the background, and one day we would have technology so good that we wouldn't even see it. It would disappear into the walls.

To me, it's the ultimate expression of making computers work for us, not the other way around, which is mostly what we have now.


I hadn't heard that, so I may be late to the party on this perspective.


> Theoretically, the best way to get rid of hardware is to move as much as possible to "the cloud"[1]

An alternative view is that Apple’s biggest product puts enough compute in your pocket to make “cloud” unnecessary in a lot of cases. My phone is somewhere between a t3.medium and a t3.2xlarge (based on ram and cpu cores respectively). That can provide a lot of local compute for my wearables. And those wearables are gonna need network anyway, so either that all end up with 5G cellular radios (and 4/3G fallback) and enough battery to run that, or one device provides the network hub and the tiny things in your ears and the glasses sitting on your nose can have lower power radios and smaller batteries.

I reckon watches/glasses/earbuds(/cars/tvs/etc) all relying on a phone is a reasonably sensible model, rather than each of those devices having completely stand alone capabilities.

(And, the idea of Google tapping into my nervous system??? No thanks... I’ll proudly be a data center smashing neo-Luddite before that happens to me...)


I really enjoy the data addons for IPad and Apple Watch because they allow me to be connected without having a phone in my pocket. What I desire is connectivity (be able to see messages, make calls, etc), but iPhone often encourages disconnection via mindless scrolling. I enjoy the times I can get away from my iPhone, and I am not excited about this hub future you describe.


> Apple isn't very good at cloud

Apple isn't very fond of cloud, which a lot of people appreciate.


I actually think Apple is about integrating different hardware to create unique interface experiences which are cross-device. I suspect it's not going to be 'one technology' any time soon, it's going to be lots of devices working together seamlessly.

For example, if I am playing music on my Airpods from my iPhone and my phone is in my pocket, turning the crown on my apple watch is a really neat and intuitive way to change the volume. The first time I did it and it worked it felt like magic.

Similarly walking down a street and getting audio directions on AirPods almost works - but if that's combined with a small map on my watch it works much better - better than a phone.

But at the same time, neither a watch or Airpods are going to be the right way to send a private text message on a quiet bus, and because a giant new unifying technology isn't with us yet, I suspect a hybrid approach is going to be with us for a while.


As far as I am aware, no one has been able to make a successful Voice First product. Some products, like Spotify or Audible, are greatly enchanted with Voice, but voice is just another access point.


There's a successful Windows app called VoiceAttack that will let you use voice instead of controls ingames. They also offer voice packs spoken by famous actors that are popular. Having Kirk's voice update you on the status of your space ship is enticing.


I'd be happy with being able to program in triple tap actions as well as the double tap.

Two commands (and that's if you've even got both in) is not enough!


Could it it detect different kinds of ear pressures, you could control it that way using your air passages. Two big sniffs to skip a song.

Copyrighting this right here btw.

I'll take 10% of all future sales please.


If you're taking the piss I don't know where I've tripped up from your comment. Is there something infeasible about triple taps? If you jailbreak you can do it..


No I'm serious, wasn't reflecting on your comment. Im not a doctor so not sure if air pressure in the ear canal could be detected and how much a human can control it. But it would be easier than voice or touch from the users point of view.


Apologies. Wound too tight today.


Eh I think that interface stinks. It's just physically very uncomfortable to tap on the headphones in your ear because of the loud noise and the headphone pressing deeper into your ear. I have a knockoff bluetooth "Pods" and I absolutely hate the triple tap.


A trick I found out is you can tap on the back of your ear and it also triggers the tap action without banging the thing in to your ear.


It would be cool if you could shake your head briefly when a new song comes up on Spotify, signalling that you don't like it.


The thing he wants a hardware button for is probably only used by .01% of users.

Edit: Removed the points that he already addressed after seeing comments below.


His footnotes begin with:

> The input mechanism I describe doesn’t have to be a physical button. In fact, gesture-based inputs might be even more convenient. If AirPods had built-in accelerometers, users could interact with audio content by nodding or shaking their heads. Radar-based sensors like Google’s Motion Sense could also create an interesting new interaction language for audio content.


...and the 2nd one:

> You could also think about the Apple Watch as the main input device. In contrast to the AirPods, Apple opened the Watch for developers from the start, but it hasn’t really seen much success as a platform. Perhaps a combination of Watch and AirPods has a better chance of creating an ecosystem with its own unique applications?


All AirPods actually do contain an accelerometer, and the Pro contain a gyroscope as well, however I'm not aware of them being opened to developers, so they're still not of use to anything unless Apple decides to open them up or implement the feature themselves.


Apple has them in the pencil too and they kind of suck. For the AirPods it’s not too bad for the double tap action but on the pencil it’s horrible. It’s such a difficult action to make when using it and I trigger it randomly all the time. Would much prefer a Wacom style button bar.


hug of death, can't load the site




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: