I don't think he touched the scroll wheel in the keynote. Why is it there? What does it do? I found it awkward he scrolled by touching the screen across the damn scroll wheel. Just use it. At least in the keynote.
And then there are so many questions about the hardware capabilities of this device:
- where is the inference running? I don't believe it's on the device. And if it's in the cloud then why make the claim it is under 500ms? Is that just a "don't you guys have low-latency 4g at home?" moment?
- how is that tiny camera capable of parsing a small-text table with 100% accuracy? the optics just don't allow it to.
- what's the battery life with that kind of usage? If inference is running on device it must be very low. Just thinking that my GPU pulls 50-100W on average (with spikes to 200W) just to suggest code I still don't think rabbit is doing anything on device. If it's cloud based then 4g is also a battery destroyer. Maybe that's why the device is so big: huge battery inside.
The "teach" session was definitely cool. But at this point it must be magic because there's no way that thing browsed to a discord server, authenticated with hallucinated credentials and it just worked.
From
> rabbit OS operates apps on our secured cloud, so you don’t have to. Log into the apps you’d like rabbit to use on your system through the rabbit hole to relay control. You only need to do this once per app.
it seems that this device doesn't run anything locally, everything is in the cloud.
> it seems that this device doesn't run anything locally, everything is in the cloud.
For apps w/o a web interface, presumably they have an Android emulator running somewhere and they controlling it with their LAM and feeding the results back to end users. They can even feed sensor data from the Rabbit phone (e.g. GPS) to the emulator.
It is a brilliant solution to the problems they are facing, but their cost model must be extreme. "No subscription required" means I am wondering what their "pay as you go" pricing will be!
You will pay with your data. Read their privacy policy. They can track you, collect your private data, they have access to your accounts. As long as it's closed source and cannot be self-hosted, it's a big no-no.
I have the "two back tap" accessibility feature setup to start chat gpt voice. It works great. I just two back tap and look at the phone and its ready to start listening.
I tried that back tap thing, surprised how poorly it worked to be honest maybe there is a knack to it.
Thing is these AI devices will become actually interesting when they start being observers in your life too and pre-empting rather than the siri style ask and response and that just isn't possible on a locked down OS like iOS.
My really old LG android had a wake on doubletap screen. Worked really well. All newer phones have failed misserably or completely missed the function. Still miss it, was much more responsive than any new phone I bought.
And then immediately be squashed by the Apple App Store.
The answer is absolutely to do these types of products on the devices we already own. There is only one thing stopping this type of innovation and experimentation.
There’s something about eliminating the friction of calling up the assistant. For example, I’m not using Google Assistant on my iDevice because I need to go unlock the device. It’s really a system level issue.
>I don't think he touched the scroll wheel in the keynote. Why is it there?
Same vibe as Humane not using the projector for almost anything and then explicitly saying "You dont have to use it" in an interview. Like why add it then
It's using a Mediatek Helio P35, a SoC from 2018. It's slower than an iPhone 6S.
There's no 5G support either, unless they've purchased an external modem. So even if you do live in an area with low-latency 5G, you aren't gonna get it.
I wouldn't be surprised if this turns out to be running just a cheap Android fork that's locked in kiosk mode with a single app (or webapp).
Thank you, this is an important detail. Teenage Engineering isn't mentioned anywhere on the linked page, so I was confused about the HN title. I figured Rabbit was a new venture by TE.
But it would be an omission if we didn't also include the fact that TE hardware often has weird flaws like OPz cases never fit together properly even years after release and the more recent problems with shipping the ko2 sampler and broken faders.
From a maker perspective, TE products(they have quite a bit more than I remembered!) still are frontal sketch in not quite Sony style with added depth, leaving something to be desired in manufacturability, durability, and usability.
e.g. in this very device, the thin stem of this scroll wheel implies it's actually shaped like LEGO minifig heads, like 中, which doesn't inspire confidence wrt lateral plays. It could easily be made 土 or 王 or even 凸 shaped internally with only the middle part exposed, which not just eliminates play, prevents debris ingress, but also makes areas around wheel edges hidden so quality control around those areas are less of an issue. Or, the shell seen here is shaped like a bathtub with a lid closed at the backside, which means the cross section of the tub has to be ever so slightly tapered towards the front, which affects grip when picking up. And, the backside has no bumps nor there are creases in the side, which further compromises pickup. Or eg the painted wheel. It's going to be PITA to paint.
Granted I'm biased feeling always defeated by superb aesthetics of their products - but their design always seems, how do I say, not as deeply connected to engineering ideals just as it is with aesthetics as seen in Sony or Apple products.
I undertood probably 20% of what you wrote, but I think you have some good points on TE being more interested in the aesthetics of engineering that engineering beautiful aesthetics.
I think the hardware is just a gimmick. It gives the proposition more 'body' and is great for marketing.
But really, this is screaming to be an app. There is literally nothing this thing can do that the phone in all of our pockets can't. Especially integrated with our smartwatches for ease of use. So why have a separate device for it? It makes no sense.
I think the hardware is just there to make it not 'just another app' in the press, really. And that the app version will come soon after the physical product releases and will take over 99% of the installed base.
And of course if this takes off they'll be quickly acquired by Apple, Microsoft, Google or Samsung. I bet that's what they're aiming for.
Their LAM is actually a bit scary, why even develop mobile apps when they are just a human interface translation to a machine interface. The LAM will do it for you
I don't think the hardware is the main product. I think the AI is, but they didn't want to be "just an app" they want to be the first OS for the new way of computing. So they designed a new device. I wouldn't be surprised if they open up to OEM's to start making all kinds of devices.
I’m extremely bearish on this app. But Pininfarina basically saved Ferrari, revived Ferrari’s little brother Maserati and gave life back to Volvo. If done right (clearly defined roles, collaboration, design maturity etc) it works well.
I'm not sure people want an LLM assistant that can actually do things like spend their money.
A minor convenience when it works well. A major inconvenience when it doesn't. (And it doesn't take a lot of imagination to come up with nightmare scenarios.)
Love the gadget, though. I feel like I want to eat it or build a Lego castle around it, or just look at it more.
How do you know the LLM is going to do what you confirmed?
There's a fundamental tension here: either you limit the LLM to a set of fixed actions that a user can individually understand and confirm. Or you let it figure out what to do and how to do it given a higher-level goal.
In the first case, it's limited and not really better than a well-designed site or app. In the second it's powerful but can run amok.
E.g., did it really fill out that four page order form for that e-bike you asked it to order? Maybe it uses the debit card instead of your credit card and your checking account is overdrawn. Maybe it sets the delivery address to your brother's address. Maybe it orders two bikes or 10 or the wrong model or wrong size.
OR
It asks/confirms each step of the way so there's little chance for mistakes.
This is an inherent problem with any kind of delegation. You either micro manage the details or trust your agent to get them right.
The whole system isn't an LLM. At the point they are asking for confirmation, they've already parsed the required information and handed it off to normal code. It's not going to change again.
Ultimately, an LLM only has the capability to take a string as input and output another string. Any other functionality (Uber, search, travel, etc) has to be manually programmed in via APIs. For the foreseeable future, a fixed set of actions is the only way to do this reliably and is what this device does so talking about the second case of an LLM run amok is a moot point.
They can call it whatever they want but it’s essentially IFTT + DOM scraping /automation (“teaching” lol) for places without an official API with a super thin AI veneer. Mostly for suggestions.
May be because "change this" is vague and more of a broad function and may need more clarifications. Users may have more clearer commands, like "cancel this", "I don't like this, provide an alternative" or "can you go back one step?" etc. User can just speak those requests out loud since device is listening.
$199 with no subscription? You must have to bring your own data sim card. Or can it get its connection from your phone? And is the AI stuff running locally on the device? Impressive if so. Suspicious if not; there's no way the service remains free forever.
Pretty compelling price, and I'm certain the vision of AI agents that can use any existing app or website to take actions on your behalf is the future of computing. But there's no room in my pocket for a second device. I don't see how a device is going to succeed when an equivalent app for existing phones seems around the corner.
It's indeed suspicious. You're sending your voice samples, your various services accounts, your location and more private data to some proprietary black box in some public cloud. Sorry, but this is a privacy nightmare. It should be open source and self-hosted like Mycroft (https://mycroft.ai) or Leon (https://getleon.ai) to be trustworthy.
It's not a matter of "catching on". There are smaller models and they have been looking into putting useful models on device from day one.
It's just the fact that the models that can currently run on a mobile device are not effective enough. It is about the most memory and compute intensive type of application ever. In particular, models that can reason or follow instructions reliably and for general purpose are too big to run quickly on mobile devices.
They are putting models on phones, but they do not have general purpose assistant capabilities.
> the industry is only starting to catch onto on-device models language models.
i mean there are technical limitations and tradeoffs to running LLM-size models locally. doesnt help to ascribe it to lack of foresight when it is a known Hard Problem.
In the linked keynote, Jesse Lyu mentions that LLM won't help us actually do tasks - there are currently no so-called "agents" that do something simple like book a flight - the best way to do it is still to click the buttons yourself.
Rabbit means to solve that by creating a "LAM", a "Large Action model", which is a service by Rabbit that will click interfaces for you. I'm not sure this is the right approach - if it is successful, it will lead to more centralisation around Rabbit.
I agree this is a problem, but I feel a better approach would be to have a market of agents that for a small fee actually handle the whole transaction for you. So there might be multiple parties that say they can buy Delta Flight DL101 tomorrow 21:10 for various prices - some might be a service like the Rabbit LAM, others might be booking platforms, and there might even be airlines themselves. And now an agent-concierge that you choose once at the start will look at all the parties, and then pick and buy the right flight for you. This will make the problem a problem of an open market, where good speedy service is promoted, and prices get ever lower. And if the Rabbit LAM gets outcompeted by an ever better speedier solution, that would be a good thing. (This will also allow us to move away from our current dreaded attention-based economy where e.g. a booking websites tries to exploit your required presence during waiting times, which the LAMs would also solve, but, like I said, let's not move towards more centralisation.)
> Rabbit means to solve that by creating a "LAM", a "Large Action model", which is a service by Rabbit that will click interfaces for you. I'm not sure this is the right approach - if it is successful, it will lead to more centralisation around Rabbit.
The LAM is a genius hack to get around the thousands of closed gardens that apps have created.
It also may have been easier than teaching an LLM how to make tons of API calls, and if done right I presume their LAM adapts to UI changes, vs writing integrations against breaking / deprecating APIs.
90% of use cases will be covered by an official API.
They’ll cover the other 10% with “teaching”. Essentially you telling the AI what the lazily written markup actually means. Then they save it into an automation template. QA teams have only been doing that for the better part of 3 decades.
I know a company that employs a building of a 1,000 people doing nothing but performing 1 click. So they put a human in the scraping
/automation loop so they don’t violate the site/services TOS.
Uber wants people in its app, they want to show ads for their subscription membership services, and they want to upsell you on services, and they want you to see sponsored restaurants first when you order food. Uber wants to own the relationship with customers, so they can ~exploit the customers more~ extract more value.
VC backed and publicly listed companies need endless growth, user-centric systems like what Rabbit is offering break those business models apart. Which is why I predict everyone is going to be fighting super hard against making UIs that just get shit done.
Watching the keynote, I found myself thinking how unhappy Uber would be with skipping over interacting with them entirely: there's no "Uber experience" you have when you're in the car, so what do you get from Uber that any random company with a tie in to Rabbit can't get you?
Option 1: a shift in devices/model like Rabbit pull the magic carpet out from under companies like Uber, and everything becomes purely transactional.
Option 2: rabbit-like market creates exclusivity-based need, to ensure Uber is the number-one (or only) rideshare choice, so it doesn't matter that customers aren't "experiencing" Uber. Uber relinquishes the experience to the agent (unlikely).
Option 3: Uber et al wage war against agents and make their use impossible
But if we're not careful this will circle back to apps/silos.
What I'd like to see is the Smalltalk approach: data providers that are able to send/receive messages, and can be connected together to achieve a goal. Even better if the connecting is done by the "machine" after I issue a command.
its been such a long year, I still remember the month of gpt...what was it, not gpt4all...gpt...ah whatever. The "running an LLM in a loop will solve it" approach. I'm not a big fan, I'd need to see something truly transformative.
This seems to be a Langchain wrapper, where the Langchain is a prompt + retrieval based on a few documents.
> Rabbit means to solve that by creating a "LAM", a "Large Action model", which is a service by Rabbit that will click interfaces for you.
https://openadapt.ai is an open source app that runs on your local machine that clicks interfaces for you -— but only for repetitive tasks that you show it how to do.
QA teams have been doing this sort of stuff for decades. With a little know how and an hour you could record a user doing something in the DOM and play it back. There’s no magic here.
This is exactly what Siri wanted to be. Compare with the original Siri keynote https://vimeo.com/5424527
I can't find the exact ~2010 article from before being bought by Apple. I remember in an interview they were talking about making a web agent that could operate and perform tasks on any website, to avoid being locked out by APIs.
I'm very interested in what Apple does with LLMs on iDevices.
They have the right hardware for it and they have all the motivation, with their focus on on-device processing. OTOH they also have a pretty bad history with their AI assistant.
I've been surprised that Apple hasn't done more to keep pushing beyond the app boundary. The primary pitch for Rabbit from the keynote is basically "it's a layer that sits atop the broken model of modern phones." I think we all agree that the 'evolved state' of phones and apps is disappointing compared to where it could be / where we expected it would go.
They (Apple) are now in the position of being seen as laggards, caught with their pants down by companies releasing products whose core conceits are built atop inefficiency of their "core" models.
For both Humane and Rabbit, it's hard for me to imagine that there's enough "there" there for these to be beyond niche products that don't get starved out by the border-expansion of Apple / Android over the next few years … but I would have also guessed that A+A would have been further out front of this.
What's the advantage of this vs a smartphone? Realistically, are you going to carry two gadgets with you? I feel like a lot of people don't like the smartphone, because, being a general purpose computer, it takes away the excuse to buy all sorts of different gadgets.
Yet people readily pay $99/year for apps all the time. I think the reason this isn’t an app is because the gate keepers don’t want competition. Looking at you Siri.
Do they really? I don't even pay that much for my entire O365 suite.
The most expensive app I pay for is Telegram Premium at 30$ per year. And that's really worth it for me for the translation otherwise I would have thought twice.
having a purpose built device reduces number of clicks and other friction. currently phones are hostile to AI interaction (not that I'm entertaining this specific device)
I read the comments before I watched the video and I'm actually way more optimistic about this now than I was expecting to be.
Yes, google or apple will probably ship LLMs on their phones eventually and make an overall more compelling product than Rabbit _if_ you're only willing to have one device.
Considering the relatively low price (and assuming they actually deliver it in a timely fashion) I can see this being useful as a pocket assistant between now and whenever google et al eventually catch up, and potentially past then if rabbit keeps innovating.
This is a pretty neat device. I think they might be more successful if they targeted the anti-smartphone crowd, but I can see why they would want to chase the bigger market.
Lyu hit the nail on the head when he said that smartphones are mostly for entertainment/wasting time these days. I definitely know of friends who want to (or try to) go smartphone free, but the apps mentioned in the keynote make it a large inconvenience. A $200 device that offers much of the convenience of a smartphone without the distraction sounds like a good fit for that crowd.
I don't use my Playdate as much as I'd like but I just love clicking around in the menus, pulling out and docking the crank. Even just how it looks sitting on my desk, with the bright yellow plastic and bright purple flip cover. It's a beautiful piece of hardware.
I see this as less of a competitor to a smartphone, and more of a competitor to a Light Phone [1]. Seems like it might help folks avoid doom-scrolling in a similar way, while providing a higher level of functionality.
Yeah, the price difference between this and Teenage Engineering's own products is wild! It's worth noting the Playdate (by Panic) is designed by Teenage Engineering and also just $200. https://shop.play.date
I need to start my teenage engineering collab collection. Such wonderful and refreshing experiences. I’ve used my friends playdate but never took the plunge and it’s wonderful, haptics and everything
I feel like the R1 is getting a lot of weird hate here on HN. I get it, it's a bit pretentious and wants to be very Apple-like, yadda yadda. Probably not that much value generation, it's overpriced, and no one wants to carry another thing in their pocket. And yes, the company will likely go bankrupt (or pivot into something less exciting). It's also all in the cloud (no local inference) so it doesn't really even "do" that much. But seriously: this is (or should be) the future of computing. The fact that things like OS self-organization, system-wide (+cloud) search, basic automation, etc. is so broken on modern operating systems is just plain bizarre.
For example, if I wanted to do a simple find-and-replace on my 2021 M1, I'd have to use something like grep (a tool originally written by Ken Thompson in 1973—a half century before my laptop was built) and look up flags and syntax I inevitably constantly forget. We should have been past this at least 15 years ago.
Assistants are cool, but I want better tools. Give me a timeline with all my emails, chats, documents, and pictures shown together. Full text search on my entire browsing history. Exporting my Booking.com search results as a spreadsheet. Controlling my phone from my desktop, and vice versa.
As a bonus, those capabilities don't require uploading your data to the cloud or risking LLM hallucinations.
> Exporting my Booking.com search results as a spreadsheet.
Yeah except they actively don’t want you to have access to your data. They want you locked in to their shitty platform. And these companies will actually fight your ability to get data out, like Reddit or Twitter when they shut down their public APIs.
In theory yes, has anyone built this tool yet? I’d love an LLM<->browser agent that can do something useful besides summarize Yelp reviews. I’m thinking something that, for example, accesses the company intranet and finds me information from my paystub without me doing anything.
I want easily pluggable/connectable systems for everything. I want to see all my messages and all my contacts and all my stuff in interfaces I choose. The future is definitely not searching email, Whatsapp, Signal and Instagram to find the message my friend sent me the other day.
> As a bonus, those capabilities don't require uploading your data to the cloud or risking LLM hallucinations.
Ignoring that half those services are dependent on the cloud already, this is such a bold and hard problem. A solution would be magical but LLMs are our best chance.
Without clearly defined protocols (eg SMTP, CalDav) and a ton of boring to write and use case specific glue code, you’re really dependent on LLMs to make those M*N connections from one service/format to another. An LLM can turn any current or future input format into any current or future output format. That’s really powerful and basically unsolvable at scale with the tools we have now.
The strangest thing about AI is that a lot of it's value comes from the fact that it's better at using the tools that we designed for humans than we are. It's essentially an admission of failure.
I'm not sure what conclusions to draw from this observation however.
I would argue that it’s not that we’re bad at using tools that we designed for humans, but more that multiple shifts in the tech world over the last 50 years have intentionally (and unintentionally) broken those tools. AI does not need to use the tools at all, it knows the knowledge that the tools gate-keep and it can write code that does the data manipulation that underlies the most powerful software.
It essentially side-steps the whole mess that is the current tech industry (and to be honest a bunch of other industries).
Automated programs are not limited by stamina, attention span, sleep cycles, etc. so I don't think AIs being better at something is necessarily an indicator of failure for humans.
Dogs were used to rotate roasting spits in Britain during the Early Modern era: https://en.wikipedia.org/wiki/Turnspit_dog Does this mean humans were inferior to dogs at the task? Absolutely not. It was just a more efficient alternative based on what was available to them at that point in time.
I think that was always the dream, even in early 20th century sci-fi. It's not a failure, it is a success.
However, in my opinion it's interesting that the dreamers underestimated the difficulty of robotics. Almost everyone thought that the manual jobs will be the first ones to be replaced. Right now it seems the other way around.
Every time I need to use google to change settings in app like facebook I constantly wonder about that. Where is my ai assistant when I'm just gonna tell him what I want? Why do I need to end on Reddit through Google when random internet user answers me how to change even the things that should be the most obvious to find?
> if I wanted to do a simple find-and-replace on my 2021 M1, I'd have to use something like grep (a tool originally written by Ken Thompson in 1973—a half century before my laptop was built) and look up flags and syntax I inevitably constantly forget. We should have been past this at least 15 years ago.
Pretty much any text editor worth its salt has find and replace, regex compatible, features. Even the extremely rudimentary TextEdit on your MacBook supports it.
The fact that foundational tools like grep still work over half a century later is a feature, not a bug, of our field.
How many cloud based services (most if not all of which rely on those foundational half-a-century-old building blocks, by the way) that are pushed to customers today will still be around in half a century?
Even a typical Product Manager, let alone a regular layperson, would have no idea where to start with regex, so it's a super weird thing to bring up. Of course grep is a great tool, just how the hammer is a great tool (invented around 3 million years ago); but we've moved past that. You can't use a hammer to do EUV lithography.
A good example of something I was doing today. Taking some text and formatting it as a markdown table with proper spacing. Copilot made it super to highlight the line, type out what I wanted, then kept moving on. Manually it was lots of adding spaces, deleting spaces, and adding |s.
> * For example, if I wanted to do a simple find-and-replace on my 2021 M1, I'd have to use something like grep (a tool originally written by Ken Thompson in 1973—a half century before my laptop was built) and look up flags and syntax I inevitably constantly forget. We should have been past this at least 15 years ago.*
What does the oldness (and long usefulness) of grep have to do with the problem you described, which is the very hard problem of translating natural human speech into machine commands?
I think the hate is that the website and the promotion videos are too advanced to the point where there were no MVPs, progress, or found limitations in between. On the other hand, the terms they used are too strong to be true. Are they building an OS from the ground up, or just calling ChatGPT API to do stuff and calling it an OS?
And of course, looking at Linkedin, the founder(s) are NOT convincing either.
I mean… sure, but wanting something and getting it are different things right?
We agree that those things are things we do want, but do you really think we’re getting any of them here?
I think it’s reasonably legitimate to go: cool idea, I want it on my phone, as an app.
There is no reason for this to be, and indeed, mostly isnt a hardware device; it’s mostly a cloud service with a mystifying and useless custom piece of hardware tacked on the edge.
You’re doing on device inference? Cool, sign me up, you have my attention.
You have a cloud service for AI integration with existing apps? Mm… like, how is that different from any of the other AI startups?
If you differentiator is “my SaaS comes with a plastic box and no subscription fee” you gotta expect people to laugh, it’s a joke.
How in the world are they going to make money on this? Perhaps the hardware is cheap to manufacture, but with no subscription I feel like they are going to get taken to the cleaner on LLM API fees. Even if they're running the LLM themselves, it's not cheap. Could they really be getting enough margin on the device to pay their staff _and_ all of that infra?
EDIT: it's 100% the razor model, they want the device out there so they own your interaction with "service providers"; i.e. they take a cut of everything to do. Middle men.
"No subscription" definitely sounds good to get people in the door but it seems short sighted in the long run, can't be making much on $200 and the more it's used the more that margin is eaten. Presumably they will roll out a sub at some point for extra features but then you get the backlash of "You said no subscription".
Cool product but I agree with others that their site does not really demonstrate the value proposition of the device in a clear way. You shouldn't have to watch the keynote to understand what it is.
I also have a hard time believing it will work as well as they say it does gen-1 at a $200 price point. I'm very skeptical.
The teenage engineering design is nice though, looks like something out of the movie Her.
The value preposition here seems pretty weak. AI voice assistants are, in my experience, one of the worst ways to interact with a computer. What does this device offer that I can't already do on my cell phone?
Quick feedback if the founders are watching this: Please allow controls on your keynote video. The video production quality looks great. But I am not watching it if I don't know if it's a 30s one or a 30minute one.
Edit: My feedback is that it's too slow. I wish the features were shown off before I had to get a layman's education for several minutes. I'm also really concerned that his uber will arrive before the pizza.
I like the device. It seems easy enough for my elderly father to use, and he won't lose it as "one of the websites" he can ask for help. There's something important about the focus a physical device brings; he can remember how to use it, where it's at, etc. The GPT UI I built him gets lost in his bookmarks, and he forgets it exists.
A while ago I came across a website with a video player that gaslights you with a logarithmic progress bar. It had me fooled for a while into thinking the video has only 2-3 minutes left, before I finally clicked the pop-out button in Firefox and realized I was only 20% of the way through a 45 minute video.
Is there a Firefox extension for android that can take over the js crap that people call video players and give something decent? I know there are open in x but that isn't seamless. I just want a video player that is standard across the Board.
Like safari on iOS used to turn any video on browser to a standard video player window with normal controls.
Not the same website, but the video player was made by a Brazillian company called VTurb and they're using the same logarithmic progress bar on their front page. If you inspect element it's literally called "smartplayer-fake-bar"
Also please describe it in pictures and words too. Video only really limits where I can watch this. I’m in a waiting currently and I’m not about to just play a video like some sort of psycho.
> I’m in a waiting currently and I’m not about to just play a video like some sort of psycho.
Funny enough, I think them releasing only a video perfectly aligns with their product and why I think this will fail. They seem to have completely overestimated how often people want to read or write and underestimated the number of scenarios where needing to talk or listen to my phone is inappropriate.
I'm assuming if you had this phone, you'd feel like just as much of a psycho if you had to dictate every command to it while sitting in that same waiting room.
Teenage Engineering is always like this. The hardware looks really cool (aesthetically I mean; no value judgment on quality or such) but they seem to have looked at Apple presentations and though those are not pretentious enough.
- the keynote video isn't scrollable behind the giant "share" button
- the text on the keynote page is incredibly difficult to read (small text + very low contrast)
- the homepage is full of dizzying animations that don't tell me anything about what the "phone" is actually supposed to do
- "How it works" is just a set of animated cards. Once you figure out they're clickable (hoverable?), they spin around to reveal... half-formed ad copy? I wanted to know how it worked.
I'm willing to believe this is a real product only because of the "research" page, but it feels more like an aspiring web designer's portfolio piece. The flashiness of it all makes it really difficult to figure out what anything actually is.
I only skimmed the video, but I have to agree. The biggest problem I see is that I don't want to talk to my phone for a lot of things. If I'm having a text conversation or trying to look up something on the internet while I'm in a crowded room, I don't want to use my voice for that. If fact, while there are some things that I would find this helpful for, I would say the majority of the interactions with my phone are things I would actually prefer to have a keyboard for.
While this thing does have a touchscreen keyboard it seems to me like the biggest selling point of the phone is actually something I don't want to use most of the time. I can't really see this succeeding for that reason.
It’s a really interesting product that I’m not going to buy!
Declarative interfaces that allow you to describe what you want and use agents to go out to different services and chain them together is a cool idea:
I don’t want to spend time using dozens of different apps with different (often poorly designed) interfaces.
Having a push to talk hardware button instead seems less clunky than a “hey siri” key phrase (I use Siri dozens of times a day but unfortunately ‘raise to talk’ feature on Apple Watch has never worked well for me).
I’m curious how their LAM works with interfaces being updated- if they need to retrain with UI updates or if it’s flexible enough to be stable with UI changes and new features etc.
I currently use ChatGTP sessions to dive into various topics I’m interested in, and explore ideas- I do like the idea of dedicated hardware that would allow this, but it’s something I imagine I’d keep on the coffee table at home, I don’t want to get a dedicated SIM and data connection or carry around another device.
They’ve raised $30M and I wish them well, I hope they survive and have me as a customer in the future.
So many questions. Is this a YC company? Did they apply? Does anyone know anything about them at all? It just screams scam to me. Who would give them $200 for vapor? [Edit: I have been told the device has been demoed in a comment below, so perhaps it might be real.]
Let’s say Apple were to do this. Apple would take years to get it right but when they announced it would be with a date with an accuracy to within a certain quarter. And it would just work on day one. Of course it would be saddled with subscriptions, but it would be real.
It just doesn’t smell right that a team led by a guy who can’t manage to pronounce his own first name clearly could think through all the issues and have a coherent vision within a year of when LLMs really took off.
Another possibility my paranoid mind comes up with here is the team/founder caught wind of some half baked research from inside Apple and decided they would try to pull a fast one and whip out a product before Apple could. And by the way, again of course it’s vapor, until it’s not, but we are still at the “it is” stage.
> It just doesn’t smell right that a team led by a guy who can’t manage to pronounce his own first name clearly could think through all the issues and have a coherent vision within a year of when LLMs really took off.
Weird ad hominem. Your entire comment reeks of a bundle of biases.
A few colleagues and I were baffled about the launch. Anecdotally, between the group, we consume a huge amount of tech news daily and were all caught off guard by the launch. How were 10k sold in a day if no one's heard of this company before?
I was confused and skeptical until I saw Teenage Engineering involvement. They are an all star company that has never come even close to half-assing a product. Their gear and design chops are absolutely ace.
Everyone wants so badly to be first to market with the "next big thing" after smart phones. And all of them fail the "so it's a phone, but just less capable?" test.
A neat toy for hobbyists, but most people can't justify carrying two devices. And as soon as you add phone functionality to it... you've just made a new (more awkward) smartphone.
They’re all computers, whether you can make phonecalls with them is actually not that interesting.
My takeaway of their angle is: the companies making the apps and services that are used a lot often have incentives that aren’t aligned with their users. People want to see baby photos, Facebook wants to sell ads, people want to find information, Google wants to sell ads, people want to see what their friends are doing Instagram wants to sell ads. The “LAM” promises to wrap a lot of stuff under an interface.
The device itself is just the access point.
I like the angle. I don’t see it working with today’s quality of LLM understanding human intent nor with all of the siloed platforms.
I must be missing something here. I used to work on a voice assistant product and everything demoed is pretty bog standard stuff: play music, shop, order food, get Uber rides, ask random queries, recognize images.
This also runs into the same core issue as voice assistants, which is that it's way easier to transmit complex thoughts/actions on a nice big visual UI than it is to convert those thoughts/actions into verbal instructions.
For example, when people order food from an online menu, most people are not fine with "get me the closest pizza available". They're usually considering complex tradeoffs between lots of factors: alternative restaurants nearby, how visually appealing is pictures of the food, health content, other options on the menu including their prices and content and how well they complement other dishes, etc. Figuring out how to express these preferences via a back-and-forth with an AI is more tedious than just looking at a visual interface and letting your subconscious process all this information into a choice.
I can be driving in my car, say "OK Google, open Spotify" and Google happily replies "OK, opening Spotify on your TV".
Lovely.
I have a Spotify playlist called "kid's music".
"OK Google, play the Spotify playlist Kid's Music", and it'll go and play someone else's public playlist called Kid's Music.
Even sending messages barely works.
"Ok Google, send a message to <wife's name because I cannot say "my wife" due to how unintelligent Google assistant is> on Facebook Messenger saying ...."
"Sorry, I cannot send messages through Facebook Messenger".
It can read messages from Facebook Messenger, but not send them, lovely.
I had Music playing on my phone, I wanted to know what song it was. "Ok Google, what song is playing", soon as I say that, the music stops and now Google assistant is listening to nothing.
Right now I cannot say "OK Google, go to Discord channel <foo> and summarize the day's conversation".
> which is that it's way easier to transmit complex thoughts/actions on a nice big visual UI
Depends on the task. Some are better on a UI (I want to see pictures of the dishes before I order), some are better with voice or natural language input.
Even on my PC, I could see some tasks being easier with an LLM
"Close all code editors I haven't touched in a week."
"Clean up my browser tabs by closing out all YouTube, reddit, and hacker news tabs. Keep open any tabs involving AI research or LLMs."
It does have a screen for output. In they demo they showed it displaying a stock graph. As for input, I suspect elucidating your desires verbally is a muscle that is developed, like learning to program or be a writer. The assistants of yore are so error prone and slow people may not have got to that level, LLM might be the difference there.
I don't see it. Base model Pixel 7 can be found in multiple places for $250 right now, so you're not coming in much cheaper than that, but more importantly, I don't really see a whole lot of demand for a second phone even as a cool toy. This feels like the type of thing people would buy, play with for a week or two, and then it would end up in a drawer never to be used again.
At the moment your right, but as Google starts to implement LLM assistants into their product it's going to eat this things lunch. I would guess maybe 18-24 months max before Google assistant can do everything this thing can and they already have a phone at a similar price. Once that happens, I just don't see what niche this fills into other than "it's not Apple or Google".
My thoughts exactly. I'm willing to give it a try because it's exactly how I imagined a companion device for intelligent note taking. Given that it's a collaboration with TE, I trust the build quality. My only concerns rest in the lack of a subscription given the cloud reliance on the OS, and as I'm sure many others feel, security with a fledgling device and it's interactions as an agent of you is definitely an important consideration.
"Doomed to fail," probably. I certainly never expected to see a product so unabashedly quixotic this side of a zero-interest-rate era. But man, what a crazy swing for the fences this is.
By my reckoning, this approach is probably ten years ahead of its time. I simply cannot imagine that the tech is there right now to make this nearly as seamless as it will need to be to actually supplant the UI paradigms we have today, and it's going to take billions invested by the duopoly to get to that point.
But I really do think this is at least a fuzzy picture of where we're headed. Your iPhone in 2034 won't look like this, but you'll likely be able to trace some things back to it. There are a lot of pieces missing that we barely even know are missing yet, but it's incredibly exciting to see a startup try to jumpstart a step change like this.
Idk, I think an iPhone in ten years will look a lot like it does now just with better Siri. Better Siri will open up a lot of things and it’ll be wonderful but we’ll still want a screen (we’ll still interact with the world largely via reading) and an iPhone just looks like a screen.
Apple and Google presumably know about LLMs so I don’t really see this device being very influential.
Maybe. That'd be interesting I suppose. Those Meta smart glasses do look more usable than anything on the market previously. Maybe because they omit screens.
Perhaps at some point technology will allow glasses to have screens that are functional without making you look like, as they called them ten years ago, a Glasshole. Ten years may be enough time.
Mobile ads and app stores make Apple and Google too much money for them to truly revolutionize mobile phone user interfaces.
Anything that even threatens those business models will be strangled before it gets released.
IMHO Microsoft has the most to gain here. If they manage to turn Apple and Android into platforms that interface with MS made/ran/hosted LLMs, MS wins the day.
The problem is monetization, these LLM services are expensive, and telling users they need to pay another $30 a month to use their phone will be a hard sell. Hopefully LLM hosting costs go down, but if hosting LLMs becomes too affordable, MS's giant war chest of $ and hardware become less of an advantage.
The other danger is Apple/Google throwing a bunch of restrictions in place, but all the anti-trust scrutiny they are under right now could give them pause on attempting that.
to me the pitch is compelling. i don't want to use any more fucking apps -- i want to talk to my computer, like i talk to chatgpt. there's real opportunity for a revolution here.
i don't think apple or google are well-positioned to build this revolution, because they are too conservative and too bought-in on the old interaction model.
(if jobs were still around, different story. alas.)
Google's primary revenue driver is search ads. I'm sure IAP fees and web advertising don't hurt either. If you don't do searches, browse the web or use apps, where does the revenue come from?
Even if Google builds something like this, gets market share and sells at a profit, it might still be a net loss for them because of all the advertising money and click tracking data they won't be getting.
Google definitely has the expertise to build this, but they also have an extremely risk-averse attitude resulting in layers upon layers of bureaucracy, and this product is literally cannibalizing their most important markets. I can't see this going over very well at all the internal reviews.
Apple was a lot leaner than Google when they build the iPhone (which was also cannibalizing one of their main products, the iPod), but they still had to make a completely independent team with no oversight except for Steve Jobs to get it right.
We also know how bad Google is at hardware. They seem to have gotten marginally better in recent years, but Pixels are still far from successful, even in countries where they are sold, which honestly isn't that many in the first place.
I'm very bearish on Google in this fight. Apple probably stands a much better chance, they've already proven they can cannibalize their own products, they need to do far less of that in the first place, as most of their revenue comes from hardware sales and subscriptions (which is perfect for something like this), they do have a lot of hardware expertise and they've already made major steps towards AR glasses, which might end up as a better form factor for such a device than a phone. Their pro-privacy attitude might be an impediment, this stuff works a lot better if you run it on a beefy GPU in the cloud instead of a tiny, battery-conserving chip in your phone, but I hope they find a good enough compromise.
That seems unlikely given that both Apple and Google have been employing the world's top ML scientists for years, have unlimited budgets, have better access to customers than any startup (to find out what customers want), and ... need I go on? Yes, it's nice that "Rabbit" is exploring this area and being innovative, but unless their particular take on mobile phones catches the world unexpectedly by storm, nobody will mourn their passing when the money runs out later this year.
> That seems unlikely given that both Apple and Google have been employing the world's top ML scientists for years, have unlimited budgets, have better access to customers than any startup.
Both of those companies had it handed to them, like, literally got completely smoked by OpenAI, a company with a thousand employees +/- in San Francisco. The giants are incredibly vulnerable, just like the giants that Google and Apple disrupted such as IBM, Yahoo, AOL, etc.
It's a lot harder for startups to win over large companies in hardware, even with a superior product, since shipping hardware is tremendously capital-intensive. An example that comes to mind is Pebble, which had an excellent smartwatch that worked way better than the one from Fitbit, but nevertheless ran out of money and got sold to the latter.
> That seems unlikely given that both Apple and Google have been employing the world's top ML scientists for years, have unlimited budgets, have better access to customers than any startup (to find out what customers want), and ... need I go on? Yes, it's nice that "Rabbit" is exploring this area and being innovative, but unless their particular take on mobile phones catches the world unexpectedly by storm, nobody will mourn their passing when the money runs out later this year.
I worked with one of the (many) teams at Microsoft who worked on Cortana.
The way the team leader explained it to me is that Cortana could do a lot more, but internal corporate politics prevented it. Rather than implementing the best solutions to user's problems, they had to do things like ensure Bing search handled certain results, to make sure that team stayed happy.
Or to take it to the extreme, if someone at Google came up with a device that directly beamed 100% correct search results into your brain, Google would never release the product because of the loss of search ad revenue.
I think an acquisition does seem pretty likely actually. Google still has yet to make assistant do anything interesting, despite leading the way on the research side of things. Also keep in mind that they lost a ton of AI talent to startups.
1. OpenAI is building in a brand new space. Mobile phones are well established. LLMs as a product for consumers are brand new. While it’s true Rabbit are trying to merge LLMs with mobile, the elephant in the room is the mobile incumbency. The existing players just have to add LLMs to their existing dominant platforms and Rabbit is done for. OpenAI, on the other hand, was unopposed launching ChatGPT. They had a genuine technical edge and consumers were hungry for it. Is everyone hungry for Rabbit’s concept? Give me a break.
2. OpenAI raised an order of magnitude more capital. Rabbit’s $30M isn’t going to get them much farther than a prototype device. My impression is the founder here managed to convince some VCs to give him money during the boom times and leveraged the generative AI hype train more recently. But where is he getting his next round? The one that he will need to actually make phones at scale. That will cost billions ultimately, and the incumbents own the supply chain he needs to access. His effort is all but doomed.
3. OpenAI’s formula was easier for a startup to master. All they needed was money for the best AI engineers and scientists and money for GPUs, and they could create a blockbuster product. Rabbit needs the top engineers as well as extensive capital for manufacturing and distribution. There is a reason that hardware favors massive scale and a reason why hardware startups tend to focus on pinpoint innovations. The energy barrier is extreme.
These are three reasons why Rabbit is in an entirely different situation than OpenAI was circa 2022.
I watched the first 2:51 or so of the keynote. The guy made a series of assertions that he seemed to think were self-evident. I disagreed with every one, without exception. I gave up when he described the LLM as “artificial intelligence”. I don’t think I’ll be a customer.
I think this is Apple's plan for the apple watch. I see my gf leave the house without her beloved iphone, making calls, listening to messages etc with her airpods connected to her watch. She doesn't even touch it.
I might even have tried it myself if Siri didn't suck to much. My experiments in this regard were...discouraging. Perhaps she radically restricts her use to things Siri understands (unlikely in her case) or she just knows how to speak in a way aligned to siri's expectations).
Though the other night I did awake to her her firmly saying "flashlight. flashlight. flashlight." for a bit followed by "shit" and a bit of fumbling at her wrist.
i do the same when i go to the gym or go on a bike ride. some apps are a bit of a pain to use on their own (trying to join a meeting from a calendar invite doesn't work because hrefs aren't tappable), but overall, i really like the experience and wish i could do it more often.
All I really saw is an awkward looking device (sorry TE) that probably wouldn't fit in my pocket, and would probably break if dropped. And whatever they paid for the keynote was a waste of money. If you can't explain how this works in about 30 seconds - I don't care.
there are tools that exist that teach agents what to do, but the ability to have them on command and return a nice little summary in a neat UI is actually kind of compelling. It also feeds the company routine information to fine tune agents and maybe results in a dev ecosystem of some kind? need to learn more about how the system prevents some of the concerns hear like taking undesirable actions on your behalf
I wish they put it next to something, I have difficulty understanding the size of it. People in this thread are talking about it like it’s tiny, but I got the impression from the pictures that it was gameboy size.
I know the $1000 device in my pocket is capable of all the things they implemented ... do I just wait for a Rabbit clone app? Or will Apple/Google just enhance their native offerings to include this sort of thing. I absolutely agree with the premise. Our phones are just mini-desktop computers accumulating unused programs that don't integrate with each other storing random bits of data all over itself. The whole device seems to be a more polished semantic web that we were promised years ago albeit being brute forced by AI because interop never evolved.
Apple would have to walk back some of their stance on privacy and/or their App Store rules. I hope they do, but I'm not holding my breath. Until then I'm cheering for people who are daring to think different.
I think what’s impressive here is their approach to “Teach Mode”, which essentially teaches the average consumer how machine learning works. I think that’s already a much better approach than Humane Ai and provides a better path long term for where this could go. This approach of a Large Action Model that can control any of your apps by watching you use them is a unique implementation for a hardware device. The hardware isn’t super impressive but the approach has staying power. It’s always way less invasive than Humane AI pin, it’s deliberate usage.
This was what I had hoped might be possible years ago working on a voice assistant for a browser (Firefox Voice). That was all prototypish and desktop-focused, but the real potential I saw was in remote control of a cloud-based browser. Voice assistants without a way to execute actions are pretty limited. Google/Siri/Alexa built up those actions through sheer effort, creating integrations for everything, carefully handling the conflicts, etc. A cloud-based browser has the potential to maintain context and perform actions across everything right away, without that huge investment and need for market share, nor buy-in from service providers.
I'm not sure how exactly they handle the communication between that browser and the phone, but at least my sense of this is that they are doing what I imagine is "standard" browser automation (which is much more advanced with LLMs) and then something to reinterpret the output for the phone. You can do a lot with the mobile view of the sites, but they seem to be going further. (For something like Spotify I wonder if they are doing something site-specific)
The use of a browser offers interesting possibilities for auditing what is happening. With something like GPT Plugins you can see the JSON, but it's not really equivalent to what you as a user understand.
Looking at them using Discord, it makes me think they must be doing automation through vision, detecting pixel locations, etc, as Discord is deliberately hard to automate.
Accessibility APIs are another way to do it. You can't break them because you have a bunch of angry blind people at your doorstep, but accessibility and automation are very closely related. If you expose your UI to a screen reader, you also expose it to whatever other apps want access.
IMHO based on the few screenshots they showed in the background when talking about their LAM, I think they are either using something like the accessibility APIs to interact with the UI tree directly.
Voice chat with your computer is possibly the worst input method. UIs exist for a reason. Clicking a few buttons is much faster than a conversation. The future of computing doesn't look like a bot that's automating some website UI actions.
A legal aside: if an airline wants to make a sucky UI because its trying to upsell, should it be legal to allow users to skip over it? Is a company's website any different to their physical storefront?
The hardware device itself is a bit of a red herring. It's really a server-side LLM that is trained to use software on your behalf, presumably against server-side Android/Chrome VMs that you authenticate to various apps/services with OAuth.
At first it seems a bit like a kludge, but I don't really see any other way to create the dream sci-fi robot assistant. It has to be able to do things as you, and it would be impossible to manually integrate with every website or service.
It doesn't cover that many months of expensive GPU compute and Chrome VMs, though. The "No Subscription" promise is very dubious unless this is a play for selling user data.
It's a neat idea but I'm not sure what the audience is. "It's simpler", but you still need to manage a bunch of integrations from a computer? It can't seem to make phone calls?
I just want something that I can give my 93 year old grandfather so he can order a ride and get reminders about his prescriptions. Rabbit seems like a device that misses an audience that actually needs a simpler smartphone.
I think the intended audience is really anyone and everyone. It’s a very grand vision.
I definitely see the appeal. I do not like owning and using a smartphone. A more functional device with a pared back interface is exactly what I would want, and I think they’re _is_ a decent slice of people out there who agree. You see that in the dumb phone and minimalist phone market that’s popped up.
That said my experiences with LLMs are that they are woefully underbaked for this kind of thing, and it’s a really tough sell to me on a privacy basis as well.
But for older folks I’m not sure I agree that the market is missed. I suspect the idea would be that you would manage the web portion for your grandfather and then he can just chat to the device. Unclear if it can make calls though, that does seem like a miss!
My elderly parents were my idea too, and the fact that I can manage the integration from my computer is actually a plus here. It's not like you have to do it all the time.
It has to be able to make phone calls and text on WhatsApp/Signal too, though.
the screen looks like it may be too small for people who have vision problems - no idea if one can adjust the text size used. But the device does "say" the text, so if the user has decent hearing, they could rely on the audio versus reading the text.
Now if multiple in the same area were using this device, it may get chaotic/confusing, but no moreso than multiple people having loud voice conversations in one place.
Heh, Rabbit (another one) was a location-specific mobile phone company in the UK back in the early 1990's. You had to be near an antenna to make or receive a call. It's the first thing that came to mind when this post came up.
Smart phones are closer to smart cameras. I use the phone app maybe once a month. I'd argue it might be viable to build a smart phone that has no phone.
Yeah it's not a TE product, just a TM design I think. Though to be honest the price is very suspect, TE is usually very expensive but this is too cheap for what it claims it is.
The future is now. At least the 5 next minutes, but then it will keep changing and what you do with an intended expire time of at least a year will be badly obsolete and moldy in a month.
That is the problem doing end user hardware tied to a technology that is changing very fast and where most of the innovation may be out of your sight, being carried by your main competitors.
That's because it is. Telling my phone to "edit images on Photoshop" (one of their examples) just doesn't seem feasible in the majority of interactions.
They sold 10k units in first 24h, so there must be other fools like me who thought they are onto something. Looks like a new category of devices being born.
The biggest take away from this is the LAM-over-email which was hidden away in one of the demos. This has a lot more potential (esp. in a business context) in my opinion than the phone itself.
Being able to directly email an autonomous agent or even CC one later into a conversation with instructions can be a game-changer. Not only would the replies be faster but the result could be cheaper compared to a human equivalent.
One use case which comes to mind is, which was also demo'd, is making trip arrangements. In the demo, the spoken request to book a trip to London was long and precise; no way I would get that right the first time on any push-to-talk device. At my company, we use a travel agent for business trips which we email and they book everything and send over receipts and tickets. I could just as well be emailing a LLM/LAM.
This is never going to work for me because I don’t want to talk to my computer. When I’m in public my voice is noise pollution at best and is telling everyone around me what I’m doing at worst.
Call me weird but I don’t want everyone around me essentially hearing every thought I’m looking up in real time. Feels creepy.
I'd totally buy a ChatGPT+ device for my 7yo and I imagine people around the house would use it quite a bit.
Not sure I want to replace any other actions yet. Probably someday, but touch + apps is actually just incredibly good. A sub-1yo can start learning to use a smartphone / ipad, and a 6yo can put together an uber eats order. I'm not saying it can't be improved on, and I fully want a competent voice assistant in my phone. But it's gonna be real hard to do better than what we already have in our pockets on the things they're already good at.
So this presentation fell flat is for me are when it was talking about doing things pretty much everyone does just fine on their phone. But it was very exciting to imagine a ChatGPT+ device sitting around the house for anyone to pick up and interact with.
I don't know about others, but I don't want something else to carry in my pocket.. especially something that duplicates what i can already do, just in a different way.
seems like it's connecting to external services but i'm not sure why siri couldn't do that.
The website and the first 5 minutes of the keynote fails to convey what the device does exactly, other than that it meant to replace smartphones, which can't be right. This form factor would be inconvenient for watching videos and that's one of the most common uses of smartphones. They should just market it as a pocket assistant and make it do one thing exceptionally well. It's a losing strategy to try to be everything for everyone instead of just trying to dominate a small beachhead market. Smartphones with better assistants are coming, but there is no reason to compete with them. Just differentiate the damn product.
But the actual rabbit device in the keynote could have come later. The cloud backend including the LLM is independent of the rabbit device. They could've demo'ed an app on an iPhone to show the VCs - there's nothing rabbit-device-specific that an iPhone or Android phone couldn't do.
Its serious question. I am stuck in a loop of taking clients money to bootstrap my startup but i need much more to pay a few other engineers to show companies like these that they produce garbage. Even apple is running in the totally wrong direction.
Until now i was under the impression i'd need to hustle more to earn cash & build the prototype. But that won't be fast enough. I can't watch people that just don't get it but try to imitate apple (website, keynote, lol) with just some bullshit.
The more i watch from the sidelines, the more i am reassured that we are building something incredible that nobody sees coming. But we are not located in the EU, not the US. So i am not sure what i should do.
The device itself is really cute. I'm not sure about handing oauth tokens to all my accounts to a third party for them to run huginn/selenium on a backend that might not be online for more than a year. I'm barely comfortable with Alexa having a connection to my iTunes for podcasts. What happens when Uber or whoever decides to throw a captcha between Rabbit and the web frontend?
I'd like to see it do more than help me buy things. Teach mode seems to be the main way to do productivity tasks?
These demos are always the same: “<Product> find me the closest pizza”
Who actually interacts with a device like that? Is it an important problem enough to be paying $199 while I can do the same thing (often even better) with my phone?
No. I could go deep into "rare language, weird slang, local dialect" hole, but simple "No" would be enough. Their website is exceedingly non user-friendly, too
It's a client device to an LLM that can use software on your behalf, presumably running on Android and Chrome/Linux VMs that they host. You log in to their web portal, authenticate various apps as you would on your own phone, and then the LLM can do everything you can do. You can also train it to do specific tasks.
When you press the button on the side, you speak as you would type to ChatGPT, but now the LLM can do arbitrary things as you.
I can't see this product outcompeting Apple, Google, Samsung, etc. At best, if it gets enough interest, it will be copied and improved like what happened with Pebble and smart watches
Since Teenage Engineering has been mentioned as a design partner, I'd look at the specs for the Playdate and give it a spec bump.
Based on the rabbit keynote showing a colour screen - take a Playdate device, turn 90 degrees counterclockwise and reduce the height-now-width since you don't need a controller, just a roller wheel, pushbutton, and rotating camera.
Quick web search states a Playdate device has a 168MHz ARM Cortex M7 and a 400x240 LCD B&W screen.
Either some sort of Linux with custom GUI, OR bone-stock AOSP + custom app. If I were in some sort of CTO role I'd push AOSP route, as an individual dreaming idealist I'd be inclined towards the former.
I am definitely not the target audience for this. I am that person that keeps typing while people send voice notes, and now the only way to interact with this device is via voice? :(
Haven't seen anyone mention just how bad the website design is here. The homepage ran at single-digit FPS on my (pretty powerful) laptop, and I noticed performance hits where there really shouldn't have been.
Besides that, there are too many animations where a few sentences or pictures would do. Elsewhere, thick walls of text, sometimes barely indistinguishable in color from the background.
The site seems to be an omen of the product itself: cool in concept, but touch-and-go in implementation.
On my phone, I get a big "Share" button right in front of the keynotes video player. It also prevents me from skipping around the video, no way I'm watching all of that.
I tried to go into landscape to fix it, qnd I get a black screen asking me to go back to portait mode.
I don't know how you can get such basic things so wrong.
I think you mean portrait mode. But yeah that immediately made me not care about whatever this is. If someone is too lazy to make their site responsive, that's fine. But when they go out of their way to make the site unusable, that's another thing entirely.
The website - looks like its trying to emualte apple.
The key note = The speakers hand movements and gestures are funny, emualting apple again.
The demo of the device, faked.
Funny - When it switches cameras back to the guy holding the device and hes just nodding, looks totally fake...
all in all.. fake and trying to copy apple. Can't people be original anymore?
The AI first OS is a step in the right direction. I just don't think that a new device in this format is necessary yet. Regardless - I'm missing visual feedback in the current version. It's too open-ended so a list of suggestions on what to do would be helpful for example. Overall pretty cool MVP tho. Excited to see what's next!
Without a complete accounting of this companies ties to the CCP, there's literally ZERO chance I'd ever use their man in the middle attack vector device. Giving all my most sensitive login credentials to a Chinese company that's trying very hard not to be seen as a Chinese company seems like a totally sane and safe thing to do...
- Not sure if people would want to "talk to" a device in public.
- Anything and more that this device do can be done by a smart phone that people already have.
- I feel like people prefer using a UI interface instead of speech.An example to this behavior might be using delivery apps instead of calling a restaurant.
This keynote really makes it seem like a scam. For example the
"plan travel" section hardly shows anything. He just says "It's all been planned out, I just confirm, confirm, confirm..."- while the camera cuts to him tapping the device without showing anything on the screen.
I'd buy that for my elderly parents who are overwhelmed with apps.
Probably not the first model but the next one for sure if it can text and phone too.
I think there is a market for this and even though I despise talking to a device, I can see the use and I can see how it could bridge a world which is getting too complicated.
I genuinely think this tech is really cool for old or disabled people who can all of a sudden interface with their devices with just natural language, that said I legitimately just do not see the appeal for regular users but cool tech nonetheless.
Has any startup succeeded by starting out offering shiny hardware running innovative new software, all of which they have to develop?
It seems like a fatal dilution of focus to have to worry about the design and logistics of a fancy dumb terminal widget when you also have to get the software/AI/app integration stuff right.
Just make an app with text and voice interaction. Accept that the thing in our pockets with a screen and an internet connection is going to be a smartphone. You will not build an own-hardware moat with these weird little bits of e-waste.
I don't think there has been a single successful hardware startup in the last decade, so the answer to your question is safely "no" without even going into specifics.
Which is sad, because I'm sure there's room for a lot more innovative devices in the world outside of a single glass rectangle in your pocket that everyone must plug into in some way. The economics of the industry just makes it very hard for them to survive, and we all lose out because of it.
There are tons of successful hardware startups in recent history.
You just don't hear about them because they're not selling to you. They make business, commercial, and industrial hardware.
Consumer hardware is very hard because consumers are extremely demanding of hardware. Just look at how difficult it is to convince people to spend even $5-10 on useful software or sign up for a $100/year SaaS product with near zero marginal cost per customer. Consumers are really hard to please and consumer price points are difficult to serve.
In just a few years around 2007-2012 we got Oculus, Nest, Ring, Blink, Fitbit, Beats, Oura, Square, Pebble, Tile, Dropcam, SmartThings, Makerbot, Neato, Raspberry Pi... All pure consumer hardware startups with popular products and successful exits. So it's not like the category is somehow fundamentally not viable. It just needs VCs and consumers to both shift from the smartphone-only mindset and start taking some risks.
I would argue it is. I've got a Quest 3 sitting next to me and I think it's great.
I can't speak for the deranged expectations, hype cycles and back-lashes over the last 5 years. And I think Meta's R+D budget is pretty hard to justify. But I don't think that reflects on their current product range (which could have been matched with a much smaller budget - and to some degree has been)
Hardware companies have (1) greater costs to get off the ground (2) longer periods of development (3) higher incremental cost per sale (so harder to scale) (4) slower iteration speed, and (5) just overall a lot riskier than pure software. A VC fund is going to see a startup with a groundbreaking innovate hardware device next to one building a cookie cutter SaaS app and still invest in the latter, because it just makes more business sense for them. No one outside of Apple/Google/Microsoft and the like is pouring 10 years and billions of dollars towards releasing a new device.
While I do agree with the topics and perceive them as changing resistance, I would like to add that bootstrapping your community is always a good idea for a successful business, so VCs are not essential. On the other end, I feel that personal data control and the human willingness to daily give away so many tasks are hard barriers on the long distance.
Not having a subscription is shocking. I really like the device and LAM sounds promising. I just don't see how they are going to make money offering unlimited LLM + LAM capabilities without a subscription service.
the keynote failed me at "Shake the device to create a keyboard on the screen".
if the whole damn point of suffering with your goofy wildly colored mostly-incompatible mobile device is a talkable LLM, then "shake to keyboard" is a failure.
at $200 i'd rather just buy any of the plethora of sbc mini devices with cell phone antennas and play with LLMs myself without the black box that I can't touch. a uconsole or similar is , generally speaking, cheaper AND open-ish -- it's a fool's errand to trust a non-paid subscription service to continue existing.
Sidenote: I find it funny that somebody that is trying to create a product that competes with Apple and Google ends up showing both of their logos in many places of their most important marketing video.
It's a bit hard to imagine this doesn't get eaten by Siri and the like. The main issue with bots like Siri is they are very stupid. But w/ ChatGPT, the tech is now available to fix that.
Yes, of course a hardware startup have a better product than Open AI. Definitely not an entirely faked demo… my suggestion is that this will not happen for 5 years at least.
No one, including us, will be able to use personally identifiable information (your name, phone number, email address, etc.) for any purpose other than serving you.
Is there a specific paper or something you can point me to? Or are you talking about like llama.cpp? Because I thought that referred to the fact that it was originally one c++ file named llama.cpp?
I hope they succeed. Would buy this if it worked offline. Wish Pebble hadn't gone broke and had launched their Pebble Core device. Always thought it was a great idea.
I was going to spend some time criticizing/ridiculing this ("analog scroll wheel" in the feature list made me lol) but then I saw the price. $200? Maybe.
watching the keynote, I was about to click Buy until I saw all the steps that required a laptop...
i feel a revolution shouldnt require so much tethering to other services through the hardware you are looking to replace?
if they keep iterating towards a truly standalone product, I think they'll hit all the notes we've been waiting for from a post-cellphone portable device
Yes, we currently sell to customers in the United States, Canada, United Kingdom, certain countries in the European Union (Denmark, France, Germany, Ireland, Italy, Netherlands, Spain, Sweden), Korea, and Japan. We will ship r1 devices only to mailing addresses in those countries/jurisdictions.
When will my r1 be shipped?
We expect to begin shipping r1 to US addresses for Pre-Sale Purchases in March-April 2024. Shipping to non-US addresses for international orders and fulfillment of Pre-Orders are expected to begin later in 2024,but we do not have a more definite estimate at this time. Shipping estimates are subject to change based on our manufacturing capacity and component availability. An exact shipping date is not available beyond the expected shipping window listed above. Please be assured that we will fulfill orders as quickly as possible in the order they were received. We will update this FAQ page with regard to shipping within the United States and shipping for international orders with the latest information on shipping window estimates as soon as we have that information.
Small enough that I would carry it in my shirt pocket or back pocket for anything. Looks very slick and at a very good price. Will buy three for my siblings this christmas.
In case I'm not making sense, imagine if OpenAI made an iOS/Android app with subscription, what chance does a new piece of hardware have to compete against that low-friction install base?
Only after getting the users should smartphones be replaced with a bespoke device. The only need for doing the device first is if the service is effectively commoditized so need to differentiate somehow.
This may just be the shittiest product website I've ever seen. It conveys absolutely nothing of value about the product. What problem does it solve?
It's just "ooh, pretty pictures, buttons and camera." Seriously? I'm supposed to trust a product from a company that can't even tell me what it's about without making me sit through a 25-minute keynote? Are they out of their minds?
How can you so badly fail at even saying what the hell this is on your shitty glossy web site? What is this? The page is just a list of hardware 'features' with large but unclear pictures.
"push to talk button" ok, so?
"far-field mike" ok, so?
"360 degree rotational eye" so what?
"analog scroll wheel" that does what?
"usb-c plus sim card slot" which gives me what?
There's a video but I shouldn't have to watch a video.
well, it tells me enough: the only input method is the mic, so you cannot silently enter any input into this device. I don't need everyone around me to hear my Kagi searches or text messages dictated to my device. So, it's not for me.
Since it's all cloud-based and has a far-field mic, I would also ask people around me not to use this device, if they had one, and turn it off if they are around me.
As far as I can tell, this product is just a bugging device.
This is a bit harsh. I feel that you can figure out what the device is before even scrolling. It’s a companion device that you can interact with to get assistance with various things, and the features you mention are how it gets instructions and information.
What is "assistance with various things?" That's incredibly vague and unhelpful. What is "a companion device?" Again, vague and unhelpful. You haven't explained anything, and the web site fails to explain anything, and I'm being too harsh?
There is zero information on the web site except an overly long video badly aping Apple's product announcement style. Including as-seen-on-TV "doing things is so hard" crap.
Hell, the FAQ is entirely about purchasing, not about the "product." It's therefore also meaningless. Rather than harsh I'm being generous. This web site is useless.
Yea I didn't want to watch the video and had no idea what it was, my best guess was it was a quirky non-smart phone, never occurred to me it might be a "companion device" whatever that is exactly.
As usual with Teenage Engineering, I love the hardware design, but this aspect of the software is a letdown:
> rabbit OS operates apps on our secured cloud, so you don’t have to. Log into the apps you’d like rabbit to use on your system through the rabbit hole to relay control. You only need to do this once per app.
So things don't run on the device, but on the cloud. Unfortunate.
The device is $200 and appears to be smaller than a phone so yeah, it does not run AI locally. I assume Siri doesn't run locally on Apple Watch or AirPods either.
Yeah it sets timers, that's about it. Not saying this product is good but the intention of what they are trying to enable here is a world apart from what Apple is running on the watch.
The keynote is near science fiction, but the actual implementation is leaving me badly confused.
- Did it really scan the whole table to add a new column? As in, OCR? What would happen if I had more than one screenful of data, and who in their right mind would trust the OCR'ed output?
- How can it access the Discord account for the "teach mode"? Is it stealing your (sensitive and expiring!) cookies to reproduce the actions on their cloud? Or worse yet, running instructions from the AI on your local machine?
- It shows structured results from Spotify and Uber on the r1's screen. Does it have custom integration with all those apps? What happens when the API changes (Twitter), or they don't support an app I need (local taxi), or the app is averse to third-party integration (WhatsApp), or I need an "advanced" feature (payment methods on Uber)?
- Why in the VC hell are they basing their revenue on a one-time sale of a clearly unnecessary electronic gadget (my phone already has all that!), when their biggest variable costs will be recurring cloud expenses?
> How can it access the Discord account for the "teach mode"? Is it stealing your (sensitive and expiring!) cookies to reproduce the actions on their cloud? Or worse yet, running instructions from the AI on your local machine?
That's what I caught too. It mentioned somewhere on their site that you'll have to provide login credentials through their Rabbit Hole dashboard panel of all services you want to use.
It breaks the services' ToS but I guess most people don't care much about giving their passwords. There's no other practical way to integrate with multiple services.
Given the collab with Teenage Engineering, and the price, I think this is pretty much in line with the kind of stuff they do.
Will it change the world? Unlikely. But there is definitely a niche that will love this, even if it's just for the form factor and design rather than anything practical. I'd also expect it to fit into TE's range of musical equipment in some form.
They largely use semantic metadata (like alt tags on images). This is part of why the "google docs is moving to canvas rendering"[1] caused a big stir there (accessibility would need to be implemented from scratch, possibly by using a hidden parallel DOM)
Most of these questions become irrelevant when you recognise that most of these kind of companies were most likely created to be purchased by Apple or Google.
I'm not sure where the founder is from, but seems like he has Asian connections. And over there these type of super app (WeChat) that can do everything for you is quite very common. It would also make sense to centralise everything if it wasn't because this 800 lb of surveillance capitalism gorilla elephant in our metaphorical internet room.
It seems much of the screen content in this keynote was pre-rendered (or in other words fake), so many of the answers might very well be "it can't actually do these things".
> Why in the VC hell are they basing their revenue on a one-time sale of a clearly unnecessary electronic gadget (my phone already has all that!), when their biggest variable costs will be recurring cloud expenses?
It looks like they'll also be allowing people who create their own automations to sell them in their marketplace. Presumably they'll take a cut of the sales here.
> who in their right mind would trust the OCR'ed output?
> Is it stealing your (sensitive and expiring!) cookies
> running instructions from the AI on your local machine
Based on my interaction with LLM enthusiasts, there are plenty of people would happily bet their career, give up all their credentials, and grant full access to all their devices to LLM if they are promised something cool. See the stories about lawyers giving LLM-generated materials to the judge for an example, or journalists submitting LLM-generated articles.
Yes, this occasionally ruins someone's day. No, this would not stop true enthusiast from trusting the model again.
One of the "lawyer pwned by LLM" cases appeared to be confusion about the difference between Google search, and Google Bard, thinking that Bard was just a next-generation Google search so asking it for relevant legal citations would find actual citations. The lawyer wasn't an LLM enthusiast, he didn't even know that an LLM was involved.
It's a striking condemnation of the current state of things that my $1K+ mobile phone does not do anything close to what's in this demo. I would pay $200 just to have a voice assistant that isn't totally incapable of playing songs requested like "Play SONG_TITLE from ALBUM_TITLE".
As an example, yesterday my daughter asked for a song from Snow White.
EDITED FOR CORRECTNESS (originally I said I asked for Heigh Ho from Cinderella)
In the car I say, "Hey Siri, play Heigh Ho from the Snow White Soundtrack". Siri: "Sure, here's Snow (Hey Oh) by Red Hot Chili Peppers"
(try it yourself!)
Why are Alexa and Siri still so useless, inaccurate and inconsistent? Why can't I yet ask Siri to "book me a ride via Uber from location X to location Y" or "reorder the same thing I got last Tuesday from Uber Eats"? I assume it's down to compute costs, but I would absolutely pay an additional subscription fee for more intelligence behind these voice assistants.
> Prompt: Give me the lyrics for Heigh Ho from the Cinderella Soundtrack
> ChatGPT: "Heigh-Ho" is actually a song from the soundtrack of Disney's "Snow White and the Seven Dwarfs," not "Cinderella." The song is famously sung by the seven dwarfs as they head to and from their work at a mine. Here are the lyrics:
Sorry, I did actually ask for Snow White in the car! And it played a song called "Snow". Haha, reproducing it here at my desk I mistakenly said "Cinderella"
There are worse songs to accidentally get! That said, I sympathize. Given how easy it is to get, say, the passenger to put on a song, it's absurd how many hoops you have to jump through with any voice assistant to get it right.
For example, I often wish I could just queue up a song to play next like I can over touch, but there's no elegant way for me to do that without picking up my phone in the car. If my wife were riding with me, all I'd have to say is "Hey play [XYZ] next", a feat yet unmatched by any voice assistant
I kind of agree with this. I'm not sure why the product is getting so much heat here on HN. I do think it's probably overkill and won't be super sticky, but it's at least iterating in the right direction.
Yeah the "OS" appears to be their backend that boils down to your login credentials + something like Selenium WebDriver + ChatGPT. The R1 device is just a thin client.
Kind of a hilarious take if you watch the keynote where significant time and a moving backdrop are devoted to highlighting the miasma of navigating across 100 apps on a smartphone. Decrying "just install another app" is a big part of the raison d'etre for the whole product!
Giving your credentials to an AI to buy stuff online by scraping a web interface is a perfectly sane and safe idea that won't have any unexpected consequences whatsoever.
1. at 13:13 there's a demo ordering a ride 'to home', then the user requests a car change to fit six people, and what's shown as a seamless switch from UberX to UberXL also updates the destination from the home address to LAX airport.
2. at 14:05 the device confirms and recites a pizza order, but the screen displays "chesse" with a typo while the voice reads out "cheese", so either the audio or the visuals is faked. My guess is that all of the on-device graphics were hand-written and hand-animated which would explain both mistakes.
I stopped watching at that point. Am sort of sad to see Teenage Engineering associated with a product that seems so sloppy and/or shady.
Right after you stopped watching, he books an entire vacation - flights, hotel, car rental, activities - and says he is given multiple options to choose from but shows none of the options, just an itinerary that he quickly scrolls through.
The idea that anyone is going to book a vacation this way is ridiculous. Vacations take extensive thought. There’s no way a voice interface would work for this even if it was the Star Trek computer.
No one ever shows what these should and could actually be used for - incredibly boring and repetitive tasks. "Hey rabbit, respond to the support ticket you just told me about with XYZ". "Hey rabbit, find all of the sales meetings I had last week, cross reference the emails of the attendees with existing emails threads, and then generate follow ups for each. Keep them short, to the point, and read them back to me before you send"
Like where is this? This is what I want. Not some voice assisted way of booking an Uber or some shit.
It's a pitch to get VC funding and this is exactly how that audience treats vacations. The idea of saving or planning to do something is just unimaginable to the super rich this is aimed at getting investment money from
At 15:55 the two screens de-sync. He's telling it that the schedule is too intense and on one screen he lifts his finger off before he ends and the other is still on the button. It was definitely shot as a single take. But hey, I guess if Google can do it you can pitch LLM WeChat. At $200, I'm tempted to buy one, they could be as famous as Theranos.
1) Price seems too low for unlimited LLM usage and there's no monthly fee... so maybe you are the product?
2) Website has no information on the people behind this.
3) In the keynote, the demo of logging into to other services like Spotify appears to just be stealing the auth tokens from the laptop and shipping them to the R1 device. Not a good sign.
4) The founder's voice-over insists that "we value privacy" and "we do not hack" and "we do not create fake users." Protests too much?
5) They're not really addressing the trust-building necessary to convince people to use this like a personal assistant. It's waaaay too opaque.
Neat product, though. Maybe they're just launching super early, nothing is figured out yet, the website isn't done, etc.
> 3) In the keynote, the demo of logging into to other services like Spotify appears to just be stealing the auth tokens from the laptop and shipping them to the R1 device. Not a good sign.
> 3) In the keynote, the demo of logging into to other services like Spotify appears to just be stealing the auth tokens from the laptop and shipping them to the R1 device. Not a good sign.
I don't think the tokens are getting transferred to the device. It seems everything happens on the cloud, so the tokens must have been stored on the cloud too.
I think the tokens are stored on device. I think what is happening is that this is effectively an Android device (or web browser) that you are somewhat normally logging into apps on. Then the LAM (which is running on device) is interacting with those authenticated apps behind the scenes, and only showing you the slick black screen with the rabbit and results on it.
THIS - pricing and security/privacy. Specially #1,2,3. Saying this despite ordering one. There is definitely a lot of scope to this new way to interacting. Really hoping these get addressed soon.
> 1) Price seems too low for unlimited LLM usage and there's no monthly fee... so maybe you are the product?
It only needs to last a few months until all the devices are palmed off to kids. There is a 0% chance this makes it to year 2.
> 2) Website has no information on the people behind this.
All the better to run with the money. Given there won't be a second device, they should have charged more. I suspect they charged less because it simply performs _that_ poorly.
> 3) In the keynote, the demo of logging into to other services like Spotify appears to just be stealing the auth tokens from the laptop and shipping them to the R1 device. Not a good sign.
It's worse, it's not processed locally at all. That means that it's all done in some random cloud.
> 4) The founder's voice-over insists that "we value privacy" and "we do not hack" and "we do not create fake users." Protests too much?
I've never murdered anybody and there's no point looking in the forest!
> Record your actions, explain them with your voice, and play them to rabbit OS. LAM will learn the nuances and create a rabbit that can be applied to various scenarios.
> What if you create a rabbit that could be useful to others? You can monetize and distribute it on our upcoming rabbit store.
This is neat right up until "monetize", where everything flips from being a potentially cool community group of reproducible actions, to a pile of spam and bot-generated bullshit
The "keynote" (is that what we're calling advertising videos now?) was pretty awful, and I'm unconvinced that making people unable to even use smartphones (which has already supplanted computer literacy for most) is progress.
Extra negative taste points for wanting a "cool SUV" for a trip to London.
I'm not talking to computers still. It's a border I just can't cross. Like I'd literally rather stop and type on my car display than try to voice in a destination. And I'd if I'm cooking and my hands are greasy wash my hands and pick up my phone and tap in a timer for 5 minutes rather than try some "set timer for 5 minutes" voice command. Not sure I ever will start talking to computers, but not this year at least. Anyone else feel the same way? It's part because in public it feels extremely dumb. But mostly I think it's the frustration. At least fumbling on touchscreen characters is a known frustration. Trying to voice-in a Scandinavian street name is worse.
I set timers while cooking with speech all the time and I don’t see why that seems weird to you. I just think of it like some natural language parser helping me do something instead of talking to myself. Is it talking to yourself you find strange?
Funny you say this, using Siri to set timers when cooking was my "gateway drug" to using voice features on my phone more broadly.
For the longest time it was literally the only thing I used Siri for (cooking timers), but now I use Siri for setting reminders and other similar transactional stuff on my phone, and I use voice dictation for texts all the time in the car, and even sometimes when I'm walking in the cold and just don't want to take my gloves off..
And it works great. And it doesn't feel weird at all.
I'd buy a device just to do the timer setting, weather checking, and light control consistently and without selling my data. I have found it enormously useful and get frustrated/miss it when I don't have it. I also love the music playing feature but don't see how I can get that without being tracked/updates being pushed that require maintenance.
I have found that relying on these devices to have consistent usability when the processing is all done on the cloud and out of my control was a foolish mistake. I do wish I could revert my google home to ~two years ago.
I would never do it in public, but I have gotten used to voice commands for a few things.
For example, adding things to our grocery list. It is quite handy to just say “add butter to the grocery list” when I am looking in the fridge. I am able to not stop cooking, but add things we need to the list as I find we are low.
I respect your feeling/position on this, but I don't understand it...
What feels "dumb" about it?
Is the perception of others towards you when you do this? Because that ship has sailed, people Vlog on the sidewalk and do TikTok videos in the subway, and Facetime their mom or brother at the grocery store or in the mall and have long video conversations with each other out loud..
I'm not saying this to be dismissive. As a GenX'er it took me a while to normalize this behaviour even to myself, but I've accepted now that these kinds of interactions with devices are just everyday occurrences and literally no one cares.
They can be obnoxious for sure, but just using TTS/Siri on your phone is like the nicest version of this interaction I can imagine. ;-)
> people Vlog on the sidewalk and do TikTok videos in the subway
Not without me giving them the "Have you escaped from an institution" look they don't. And you judge yourself as you judge others. If I were to say "Add eggs to shopping list" on the bus, I'd instantly feel the stares of a dozen people as I look on my phone in shame (That's whether or not I'm alone on the bus). And my shame is justified too.
Yes if it's 100% reliable and I'm alone then I'd have a low treshold. But it feels weird having a UX that works when I'm alone and isn't accessible when I'm not. On your phone you can just enter "eggs" to your shopping list. But if devices move away from the general purpose input UX, then you'd have to talk to it on the bus. And when it's only 90% reliable you'd have people shouting shopping lists on buses...
He is the founder of RavenTech. Baidu acquired RavenTech and he became the leader of Baidu's smart hardware division, and then launched the Raven H, which you may have heard of.
That’s actually really helpful context, and absolutely swayed the needle from “no track record vaporware company salesman” to “maybe he actually has the experience to pull this off”
The hardware was designed by Teenage Engineering but the company producing the device (and presumably operating the cloud services it depends on) is separate.
The hardware looks cool as can be; I have total faith in Teenage Engineering, and no faith in this other random startup's cloud AI (especially since we're in an AI hype bubble and their funding is going to run out at some point)
Completely agree with you along with a single caveat.
I do think some of the UX of these devices offer something your mobile phone doesn't but in the name of testing the market, it would make more sense for them to build devices that are currently accessories to your phone, see what use case is getting traction and if a self contained device is necessary.
What I find surprising about these devices is that not only am I paying a lot for the device, presumably because they over-built on software so they could have a stand-along experience, but I also have to have a 2nd SIM for it?
Is everyone paying $50/month for each mobile connection?
The fact the Vision Pro is passthrough VR instead of an AR screen on glass (as in when the battery is dead you see black, not see the room with no AR) says that it's far away.
This is pretty much what everyone saw coming when LLM became a reality: that is, eliminating the usual human-machine interface because now the machine can understand what we say and instructing it through keypresses and clicks becomes unnecessary.
In a few years everyone will carry a device like this one, and "call Jim, the one I met the other day, not my brother" or "start writing an email to all my associates" etc. will become a reality, and visual interfaces will be relegated to contexts where we actually want visual interaction, such as photos, movies, games, etc.
All cool, but unfortunately there's a catch: a device that can run AI locally and connect to the Internet without depending on a proprietary cloud, as of today can't fit in a pocket and can't cost $199, therefore it will depend on services that will have a cost in money and privacy.
Also, since this is looks like nothing more than an audio/video interface to a remote cloud AI, everything it does can be replicated on a smartphone app.
I mostly agree but think you underestimate the “contexts where we actually want visual interaction” - a screen is not just for consuming visual media. It might be that all input is spoken, but output will always need a screen, because spoken word is just incredibly low bandwidth compared to a good text layout. Imagine an order confirmation page for a complex food delivery order, where you’ve made a couple of item modifications and you’re using a promotional deal for a discount on the total price. You can take in all that info in 2 seconds if it’s nicely laid out on a screen. Compare that to having to listen carefully to a voice reciting it all for you. And same goes for things like just browsing other songs by the same artist you’re listening to. Text on a screen, with good typographical layout for fast mental processing, will still be the best output format for many use cases.
> In a few years everyone will carry a device like this one, and ...
Yes, but that device is a smartphone. I can't see a future in which standalone devices that are simply ChatGPT API shells are actually viable, unless they add some significant capabilities that a phone app or built-in phone assistant doesn't have.
A smartphone is still too difficult to use and confusing for many people. There is a large market of users that would benefit from AI tools if it can be packaged in a user-friendly way. This is the first device I've seen that seems to succeed at this, if their keynote can be trusted.
Not to mention the price. This is a fifth of the price of most flagship smartphones.
And then there are so many questions about the hardware capabilities of this device:
- where is the inference running? I don't believe it's on the device. And if it's in the cloud then why make the claim it is under 500ms? Is that just a "don't you guys have low-latency 4g at home?" moment?
- how is that tiny camera capable of parsing a small-text table with 100% accuracy? the optics just don't allow it to.
- what's the battery life with that kind of usage? If inference is running on device it must be very low. Just thinking that my GPU pulls 50-100W on average (with spikes to 200W) just to suggest code I still don't think rabbit is doing anything on device. If it's cloud based then 4g is also a battery destroyer. Maybe that's why the device is so big: huge battery inside.
The "teach" session was definitely cool. But at this point it must be magic because there's no way that thing browsed to a discord server, authenticated with hallucinated credentials and it just worked.