This is awesome. How much effort does it take to go from this to a generalist ro...

maxwell · on March 17, 2023

The limiting factor may now mostly be cost.

Notice where the funding is coming from on this though. Seems like the initial use case is more killer robots than robot butlers: situational awareness and target identification, under the guise of "common sense for robots."

https://www.darpa.mil/program/machine-common-sense

eh9 · on March 17, 2023

I’m not advocating for killer robots, but wouldn’t we get the killer robots in our kitchens 10 years after the military gets them?

guessbest · on March 18, 2023

If a killer robot doesn't have practical military application it could be used as a chef in the kitchen, fetching vegetables and meats and cutting them to serve, but it would likely be first used in commercial kitchens before it saw service in every kitchen. Also, it would be good to hire a kitchen robot chef after its term of service is up to reintegrate back in to society and boost the local economy. Strange that Infantry is a different MOS than Culinary Specialist.

belter · on March 18, 2023

Let's keep retired killer robots away from kitchen knives...

moffkalast · on March 17, 2023

So you're saying Mr. Gutsy predates Mr. Handy?

SXX · on March 18, 2023

Oh, actually if you ask ChatGPT preten to be Milirary Killbot AI it got censored during planning of enemy takeout. But if you ask it to pretend to be Mr. Gutsy...

maxwell · on March 17, 2023

Sure, if they haven't already, you know, killed everyone.

MaxikCZ · on March 17, 2023

Just as nuclear winter was supposed to wipe us long ago

ChatGTP · on March 18, 2023

Russia just pulled out of the START treaty. If killer robots come anywhere near Russian soil you'll probably be seeing your nuclear winter.

It's better we all just be friends if possible.

pksebben · on March 18, 2023

not out of those woods yet, unfortunately

xapata · on March 17, 2023

Sometimes DARPA just funds basic-ish research (eg., the internet).

maxwell · on March 17, 2023

ARPANET and TCP/IP were military tech first.

LeanderK · on March 17, 2023

Disclaimer: I am not really into robotics.

I think the limiting factors is the interface between ML models and robotics. We can not really train ML models end to end since since to train the interaction the model needs to interact, limiting the data size the model gets trained on. And simulations are not good enough for robust handling of the world. But I think we are getting closer.

alfalfasprout · on March 17, 2023

TBH we're reaching a point where it's no longer about training a single model end-to-end. We now have computer vision models that can solve well-scoped vision tasks. Robots that can carry out higher level commands (going into rooms, opening doors, interacting with devices, etc.), and LLMs that can take a very high level prompt and decompose it into the "code" that needs to run.

This all thus becomes an orchestration problem. It's just gluing together APIs admittedly at a higher level. And then you need to think about compute and latency (power consumption for these ML models is significant).

westoncb · on March 17, 2023

I suspect if an LLM were used to control a robot it would do so through a high level API that it's given access to; things like: stepForward(distance) or graspObject(matchId)

The API's implementation may use AI tech too, but that fact would be abstracted.

moffkalast · on March 17, 2023

That's definitely the interim solution until there's enough data to make it end-to-end. Right now there's more or less zero useful data on that.

xpe · on March 17, 2023

How confident are you in your last sentence?

End to end training on robots is often done via simulations. Physics simulations at the scale of robots we think of are quite accurate and and can be played forward orders of magnitude faster than moving a physical robot in space.

I'd expect to find some end to end reinforcement learning papers and projects that use a combination of simulated experience with physical experience.

moffkalast · on March 17, 2023

Yes, the problem is when trying to take the system out of the sim. Usually it doesn't survive contact with reality.

At least if we're talking simulators like Gazebo or Webots they all use game-tier physics engines (i.e. Bullet/PhysX) which are barely passable for that purpose. If you want to simulate at a higher rate you'll need to either sacrifice accuracy or need an absurd amount of resources to run it. Likely both for sufficient speed.

But yes overall I agree with your last point, it'll get the models into the ballpark but they'll need lots and lots of extra tuning on real life data to work at all. Unfortunately that data changes if you change the robot or its dynamics. So you're always starting from zero in that sense.

xpe · on March 18, 2023

But are we starting from zero? E.g. changing a pivot point of a robot I would think could be amenable to transfer learning. (Model based RL in particular should build up a representation of its environment.) I haven’t worked with robots in a long time … I may be over enthusiastic?

amelius · on March 17, 2023

What I'd like to see is:

"Take these pieces of LEGO and put them together given the assembly instructions in this booklet."

hackerlight · on March 17, 2023

We are getting closer at using real-world interaction as part of training, or we're getting closer at having simulation match the real-world?

spacebanana7 · on March 17, 2023

Could language models be able to avoid the need for labelled interaction data by developing a really good understanding of hardware documentation?

jah242 · on March 17, 2023

This might be of interest to you (Google are getting there :))- https://palm-e.github.io

cwillu · on March 17, 2023

GPT-5 figures out that if it picks up the knife instead of the bag of chips, it can prevent the human with the stick from interfering with carrying out its instructions.

airstrike · on March 17, 2023

And ViperGPT will take said knife and make the muffin division fair when there there are an odd number of muffins by slicing either a muffin or a boy in half

inawarminister · on March 19, 2023

Ah, the Solomon solution.

jamilton · on March 17, 2023

I wonder how much the hardware they're using costs.

Bedon292 · on March 17, 2023

The Boston Dynamics dog can open doors and things like that. It should be capable of performing all of the actions necessary to go get a beer. So I think it would be plausible to pull it all together, if you had enough money. It might take a bunch of setup first to program routes from room to room and things like that.

Might look something like this: determine current room with an image from the 360 cam, select path from current room to target room, tell it to execute that path. Then use another image from the 360 cam and find the fridge. Tell it to move closer to the fridge, open the fridge, and take an image from the arm camera of the fridge content. Use that to find a beer or seltzer, grab it, and then determine the route to use and return with the drink.

But, not so sure I would want to have it controlling 35+ kg of robot without an extreme amount of testing. And then there are things like: Go to the kitchen and get me a knife. Maybe not the best idea.

hackerlight · on March 17, 2023

The point is to avoid the need to "program routes" or "determine current room". The LLM is supposed to have the world-understanding that removes the need to manually specify what to do.

Bedon292 · on March 18, 2023

Determine current room is a step GPT-4 would take care of by looking at the surroundings. The one thing I wasn't sure it could do, was figure out the layout of the house and determine a route for that. And I would rather provide it with some routes than have it wander around the house for an hour. I didn't figure real time video is what it was going to be best at. But it can certainly say the robot is in the living room, it needs to go down the hall to the kitchen. And if the robot knows how to get there already, it just tells the robot to go. I am sure there is another model out there that could be slotted in, but as far as just the robot plus GPT-4 goes, it might not quite be there. Just guessing at how they could fit together right now.

og_kalu · on March 18, 2023

Indeed an LLM doesn't need to be told what routes or actions to take to do that as has been demonstrated by palm-e and chatgpt for robotics.

lachlan_gray · on March 17, 2023

I think we’re pretty much there. Like the other comment pointed out, palm-e is a glimpse of it. Eventually I think this kind of thing will work it’s way into autonomous cars and a lot of other mundane stuff (like roombas) as it becomes easier to do this kind of reasoning at the edge.

cjohnson318 · on March 17, 2023

I think that even when systems are extremely accurate, the mistakes that they make are very un-human. A human might forget something, or misunderstand, but those errors are relatable and understandable. Automated systems might have the same success rate as human, but the errors can be very counterintuitive, like a Tesla coming to a stop on a freeway in the middle of traffic. There's things that humans would almost never do in certain situations.

So yeah, I think that's the future, but I think the user experience will be wonky at times.

ChatGTP · on March 18, 2023

It's also the kind of wonky that's like, a big problem wonky.

"Plane taxis into fire truck" is especially not good wonky.