The Boston Dynamics dog can open doors and things like that. It should be capable of performing all of the actions necessary to go get a beer. So I think it would be plausible to pull it all together, if you had enough money. It might take a bunch of setup first to program routes from room to room and things like that.
Might look something like this: determine current room with an image from the 360 cam, select path from current room to target room, tell it to execute that path. Then use another image from the 360 cam and find the fridge. Tell it to move closer to the fridge, open the fridge, and take an image from the arm camera of the fridge content. Use that to find a beer or seltzer, grab it, and then determine the route to use and return with the drink.
But, not so sure I would want to have it controlling 35+ kg of robot without an extreme amount of testing. And then there are things like: Go to the kitchen and get me a knife. Maybe not the best idea.
The point is to avoid the need to "program routes" or "determine current room". The LLM is supposed to have the world-understanding that removes the need to manually specify what to do.
Determine current room is a step GPT-4 would take care of by looking at the surroundings. The one thing I wasn't sure it could do, was figure out the layout of the house and determine a route for that. And I would rather provide it with some routes than have it wander around the house for an hour. I didn't figure real time video is what it was going to be best at. But it can certainly say the robot is in the living room, it needs to go down the hall to the kitchen. And if the robot knows how to get there already, it just tells the robot to go. I am sure there is another model out there that could be slotted in, but as far as just the robot plus GPT-4 goes, it might not quite be there. Just guessing at how they could fit together right now.
Might look something like this: determine current room with an image from the 360 cam, select path from current room to target room, tell it to execute that path. Then use another image from the 360 cam and find the fridge. Tell it to move closer to the fridge, open the fridge, and take an image from the arm camera of the fridge content. Use that to find a beer or seltzer, grab it, and then determine the route to use and return with the drink.
But, not so sure I would want to have it controlling 35+ kg of robot without an extreme amount of testing. And then there are things like: Go to the kitchen and get me a knife. Maybe not the best idea.