I’ve been thinking quite a bit about the recursive prompting.
The other day I considered feeding computer vision (with objects ID’d and spatial depth estimated) data into an robot embodied LLM repeatedly as input and asking what it should do next to achieve goal X
You could have the LLM express the next action to take based on a set of recognizable primitives (ex: MOVE FORWARD 1 STEP) Those primitive commands it spits out could be parsed by another program and converted to electromechanical instructions for the motors.
Seems a little terminator-es que for sure. After thinking about it I went to see if anyone was working on it and sure enough this seems close: https://palm-e.github.io/ though their implementation is probably more sophisticated than my naive musings
when I was experimenting with gpt I found that it's pretty bad at responding to numerical questions with numbers, but it does a pretty good job at generating mathematica code that then produces the right answer. I feel like some robust "glue" to improve the interface between such software packages may be a force multiplier.
Maybe your prompts are better, but so far I have found it fails at producing the right math code too regularly. For example, calculating an average of averages instead of a simple mean or producing code that doesn't run.
Not just in a linear sequence, but it should have some concept of recursion -- starting with very high-level tasking and calling into more and more specific prompts, only returning the summary of low-level tasking.
The other day I considered feeding computer vision (with objects ID’d and spatial depth estimated) data into an robot embodied LLM repeatedly as input and asking what it should do next to achieve goal X
You could have the LLM express the next action to take based on a set of recognizable primitives (ex: MOVE FORWARD 1 STEP) Those primitive commands it spits out could be parsed by another program and converted to electromechanical instructions for the motors.
Seems a little terminator-es que for sure. After thinking about it I went to see if anyone was working on it and sure enough this seems close: https://palm-e.github.io/ though their implementation is probably more sophisticated than my naive musings