Hacker News new | past | comments | ask | show | jobs | submit login

I’ve been thinking quite a bit about the recursive prompting.

The other day I considered feeding computer vision (with objects ID’d and spatial depth estimated) data into an robot embodied LLM repeatedly as input and asking what it should do next to achieve goal X

You could have the LLM express the next action to take based on a set of recognizable primitives (ex: MOVE FORWARD 1 STEP) Those primitive commands it spits out could be parsed by another program and converted to electromechanical instructions for the motors.

Seems a little terminator-es que for sure. After thinking about it I went to see if anyone was working on it and sure enough this seems close: https://palm-e.github.io/ though their implementation is probably more sophisticated than my naive musings




when I was experimenting with gpt I found that it's pretty bad at responding to numerical questions with numbers, but it does a pretty good job at generating mathematica code that then produces the right answer. I feel like some robust "glue" to improve the interface between such software packages may be a force multiplier.


Maybe your prompts are better, but so far I have found it fails at producing the right math code too regularly. For example, calculating an average of averages instead of a simple mean or producing code that doesn't run.


like the plugins it just released


Not just in a linear sequence, but it should have some concept of recursion -- starting with very high-level tasking and calling into more and more specific prompts, only returning the summary of low-level tasking.


GPT-4 can take image input directly but the API for it isn’t public yet




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: