Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Challenges in long-term planning and task decomposition: Planning over a lengthy history and effectively exploring the solution space remain challenging. LLMs struggle to adjust plans when faced with unexpected errors, making them less robust compared to humans who learn from trial and error.

While working on smol-developer I also eventually landed on the importance of planning as mentioned in my Agents writeup https://www.latent.space/p/agents . I feel some hesitation with suggesting this because it suggests I'm not deep-learning-pilled, but I really wonder how far next-token-prediction can go with planning. When I think about planning I think about mapping out a possibility space, identifying trees of dependencies, assigning priorities, and then solving for some kind of weighted shortest path. That's an awful lot of work to expect of a next token predictor (goodness knows its scaled far beyond what anyone thought - is there any limit to next token prediction?).

If there were one focus area for GPT-5, my money would be on a better architecture capable of planning.



I don't know how far "pure" next-token prediction can go with planning, although I wouldn't count them out until their performance starts noticeably plateauing. But the tree of thought architecture is a very similar concept to what you're discussing, you should definitely give it a read: https://arxiv.org/abs/2305.10601

It's not everything involved in traditional planning, but it may be a framework to use more traditional planning algorithms on LLM output.


Theyre already noticably plataeuing when it comes to coding, which similarly requires you to navigate tree structures and plan how to connect existing things together using shortest paths rather than inventing things out of thin air.


I would love to see the evidence for plateauing when it comes to coding! Specifically, I'd like to see that new larger models are not achieving performance increases relative to previous smaller models.


I think that in the end predicting words is non optimal , most things we want to do are things that are related to the internal representation of concepts that exist deeper in the layers.

I at least do not want to predict the next token, I want to predict the next concept in a chain of reasoning, but it seems that currently we are stuck at using the same representation for autoregression we use for training.

Maybe we can come up with a better way to construct these chains once we understand the models better.

Edit: Typo


The weird thing is that training on text encodes information on related concepts.

from u/gmt2027

>An extreme version of the same idea is the difference between understanding DNA vs the genome of every individual organism that has lived on earth. The species record encodes a ton of information about the laws of nature, the composition and history of our planet. You could deduce physical laws and constants from looking at this information, wars and natural disasters, economic performance, historical natural boundaries, the industrial revolution and a lot more.

and u/thomastjeffery

>That entropy is the secret sauce: the extra data that LLMs are sometimes able to model. We don't see it, because we read language, not text.


How would you represent or interpret the "next concept" if not with some kind of token, though?

Language is a communicated abstraction of concepts, and it would seem that internal representations of those concepts can emerge from something optimized for token prediction. Or at least: internal representations of the speaker to be predicted, including the knowledge they may possess.


Language is indeed communicated abstraction of concepts, but it emerged under a lot of constraints (our auditory system, our brains inherent bias towards visual stimuli etc. ). Predicting in this constrained system most likely is suboptimal.

Imagine translating language into an optimized representation free from human constraints, doing autoregressive prediction in this domain and only than translate back.

As far as I understand current models, this is not yet how they work.


A chain of LLMs can work in that regard, using intermediary prompts that feed answers to the next prompt. Make the LLM build a list of sections, then make it fill them with examples, then make it enrich the text. Maybe a last layer for error correction, clarity, removing mentions of "as an AI model", etc.


> When I think about planning I think about mapping out a possibility space, identifying trees of dependencies, assigning priorities, and then solving for some kind of weighted shortest path.

but that's two things: planning and execution of a plan. The plan is the dependency graph (!), assignment of priorities and shortest path is execution. This is very important in the context of agents, autonomous or not. If you want the agent to self-correct it has to understand that there can be multiple start points or multiple end points (or both) to backtrack and pivot. And as long as it is glorified "next token predictor" it cannot really do that.

Of course some tasks indeed are ifttt style linear-ish flows where next token prediction may prove to be adequate. However, if your agent is incapable of understanding non-linear flows, can it reasonably back off from one true way hen faced with such flow?


There is a fair amount of existing work related to planning in continuous and unbounded spaces under uncertainty. It seems likely some of the existing techniques combined with modern language models could be quite effective. A couple entry points:

Deep Reinforcement Learning with an Unbounded Action Space - https://arxiv.org/abs/1511.04636v3

Efficient Planning in a Compact Latent Action Space - https://arxiv.org/abs/2208.10291


I feel this is more of a limit from single scratch pad agents

Having a tasking agent refining prompts seems a much robust approach.

Bonus if it can decompose the required data properly. I'm working on a world database prompt with limited success, but implementing the retrieval with prompts is time consuming and I've not much time. The idea is along the line of "translate the user question in SQL, you have a database of all known facts with a table for each entity" and then you handle the retrieval of each table data step by step.

I think then you'd be able to use the postgres planner as planner, but while I've seen project of postgres retrieving data from random sources I don't remember the specifics.


We probably stay with the next token prediction task for training the foundation. It generalizes really well.

However, transformers are not able to loop. In a fixed amount of computing steps, they need to predict. For long-term planning, we probably want something more recursive/iterative (but that will be harder to train).

Also today we can compensate that with wrappers (eg. Langchain), but ultimately the machine will learn end to end.


> (..) I really wonder how far next-token-prediction can go with planning. When I think about planning I think about mapping out a possibility space, identifying trees of dependencies, assigning priorities, and then solving for some kind of weighted shortest path.

What you really do is more like: you've already formed a half-baked plan before the problem even fully registers, in one cognitive step of pulling the ready-made if somewhat fuzzy graph out of the depths of unconscious - and then you start doing all those things you mention, while trying hard not to be biased towards or anchored to that original plan-shaped blob of words, images and emotions.

I think creating that first plan ex nihilo and putting it into words is exactly what next-token-prediction can do, because that structure of options and dependencies and priorities and a path through them is something that's encoded wholesale in the latent space, and needs to be projected out of it. That's at least my understanding of what current transformer LLMs are good at - encoding every imaginable relationship between tokens and token sequences into proximity in a four- or five-digit dimensional space, and then projecting out of it on inference.

Going beyond that - beyond that first plan prototype, which may or may not be good, is getting increasingly explicit and thus increasingly hard. I think you can ultimately coax an LLM to do all the steps you listed and then some, by walking it through them like infinitely patient parent of a 3 year old - but not within the context window, so you need to split it into multiple invocations. This grows in cost superlinearly, as you'll have multiple results you need to summarize to feed into next stage, and if you're computing enough of those results then they won't fit into context window of the summarizer, so you need to add an intermediate level - and then eventually summaries may not fit in the context window of next step, so you need another intermediate level, etc. Context window is the limiting factor here.

And if you're to use LLM to emulate a mechanical/formalized process, then it's better to just extract structured information from the LLM, feed it to regular software optimized for purpose, and feed the results back. I.e. something similar to LLM+P approach - or similar to any time when you get LLM to "use tools". If it can call out to a calculator, it can just as much call out to some PERT tool to get your list of steps sorted and critical path highlighted.

On a tangent:

> identifying trees of dependencies

Are you sure those are trees? :). My pet peeve with all the hip software used to manage projects and track work in our industry, is that it treats work as decomposing into flat trees. In my experience, work decomposes into directed acyclic graphs (and in certain cases, it might be useful to allow cycles to make representation more compact). There's hardly a tool that can deal with it in work planning context, not unless you step over to "old-school" engineering (but those tools aren't cheap or well-known). Which is ironic, because at least when it comes to build systems, software devs recognize the DAG-like nature of dependencies.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: