> *(..) I really wonder how far next-token-prediction can go with planning. When...

> (..) I really wonder how far next-token-prediction can go with planning. When I think about planning I think about mapping out a possibility space, identifying trees of dependencies, assigning priorities, and then solving for some kind of weighted shortest path.

What you really do is more like: you've already formed a half-baked plan before the problem even fully registers, in one cognitive step of pulling the ready-made if somewhat fuzzy graph out of the depths of unconscious - and then you start doing all those things you mention, while trying hard not to be biased towards or anchored to that original plan-shaped blob of words, images and emotions.

I think creating that first plan ex nihilo and putting it into words is exactly what next-token-prediction can do, because that structure of options and dependencies and priorities and a path through them is something that's encoded wholesale in the latent space, and needs to be projected out of it. That's at least my understanding of what current transformer LLMs are good at - encoding every imaginable relationship between tokens and token sequences into proximity in a four- or five-digit dimensional space, and then projecting out of it on inference.

Going beyond that - beyond that first plan prototype, which may or may not be good, is getting increasingly explicit and thus increasingly hard. I think you can ultimately coax an LLM to do all the steps you listed and then some, by walking it through them like infinitely patient parent of a 3 year old - but not within the context window, so you need to split it into multiple invocations. This grows in cost superlinearly, as you'll have multiple results you need to summarize to feed into next stage, and if you're computing enough of those results then they won't fit into context window of the summarizer, so you need to add an intermediate level - and then eventually summaries may not fit in the context window of next step, so you need another intermediate level, etc. Context window is the limiting factor here.

And if you're to use LLM to emulate a mechanical/formalized process, then it's better to just extract structured information from the LLM, feed it to regular software optimized for purpose, and feed the results back. I.e. something similar to LLM+P approach - or similar to any time when you get LLM to "use tools". If it can call out to a calculator, it can just as much call out to some PERT tool to get your list of steps sorted and critical path highlighted.

On a tangent:

> identifying trees of dependencies

Are you sure those are trees? :). My pet peeve with all the hip software used to manage projects and track work in our industry, is that it treats work as decomposing into flat trees. In my experience, work decomposes into directed acyclic graphs (and in certain cases, it might be useful to allow cycles to make representation more compact). There's hardly a tool that can deal with it in work planning context, not unless you step over to "old-school" engineering (but those tools aren't cheap or well-known). Which is ironic, because at least when it comes to build systems, software devs recognize the DAG-like nature of dependencies.