> the LLM may not be learning anything, but I am Regardless of that, personally ...

didibus · 2025-09-28T17:15:22 1759079722

You want something that requires an engineering breakthrough.

Models don't have memory, and they don't have understanding or intelligence beyond what they learned in training.

You give them some text (as context), and they predict what should come after (as the answer).

They’re trained to predict over some context size, and what makes them good is that they learn to model relationships across that context in many dimensions. A word in the middle can affect the probability of a word at the end.

If you insanely scale the training and inference to handle massive contexts, which is currently far too expensive, you run into another problem: the model can’t reliably tell which parts of that huge context are relevant. Irrelevant or weakly related tokens dilute the signal and bias it in the wrong direction, the distribution flatten or just ends up in the wrong place.

That's why you have to make sure you give it relevant well attended context, aka, context engineering.

It won't be able to look at a 100kloc code base and figure out what's relevant to the problem at hand, and what is irrelevant. You have to do that part yourself.

Or what some people do, is you can try to automate that part a little as well by using another model to go research and build that context. That's where people say the research->plan->build loop. And it's best to keep to small tasks, otherwise the context needing for a big task will be too big.

badsectoracula · 2025-09-28T23:09:36 1759100976

> You want something that requires an engineering breakthrough.

Basically, yes. I know the way LLMs currently work wouldn't be able to provide what i want, but what i want is a different way that does :-P (perhaps not even using LLMs).

pixl97 · 2025-09-29T14:00:10 1759154410

What you want is actual AGI/ASI, which is a different can of worms and likely has a whole list of different existential problems that come with it.

badsectoracula · 2025-09-30T09:11:44 1759223504

No, an LLM not forgetting stuff discussed minutes ago wouldn't make it AGI.

epiccoleman · 2025-09-28T18:20:57 1759083657

I'm using a "memory" MCP server which basically just stores facts to a big json file and makes a search available. There's a directive in my system prompt that tells the LLM to store facts and search for them when it starts up.

It seems to work quite well and I'll often be pleasantly surprised when Claude retrieves some useful background I've stored, and seems to magically "know what I'm talking about".

Not perfect by any means and I think what you're describing is maybe a little more fundamental than bolting on a janky database to the model - but it does seem better than nothing.

zmmmmm · 2025-09-29T04:41:00 1759120860

I routinely ask the LLM to summarise the high level points as guidance and add them to the AGENTS.md / CONVENTIONS.md etc. It is limited due to context bloat but it's quite effective at getting it to persist important things that need to carry over between sessions.

badsectoracula · 2025-09-29T08:53:23 1759136003

Yeah, as i wrote this is a common workaround, but what i want is for it to remember everything, not just the important bits.

TBH i'm not even sure if that is possible with LLMs, especially in a way that does not rely on using the context.