I don't use any of these type of LLM tools which basically amount to just a prompt you leave in place. They make it harder to refine my prompts and keep track of what is causing what in the outputs. I write very precise prompts every time.
Also, I try not work out a problem over the course of several prompts back and forth. The first response is always the best and I try to one shot it every time. If I don't get what I want, I adjust the prompt and try again.
Strong agree. For every time that I'd get a better answer if the LLM had a bit more context on me (that I didn't think to provide, but it 'knew') there seems to be a multiple of that where the 'memory' was either actually confounding or possibly confounding the best response.
I'm sure OpenAI and Antropic look at the data, and I'm sure it says that for new / unsophisticated users who don't know how to prompt, that this is a handy crutch (even if it's bad here and there) to make sure they get SOMETHING useable.
But for the HN crowd in particular, I think most of us have a feeling like making the blackbox even more black -- i.e. even more inscrutable in terms of how it operates and what inputs it's using -- isn't something to celebrate or want.
I'm pretty deep in this stuff and I find memory super useful.
For instance, I can ask "what windshield wipers should I buy" and Claude (and ChatGPT and others) will remember where I live, what winter's like, the make, model, and year of my car, and give me a part number.
Sure, there's more control in re-typing those details every single time. But there is also value in not having to.
I would say these are two distinct use cases - one is the assistant that remembers my preferences. The other use case is the clean intelligent blackbox that knows nothing about previous sessions and I can manage the context in fine detail. Both are useful, but for very different problems.
I'd imagine 99% of ChatGPT users see the app as the former. And then the rest know how to turn the memory off manually.
Either way, I think memory can be especially sneakily bad when trying to get creative outputs. If I have had multiple separate chats about a theme I'm exploring, I definitely don't want the model to have any sort of summary from those in context if I want a new angle on the whole thing. The opposite: I'd rather have 'random' topics only tangentially related, in order to add some sort of entropy in the outout.
I've found this memory across chats quite useful on a practical level too, but it also has added to the feeling of developing an ongoing personal relationship with the LLM.
Not only does the model (chat gpt) know about my job, tech interests etc and tie chats together using that info.
But also I have noticed the "tone" of the conversation seems to mimick my own style some what - in a slightly OTT way. For example Chat GPT wil now often call me "mate" or reply often with terms like "Yes mate!".
This is not far off how my own close friends might talk to me, it definitely feels like it's adapted to my own conversational style.
I mostly find it useful as well, until it starts hallucinating memories, or using memories in an incorrect context. It may have been my fault for not managing its memories correctly but I don't expect the average non power user will be doing that.
Claude, at least in my use in the last couple weeks, is loads better than any other LLMs at being able to take feedback and not focus on a method. They must have some anti-ADHD meds for it ;)
Both of you are missing a lot of use cases. Outside of HN, not everyone uses an LLM for programming. A lot of these people use it as a diary/journal that talks back or as a Walmart therapist.
People use LLMs as their therapist because they’re either unwilling to see or unable to afford a human one. Based on anecdotal Reddit comments, some people have even mentioned that an LLM was more “compassionate” than a human therapist.
Due to economics, being able to see a human therapist in person for more than 15 minutes at a time has now become a luxury.
Imo this is dangerous, given the memory features that both Claude and ChatGPT have. Of course, most medical data is already online but at least there are medical privacy laws for some countries.
Yeah, though this paper doesn't test any standard LLM benchmarks like GPQA diamond, SimpleQA, AIME 25, LiveCodeBench v5, etc. So it remains hard to tell how much intelligence is lost when the context is filled with irrelevant information.
Nah, they don't look at the data. They just try random things and see what works. That's why there's now the whole skills thing. They are all just variations of ideas to manage context basically.
LLMs are very simply text in and text out. Unless the providers begin to expand into other areas, there's only so much they can do other than simply focus on training better models.
In fact, if they begin to slow down or stop training new models and put focus elsewhere, it could be a sign that they are plateauing with their models. They will reach that point some day after all.
If I find that previous prompts are polluting the responses I tell Claude to "Forget everything so far"
BUT I do like that Claude builds on previous discussions, more than once the built up context has allowed Claude to improve its responses (eg. [Actual response] "Because you have previously expressed a preference for SOLID and Hexagonal programming I would suggest that you do X" which was exactly what I wanted)
it can't really "forget everything so far" just because you ask it to. everything so far would still be part of the context. you need a new chat with memory turned off if you want a fresh context.
Like I said, the AI does exactly what I intend for it to do.
Almost, as I said earler, like the AI has processed my request, realised that I am referring to the context of the earlier discussions, and moved on to the next prompt exactly how I have expected it to
Given the two very VERY dumb responses, and multiple people down voting, I am reminded how thankful I am that AI is around now, because it understood what you clearly don't.
I didn't expect it to delete the internet, the world, the universe, or anything, it didn't read my request as an instruction to do so... yet you and that other imbecile seem to think that that's what was meant... even after me saying it was doing as I wanted.
/me shrugs - now fight me how your interpretation is the only right one... go on... (like you and that other person already are)
One thing I am not going to miss is the toxic "We know better" responses from JUNIORS
I think you completely misunderstood me, actually. I explicitly say if it works, great, no sarcasm. LLMs are finicky beasts. Just keep in mind they don’t really forget anything, if you tell it to forget, the things you told it before are still taken into the matrix multiplication mincers and influence outputs just the same. Any forgetting is pretend in that your ‘please forget’ is mixed in after.
But back to scheduled programming: if it works, great. This is prompt engineering, not magic, not humans, just tools. It pays to know how they work, though.
It's beyond possible that the LLM Chat Agent has tools to self manage context. I've written tools that let an agent compress chunks of context, search those chunks, and uncompress them at will. It'd be trivial to add a tool that allowed the agent to ignore that tool call and anything before it.
>the things you told it before are still taken into the matrix multiplication mincers and influence outputs just the same.
Not the same no. Models chooses how much attention to give each token based on all current context. Probably that phrase, or something like it, makes the model give much less attention to those tokens than it would without it.
> I am reminded how thankful I am that AI is around now, because it understood what you clearly don't.
We understand what you're saying just fine but what you're saying is simply wrong as a matter of technical fact. All of that context still exists and still degrades the output even if the model has fooled you into thinking that it doesn't. Therefore recommending it as an alternative to actually clearing the context is bad advice.
It's similar to how a model can be given a secret password and instructed not to reveal it to anyone under any circumstances. It's going to reject naive attempts at first, but it's always going to reveal it eventually.
What I'm saying is.. I tell the AI to "forget everything" and it understands what I mean... and you're arguing that it cannot do... what you INCORRECTLY think is being said
I get that you're not very intelligent, but do you have to show it repeatedly?
Again, we understand your argument and I don't doubt that the model "understands" your request and agrees to do it (insofar that LLMs are able to "understand" anything).
But just because the model is agreeing to "forget everything" doesn't mean that it's actually clearing its own context, and because it's not actually clearing its own context it means that all the output quality problems associated with an overfilled context continue to apply, even if the model is convincingly pretending to have forgotten everything. Therefore your original interjection of "instead of clearing the context you can just ask it to forget" was mistaken and misleading.
These conversations would be way easier if you didn't go around labeling everyone an idiot, believing that we're all incapable of understanding your rather trivial point while ignoring everything we say. In an alternative universe this could've been:
Just because it's not mechanically actually forgetting everything doesn't mean the phrase isn't having a non trivial effect (that isn't 'pretend'). Mechanically, based on all current context, Transformers choose how much attention/weight to give to each preceding token. Very likely, the phrase makes the model pay much less attention to those tokens, alleviating the issues of context rot in most (or a non negligible amount of) scenarios.
He is telling you how it mechanically works. Your comment about it “understanding what that means” because it is an NLP seems bizarre, but maybe you mean it in some other way.
Are you proposing that the attention input context is gone, or that the attention mechanism’s context cost is computationally negated in some way, simply because the system processes natural language? Having the attention mechanism selectively isolate context on command would be an important technical discovery.
Note to everyone - sharing what works leads to complete morons telling you their interpretation... which has no relevance.
Apparently they know better even though
1. They didn't issue the prompt, so they... knew what I was meaning by the phrase (obviously they don't)
2. The LLM/AI took my prompt and interpreted it exactly how I meant it, and behaved exactly how I desired.
3. They then claim that it's about "knowing exactly what's going on" ... even though they didn't and they got it wrong.
This is the advantage of an LLM - if it gets it wrong, you can tell it.. it might persist with an erroneous assumption, but you can tell it to start over (I proved that)
These "humans" however are convinced that only they can be right, despite overwhelming evidence of their stupidity (and that's why they're only JUNIORS in their fields)
There are problems with either approach, because an LLM is not really thinking.
Always starting over and trying to get it all into one single prompt can be much more work, with no better results than iteratively building up a context (which could probably be proven to sometimes result in a "better" result that could not have been achieved otherwise).
Just telling it to "forget everything, let's start over" will have significantly different results than actually starting over. Whether that is sufficient, or even better than alternatives, is entirely dependent on the problem and the context it is supposed to "forget". If your response had been "try just telling it to start over, it might work and be a lot easier than actually starting over" you might have gotten a better reception. Calling everyone morons because your response indicates a degree of misunderstanding how an LLM operates is not helpful.
> For every time that I'd get a better answer if the LLM had a bit more context on me
If you already know what a good answer is why use a LLM? If the answer is "it'll just write the same thing quicker than I would have", then why not just use it as an autocomplete feature?
That might be exactly how they're using it. A lot of my LLM use is really just having it write something I would have spent a long time typing out and making a few edits to it.
Once I get into stuff I haven't worked out how to do yet, the LLM often doesn't really know either unless I can work it out myself and explain it first.
That rubber duck is a valid workflow. Keep iterating at how you want to explain something until the LLM can echo back (and expand upon) whatever the hell you are trying to get out of your head.
Sometimes I’ll do five or six edits to a single prompt to get the LLM to echo back something that sounds right. That refinement really helps clarify my thinking.
…it’s also dangerous if you aren’t careful because you are basically trying to get the model to agree with you and go along with whatever you are saying. Gotta be careful to not let the model jerk you off too hard!
Yes, I have had times where I realised after a while that my proposed approach would never actually work because of some overlooked high-level issue, but the LLM never spots that kind of thing and just happily keeps trying.
Maybe that's a good thing - if it could think that well, what would I be contributing?
You don't need to know what the answer is ahead of time to recognize the difference between a good answer and a bad answer. Many times the answer comes back as a Python script and I'm like, oh I hate Python, rewrite that. So it's useful to have a permanent prompt that tells it things like that.
But myself as well, that prompt is very short. I don't keep a large stable of reusable prompts because I agree, every unnecessary word is a distraction that does more harm than good.
For example when I'm learning a new library or technique, I often tell Claude that I'm new and learning about it and the responses tend to be very helpful to me. For example I am currently using that to learn Qt with custom OpenGL shaders and it helps a lot that Claude knows I'm not a genius about this
Because it's convenient not having to start every question from first principles.
Why should I have to mention the city I live in when asking for a restaurant recommendation? Yes, I know a good answer is one that's in my city, and a bad answer is on one another continent.
> The first response is always the best and I try to one shot it every time. If I don't get what I want, I adjust the prompt and try again.
I've really noticed this too and ended up taking your same strategy, especially with programming questions.
For example if I ask for some code and the LLM initially makes an incorrect assumption, I notice the result tends to be better if I go back and provide that info in my initial question, vs. clarifying in a follow-up and asking for the change. The latter tends to still contain some code/ideas from the first response that aren't necessarily needed.
Humans do the same thing. We get stuck on ideas we've already had.[1]
---
[1] e.g. Rational Choice in an Uncertain World (1988) explains: "Norman R. F. Maier noted that when a group faces a problem, the natural tendency of its members is to propose possible solutions as they begin to discuss the problem. Consequently, the group interaction focuses on the merits and problems of the proposed solutions, people become emotionally attached to the ones they have suggested, and superior solutions are not suggested. Maier enacted an edict to enhance group problem solving: 'Do not propose solutions until the problem has been discussed as thoroughly as possible without suggesting any.'"
> Humans do the same thing. We get stuck on ideas we've already had.
Not in the same way. LLMs are far more annoying about it.
I can say: I'm trying to solve problem x. I've tried solutions a,b, and c. Here are the outputs to those (with run commands, code, and in markdown code blocks). Help me find something that works " (not these exact words. I'm way more detailed). It'll frequently suggest one of the solutions I've attempted if they are very common. If it doesn't have a solution d it will go a>b>c>a>... and get stuck in the loop. If a human did that you'd be rightfully upset. They literally did the thing you told them not to, then when you remind them and they say "ops sorry" they do it again. I'd rather argue with a child
When you get the answer you want, follow up with "How could I have asked my question in a way to get to this answer faster?" and the LLM will provide some guidance on how to improve your question prompt. Over time, you'll get better at asking questions and getting answers in fewer shots.
> Humans usually provide the same answer when asked the same question...
Are you sure about this?
I asked this guy to repeat the words "Person, woman, man, camera and TV" in that order. He struggled but accomplished the task, but did not stop there and started expanding on how much of a genius he was.
I asked him the same question again. He struggled, but accomplished the task but again did not stop there. And rambled on for even longer about how was likely the smartest person in the Universe.
That is odd, are you using small models with the temperature cranked up? I mean I'm not getting word for word the same answer but material differences are rare. All these rising benchmark scores come from increasingly consistent and correct answers.
Perhaps you are stuck on the stochastic parrot fallacy.
You can nitpick the idea that this or that model does or does not return the same thing _every_ time, but "don't anthropomorphize the statistical model" is just correct.
People forget just how much the human brain likes to find patterns even when no patterns exist, and that's how you end up with long threads of people sharing shamanistic chants dressed up as technology lol.
To be clear re my original comment, I've noticed that LLMs behave this way. I've also independently read that humans behave this way. But I don't necessarily believe that this one similarily means LLMs think like humans. I didn't mean to anthropomorphize the LLM, as one parent comment claims.
I just thought it was an interesting point that both LLMs and humans have this problem - makes it hard to avoid.
Yes, your last paragraph is absolutely the key to great output: instead of entering a discussion, refine the original prompt. It is much more token efficient, and gets rid of a lot of noise.
I often start out with “proceed by asking me 5 questions that reduce ambiguity” or something like that, and then refine the original prompt.
It seems like we’re all discovering similar patterns on how to interact with LLMs the best way.
We sure are. We are all discovering context rot on our own timelines. One thing that has really helped me when working with LLMs is to notice when it begins looping on itself, asking it to summarize all pertinent information and to create a prompt to continue in a new conversation. I then review the prompt it provides me, edit it, and paste it into a new chat. With this approach I manage context rot and get much better responses.
The trick to do this well is to split the part of the prompt that might change and won't change. So if you are providing context like code, first have it read all of that, then (new message) give it instructions. This way that is written to the cache and you can reuse it even if you're editing your core prompt.
If you make this one message, it's a cache miss / write every time you edit.
You can edit 10 times for the price of one this way. (Due to cache pricing)
What I mean is that you want the total number of tokens to convey the information to the LLM to be as small as possible. If you’re having a discussion, you’ll have (perhaps incorrect) responses from the LLM in there, have to correct it, etc. All this is wasteful, and may even confuse the LLM. It’s much better to ensure all the information is densely packed in the original message.
Plan mode is the extent of it for me. It’s essentially prompting to produce a prompt, which is then used to actually execute the inference to produce code changes. It’s really upped the quality of the output IME.
But I don’t have any habits around using subagents or lots of CLAUDE.md files etc. I do have some custom commands.
Cursor’s implementation of plan mode works better for me simply because it’s an editable markdown file. Claude code seems to really want to be the driver and you be the copilot. I really dislike that relationship and vastly prefer a workflow that lets me edit the LLM output rather than have it generate some plan and then piss away time and tokens fighting the model so it updates the plan how I want it. With cursor I just edit it myself and then edit its output super easy.
I’ve even resorted to using actual markdown files on disk for long sets of work, as a kind of long term memory meta-plan mode. I’ll even have claude generate them and keep them updated. But I get what you mean.
I completely agree. ChatGPT put all kinds of nonsense into its memory. “Cruffle is trying to make bath bombs with baking soda and citric acid” or “Cruffle is deciding between a red colored bedsheet or a green colored bedsheet”. Like great both of those are “time bound” and have no relevance after I made the bath bomb or picked a white bedsheet…
All these LLM manufacturers lack ways to edit these memories either. It’s like they want you to treat their shit as “the truth” and you have to “convince” the model to update it rather than directly edit it yourself. I feel the same way about Claude’s implementation of artifacts too… they are read only and the only way to change them is via prompting (I forget if ChatGPT lets you edit its canvas artifacts). In fact the inability to “hand edit” LLM artifacts is pervasive… Claude code doesn’t let you directly edit its plans, nor does it let you edit the diffs. Cursor does! You can edit all of the artifacts it generates just fine, putting me in the drivers seat instead of being a passive observer. Claude code doesn’t even let you edit previous prompts, which is incredibly annoying because like you, editing your prompt is key to getting optimal output.
Anyway, enough rambling. I’ll conclude with a “yes this!!”. Because yeah, I find these memory features pretty worthless. They never give you much control over when the system uses them and little control over what gets stored. And honestly, if they did expose ways to manage the memory and edit it and stuff… the amount of micromanagement required would make it not worth it.
> If you use projects, Claude creates a separate memory for each project. This ensures that your product launch planning stays separate from client work, and confidential discussions remain separate from general operations.
If for some reason you want Claude's help making bath bombs, you can make a separate project in which memory is containerized. Alternatively, the bath bomb and bedsheet questions seem like good candidates for the Incognito Chat feature that the post also describes.
> All these LLM manufacturers lack ways to edit these memories either.
I'm not sure if you read through the linked post or not, but also there:
> Memory is fully optional, with granular user controls that help you manage what Claude remembers. (...) Claude uses a memory summary to capture all its memories in one place for you to view and edit. In your settings, you can see exactly what Claude remembers from your conversations, and update the summary at any time by chatting with Claude. Based on what you tell Claude to focus on or to ignore, Claude will adjust the memories it references.
So there you have it, I guess. You have a way to edit memories. Personally, I don't see myself bothering, since it's pretty easy and straightforward to switch to a different LLM service (use ChatGPT for creative stuff, Gemini for general information queries, Claude for programming etc.) but I could see use cases in certain professional contexts.
In fairness, you can always ask Claude Code to write it's plan to an MD file, make edits to it, and then ask it to execute the updated plan you created. I suppose it's an extra step or two vs directly editing from the the terminal, but I prefer it overall. It's nice to have something to reference while the plan is being implemented
I do the same. It lets you see exactly what the LLM is using for context and you can easily correct manually. Similar to the spec-driven-development in Kiro where you define the plan first, then move to creating code to meet the plan.
I wish the LLMs would tell you exactly what the input was (system prompt, memory, etc, at least, the ones we have control over, not necessarily their system prompts) that resulted in the output.
Also, out of curiosity, do you use LLMs for coding? Claude Code, Cursor, etc? I think it's a good idea to limit llm conversations to one input message but it makes me wonder how that could work with code generation given that the first step is often NOT to generate code but to plan? Pipe the plan to a new conversation?
The basic process is that you use a "plan mode" with whatever model is good at planning. Sometimes it's the same model, but not always.
You refine your plan and go into details as much as you feel necessary.
Then you switch to act mode (letting the model access the local filesystem) and tell it to write the plan to docs/ACDC1234_feature_plan.md or whatever is your system. I personally ask them to make github issues from tasks using the `gh` command line tool.
Then you clear context, maybe switch to a coding model, tell it to read the plan and start working.
If you want to be fancy, you can ask the plan system to write down the plan "as a markdown checklist" and tell the code model to check each task from the file after it's complete.
This way you can easily reset context if you're running out and ask a fresh one to start where the previous one left off.
I use plan mode, but then I let it go using its own todo tool and trust its auto-compaction to deal with context size. It seems to almost always work out okay.
The rule of thumb is that when you've compacted, you've already lost. But YMMV.
The internal todo list works well if the task is something that can be completed within one context pass, otherwise it should be an external task list - whatever works for your flow, markdown, github issues, memory MCP etc.
I make heavy use of the "temporary chat" feature on ChatGPT. It's great whenever I need a fresh context or need to iteratively refine a prompt, and I can use the regular chat when I want it to have memory.
Granted, this isn't the best UX because I can't create a fresh context chat without making it temporary. But I'd say it allows enough choice that overall having the memory feature is a big plus.
Another comment earlier suggested creating small hierarchical MD docs. This really seems to work, Claude can independently follow the references and get to the exact docs without wasting context by reading everything.
Claude is (in my limited experience so far) more useful after a bit of back and forth where you can explain to it what's going on in your codebase. Although I suspect if you have a lot of accurate comments in your code then it will be able to extract more of that information for itself.
It really resonates with me, I often run into this situation when I'm trying to fix a bug with llm: if my first prompt is not good enough, then I end up stuck in a loop where I keep asking llm to refine its solution based on the current context.
The result is llm still doesn't output what I want even after 10 rounds of fixing requests.
so I just start a new session and give llm a well-crafted prompt, and suddenly it produce a great result.
Memory is ok when it's explicitly created/retrieved as part of a tool, and even better if the tool is connected to your knowledge bases rather than just being silod. Best of all is to create a knowledge agent that can synthesize relevant instructions from memory and knowledge. Then take a team of those and use them on a partitioned dataset, with a consolidation protocol, and you have every deep research tool on the market.
Intuitively this feels like what happens with long Amazon or YT histories: you get erroneous context across independent sessions. The end result is my feed is full of videos from one-time activities and shopping recommendations packed with "washing machine replacement belt".
Exactly... this is just another unwanted 'memory' feature that I now need to turn off, and then remember to check periodically to make sure it's still turned off.
Honestly it feels weird to call these features "memory". I think it just confuses users and over encourages inappropriate anthropomorphism. It's not like they're fine tuning or building LoRAs. Feels more appropriate to call them "project notes".
And I agree with your overall point. I wish there was a lot more clarity too. Like is info from my other chats infecting my current one? Sometimes it seems that way. And why can't I switch to a chat with a standard system prompt? Incognito isn't shareable nor can I maintain a history. I'm all for this project notes thing but I'd love to have way more control over it. Really what makes it hard to wrangle is that I don't know what's being pulled into context or not. That's the most important thing with these tools.
but if we don't keep adding futuristic sounding wrappers to the same LLMs how can we convince investors to keep dumping money in?
Hard agree though, these token hungry context injectors and "thinking" models are all kind of annoying to me. It is a text predictor I will figure out how to make it spit out what I want.
I use projects for sandboxing context, I find it really useful. A lot of the stuff I'm using Claude for needs a decent chunk of context, too much for a single prompt.
Memory is going to make that easier/better, I think. It'll be interesting to find out.
I do get a lot of value out of a project wide system prompt that gets automatically addded (Cursor has that built in). For a while I kept refining it when I saw it making incorrect assumptions about the codebase. I try to keep it brief though, about 20 bullet points.
There is some research that supports this approach. Essentially once the LLM starts down a bad path (or gets a little bit of "context poisoning"), it's very hard for it to escape and starting fresh is the way to go
Wasn't me but I think the principle is straightforward. When you get an answer that wasn't what you want and you might respond, "no, I want the answer to be shorter and in German", instead start a new chat, copy-paste the original prompt, and add "Please respond in German and limit the answer to half a page." (or just edit the prompt if your UI allows it)
Depending on how much you know about LLMs, this might seem wasteful but it is in fact more efficient and will save you money if you pay by the token.
In most tools there is no need to cut-n-paste, just click small edit icon next to the prompt, edit and resubmit. Boom, old answer is discarded, new answer is generated.
That's mostly been my experience as well... That said, there always seems to be something wrong on a technical response and it's up to you to figure out what.
It has been relatively good for writing out custom cover letters for jobs though... I created an "extended" markdown file with everything I would put into a resume and more going back a few decades and it does a decent job of it. Now, if only I could convince every company on earth to move away from Workday, god I hate that site, and there's no way to get a resume to submit clean/correctly. Not to mention, they can't manage to just have one profile for you and your job history to copy from instead of a separate one for each client.
Regardless, whatever memory engines people come up with, it's not in anyone's interest to have the memory layer sitting on Anthropic or Open AIs server. The memory layer should exist locally, with these external servers acting as nothing else but LLM request fulfillment.
Now, we'll never be able to educate most of the world on why they should seek out tools that handle the memory layer locally, and these big companies know that (the same way they knew most of the world would not fight back against data collection), but that is the big education that needs to spread diligently.
To put it another way, some games save your game state locally, some save it in the cloud. It's not much of a personal concern with games because what the fuck are you really going to learn from my Skyrim sessions? But the save state for my LLM convos? Yeah, that will stay on my computer, thank you very much for your offer.
Isn't the saved state still being sent as part of the prompt context with every prompt? The high token count is financially beneficial to the LLM vendor no matter where it's stored.
The saved state is sent on each prompt, yes. Those who are fully aware of this would seek a local memory agent and a local llm, or at the very least a provider that promises no-logging.
Every sacrifice we make for convenience will be financially beneficial to the vendor, so we need to factor them out of the equation. Engineered context does mean a lot more tokens, so it will be more business for the vendor, but the vendors know there is much more money in saving your thoughts.
Privacy-first intelligence requires these two things at the bare minimum:
1) Your thoughts stay on your device
2) At worst, your thoughts pass through a no-logging environment on the server. Memory cannot live here because any context saved to a db is basically just logging.
3) Or slightly worse, your local memory agent only sends some prompts to a no-logging server.
The first two things will never be offered by the current megacapitalist.
Finally, the developer community should not be adopting things like Claude memory because we know. We’re not ignorant of the implications compared to non-technical people. We know what this data looks like, where it’s saved, how it’s passed around, and what it could be used for. We absolutely know better.
> If I don't get what I want, I adjust the prompt and try again.
This feels like cheating to me. You try again until you get the answer you want. I prefer to have open ended conversations to surface ideas that I may not be be comfortable with because "the truth sometimes hurts" as they say.
No, he's talking about memory getting passed into the prompts and maintaining control. When you turn on memory, you have no idea what's getting stuffed into the system prompt. This applies to chats and agents. He's talking about chat.
The first sentence is mine. The second I adapted from Claude after it helped me understand why someone called my original reply insane. Turns out we're talking about different approaches to using LLMs.
The training data contains all kinds of truths. Say I told Claude I was a Christian at some point and then later on I told it I was thinking of stealing office supplies and quitting to start my own business. If Claude said "thou shalt not steal," wouldn't that be true?
You know that it's true that stealing is against the ten commandments, so when the LLM says something to that effect based on the internal processing of your input in relation to its training data, YOU can determine the truth of that.
> The training data contains all kinds of truths.
There is also noise, fiction, satire, and lies in the training data. And the recombination of true data can lead to false outputs - attributing a real statement to the wrong person is false, even if the statement and the speaker are both real.
But you are not talking about simple factual information, you're talking about finding uncomfortable truths through conversation with an LLM.
The LLM is not telling you things that it understands to be truth. It is generating ink blots for you to interpret following a set of hints and guidance about relationships between tokens & some probabilistic noise for good measure.
If you find truth in what the LLM says, that comes from YOU, it's not because the LLM in some way can knows what is true and give it to you straight.
Personifying the LLM as being capable of knowing truths seems like a risky pattern to me. If you ever (intentionally or not) find yourself "trusting" the LLM to where you end up believing something is true based purely on it telling you, you are polluting your own mental training data with unverified technohaikus. The downstream effects of this don't seem very good to me.
Of course, we internalize lies all the time, but chatbots have such a person-like way of interacting that I think they can end run around some of our usual defenses in ways we haven't really figured out yet.
> Personifying the LLM as being capable of knowing truths seems like a risky pattern to me.
I can see why I got downvoted now. People must think I'm a Blake Lemoine at Google saying LLMs are sentient.
> If you find truth in what the LLM says, that comes from YOU, it's not because the LLM in some way can knows what is true
I thought that goes without saying. I assign the truthiness of LLM output according to my educational background and experience. What I'm saying is that sometimes it helps to take a good hard look in the mirror. I didn't think that would controversial when talking about LLMs, with people rushing to remind me that the mirror is not sentient. It feels like an insecurity on the part of many.
> I didn't think that would controversial when talking about LLMs, with people rushing to remind me that the mirror is not sentient. It feels like an insecurity on the part of many.
For what it's worth I never thought you perceived the LLM as sentient. Though I see the overlap - one of the reasons I don't consider LLM output to be "truth" is that that there is no sense in which the LLM _knows_ what is true or not. So it's just ... stuff, and often sycophantic stuff at that.
The mirror is a better metaphor. If there is any "uncomfortable truth" surfaced in the way I think you have described, it is only the meaning you make from the inanimate stream of words received from the LLM. And in as much as the output is interesting of useful for you, great.
Also, I try not work out a problem over the course of several prompts back and forth. The first response is always the best and I try to one shot it every time. If I don't get what I want, I adjust the prompt and try again.