> part of your llm prompt (usually your most recent question?) gets fed as a que...

brucethemoose2 · on June 1, 2023

Yeah its completely seperate. The LLM just gets some extra text in the prompt, that is all. The text you want to insert is "encoded" into the database which is not particularly compute expensive. You can read about one such implementation here: https://github.com/chroma-core/chroma

CamperBob2 · on June 3, 2023

One thing I don't understand is how feeding the entire conversation back as a prefix for every prompt doesn't waste the entire 4K-token context almost immediately. I'd swear that a given ChatGPT window is stateful, somehow, just for that reason alone... but everything I've read suggests that it's not.

thomasahle · on June 2, 2023

Have you tried something like Memory Transformers https://arxiv.org/abs/2006.11527 where you move the k/v pairs that don't fit in the context window to a vector db? Seems like a more general approach, but I have tested then against each other.