> Since decoder-only transformer memory requirements scale with the square of sequence lengths, things would probably slow down significantly for very long sequences, which would be required for a back-and-forth conversation.
You can use tricks to keep the sequence length down even if the conversation goes on for a long time. For example, you can use the model to summarize the first n-1 lines of the conversation and append the last line to the summary as is.
I don't have any sources to refer to, but "text summarization" is one of the common NLP tasks that LLMs are often benchmarked on. All of these general-purpose LLMs will be able to do a decent job at text summarization (some, such as ChatGPT, will be able to do zero-shot summarizations at high quality, whereas others need to be fine tuned for the task). If your problem is that you are feeding a large amount of text to the model and that is slow/expensive, then summarization will obviously remediate that issue. After summarizing most of the input text you still need to feed in the latest input without summarization, so for example if the user asks a question, the LLM can then accurately answer that question. (If all of the input goes into summarization, that last question may not even appear in the summarization, so results will be crap.)
You can use tricks to keep the sequence length down even if the conversation goes on for a long time. For example, you can use the model to summarize the first n-1 lines of the conversation and append the last line to the summary as is.