I see people say this all the time and it sounds like a pretty cosmetic distinction. Like, you could wire up an LLM to a systemd service or cron job and then it wouldn’t be “waiting”, it could be constantly processing new inputs. And some of the more advanced models already have ways of compressing the older parts of their context window to achieve extremely long context lengths.