yes, it like surfing porn in the early internet year using a dialup modem. One line a the time until you finally can see enough of the picture (reply) to realize that is was not the reply you were looking for.
LLM streaming must be a cost saving feature to prevent you from overloading the servers by asking to many questions with in a short time frame. Annoying feature IMHO
How is hiding it behind a loading spinner any better? You still can't spam it with questions since you need to wait for it to finish. With streaming you can at least hit the stop button if it looks incorrect, so you actually spam it more with it enabled.
For me, the constant visual changes of new parts being streamed in are annoying, and straining on the eyes. Ideally, web frontends would honor `prefers-reduced-motion` and buffer the response when set.
Personally, I've fallen in love with that visual effect of streaming text you're talking about. It's a bit pavlovian, but I think in my head it signifies that I'm reading something high signal (even though it isn't always).
It's more about UX, to reduce the perceived delay. LLMs inherently stream their responses, but if you wait until the LLM has finished inference, the user is sitting around twiddling their thumbs.
LLM streaming must be a cost saving feature to prevent you from overloading the servers by asking to many questions with in a short time frame. Annoying feature IMHO