I'd still want to see the entire response all at once. Having it stream in while...

qwertox · 2024-03-14T10:27:28 1710412048

It's a request the front-end developer should be confronted with, not OpenAI.

The website could as well buffer the incoming stream until the used clicks an area to request the display of the next block of the response, once he has finished reading the initial sentences.

TowerTall · 2024-03-14T01:25:06 1710379506

yes, it like surfing porn in the early internet year using a dialup modem. One line a the time until you finally can see enough of the picture (reply) to realize that is was not the reply you were looking for.

LLM streaming must be a cost saving feature to prevent you from overloading the servers by asking to many questions with in a short time frame. Annoying feature IMHO

Kiro · 2024-03-14T07:39:35 1710401975

How is hiding it behind a loading spinner any better? You still can't spam it with questions since you need to wait for it to finish. With streaming you can at least hit the stop button if it looks incorrect, so you actually spam it more with it enabled.

silversmith · 2024-03-14T10:46:06 1710413166

For me, the constant visual changes of new parts being streamed in are annoying, and straining on the eyes. Ideally, web frontends would honor `prefers-reduced-motion` and buffer the response when set.

Prosammer · 2024-03-14T11:26:27 1710415587

Personally, I've fallen in love with that visual effect of streaming text you're talking about. It's a bit pavlovian, but I think in my head it signifies that I'm reading something high signal (even though it isn't always).

SoulAuctioneer · 2024-03-14T16:14:39 1710432879

It's more about UX, to reduce the perceived delay. LLMs inherently stream their responses, but if you wait until the LLM has finished inference, the user is sitting around twiddling their thumbs.