This feels ill advised and I don't believe that HTTP streaming was designed with this pattern in mind
Perhaps I'm wrong, but I believe HTTP streaming is for chunking large blobs. I worry that if you use this pattern and treat streaming like a pub/sub mechanism, you'll regret it. HTTP intermediaries don't expect this traffic pattern (e.g., NGINX, CloudFlare, etc.). And I suspect every time your WiFi connection drops while the stream is open, the fetch API will raise an error as if the request failed.
However, I agree you probably don't need WebSockets for many of the ways they're used—server-sent events are a simpler solution for many situations where people reach for WebSockets... It's a shame SSEs never received the same fanfare.
> I don't believe that HTTP streaming was designed with this pattern in mind
> server-sent events are a simpler solution
Fwiw Server-Sent Events are a protocol on top of HTTP Streaming.
In fact I'm somewhat surprised that the article doesn't mention it, instead rolling their own SSE alternative that looks (to my non-expert eyes) like a lower level version of the same thing. It seems a bit weird to me to use chunks as a package boundary, I'd worry that that has weird edge cases (eg won't large responses be split into multiple chunks?)
I pretty much always prefer SSE over websockets just because of the simplicity end-to-end. It's "just HTTP", so all the HTTP-based tech and tools apply out-of-the-box without any really special configuration that is required for WS. Curl (or even netcat) "just works", no special client. I don't have to do any special CDN configuration to proxy connections or terminate SSL aside from just turning off buffering.
Websockets requires almost a completely new L7 stack and tons of special configuration to handle Upgrade, text or data frames, etc. And once you're out of "HTTP mode" you now have to implement the primitive mechanics of basically everything yourself, like auth, redirects, sessions, etc.
It's why I originally made Tiny SSE which is a purpose-built SSE server written in Rust and programmable with Lua.
You could do a lobotomized WebSockets implementation that was an extremely thin layer on top of http, similarly to this.
In this way SSE and WebSockets are exactly the same. They are HTTP requests that you keep open. To firewalls and other network equipment both look the same. They look like long lived http requests, because that is what they are.
If you only care about events in one direction, it's a perfectly fine solution, but if you need something other than that, things might get awkward using SSE and regular HTTP calls, even with long-lived HTTP connections.
> once you're out of "HTTP mode" you now have to implement the primitive mechanics of basically everything yourself, like auth, redirects, sessions, etc.
WebSockets do support authentication via cookies or custom headers, don't they?
>If you only care about events in one direction, it's a perfectly fine solution
i feel like clients sending requests to servers is a pretty well-solved problem with regular http? i can't imagine how that could be the difficult part of the equation.
not if you need bidirectional communication, for example a ping-pong of request/response. That is solved with WS, but hard to do with SSE+requests. The client requests may not even hit the same SSE server depending on your setup. There are workarounds obviously, but it complicates things.
> WebSockets do support authentication via cookies or custom headers, don't they?
It will depend on how the websocket architecture is implemented. A lot of systems will terminate the HTTP connection at the CDN or API gateway and just forward the upgraded TCP socket to the backend without any of the HTTP semantics intact.
Sure. If you need http header / cookie based auth with websockets, then you need the full http request with all the headers intact. This is the common case or at least something for which it is pretty straight forward to architect for.
Authenticating a websocket is just as easy as authenticating a regular http request. Because it is exactly the same.
Interesting, do you have any examples for that? I haven't used WebSockets in such a context yet but was always curious how it would be exposed to the application servers.
Because of TCP, large chunks are always split into smaller chunks. It’s just that at the HTTP level we don’t know and don’t see it. UDP forces people into designing their own protocols if the data is a defined package of bytes. Having done some socket coding my impression is web sockets would be good for a high end browser based game, browser based simulations, or maybe a high end trading system. At that point the browser is just a shell/window. As others have pointed out, there are already plenty of alternatives for web applications.
The problem for things like video games and trading is that websockets only support TCP by default. Technologies like WebRTC allow for much faster updates.
I think websockets certainly have their uses. Mostly in systems where SSE isn't available quickly and easily, or when sending a bunch of quick communications one after another as there's no way to know if the browser will pipeline the requests automatically or if it'll set up a whole bunch of requests.
With the current AI/LLM wave SSE have received a lot of attention again, and most LLM chat frontends use them. At least from my perception as a result of this, support for SSEs in major HTTP server frameworks has improved a lot in the last few years.
It is a bit of a shame though, that in order to do most useful things with SSEs you have to resort to doing non-spec-compliant things (e.g. send initial payload with POST).
Arguably it’s also because of serverless architecture where SSE can be used more easily than WS or streaming. If you want any of that on Lambda and API Gateway, for example, and didn’t anticipate it right off the bat, you’re in for quite a bit of pain.
The issue I have with SSE and what is being proposed in this article (which is very similar), is the very long lived connection.
OpenAI uses SSE for callbacks. That works fine for chat and other "medium" duration interactions but when it comes to fine tuning (which can take a very long time), SSE always breaks and requires client side retries to get it to work.
So, why not instead use something like long polling + http streaming (a slight tweak on SSE). Here is the idea:
1) Make a standard GET call /api/v1/events (using standard auth, etc)
2) If anything is in the buffer / queue return it immediately
3) Stream any new events for up to 60s. Each event has a sequence id (similar to the article). Include keep alive messages at 10s intervals if there are no messages.
4) After 60s close the connection - gracefully ending the interaction on the client
5) Client makes another GET request using the last received sequence
What I like about this is it is very simple to understand (like SSE - it basically is SSE), has low latency, is just a standard GET with standard auth and works regardless of how load balancers, etc., are configured. Of course, there will be errors from time to time, but dealing with timeouts / errors will not be the norm.
My issue with eventsource is it doesn't use standard auth. Including the jwt in a query string is an odd step out requiring alternate middleware and feels like there is a high chance of leaking the token in logs, etc.
I'm curious though, what is your solution to this?
Secondly, not every client is a browser (my OpenAI / fine tune example is non-browser based).
Finally, I just don't like the idea of things failing all time with something working behind the scenes to resolve issues. I'd like errors / warnings in logs to mean something, personally.
>> I don't understand the advantages of recreating SSE yourself like this vs just using SSE
This is more of a strawman and don't plan to implement it. It is based on experiences consuming SSE endpoints as well as creating them.
> I'm curious though, what is your solution to this?
Cookies work fine, and are the usual way auth is handled in browsers.
> Secondly, not every client is a browser (my OpenAI / fine tune example is non-browser based).
That's fair. It still seems easier, to me, to save any browser-based clients some work (and avoid writing your own spec) by using existing technologies. In fact, what you described isn't even incompatible with SSE - all you have to do is have the server close the connection every 60 seconds on an otherwise normal SSE connection, and all of your points are covered except for the auth one (I've never actually seen bearer tokens used in a browser context, to be fair - you'd have to allow cookies like every other web app).
GP is talking about intermediary proxies, CDNs etc. that might be unhappy about long-running connections with responses trickling in bit by bit, not doubting that it works on the client side.
That said, I'd be surprised if proxy software or services like Cloudflare didn't have logic to automatically opt out of "CDN mode" and switch to something more transparent when they see "text/event-stream". It's not that uncommon, all things considered.
Perhaps I'm wrong, but I believe HTTP streaming is for chunking large blobs. I worry that if you use this pattern and treat streaming like a pub/sub mechanism, you'll regret it. HTTP intermediaries don't expect this traffic pattern (e.g., NGINX, CloudFlare, etc.). And I suspect every time your WiFi connection drops while the stream is open, the fetch API will raise an error as if the request failed.
However, I agree you probably don't need WebSockets for many of the ways they're used—server-sent events are a simpler solution for many situations where people reach for WebSockets... It's a shame SSEs never received the same fanfare.