Hacker News new | past | comments | ask | show | jobs | submit login
You might not need WebSockets (hntrl.io)
405 points by hntrl 2 days ago | hide | past | favorite | 263 comments





This feels ill advised and I don't believe that HTTP streaming was designed with this pattern in mind

Perhaps I'm wrong, but I believe HTTP streaming is for chunking large blobs. I worry that if you use this pattern and treat streaming like a pub/sub mechanism, you'll regret it. HTTP intermediaries don't expect this traffic pattern (e.g., NGINX, CloudFlare, etc.). And I suspect every time your WiFi connection drops while the stream is open, the fetch API will raise an error as if the request failed.

However, I agree you probably don't need WebSockets for many of the ways they're used—server-sent events are a simpler solution for many situations where people reach for WebSockets... It's a shame SSEs never received the same fanfare.


> I don't believe that HTTP streaming was designed with this pattern in mind

> server-sent events are a simpler solution

Fwiw Server-Sent Events are a protocol on top of HTTP Streaming.

In fact I'm somewhat surprised that the article doesn't mention it, instead rolling their own SSE alternative that looks (to my non-expert eyes) like a lower level version of the same thing. It seems a bit weird to me to use chunks as a package boundary, I'd worry that that has weird edge cases (eg won't large responses be split into multiple chunks?)


I pretty much always prefer SSE over websockets just because of the simplicity end-to-end. It's "just HTTP", so all the HTTP-based tech and tools apply out-of-the-box without any really special configuration that is required for WS. Curl (or even netcat) "just works", no special client. I don't have to do any special CDN configuration to proxy connections or terminate SSL aside from just turning off buffering.

Websockets requires almost a completely new L7 stack and tons of special configuration to handle Upgrade, text or data frames, etc. And once you're out of "HTTP mode" you now have to implement the primitive mechanics of basically everything yourself, like auth, redirects, sessions, etc.

It's why I originally made Tiny SSE which is a purpose-built SSE server written in Rust and programmable with Lua.

https://tinysse.com

https://github.com/benwilber/tinysse


Everything 'just works', yet you needed to create your own server for it which needs scripting support?

IMO 'just works' means Apache suupports it out of the box with a simple config file and you can just start sending messages to client IPs.


"just works" in the sense that this is a complete SSE client application:

    while true; do
        curl example.com/sse | handle-messages.sh
    done


Because it's just text-over-http. This isn't possible with websockets without some kind of custom client and layer 7 protocol stack.

You could do a lobotomized WebSockets implementation that was an extremely thin layer on top of http, similarly to this.

In this way SSE and WebSockets are exactly the same. They are HTTP requests that you keep open. To firewalls and other network equipment both look the same. They look like long lived http requests, because that is what they are.


It’s functional, but I wouldn’t say it’s complete without Last-Event-Id handling.

If you only care about events in one direction, it's a perfectly fine solution, but if you need something other than that, things might get awkward using SSE and regular HTTP calls, even with long-lived HTTP connections.

> once you're out of "HTTP mode" you now have to implement the primitive mechanics of basically everything yourself, like auth, redirects, sessions, etc.

WebSockets do support authentication via cookies or custom headers, don't they?


>If you only care about events in one direction, it's a perfectly fine solution

i feel like clients sending requests to servers is a pretty well-solved problem with regular http? i can't imagine how that could be the difficult part of the equation.


not if you need bidirectional communication, for example a ping-pong of request/response. That is solved with WS, but hard to do with SSE+requests. The client requests may not even hit the same SSE server depending on your setup. There are workarounds obviously, but it complicates things.

> WebSockets do support authentication via cookies or custom headers, don't they?

It will depend on how the websocket architecture is implemented. A lot of systems will terminate the HTTP connection at the CDN or API gateway and just forward the upgraded TCP socket to the backend without any of the HTTP semantics intact.


Sure. If you need http header / cookie based auth with websockets, then you need the full http request with all the headers intact. This is the common case or at least something for which it is pretty straight forward to architect for.

Authenticating a websocket is just as easy as authenticating a regular http request. Because it is exactly the same.


Interesting, do you have any examples for that? I haven't used WebSockets in such a context yet but was always curious how it would be exposed to the application servers.

Because of TCP, large chunks are always split into smaller chunks. It’s just that at the HTTP level we don’t know and don’t see it. UDP forces people into designing their own protocols if the data is a defined package of bytes. Having done some socket coding my impression is web sockets would be good for a high end browser based game, browser based simulations, or maybe a high end trading system. At that point the browser is just a shell/window. As others have pointed out, there are already plenty of alternatives for web applications.

The problem for things like video games and trading is that websockets only support TCP by default. Technologies like WebRTC allow for much faster updates.

I think websockets certainly have their uses. Mostly in systems where SSE isn't available quickly and easily, or when sending a bunch of quick communications one after another as there's no way to know if the browser will pipeline the requests automatically or if it'll set up a whole bunch of requests.


My problem with SSE is that it has a very low connection limit of 6 per domain across the entire browser session.

You just use HTTP/2. It's a solved problem.

That's an HTTP 1.1 problem, not SSE. Websockets has the same restriction.

With the current AI/LLM wave SSE have received a lot of attention again, and most LLM chat frontends use them. At least from my perception as a result of this, support for SSEs in major HTTP server frameworks has improved a lot in the last few years.

It is a bit of a shame though, that in order to do most useful things with SSEs you have to resort to doing non-spec-compliant things (e.g. send initial payload with POST).


Same with graphql subscriptions.

Arguably it’s also because of serverless architecture where SSE can be used more easily than WS or streaming. If you want any of that on Lambda and API Gateway, for example, and didn’t anticipate it right off the bat, you’re in for quite a bit of pain.


SSE limitations in the browser are still a drag for this, too.

Also MCP uses it

> Perhaps I'm wrong, but I believe HTTP streaming is for chunking large blobs.

You are wrong in the case of Chrome and Firefox. I have tried it and streaming e.g. unordered list elements are displayed instantly.

But for Safari, "text/html" streaming happens in 512 byte chunks[1].

[1] https://bugs.webkit.org/show_bug.cgi?id=265386


GP is talking about intermediary proxies, CDNs etc. that might be unhappy about long-running connections with responses trickling in bit by bit, not doubting that it works on the client side.

That said, I'd be surprised if proxy software or services like Cloudflare didn't have logic to automatically opt out of "CDN mode" and switch to something more transparent when they see "text/event-stream". It's not that uncommon, all things considered.


The issue I have with SSE and what is being proposed in this article (which is very similar), is the very long lived connection.

OpenAI uses SSE for callbacks. That works fine for chat and other "medium" duration interactions but when it comes to fine tuning (which can take a very long time), SSE always breaks and requires client side retries to get it to work.

So, why not instead use something like long polling + http streaming (a slight tweak on SSE). Here is the idea:

1) Make a standard GET call /api/v1/events (using standard auth, etc)

2) If anything is in the buffer / queue return it immediately

3) Stream any new events for up to 60s. Each event has a sequence id (similar to the article). Include keep alive messages at 10s intervals if there are no messages.

4) After 60s close the connection - gracefully ending the interaction on the client

5) Client makes another GET request using the last received sequence

What I like about this is it is very simple to understand (like SSE - it basically is SSE), has low latency, is just a standard GET with standard auth and works regardless of how load balancers, etc., are configured. Of course, there will be errors from time to time, but dealing with timeouts / errors will not be the norm.


I don't understand the advantages of recreating SSE yourself like this vs just using SSE.

> SSE always breaks and requires client side retries to get it to work

Yeah, but these are automatic (the browser handles it). SSE is really easy to get started with.


My issue with eventsource is it doesn't use standard auth. Including the jwt in a query string is an odd step out requiring alternate middleware and feels like there is a high chance of leaking the token in logs, etc.

I'm curious though, what is your solution to this?

Secondly, not every client is a browser (my OpenAI / fine tune example is non-browser based).

Finally, I just don't like the idea of things failing all time with something working behind the scenes to resolve issues. I'd like errors / warnings in logs to mean something, personally.

>> I don't understand the advantages of recreating SSE yourself like this vs just using SSE

This is more of a strawman and don't plan to implement it. It is based on experiences consuming SSE endpoints as well as creating them.


> I'm curious though, what is your solution to this?

Cookies work fine, and are the usual way auth is handled in browsers.

> Secondly, not every client is a browser (my OpenAI / fine tune example is non-browser based).

That's fair. It still seems easier, to me, to save any browser-based clients some work (and avoid writing your own spec) by using existing technologies. In fact, what you described isn't even incompatible with SSE - all you have to do is have the server close the connection every 60 seconds on an otherwise normal SSE connection, and all of your points are covered except for the auth one (I've never actually seen bearer tokens used in a browser context, to be fair - you'd have to allow cookies like every other web app).


> it doesn't use standard auth

I'm not sure what this means because it supports the withCredentials option to send auth headers if allowed by CORS


I mean Bearer / JWT

SSE can be implemented over HTTP GET; there is no difference in handling of JWT tokens in headers.

It's a minor point in the article, but sending a RequestID to the server so that you get request/response cycles isn't weird nor beyond the pale.

It's pretty much always worth it to have an API like `send(message).then(res => ...)` in a serious app.

But I agree. The upgrade request is confusing, and it's annoying how your websocket server is this embedded thing running inside your http server that never integrates cleanly.

Like instead of just reusing your middleware that reads headers['authorization'] from the websocket request, you access this weird `connectionParams` object that you pretend are request headers, heh.

But the idiosyncrasies aren't that big of a deal (ok, I've just gotten used to them). And the websocket browser API is nicer to work with than, say, EventSource.


It's a good well worn tactic. You list in very high detail every single step of any process you don't like. It makes that process seem overly complex, then you can present your alternative and it sounds way simpler.

For example, making a sandwich: You have to retrieve exactly two slices of bread after finding the loaf in the fridge. Apply butter uniformly after finding the appropriate knife, be sure to apply about a 2.1mm level of coating. After all of that you will still need to ensure you've calibrated the toaster!"


On the other hand, we're doing the worse tactic of getting held up on the first tiny subheader instead of focusing on the rest of a decent article.

Also, their alternative is just a library. It's not like they're selling a SaaS, so we shouldn't be mean spirited.


> ...we shouldn't be mean spirited.

Am I on the right website? checks URL

People find anything to be mean about on here.


But it is frowned upon.

The loaf shouldn't be in the fridge, and 2.1mm is way too much butter, especially if applied before putting the bread in the toaster

Too much butter? You're not living if thats too much butter!

Sandwich code review is what HN is for.

I think we need a function that returns the correct butter height given the dimensions of the input bread. We may also need an object containing different kinds of bread and the ideal amount of butter for each depending on the absorbtion characteristics of the bread, etc. The user's preference for butter might also need to be another parameter.

sanwy.ch is the name of the YC25 startup tackling AI sandwich tech.

Pretty much. In this case, WebSockets is simpler to implement than HTTP2; it's closer to raw TCP, you just send and receive raw packets... It's objectively simpler, more efficient and more flexible.

It's a tough sell to convince me that a protocol which was designed primarily for resource transfer via a strict, stateless request-response mode of interaction, with server push tacked on top as an afterthought is simpler than something which was built from the ground up to be bidirectional.


I fixed a few bugs in a WebSocket client and was blown away by the things they do to trick old proxies into not screwing it all up.

I would be interested in those tricks

A big one is 'masking' all client requests that a proxy can't effectively cache the response since the request always changes.

The RFC explains it: https://datatracker.ietf.org/doc/html/rfc6455#section-5.3


Aren't websockets the only way to some sort of actual multi-core and threaded code in JavaScript, or is it still subject to the single background thread limitation and it just runs like node does?


You butter bread before it’s toasted? My mind is honestly blown (as I move to kitchen to try this).

Absolutely. The author conveniently leaves out the benefit that websockets enable ditching the frontend js code--included is the library the author is plugging. The backend shouldn't send back an error message to the frontend for rendering, but, instead, a rendered view.

This is how I used to do it over TCP, 20 years ago: each request message has a unique request ID which the server echoes and the client uses to match against a pending request. There is a periodic timer that checks if requests have been pending for longer than a timeout period and fails them with an error bubbled up to the application layer. We even had an incrementing sequence number in each message so that the message stream can resume after a reconnect. This was all done in C++e, and didn't require a large amount of code to implement. I was 25 years old at the time.

What the author and similar web developers consider complex, awkward or difficult gives me pause. The best case scenario is that we've democratized programming to a point where it is no longer limited to people with highly algorithmic/stateful brains. Which would be a good thing. The worst case scenario is that the software engineering discipline has lost something in terms of rigor.


Every web browser already has a built in system for matching requests and responses and checking if requests have been pending too long. There is no need to reinvent the wheel.

The real problem with the software engineering discipline is that we are too easily distracted from solving the actual business problem by pointless architecture astronautics. At best because of boredom associated with most business problems being uninteresting, at worst to maliciously increase billable hours.


> The real problem with the software engineering discipline is that we are too easily distracted from solving the actual business problem by pointless architecture astronautics.

There are two pervasive themes in software engineering:

- those who do not understand the problem domain complaining that systems are too complex.

- those who understand the problem domain arguing that the system needs to be refactored to shed crude unmaintainable hacks and further support requirements it doesn't support elegantly.

Your comment is in step 1.


Yes, but we’re just talking about making websites here. Rolling your own HTTP is overkill, you’re not Google.

There is a huge difference between guaranteeing algorithmic security of an endpoint, e.g. getting authentication correct, and anticipating every security issue that often has nothing to do with developer code. The former is possible, the latter is not. I understand the author here not wishing to deal with the websocket upgrade process - I would be surprised if there aren’t zero-days lurking there somewhere.

I am beginning to see this increasingly. Apps that make the most basic of mistakes. Some new framework trying to fix something that was already fixed by the previous 3 frameworks. UX designs making no sense or giving errors that used to be solved. From small outfits (that’s fair) to multi billion dollar companies (you should know better) , I feel that rigor is definitely lacking.

A framework was recently posted here where the author was comparing how great their Rust-to-WASM client side state management could handle tens of thousands of records which would cause the JS version of their code to stack overflow...

...and yes, the stack overflow in the JS version was trivially fixable and then the JS version worked pretty well.


That’s basically RPC over WS.

This article conflates a lot of different topics. If your WebSocket connection can be easily replaced with SSE+POST requests, then yeah you don’t need WebSockets. That doesn’t mean there aren’t a ton of very valid use cases (games, anything with real time two-way interactivity).


> games, anything with real time two-way interactivity

No need for WebSockets there as well. Check out WebTransport.


It even has mention as being the spiritual successor to WebSocket for certain cases in mdn docs:

https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_...


"if your application requires a non-standard custom solution, then you should use the WebTransport API"

That's a pretty convincing use-case. Why use something standard if it can be non-standard custom instead!


Your projects require holistic and craft solutions. Simple, working ways are the wrong path!

WebTransport is great but it's not in safari yet.

> No need for WebSockets there as well. Check out WebTransport.

Isn't WebTransport basically WebSockets reimplemented in HTTP/3? What point where you trying to make?


> Isn't WebTransport basically WebSockets reimplemented in HTTP/3?

No.


> No.

Thanks for your insight.

It seems you need to urgently reach out to the people working on WebTransport. You seem to know better and their documentation contradicts and refutes your assertion.

https://github.com/w3c/webtransport/blob/main/explainer.md


Where does that document say that WebTransport is just WebSockets over HTTP/3? The only thing in common is that both features provide reliable bi-directional streams, but WebTransport also supports unreliable streams and a bunch of other things. Please read the docs. There is also RFC 9220 Bootstrapping WebSockets with HTTP/3, which is literally WebSockets over HTTP/3.

> (...) but WebTransport also supports unreliable streams and a bunch of other things.

If you take some time to learn about WebTransport, you will eventually notice that if you remove HTTP/3 from if you remove each and every single feature that WebTransport touts as changes/improvements over WebSockets.


He said "basically" which should be interpreted as "roughly"? Then it seems his assert is roughly correct?

Maybe? Isn't WebSockets basically TCP? Roughly? I wrote that WebSockets provide reliable bi-directional streams, but it actually doesn't. It implements message framing. WebTransport also doesn't support "unreliable streams", it's actually called "datagrams". WebTransport doesn't even have to be used over HTTP/3 per the latest spec, so is it basically WebSockets reimplemented in HTTP/3? No.

Last time I've checked none of the common reverse proxy servers (most importantly nginx) supported WebTransport.

> sending a RequestID to the server so that you get request/response cycles isn't weird nor beyond the pale.

To me the sticking point is what if the "response" message never comes? There's nothing in the websocket protocol that dictates that messages need to be acknowledged. With request/response the client knows how to handle that case natively

> And the websocket browser API is nicer to work with than, say, EventSource.

What in particular would you say?


Yeah, you'd need a lib or roll your own that races the response against a timeout.

Kind of like how you also need to implement app-layer ping/pong over websockets for keepalive even though tcp already sends its own ping/pong. -_-

As for EventSource, I don't remember exactly, something always comes up. That said, you could say the same for websockets since even implementing non-buggy reconn/backaway logic is annoying.

I'll admit, time for me to try the thing you pitch in the article.


I have only small experience programming with web sockets, but I thought the ping pong mechanism is already built into the protocol. Does it have timeout? Does it help at the application layer?

Ref: https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_...


You only need to implement it yourself if you’ve catastrophically fucked up the concurrency model on the client or sever side and they can’t respond out of band of whatever you’re waiting on.

Discord implements its own heartbeat mechanism. I've heard websocket-native ping is somehow unreliable. Maybe in case the websocket connection is fine but something happened at the application layer?

"Unreliable" is a bit harsh - the problem arises imho not from the websocket ping itself, but from the fact that client-side _and_ server-side need to support the ping/pong frames.

The WebSocket browser APIs don't support ping/pong

Native EventSource doesn’t let you set headers ([issue](https://github.com/whatwg/html/issues/2177)), so it’s harder to handle authentication.

> sending a RequestID to the server so that you get request/response cycles isn't weird nor beyond the pal

There's even a whole spec for that: JSON-RPC, and it's quite popular.


IMAP uses request IDs.

>> If it wasn’t, we couldn’t stream video without loading the entire file first

I don't believe this is correct. To my knowledge, video stream requests chunks by range and is largely client controlled. It isn't a single, long lived http connection.


> I don't believe this is correct.

Yes, the statement is patently wrong. There are a few very popular video formats whose main feature is chunking through HTTP, like HTTP Live Streaming or MPEG-DASH.


I believe that's standard for Netflix, etc, but is it also true for plain webms and mp4s in a <video> tags? I thought those were downloaded in one request but had enough metadata at the beginning to allow playback to start before the file is completely downloaded.

Yes it is true.

Browsers talking to static web servers use HTTP byte ranges requests to get chunks of videos and can use the same mechanism to seek to any point in the file.

Streaming that way is fast and simple. No fancy technology required.

For MP4 to work that we you need to render it as fragmented MP4.


Why would the browser send byte range requests for video tags if it expects to play the file back linearly from beginning to end anyway? Wouldn't that be additional overhead/round-trips?

> Why would the browser send byte range requests for video tags if it expects to play the file back linearly from beginning to end anyway?

Probably because byte range is required for seeking, and playing from the beginning is equivalent to seeking at 0.

> Wouldn't that be additional overhead/round-trips?

No because the range of the initial byte range request is the whole file (`bytes=0-`).


My original comment was about the commenter I replied to saying:

> To my knowledge, video stream requests chunks by range and is largely client controlled. It isn't a single, long lived http connection.

Wouldn't a byte range request for the whole file fall under the "single, long lived http connection"? Sure it could be terminated early and another request made for seeking, but regardless the video can start before the whole file is downloaded, assuming it's encoded correctly?


> Wouldn't a byte range request for the whole file fall under the "single, long lived http connection"?

Yes, it would (though a better description would be "a single, long lived http request" because this doesn't have anything to do with connections), and wewewedxfgdf also replied Yes.

> Sure it could be terminated early and another request made for seeking, but regardless the video can start before the whole file is downloaded, assuming it's encoded correctly?

Yes.


The client doesn't want to eat the whole file, so it uses a range request for just the beginning of the file, and then the next part as needed.

The client would actually request the whole file and then terminate the request if the file is no longer needed. This is what browsers do at least.

Both are possible, and in fact I could imagine not all servers being too happy with having to trickle data over a persistent HTTP connection through the entire length of the video, with an almost always full TCP send buffer at the OS level.

> Both are possible

It is possible if you are in control of the client, but no browser would stream an mp4 file request by request.

> with an almost always full TCP send buffer at the OS level

This shouldn't be a problem because there is flow control. Also the data would probably be sent to the kernel in small chunks, not the whole file at once.


> It is possible if you are in control of the client, but no browser would stream an mp4 file request by request.

I believe most browsers do it like that, these days: https://developer.mozilla.org/en-US/docs/Web/Media/Guides/Au...

> This shouldn't be a problem because there is flow control.

It's leveraging flow control, but as I mentioned this might be less efficient (in terms of server memory usage and concurrent open connections, depending on client buffer size and other variables) than downloading larger chunks and closing the HTTP connection in between them.

Many wireless protocols also prefer large, infrequent bursts of transmissions over a constant trickle.


> I believe most browsers do it like that, these days

Nope. Browsers send a byte range request for the whole file (`0-`), and the correspoding time range grows as the file is being downloaded. If the user decided to seek to a different part of the file, say at byte offset 10_000, the browser would send a second byte range request, this time `10000-` and a second time range would be created (if this part of the file has not already been downloaded). So there is no evidence there that any browser would stream files in small chunks, request by request.

> in terms of server memory usage

It's not less efficient in terms of memory usage because the server wouldn't read more data from the filesystem than it can send with respect to the flow control.

> concurrent open connections

Maybe if you're on HTTP/1, but we live in the age of HTTP/2-3.

> Many wireless protocols also prefer large, infrequent bursts of transmissions over a constant trickle.

AFAIK browsers don't throttle download speed, if that's what you mean.


Ah, interesting, I must have mixed it up/looked at range request based HLS playlists in the past. Thank you!

> AFAIK browsers don't throttle download speed, if that's what you mean.

Yeah, I suppose by implementing a relatively large client-application-side buffer and reading from that in larger chunks rather than as small as the media codec allows, the same outcome can be achieved.

Reading e.g. one MP3 frame at a time from the TCP buffer would effectively throttle the download, limited only by Nagle's Algorithm, but that's probably still much too small to be efficient for radios that prefer to sleep most of the time and then receive large bursts of data.


Realistically you wouldn’t be reading anything from the TCP buffer because you would have TLS between your app and TCP, and it’s pretty much guaranteed that whatever TLS you’re using already does buffering.

That's effectively just another small application layer buffer though, isn't it? It might shift what would otherwise be in the TCP receive buffer to the application layer on the receiving end, but that should be about all the impact.

Oh you’re right, I’m just so used to making the TLS argument because there is also the cost of syscalls if you make small reads without buffering, sorry xD

Are you sure browsers would try to download an entire, say, 10h video file instead of just some chunks of it?

Common sense tells me there should be some kind of limit, but I don't know what it is, whether it's standardized and whether it exists. I just tested and Firefox _buffered_ (according to the time range) the first 27_000~ seconds, but in the dev tools the request appeared as though still loading. Chrome downloaded the first 10.2 MB (according to dev tools) and stopped (but meanwhile the time range was growing from zero approximately by one second every second, even though the browser already stopped downloading). After it played for a bit, Chrome downloaded 2.6 more MB _using the same request_. In both cases the browser requested the whole file, but not necessarily downloaded the whole file.

Seconded, ive done a userland 'content-range' implementation myself. of course there were a few ffmpeg specific parameters the mp4 needed to work right still

It’s not true because throwing a video file as a source on video tag has no information about the file being requested until the headers are pushed down. Hell back in 2005 Akamai didn’t even support byte range headers for partial content delivery, which made resuming videos impossible, I believe they pushed out the update across their network in 06 or 07.

If your HTTP server provides and supports the appropriate headers and you’re serving supported file types, then it absolutely is true.

Just putting a url in my Chromium based browser’s address bar to an mp4 file we have hosted on CloudFlare R2 “just works” (I expect a video tag would be the same), supporting skipping ahead in the video without having to download the whole thing.

Initially skipping ahead didn’t work until I disabled caching on CloudFlare CDN as that breaks the “accept-range” capability on videos. For now we have negligible amount of viewership of these mp4s, but if it becomes an issue we’ll use CloudFlare’s video serving product.


> If your HTTP server provides and supports the appropriate headers and you’re serving supported file types, then it absolutely is true.

No. When you play a file in the browser with a video tag. It requests the file. It doesn’t ask for a range. It does use the range if you seek it, or you write the JavaScript to fetch based on a range. That’s why if you press play and pause it buffers the whole video. Only if you write the code yourself can you partially buffer a while like YouTube does.


Nah, it uses complex video specific logic and http range requests as protocol. (At least the normal browsers and servers. You can roll your own dumb client/server of course.)

> That’s why if you press play and pause it buffers the whole video.

Browsers don't do that.


Obviously it doesn’t initially ask for a range if it starts from the beginning of the video, but it starts playing video immediately without requiring the whole file to download, when you seek it cancels the current request and then does a range request. At no point does it “have” to cache the entire file.

I suppose if you watch it from start to finish without seeking it might cache the entire file, but it may alternatively keep a limited amount cached of the video and if you go back to an earlier time it may need to re-request that part.

Your confidence seems very high on something which more than one person has corrected you on now, perhaps you need to reassess the current state of video serving, keeping in mind it does require HTTP servers to allow range requests.


You can learn it here:

https://www.zeng.dev/post/2023-http-range-and-play-mp4-in-br...

You can also watch it happen - the Chrome developer tools network tab will show you the traffic that goes to and from the web browser to the server and you can see this process in action.


Who cares what happened in 2005? This is so rare nowadays, I've only really seen it on websites that are constructing the file as they go, such as the Github zip download feature.

2005 is basically the dark ages of the web. It’s pre Ajax and ie6 was the dominant browser. Using this as an argument is like saying apps aren’t suitable because the iPhone didn’t have an App Store until 2008.

> It’s not true because throwing a video file as a source on video tag has no information about the file being requested until the headers are pushed down.

And yet, if you stick a web server in front of a video and load it in chrome, you’ll see just that happening.


Can load a video into a video tag in chrome. Press play and pause. See it makes a single request and buffers the whole video.

If you stick:

  <video controls>
    <source src="/video/sample.mp4" type="video/mp4">
    Your browser does not support the video tag.
  </video>
into a html file, and run it against this pastebin [0], you'll see that chrome (and safari) both do range requests out of the box if the fileis big enough.

[0] https://pastebin.com/MyUfiwYE


Tried it on a 800mb file. Single request.

I tried it on 4 different files, and in each case my browser sent a request, my server responded with a 206 and it grabbed chunks as it went.

They can playback as loading as long as they are encoded correctly fwiw (faststart encoded).

When you create a video from a device the header is actually at the end of the file. Understandable, it’s where the file pointer was and mp4 allows this so your recording device writes it at the end. You must re-encoded with faststart (puts the moov atom at the start) to make it load reasonably on a webpage though.


> Understandable, it’s where the file pointer was and mp4 allows this so your recording device writes it at the end.

Yet formats like WAVE which use a similar "chunked" encoding they just use a fixed length header and use a single seek() to get back to it when finalizing the file. Quicktime and WAVE were released around nearly the same time in the early 90s.

MP2 was so much better I cringe every time I have to deal with MP4 in some context.


At the expense of quite some overhead though, right?

MPEG-2 transport streams seem more optimized for a broadcast context, with their small frame structure and everything – as far as I know, framing overhead is at least 2%, and is arguably not needed when delivered over a reliable unicast pipe such as TCP.

Still, being able to essentially chop a single, progressively written MPEG TS file into various chunks via HTTP range requests or very simple file copy operations without having to do more than count bytes, and with self-synchronization if things go wrong, is undoubtedly nicer to work with than MP4 objects. I suppose that's why HLS started out with transport streams and only gained fMP4 support later on.


> and is arguably not needed when delivered over a reliable unicast pipe such as TCP.

So much content ended up being delivered this way, but there was a brief moment where we thought multicast UDP would be much more prevalent than it ended up being. In that context it's perfect.

> why HLS started out with transport streams and only gained fMP4 support later on.

Which I actually think was the motivation to add fMP4 to base MP4 in the first place. In any case I think MPEG also did a better job with DASH technically but borked it all up with patents. They were really stupid with that in the early 2010s.


Multicast UDP is widely used - but not on the Internet.

We often forget there are networks other than the Internet. Understandable, since the Internet is most open. The Internet is just an overlay network over ISPs' private networks.

SCTP is used in cellphone networks and the interface between them and legacy POTS networks. And multicast UDP is used to stream TV and/or radio throughout a network or building. If you have a "cable TV" box that plugs into your fiber internet connection, it's probably receiving multicast UDP. The TV/internet company has end-to-end control of this network, so they use QoS to make sure these packets never get dropped. There was a write-up posted on Hacker News once about someone at a hotel discovering a multicast UDP stream of the elevator music.


> If you have a "cable TV" box that plugs into your fiber internet connection, it's probably receiving multicast UDP.

That's a good point: I suppose it's a big advantage being able to serve the same, unmodified MPEG transport stream from a CDN, as IP multicast over DOCSIS/GPON, and as DVB-C (although I’m not sure that works like that, as DVB usually has multiple programs per transponder/transport stream).


The long answer is "it depends on how you do it" unsurprisingly video and voice/audio are probably the most different ways that you can "choose" to do distribution

This. You can't just throw it into a folder and have to stream. The web server has to support it and then there is encoding and formats.

Yea this works for mp4 and HN seems confused about how.

The MOOV atom is how range requests are enabled, but the browser has to find it first. That's why it looks like it's going to download the whole file at first. It doesn't know the offset. Once it reads it, the request will be cancelled and targeted range requests will begin.


The two are essentially the same thing, modulo trading off some unnecessary buffering on both sides of the TCP pipe in the "one big download" streaming model for more TCP connection establishments in the "range request to refill the buffer" one.

For MP4s the metadata is at the end annoyingly enough.

MP4 allows the header at the start or the end.

It’s usually written to the end since it’s its not a fixed size and it’s a pain for recording and processing tools to rewrite the whole file on completion just to move the header to the start. You should always re-encode to move the header to the start for web though.

It’s something you see too much of online once you know about it but mp4 can absolutely have the header at the start.


You can `-movflags faststart` when encoding to place it at the beginning.

implementations may request the metadata range at the end in this case, if the content length is known

For "VOD", that works (and is how very simple <video> tag based players sometimes still do it), but for live streaming, it wouldn't – hence the need for fragmented MP4, MPEG-DASH, HLS etc.

It does work for simpler codecs/containers though: Shoutcast/Icecast web radio streams are essentially just endless MP3 downloads, optionally with some non-MP3 metadata interspersed at known intervals.


Correct. HLS and Dash are industry standards. Essentially the client downloads a file which lists the files in various bitrates and chunks and the client determines which is best for the given connectivity.

And even if you are using a "regular" video format like mp4, browsers will still use range requests [1] to fetch chunks of the file in separate requests, assuming the server supports it (which most do).

[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/Ran...


Correct

> Bonus: Making it easy with eventkit

Why not just use SSE? https://developer.mozilla.org/en-US/docs/Web/API/Server-sent...


I've noticed some weird behaviors with the EventSource impl that browsers ship with. Chief among them being the default behavior is to infinitely reconnect after the server closes the stream, so you have to coordinate some kind of special stop event to stop the client from reconnecting. You wouldn't have that problem with the stream object from Response.body

The SSE protocol is actually just a long-running stream like I mentioned but with specific formatting for each chunk (id, event, and data fields)

as a side note, eventkit actually exports utilities to support SSE both on client and server. The reason you'd want to use eventkit in either case is because it ships with some extra transformation and observability goodies. https://hntrl.github.io/eventkit/guide/examples/http-streami...


The reconnect thing is actually quite helpful for mobile use cases. Say the user switches the tab, closes their browser or loses network and then they return. Since SSE is stateless from the client's perspective, the client can just reconnect and continue receiving messages. Whereas with WS there's handshakes to worry about--and also other annoyances like what to do with pending requests before connection was lose.

SSE is great. Most things with websockets would be fine with SSE.

Also I don't see it being much easier here than a few primitives and learning about generator functions if you haven't had experience with them. I appreciate the helper, but the API is pretty reasonable as-is IMO


I’m experimenting with SSE for realtime project deployment logs in https://lunni.dev/ and it’s been extremely pleasant so far.

The only problem is, if you want to customize the request (e.g. send a POST or add a header),you have to use a third-party implementation (e.g. one from Microsoft [1]), but I hope this can be fixed in the standards later.

[1]: https://www.npmjs.com/package/@microsoft/fetch-event-source


The helper example was a sore attempt to plug the project I've been working on (tinkering with it is how I came up with the example). The library I plugged has much more to do with enabling a more flexible reactive programming model in js, but just so happens to plug into the stream API pretty handily. Still an interesting look IMO if you're into that kind of stuff

No worries, I know how it feels! (I said, plugging away my own project in a sibling comment, lol)

I do like the reactive approach (in fact, I’ve reinvented something similar over SSE). I feel a standards-based solution is just ever so slightly more robust/universal.


SSE doesn't support binary data without encoding to something base64 first. These days I'd recommend a fetch stream with TLV messages first, followed by WebSocket.

Based on my read, this basically is SSE but doesn't use the same protocol.

Do CDN, such as Cloudflare, support SSE? The last time I looked, they didn't, but maybe things have changed.

Cloudflare doesn't officially support SSE, but if you send keepalives events every 15 or 20sec or so you can reliably use SSE for 40 min + in my experiance.

No server traffic for 100+ sec officially results in a 524, so you could possibly make that keepalive interval longer, but I haven't tested it.

Make sure to have the new style cache rule with Bypass cache selected and absolutely make sure you are using HTTP/2 all the way to the origin.

The 6 connections per browser limit of HTTP/1.1 SSE was painful, and I am pretty sure auto negotiation breaks, often in unexpected ways with a HTTP/1.1 origin.


I have a demo of this for CF workers https://github.com/hntrl/eventkit/tree/main/examples/workers...

(it's not SSE in particular, but it demonstrates that you can have a long running stream like SSE)


On top of the comments below about SSE, I'd also point out that Cloudflare is doing some interesting stuff around serverless resumable websockets. They also have stuff for WebRTC.

Yes, they are

SSE is the way to roll.

The problem selects the solution.

That said, I like SSE for unidirectional string-encoded events.


I don't know why people keep trying desperately to avoid the simplicity and flexibility of WebSockets.

A lot of times, what people need is a bidirectional connection yet somehow they convince themselves that SSE is better for the job... But they end up with two different types of streams; HTTP for writes and responses and SSE for passively consuming real-time data... Two different stream types with different lifecycles; one connection could fail while the other is fine... There is no way to correctly identify what is the current connection status of the app because there are multiple connections/statuses and data comes from multiple streams... Figuring out how to merge data coming from HTTP responses with data coming in passively from the SSE is messy and you have no control over the order in which the events are triggered across two different connections...

You can't enforce a serial, sequential, ordered flow of data over multiple connections as easily, it gets messy.

With WebSockets, you can easily assign an ID to requests and match it with a response. There are plenty of WebSocket frameworks which allow you to process messages in-order. The reason they work and are simple is because all messages pass over a single connection with a single state. Recovering from lost connections is much more straight forward.


I don't know why everyone... proceeds to use their own experience as proof of what everyone needs.

These are tools, not religions.

Websockets have some real downsides if you don't need bidirectional comms.


who's to say your data is coming from multiple streams? You can propagate any updates you need to make in the application to a single stream (like SSE or a long-lived response) in place of a WebSocket. your http responses can just be always 204 if all they're doing is handling updates and pushing events to aforementioned single stream.

https://en.wikipedia.org/wiki/Command%E2%80%93query_separati...


SSE does not require a separate connection, unlike WebSockets.

MDN disagrees. See the huge red warning here https://developer.mozilla.org/en-US/docs/Web/API/EventSource

Unless you mean on HTTP2? But aren't WS connections also multiplexed over HTTP2 in that case?


> MDN disagrees. See the huge red warning here https://developer.mozilla.org/en-US/docs/Web/API/EventSource

It should say "When used over HTTP/1" instead of "When not used over HTTP/2" because nowadays we also have HTTP/3, and browsers barely even use HTTP/1, so I would say it's pretty safe to ignore that warning.

> Unless you mean on HTTP2?

Any version of HTTP that supports multiplexing.

> But aren't WS connections also multiplexed over HTTP2 in that case?

There is RFC 8441 but I don't think it's actually implemented in the browsers.


> There is RFC 8441 but I don't think it's actually implemented in the browsers.

Found this: https://github.com/mattermost/mattermost/issues/30285


https://chromestatus.com/feature/6251293127475200

It looks like it's supported in Chrome and Firefox but not in Safari.


It's javascript, anything simple needs a framework.

The problem with HTTP2 is that the server-push aspect was tacked on top of an existing protocol as an afterthought. Also, because HTTP is a resource transfer protocol, it adds a whole bunch of overheads like request and response headings which aren't always necessary but add to processing time. The primary purpose of HTTP2 was to allow servers to preemptively push files/resources to clients to avoid round-trip latency; to reduce the reliance on script bundles.

WebSockets is a simpler protocol built from the ground up for bidirectional communication. It provides a lot more control over the flow of data as everything passes over a single connection which has a single lifecycle. It makes it a lot easier to manage state and to recover cleanly from a lost connection when you only have one logical connection. It makes it easier to process messages in a specific order and to do serial processing of messages. Having just one connection also greatly simplifies things in terms of authentication and access control.

I considered the possibility of switching the transport to HTTP2 for https://socketcluster.io/ years ago, but it's a fundamentally more complex protocol which adds unnecessary overheads and introduces new security challenges so it wasn't worth it.


> The primary purpose of HTTP2 was to allow servers to preemptively push files/resources to clients to avoid round-trip latency; to reduce the reliance on script bundles.

No, it was not. The primary goal of HTTP/2 was to get over traditional connection limits through connection multiplexing because browsers treat TCP connections as an extremely scarce resource. Multiplexing massively improves the ability to issue many asynchronous calls, which are very common -- and H2 went on to make the traditional HTTP stack more efficient across the board (i.e. header compression.) Some of the original HTTP/2 demo sites that popped up after Google first supported it in Chrome were of loading many images over HTTP/1 vs HTTP/2, which is very common. In one case of my own (fetching lots of small < 1kb files recursively from S3, outside the browser) HTTP/2 was like a 100x performance boost over HTTP/1 or something.

You're correct Server Push was tacked on and known to be flawed very early on, and it took a while before everyone pulled the plug on it, but people fixated on it because it just seemed really cool, from what I can tell. But it was never the lynchpin of the thing, just a (failed and experimental) boondoggle.


> The primary purpose of HTTP2 was to allow servers to preemptively push files/resources to clients to avoid round-trip latency; to reduce the reliance on script bundles.

The primary purpose for HTTP2 was to allow multiple simultaneous asynchoronous http calls, which is a massive loading performance boost for most websites. Server push was very much a tacked on afterthought.


How can server push be a problem with HTTP/2 if nobody supports server push? It's dead. And what about multiplexing and header compression? Not worth it?

Server push is dead though, SSE is a different idea with completely different semantics (and tradeoffs).

Agree after banging my head against http2 for years, I now really enjoy how simple websockets are and their universal support

Me: For this POC you've given me, I will do an old-fashioned HTTP form submit, no need for anything else.

Architect: But it must have websockets!

Me: Literally nothing in this POC needs XHR, much less websockets. It's a sequential buy flow with nothing else going on.

Architect: But it has to have websockets, I put them on the slide!

(Ok he didn't say the part about putting it on the slide, but it was pretty obvious that's what happened. Ultimately I caved of course and gave him completely unnecessary websockets.)


My strategy for this kind of situation is to avoid direct rejection. Instead of saying stuff like "it's unnescessary" or "you are wrong", I push for trying first without.

I would say:

> Once we have a working MVP without websockets we can talk again to think about using websocket.

Most times, once something is working, they then stop to care, or we have other priorities then.


I always try and push back on those beliefs, about reasonings why they believe it will be faster or more efficient than some other solution.

I've found , if you could type cast those people, they would be a tech architect who only uses "web scale" items. (Relevant link: https://www.youtube.com/watch?v=5GpOfwbFRcs )


I call them Powerpoint architects.

Having deployed WebSockets into production, I came to regret that over the next years. Be it ngnix terminating connections after 4/8 hours, browsers not reconnecting after sleep and other issues, I am of the opinion that WebSockets and other forms of long standing connections should be avoided if possible.

Not to mention, some major parts of the websocket API have been broken in Google Chrome for over two years now.

Chrome no longer fires Close or Error events when a websocket disconnects (well, at least not when they happen, they get fired about 10 minutes later!). So, your application won't know for 10 minutes that the connection has been severed (unless the internet connection is also lost, but that isn't always the case when a websocket is disconnected).

Here's the chrome bug:

https://issuetracker.google.com/issues/362210027?pli=1

From that bug report it looks like the Chrome bug is less than a year old, but the Chrome bug is originally mentioned here in April 2023 for a similar bug in iOS (the iOS bug has been resolved):

https://stackoverflow.com/questions/75869629/ios-websocket-c...

I kind of suspect Chrome is actually doing this intentionally. I believe they do this so a tab can recover from background sleep without firing a websocket close event. That's helpful in some cases, but it's a disaster in other cases, and it doesn't matter either way... it breaks the specification for how websockets are expected to work. WebSockets should always fire Close and Error events immediately when they occur.


If you want to use websockets, then you are most definitely going to need some library that wraps the websocket, because websockets themselves are very simple and don't do things like reconnect on their own.

This one is pretty simple and pretty great: https://github.com/lukeed/sockette

I did my own which provides rpc functionality and type safety: https://github.com/samal-rasmussen/smolrpc


Even load balancers force you to have a frequent heartbeat all the way to the client for each connection.

People interested in HTTP streaming should check out Braid-HTTP: https://braid.org. It adds a standard set of semantics that elegantly extend HTTP with event streaming into a robust state synchronization protocol.

Oof, what a headline to be top of hn the day after you implement websockets into a project.

We've had a production app with them for over 10 years and it's generally great. The only thing to be aware of is this Chrome bug:

https://issuetracker.google.com/issues/362210027?pli=1

You can add a recurring ping/pong between the client/server so you can know with some recency that the connection has been lost. You shouldn't have to do that, but you probably want to until this bug is fixed.


60s heartbeat interval, job done.

We've got multiple internal apps using WebSockets in production, for years. I have to say I don't really get all the concern in the article about upgrading the connection - any decent backend framework should handle this for you without a problem.

Hacker News articles on new libraries generally live in the 1% of the 1%. For lots of websites, they don't need a web-socket because they are just doing CRUD. For the 1% doing live updates, web-sockets are great and straight-forward. For whatever specialised use case the article has, sure there's something even less well supported you can pivot to.


We use a much short heartbeat interval because our use case is for real time control and monitoring so our users need to know immediately if the connection is lost.

Websockets work great, don't worry too much about it.

I don't know why the topic of websockets is so weird. 80% of the industry seem to have this skewed idealised perception of websockets as the next frontier of their web development career and cannot wait to use them for anything remotely connected to streaming/ realtime use cases. When pointing out the nuances and that websockets should actually be avoided for anything where they are not absolutely needed without alternatives people get defensive and offended, killing every healthy discussion about realistic tradeoffs for a solution. Websockets have a huge number of downsides especially losing many of the niceties and simplicity of http tooling, reasonability, knowledge and operations of http. As many here pointed, the goto solution for streaming server changes is h2 / h3 and SSE. Everything that can be accomplished in the other direction with batching and landing in the ballpark of max 0.5req/s per client does NOT need websockets.

There is no reason to avoid WebSockets. This is a conclusion people come to because they are familiar with HTTP round trips and cannot imagine anything different.

There are no nuances to understand. It’s as simple as fire and forget.

The only downside to WebSockets is that they are session oriented. Conversely, compared to WebSockets the only upside to HTTP is that its sessionless.


I just realized that modern web applications are a group form of procrastination. Procrastination is a complex thing. But essentially, it's putting something off because of some perceived pain, even though the thing may be important or even inevitable, and eventually the procrastination leads to negative outcomes.

Web applications were created because people were averse to creating native applications, for fear of the pain involved with creating and distributing native applications. They were so averse to this perceived pain that they've done incredibly complex, even bizarre things, just so they don't have to leave the web browser. WebSockets are one of those things: taking a stateless client-server protocol (HTTP) and literally forcing it to turn into an entirely new protocol (WebSockets) just so people could continue to do things in a web browser that would have been easy in a native application (bidirectional stateful sockets, aka a tcp connection).

I suppose this is a normal human thing. Like how we created cars to essentially have a horseless buggy. Then we created paved roads to make that work easier. Then we built cities around paved roads to keep using the cars. Then we built air-scrubbers into the cars and changed the fuel formula when we realized we were poisoning everyone. Then we built electric cars (again!) to try to keep using the cars without all the internal combustion issues. Then we built self-driving cars because it would be easier than expanding regional or national public transportation.

We keep doing the easy thing, to avoid the thing we know we should be doing. And avoiding it just becomes a bigger pain in the ass.


I agree with a lot of that. But, it's a lot easier to get someone to try your web app than install a native app. It's also easier to get the IT department to allow an enterprise web app than install a native app. Web apps do have some advantages over native apps.

Yes, all of that is the reason why we procrastinate. "Easy" is the excuse we give ourselves to do the things we would otherwise have no justification for, and avoid the difficult things we know would be better. It's not my fault; it's not my responsibility; I shouldn't have to do extra work; it's too complicated; it'll be hard; it'll be long; it'll be painful; it's not perfect; it might fail. No worries; there's something easier I can do.

Thus we see the flaws in the world, and shrug. When someone else does this, we get angry, and indignant. How dare someone leave things like this! Yet when we do it, we don't make a peep.


You left out the part where you explain why native apps are so much better for users and developers than web apps?

I can't tell why you think WebSockets are so bizarre.


Many advantages, for example web apps get suspended if you’re not browsing the tab. But I do agree it’s much more attractive to write web apps mainly for portability.

Native apps also are suspended. Or can be, like na iOS (not being fanboy, I appreciate this mechanism). Also native, desktop apps also can be almost suspended while not used.

Web apps are just way easier to do anything (rarely good), so many people are doing them without real engineering or algo knowledge producing trash every day. Article is also using same voice. Showing one protocol as completely bad, mentioning only the issues both approaches have, but silently omitting those issues describing „the only way, craft, holistic, Rust and WASM based solution, without a plug”


> Native apps also are suspended. Or can be, like na iOS (not being fanboy, I appreciate this mechanism).

On iOS web apps get suspended very aggressively, and there is no way for a web app to signal to the browser to not suspend it. I never developed native mobile apps, but I assume it’s less aggressive for native apps and/or native apps have a way to prevent themselves from being suspended. This doesn’t seem to be an issue on desktop though.


> bidirectional stateful sockets, aka a tcp connection

Which is not "easy" to do over the internet, so the native app folks ended-up using HTTP anyway. (Plus they invented things like SOAP.)


TCP is easy to do over the internet. Did you mean the middleboxes? Ah, the middleboxes, the favorite scapegoat of the world wide web's cabal of committees. You'd think they were absolutely powerless. Like firewalls and application proxies are a fundamental principle of nature; unable to be wrestled, only suffered under. Yet the web controls a market share 500 times larger.

Websocket is not ment to be sent as streams (TCP equvalent), but as datagrams aka packets (UDP equivalents). Correct me if I am wrong, but websockets api in Javascript libraray for browsers is pretty poor and does not have ability to handle backpressure and I am sure it can not handle all possible errors (assertions about delivery). If you want to use websockets as TCP streams including seasson handling, great care should be taken as this is not natively availabe in neither rfc6455 and in browser.

I usually start with the long polling/SSE and migrate to WebSockets when needed. It is cheap and reliable with almost no performance overhead when compared to WebSockets.

You don't need websockets SSE works fine for realtime collaborative apps.

Websockets sound great on paper. But, operationally they are a nightmare. I have had the misfortune of having to use them at scale (the author of Datastar had a similar experience). To list some of the challenges:

- firewalls and proxies, blocked ports

- unlimited connections non multiplexed (so bugs lead to ddos)

- load balancing nightmare

- no compression.

- no automatic handling of disconnect/reconnect.

- no cross site hijacking protection

- Worse tooling (you can inspect SSE in the browser).

- Nukes mobile battery because it hammers the duplex antenna.

You can fix some of these problems with websockets, but these fixes mostly boil down to sending more data... to send more data... to get you back to your own implementation of HTTP.

SSE on the other hand, by virtue of being regular HTTP, work out of the box with, headers, multiplexing, compression, disconnect/reconnect handling, h2/h3, etc.

If SSE is not performant enough for you then you should probably be rolling your own protocol on UDP rather than using websockets. Or wait until WebTransport is supported in Safari (any day now ).

Here's the article with a real time multiplayer Game of Life that's using SSE and compression for multiplayer.

https://example.andersmurphy.com

It's doing a lot of other dumb stuff explained a bit more here, but the point is you really really don't need websockets (and operationally you really don't want them):

https://andersmurphy.com/2025/04/07/clojure-realtime-collabo...


Useful take, thanks for mentioning specifics. Some of these I wasn't aware of.

- What makes load balancing easier with SSE? I imagine that balancing reconnects would work similar to WS.

- Compression might be a disadvantage for binary data, which WS specializes in.

- Browser inspection of SSE does sound amazing.

- Mobile duplex antenna is way outside my wheelhouse, sounds interesting.

Can you see any situation in which websockets would be advantageous? I know that SSE has some gotchas itself, such as limited connections (6) per browser. I also wonder about the nature of memory and CPU usage for serving many clients on WS vs SSE.

I have a browser game (few players) using vanilla WS.


Thanks.

- Load balancing is easier because your connection is stateless. You don't have to connect to the same server when you reconnect. Your up traffic doesn't have to go to the same server as your down traffic. Websocket tend to come with a lot of connection context. With SSE you can easily kill nodes, and clients will reconnect to other nodes automatically.

- The compression is entirely optional. So when you don't need it don't use it. What's great about it though is it's built into the browser so you're not having to ship it to the client first.

- The connection limit of 6 is only applies to http1.1 not http2/3. If you are using SSE you'll want http2/3. But, generally you want http2/3 from your proxy/server to the browser anyway as it has a lot of performance/latency benefits (you'll want it for multiplexing your connection anyway).

- In my experience CPU/memory usage is lower than websockets. Obviously, some languages make them more ergonomic to use virtual/green threads (go, java, clojure). But, a decent async implementation can scale well too.

Honestly, and this is just an opinion, no I can't see when I would ever want to use websockets. Their reconnect mechanisms are just not reliable enough and their operational complexity isn't worth it. For me at least it's SSE or a proper gaming net code protocol over UDP. If your browser game works with websockets it will work with SSE.


I appreciate the answers. For others reading, I also just ran across another thread where you posted relevant info [0]. In the case of my game, I'm going to consider SSE, since most of the communication is server to client. That said, I already have reconnects etc implemented.

In my research I recall some potential tradeoffs with SSE [1], but even there I concluded they were minor enough to consider SSE vs WS a wash[2] even for my uses. Looking back at my bookmarks, I see that you were present in the threads I was reading, how cool. A couple WS advantages I am now recalling:

SSE is one-way, so for situations with lots of client-sent data, a second connection will have to be opened (with overhead). I think this came up for me since if a player is sending many events per second, you end up needing WS. I guess you're saying to use UDP, which makes sense, but has its own downsides (firewalls, WebRTC, WebTransport not ready).

Compression in SSE would be negotiated during the initial connection, I have to assume, so it wouldn't be possible to switch modes or mix in pre-compressed binary data without reconnecting or base64-ing binary. (My game sends a mix of custom binary data, JSON, and gzipped data which the browser can decompress natively.)

Edit: Another thing I'm remembering now is order of events. Because WS is a single connection and data stream, it avoids network related race conditions; data is sent and received in the programmatically defined sequence.

0: https://news.ycombinator.com/item?id=43657717

1: https://rxdb.info/articles/websockets-sse-polling-webrtc-web...

2: https://www.timeplus.com/post/websocket-vs-sse


Cool. I didn't notice either. :)

With http2/3 the it's all multiplexed over the same connection, and as far as your server is concerned that up request/connection is very short lived.

Yeah mixed formats for compression is probably a use case (like you said once you commit with compression with SSE there's no switching during the connection). But, then you still need to configure compression yourself with websockets. The main compression advantage of SSE is it's not per message it's for the whole stream. The implementations of compression with websockets I've seen have mostly been per message compression which is much less of a win (I'd get around 6:1, maybe 10:1 with the game example not 200:1, and pay a much higher server/client CPU cost).

Websockets have similar issues with firewalls and TCP. So in my mind if I'm already dealing with that I might as well go UDP.

As for ordering, that's part of the problem that makes websockets messy (with reconnects etc). I prefer to build resilience into the system, so in the case of that demo I shared, if you disconnect/reconnect lose your connection you automatically get the latest view (there's no play back of events that needs to happen). SSE will automatically send up the last received event id up on reconnect (so you can play back missed events if you want, not my thing personally). I mainly use event ID as a hash of content, if the hash is the same don't send any data the client already has the latest state.

By design, the way I build things with CQRS. Up events never have to be ordered with down events. Think about a game loop, my down events are basically a render loop. They just return the latest state of the view.

If you want to order up events (rarely necessary). I can batch on the client to preserver order. I can use client time stamp/hash of the last event (if you want to get fancy), and the server orders and batches those events in sync with the loop, i.e everything you got in the last X time (like blockchains/trading systems). This is only for per client based ordering, no distributed client ordering otherwise you get into lamport clocks etc.

I've been burnt too many times by thinking websockets will solve the network/race conditions for me (and then failing spectacularly), so I'd rather build the system to handle disconnects rather than rely on ordering guarantees that sometimes break.

Again, though my experience has made me biased. This is just my take.


What do you mean by "inspect in browser"? All major browsers' devtools have supported WebSocket inspecting for many years.

Many of the other issues mentioned are also trivial to solve (reconnects, cross-origin protection).

Also, doesn't WebTransport have many of the same issues? (e.g. with proxies and firewalls). And do you have any data for the mobile battery claim? (assuming this is for an application in foreground with the screen on)


The fact that you are saying they are trivial to solve means you probably need more visibility on your system. Reliable reconnect was the nightmare we saw regularly.

Unfortunately, I can't go into much detail on the mobile battery stuff, but I can give you some hints. If you do some reading on how antenna on phones work combined with websockets heartbeat ping/pong and you should get the idea.


> If you do some reading on how antenna on phones work combined with websockets heartbeat ping/pong and you should get the idea.

The implication is that the ping/pong keeps the system active when it wouldn't otherwise be necessary, but else are you receiving data or detecting lost connection with the other mechanisms? The lower layers have their own keepalives, so what's different?

I looked into it a little since it didn't make sense to me, unless you're comparing apples and oranges, but the only research I could find either didn't seem to support your stance or compared WebSockets to the alternative of just simply not being able to receive data in a timely manner.


You can also use long polling, which keeps alive a connection so the server can respond immediately when there’s new data. For example:

Server

  const LONG_POLL_SERVER_TIMEOUT = 8_000

  function longPollHandler(req, response) {
    // e.g. client can be out of sync if the browser tab was hidden while a new event was triggered
    const clientIsOutOfSync = parseInt(req.headers.last_received_event, 10) !== myEvents.count
    if (clientIsOutOfSync) {
      sendJSON(response, myEvents.count)
      return
    }

    function onMyEvent() {
      myEvents.unsubscribe(onMyEvent)
      sendJSON(response, myEvents.count)
    }
    response.setTimeout(LONG_POLL_SERVER_TIMEOUT, onMyEvent)
    req.on('error', () => {
      myEvents.unsubscribe(onMyEvent)
      response.destroy()
    })
    myEvents.subscribe(onMyEvent)
  }



Client (polls when tab is visible)

  pollMyEvents()
  document.addEventListener('visibilitychange', () => {
    if (!document.hidden)
      pollMyEvents()
  })

  pollMyEvents.isPolling = false
  pollMyEvents.oldCount = 0
  async function pollMyEvents() {
    if (pollMyEvents.isPolling || document.hidden)
      return
    try {
      pollMyEvents.isPolling = true
      const response = await fetch('/api/my-events', {
        signal: AbortSignal.timeout(LONG_POLL_SERVER_TIMEOUT + 1000),
        headers: { last_received_event: pollMyEvents.oldCount }
      })
      if (response.ok) {
        const nMyEvents = await response.json()
        if (pollMyEvents.oldCount !== nMyEvents) { // because it could be < or >
          pollMyEvents.oldCount = nMyEvents
          setUIState('eventsCount', nMyEvents)
        }
        pollMyEvents.isPolling = false
        pollMyEvents()
      }
      else
        throw response.status
    }
    catch (_) {
      pollMyEvents.isPolling = false
      setTimeout(pollMyEvents, 5000)
    }
  }

Working example at Mockaton: https://github.com/ericfortis/mockaton/blob/6b7f8eb5fe9d3baf...

Yep, have used long polling with no downsides for ~20 years. 95% of the time I see web sockets it's unnecessary.

> We can’t reliably say “the next message” received on the stream is the result of the previous command since the server could have sent any number of messages in between now and then.

Doing so is a protocol decision though, isn't it?

If the protocol specifies that the server either clearly identifies responses as such, or only ever sends responses, and further doesn't send responses out of order, I don't see any difference to pipelined HTTP: The client just has to count, nothing more. (Then again, if that's the use case, long-lived HTTP connections would do the trick just as well.)


What happens if a message somehow gets lost? Dropped packets, error, etc? Or is that completely precluded by using http streaming?

TCP provides a lossless in-order stream, and errors are corrected at the layers even below that, so HTTP and WebSockets are equivalent in that regard.

Why not use a library like socket.io? It handles the socket lifecycle, reconnection etc.

I think an article like that would benefit from focusing more on protocols, rather than particular APIs to work with those: referencing the specifications and providing examples of messages. I am pretty sure that the article is about chunked transfer encoding [1], but it was not mentioned anywhere. Though possibly it tries to cover newer HTTP versions as well, abstracting from the exact mechanisms. In which case "JS API" in the title would clarify it.

As for the tendency described, this seems to be an instance of the law of the instrument [2], combined with some instruments being more trendy than others. Which comes up all the time, but raising awareness of more tools should indeed be useful.

[1] https://en.wikipedia.org/wiki/Chunked_transfer_encoding

[2] https://en.wikipedia.org/wiki/Law_of_the_instrument


We are looking into adopting bidirectional streams, and have identified gRPC as a likely ideal candidate. It provides a layer on top of the blobs (partial responses) sent by either side, and takes over the required chunking and dechunking. And it doesn’t have the authentication issues that Websockets have. I‘d appreciate any insights on this matter.

What does this solve? Genuine question. You still have to manage connectivity, and synchronization. Also not so sure that stream reading will necessarily be quantized chunks of your updates sent from the server.

If you use a proper framework, you don't have to manage the socket lifecycle and it doesn't complicate your server.

WebSockets are full duplex, so both sides of a connection are equally transmitting sides. There first section fails to understands this and then builds some insane concern for state on top of this faulty notion. WebSockets don't care about your UI framework just like your car doesn't care what time you want to eat dinner.

> You have to manage the socket lifecycle

You have to do the very same thing with HTTP keep-alive or use a separate socket for each and every HTTP request, which is much slower. Fortunately the browser makes this stupid simple in regards to WebSockets with only a few well named events.

> When a new WebSocket connection is initiated, your server has to handle the HTTP “upgrade” request handshake.

If the author cannot split a tiny string on CRLF sequences they likely shouldn't be programming and absolutely shouldn't be writing an article about transmission. There is only 1 line of data you really need from that handshake request: Sec-WebSocket-Key.

Despite the upgrade header in the handshake the handshake is not actually HTTP. According to RFC6455 it is a tiny bit of text conforming to the syntax of RFC2616, which is basically just: lines separated by CRLF, terminated by two CRLFs, and headers separated from values with a colon. Really its just RFC822 according to RFC2616.

This is not challenging.

I take it this article is written by a JavaScript framework junkie that cannot program, because there is so much in the article that is just wrong.

EDITED: because people get sad.


You're very confrontational yet your post doesn't really refuse the author's main points.

What the author means with "transactional" is that WebSockets have no built-in request-response mechanism, where you can tell which response belongs to which request. It's a weird word choice, but alas.

I do agree that the bit about "handshakes are hard" feels a bit ill-advised btw, but it's not the core argument nor the core idea of this post. The core idea is "do request-response via HTTP, and then use some sort of single-direction stream (maybe over WS, doesn't matter) to keep client state in sync". This is a pretty good idea regardless of how well or how badly you know the WebSocket RFCs by heart.

(I say this as someone who built a request-response protocol on top of websockets and finds it to work pretty well)


> What the author means with "transactional" is that WebSockets have no built-in request-response mechanism

Its not HTTP and does not want to be HTTP. In WebSockets the request/response mechanism is for one side to send a message and then the other side to send a message. If you want to associate a message from one side with a message from the other side put a unique identifier in the messages.

If you really want the request/response round trip then don't use WebSockets. I would rather messages just transmit as each side is ready, completely irrespective of any round trip or response, because then everything is fully event oriented and free from directionality.


> If you really want the request/response round trip then don't use WebSockets.

Yes! That's the whole point of the article! You agree with the author!


The world needs more of these "you might not need" articles.

Too many technology fads make things needlessly complicated, and complexity makes systems unreliable.

You might not need Kubernetes

You might not need The Cloud

You might not need more than SQLite

...and so on.


Genuine question because I agree that there are a lot of over complicated systems. I often see people say all you need is SQLite. Do you implement replication yourself? Or you are just accepting that if something happens to your server your data is just gone? I always default to managed Postgres and that seems to be the simplest most boring solution.

There's a huge class of applications for which Litestream provides all the replication of SQLite databases you need.

https://litestream.io https://github.com/benbjohnson/litestream


Replication in SQLLite

   cp data.db <backuo location>
On modern cloud systems you shouldn’t have data loss anyway

SQLite is absolutely not suitable if you need non-trivial amounts of write concurrency - SQLite locks the file when writing, and doesn't even notify the next writer when done - writers poll to see if it's unlocked yet. If you don't use WAL mode, then readers have to wait for writers to.

You can still back up your SQLite database file. You shouldn't do it in the middle of a write, or you should use the SQLite backup API to manage concurrency for you, or you can back it up in SQL dump format. This isn't one of the usual reasons you shouldn't use SQLite. If you need synchronous replication, then you shouldn't use SQLite.

SQLite is robust against process crashes and even operating system crashes if fsync works as it should (big if, if your data is important), but not against disk failure.

In most of the cases when you shouldn't use SQLite, you should still just upgrade one step to Postgres, not some random NoSQL thing or Google-scale thing.


I'm still waiting for "You might not need React"

We have multiple mission-critical, industrial-grade WebSocket monitoring applications that have been running rock-solid for the last eight years without any hiccups in manufacturing environments. It seems like you're taking an easy-to-maintain codebase and turning it into a complex monstrosity.

Web sockets are very low level, so first you want to use a library in order to work seamlessly with all 100 different implementations of websockets, but then you need to make your own protocol ontop of it. And implement ping and reconnect.

I wrote a subsystem the other day that used websockets for a server to distribute video conversion tasks.

After futzing with silly things like file transfers and communication protocols I chucked it out and rewrote it so the client does HTTP long polling of the server and uploads its renders via hTTP POST.

So much easier.


That used to be called “Comet” back in the early 2000s.

Did you try using an established library like socket.io, connectRPC etc? They handle a lot of the complexity.


Long polling is easy - all it means is your server does not immediately respond - nothing more to it than that.

Not really the case for user-facing applications. Proxies can time out, detecting stalls is hard, reconnection is expensive, TCP slow start means higher latency, the overhead is huge for small messages. Implementing it properly is not trivial, the WebSocket standard was created precisely to improve on those shortcomings. Good for you that it works for your case, though if all you need is to listen to a stream you might also be better served by SSE.

I was asking since Socket.io, for example, takes care of file uploads, reconnection, the whole HTTP upgrade flow, and is extremely easy to use, both on client and server. On top of that it can fall back to long-polling if WS is not available.

Here's a link for educational purposes: https://en.wikipedia.org/wiki/Comet_(programming)


Long polling is great for most things that don't need a realtime push. It just gets to be a strain on a server if you've got to set up and tear down lots of those connections from lots of users. Keeping a socket alive is a lot less resource intensive. Maybe it sounds stupid, but I've even converted PHP code that responded to long polling to handle the same polling over a socket to save resources. Most of my apps that need some kind of lazy updates actually work this way, and fall back to REST polling the same services if the socket is down.

I liked vert.x's strategy of seamlessly downgrading the form of connection based on what is available.

Vert.x is great! I'm missing it lately with Node. At least with Vert.x you get a stack trace when you block the event loop by accident...

> It makes your server code more complex.

And, that is why we have frameworks to at least in the case of Web Sockets, make things as easy as regular old REST.


I personally view WebSockets as a nicer TCP that has all the messaging functionality you end up building anyway than as an alternative to HTTP.

Maybe I'm naive? But I thought its if you need stateful, use websockets. Else, use short/long poll or SSE.

With HTTP streaming the browser shows that it's still loading data. Is there some mitigation for it after the initial loading?

That sounds less like a problem with HTTP streaming (initiated from JavaScript) and more like a page with some hanging resource.

The fetch API is asynchronous. The initial page load would deliver the payload that then initiates the streaming connection in the background.

I'm guessing you would use JS to fetch() the stream resource separately.

Sure, it would be just cool to not have to do that

Not sure what you mean. How else are you going to make the request if not using fetch()?

You could use an iframe as well, with certain caveats.

Reads like a series of strawman arguments if you replace "WebSockets" with socket.io.

  - "messages aren’t transactional": You can process request and return a value to sender in socket.io application layer. Is that transactional enough?
  - "If you’re sending messages that don’t necessarily need to be acknowledged (like a heartbeat or keyboard inputs), then Websockets make a great fit". But socket.io has acknowledgements.
  - "When a new WebSocket connection is initiated, your server has to handle the HTTP “upgrade” request handshake.". You can bypass handshake and go straight to WS even in Websockets, and if you don't socket.io handles upgrade for you pretty nicely so you not parsing HTTP header ..

It's a good thing I didn't then :shrug:

Websockets are a web standard, socket.io is a userland framework


It's like arguing web components suck because there is all these problems you need to solve, while pretending web component frameworks (React, Vue, Angular, ..) that solve all those problems don't exist.

Why do you need to implement your own web socket server? Why not use AWS appsync events?

One thing I couldn’t get working with websockets is how do you keep websocket connections active during code deployments without disconnecting current connected clients?

Sounds very tricky to me to get right even at scale.


The trick is to make the connection stateless, i.e. any client can connect to any server (just like plain HTTP). Then when there's a new deployment the websocket connection will be terminated and the client can reconnect instantly, automatically finding the next available server.

You probably do. Reliable SSE is a complete nightmare.

Why?

Discord and Slack do the article's suggestion of using web sockets for the receiving side only (mostly) and having you switch to http on the calling side. It works pretty well, you have to keep two sets of books but the web socket side is almost always for events that you, the client, should be responding to so it works somewhat like a reverse http but that works with your firewall. It also allows Discord to implement trivial, from the client's perspective, sharding. It really is clever as hell, scaling up a bot/integration is as easy as just turning on sharding and launching multiple instances of your bot. It handles spreading events across them.

The author throws away their own suggestion but it clearly works, works well, and scales well into "supermassive" size. They don't even mention the real downside to web sockets which is that they're stateful and necessarily tied to a particular server which makes them not mesh at all with your stateless share-nothing http servers.


I don't need them but I do like them.

I see the shiny thing and I'm not delusional enough to think I need it.


I think at this point in my career my goal is to continue to never, ever, work on a public-facing website. 20 years into this foray of a career and I’ve avoided it so far.

My HTTP streaming has slowed to more of a trickle the last couple of years.

Setinterval.

just use meteor.js https://www.meteor.com/ ?

WebSockets can't go through proxies.

I think what you are getting at is that websockets aren't as simple as http traffic through a proxy, but you absolutely can use proxies and ws connections just fine and for a variety of reasons.

For all the other comments, parent is probably talking about forward proxies and to their point many forward/enterprise proxies have configurations which cause websockets to break and it is a pain to debug this if you have many enterprise customers.

Echoing this. At $DAYJOB some 5-10% of customers will fail to initiate a websocket connection, even over wss:// despite plain HTTPS requests working fine. This is a client-side issue with whatever outdated HTTP CONNECT implementation the enterprise has.

This isn't based on any facts

Works completely fine in Haproxy

I use them though nginx/cloudflare. they work fine.

I've definitely used websockets through nginx

They even go through Cloudflare.

or Fly.io

Says who?


The article forgot to mention websockets add state to the server! Load balancing will require sticking sessions. At scale this tends to mean separating websocket servers completely from http servers.



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: