Hacker News new | past | comments | ask | show | jobs | submit login
The SO_LINGER page, or: why is my TCP not reliable (2009) (netherlabs.nl)
68 points by bluestreak on June 12, 2021 | hide | past | favorite | 8 comments



This is what we used to call a "Berkeleyism". It's an error in the UC Berkeley implementation of TCP in the kernel, which, being free, became standard. (Yes, there was TCP/IP before Berkeley UNIX, at least four implementations.)

The trouble is that UNIX does not take "close()" very seriously. This is because "close()" is implicit on exit, and having "exit()" stall waiting for I/O is undesirable. So, close is non-blocking. When you close a file after writing, close happens immediately, even if writing is not yet done. "Calling close() DOES NOT guarantee that contents are on the disk as the OS may have deferred the writes."

So sockets got the same unchecked semantics as files.

Now, logically, you'd call "fsync()" to get a socket fully flushed. But, another Berkeleyism is that sockets were originally distinct from the file system, because networking was bolted on as an afterthought and not well integrated. Hence "send" and "recv" instead of "read" and "write". Being able to use read and write on sockets came later. So, sockets have "shutdown".


The semantics around close() are as they are not because of issues around exit, but because of shared descriptors.

Calls that duplicate descriptors such as fork() or dup() will result in multiple descriptors which refer to the same network socket. These sockets need to be able to be closed individually via close() without actually shutting down the socket or modifying the socket state.

If close() terminated a connection ala shutdown() then it would be impossible for a process with a network socket to fork a temporary subprocess.

A different call is necessary to indicate that you wish to terminate the underlying network socket state.


Usually when people criticize UNIX i/o behavior for not being proper what they want is for it to do that thing NT does where processes freeze indefinitely at the first sign of trouble, and you have to pull out the task manager and kill the thing. Nice thing about UNIX is pretty much the only time things get that bad is when you implement sigchld wrong, and the times when we want that behavior, like with linger, we can specify the number of milliseconds it should hang. In fact, NT is so zealous about the process freezing thing that even when it comes to the signals it's required to be able to deliver by the ANSI C standard it will spawn a thread in your program just to do it. Being able to drop connections yolo style without the kernel having strong opinions is pretty important if, for example, your app is getting attacked by a slowloris and you need to shed resources while responding to requests from the main process. That's why proper operating systems are rarely production worthy.


Yes, as succinctly captured by the "worse is better" philosophy: https://www.jwz.org/doc/worse-is-better.html



Those better engineers picked the wrong hill to die on with EINTR since it was never worse. It harmoniously reflects the way hardware works and hasn't threatened UNIX's long-term survival. His ideas could be more appropriately applied today to Node and Rust which made a devil's bargain adopting a quadratic packaging model.


When the socket is closed as part of exit(2), it always lingers in the background

In other words, not closing the socket before exiting might actually be the easiest way to not have the connection aborted with pending data?

Note that Windows has slightly different, and perhaps the more expected, semantics for closing a socket with pending data:

https://docs.microsoft.com/en-us/windows/win32/api/winsock/n...

However, any data queued for transmission will be sent, if possible, before the underlying socket is closed. This is also called a graceful disconnect or close. In this case, the Windows Sockets provider cannot release the socket and other resources for an arbitrary period, thus affecting applications that expect to use all available sockets. This is the default behavior for a socket.


Actually, your quoted text applies to when SO_LINGER option is used and the semantics are exactly the same for both platforms for this option.

Matching the semantics of Unix for most things was a goal of Winsock. In those days the NT TCP stack was new (compared to Berkeley TCP) and so there was a lot of effort spent to get it to match behavior so that applications did not get unexpected glitches.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: