bwross's comments

bwross · on March 23, 2015

If you look really closely at the noisy images, you can see little blotches that vaguely resemble the things the computer recognized them as.

Coincoin · on March 23, 2015

Yeah, they say the algorithm erroneously sees an Amarillo. But wait, I actually see an Amarillo and I actually see a centipede.

bwross · on Feb 15, 2015

So the basic take away seems to be: don't bother using async patterns for single, low latency connections to a server on your local network.

For anything where you're dealing with thousands of connections from random Internet hosts, "just spawn a thread for it" does not cut it. If you take that approach, you're setting yourself up to be accidentally DoS'd at some point in the near future. Async, on the other hand, has more than proven itself to be apt for this kind of scenario.

thrownaway2424 · on Feb 15, 2015

I'd want data. The system I work on does in fact spawn a thread to handle each and every connection and in fact each connection thread spawns numerous child threads to exploit available parallelism within the request. The code is fully blocking and linear and anyone can read it and see what it is doing. The mentioned system is one of the largest public networks services on earth.

I am very skeptical of the idea that you must not handle thousands of connections with a thread per connection. High tens of thousands of threads per core is the minimum level where I would start to worry.

bwross · on Feb 16, 2015

Show me your application and I'll tell you where it's either calling select() or is limiting connections to keep itself from locking up the OS.

thrownaway2424 · on Feb 16, 2015

You're right on the second point. I have it limited to 4 million sockets per machine.

bwross · on Feb 15, 2015

Also analogous to cooperative multitasking as used in the modern era in high performance server applications.

To great success, I might add.

bwross · on Feb 12, 2015

Additionally, a TCP connection is essentially an operating system resource; you need to set aside a port and space for a send and receive buffer. It might seem fine for a client to open hundreds connections, but imagine being a server with thousands of clients all opening hundreds of connections to you. You very quickly run out of resources and either have to close connections or reject new connections.

otterley · on Feb 13, 2015

That hasn't been a practical problem in many years. Most servers have gigabytes of RAM and a 64-bit kernel nowadays.

getsat · on Feb 13, 2015

Linux starts to act weird around 200,000 concurrent connections in my experience, even with aggressive sysctl tuning. You end up with weird edge cases like netstat literally taking 15 minutes of CPU time (in kernel) before it dumps the list of connections to stdout.

Not sure about FreeBSD or any other OSes.

btmorex · on Feb 13, 2015

Not saying netstat isn't slow, but:

# time sh -c 'netstat -tn | wc -l' 486206

real 0m13.538s user 0m1.698s sys 0m10.380s

It still works with a whole lost of connections. (in fairness, only about 130k were connected)

getsat · on Feb 14, 2015

The problem is that it doesn't scale linearly. There's some O(n4) algorithm being used in netstat or some kernel syscalls or something. Once you go over 250k, things get _really_ weird.

patrickmcmanus · on Feb 15, 2015

try ss -nt instead.. it uses netlink sockets instead of /proc and generally scales much better

getsat · on Feb 17, 2015

Thanks for the tip!

bwross · on Feb 10, 2015

Solutions that pull the TCP stack out of the kernel perform so much better because they're bypassing all the internal bureaucracy that the kernel otherwise performs to make it as easy as possible for userspace applications to use the network without stepping on other applications' toes.

The kernel socket API is designed so that programs have to do as little thinking as possible to get their own personal slice of the shared and noisy network. It provides an easy abstraction, and that requires the kernel do a lot of messy stuff for you:

- When you're using TCP sockets, the kernel makes copies of everything your application writes and holds it in a buffer until its receipt is acknowledged, just in case it needs to resend it when the other side doesn't acknowledge it. If the socket's buffer fills up, your application blocks on I/O until some space is freed.

- It holds ports open in a lingering state long after they're closed just in case it needs to re-transmit the last bytes. This can be disabled, but it's on by default.

- It takes care of all the congestion control for you, but it's tuned for the general case, and as a result there are a lot of edge cases which perform very badly for the problem they're trying to solve. Redis is probably one such edge case.

Of course, all of this is fine and desirable for general applications, but it ends up being problematic if you're trying to solve a problem where performance is the chief concern.

It's tempting to say the problem is that kernel has to do way too much to provide that easy abstraction, but really the problem is that the kernel provides no way around it. You pretty much have the option of using their cushy stream abstraction at the cost of performance, or you use a userspace TCP stack on raw sockets, which requires running as root and disabling TCP in the kernel (otherwise the kernel stomps all over your TCP negotiations[1]).

There are some other transport layer protocols (SCTP, DCCP, etc.), as well as application layer protocols built on UDP, that remove some of the abstractions TCP provides and as a result require less in-kernel bureaucracy, but those solutions don't seem to be very popular or well-supported.

It would be nice if the kernel would provide some lower level system calls that could be selectively used to move parts of TCP into the application (e.g., retaining copies of data in case of re-transmission). Alas, I don't think there's much push for that, because a) it's hard, and b) the current situation is fine for 99% of network applications.

[1] http://jvns.ca/blog/2014/08/12/what-happens-if-you-write-a-t...

bwross · on Feb 5, 2015

Just don't try to do TCP over tapes on a station wagon.

bwross · on Feb 5, 2015

The primary advantage GridFTP has over simply using tar+netcat for performance is that GridFTP can multiplex transfers over multiple TCP connections. This is helpful as long as the endpoint systems limit the per-connection buffer size to some value less than the bandwidth-delay product (BDP) between them. If you've got to bug sysadmins to get GridFTP set up for you on both endpoints, you might as well just ask them to increase the maximum TCP buffer size to match the BDP.

EDIT: Sorry, "multiplex" is not the right word to describe that. It's more like GridFTP "stripes" files across multiple connections; it divides the file into chunks, sends the chunks over parallel connections, and reassembles the file at the destination.