I tried DNS over TLS (somewhat similar) and it has some potential. But not with those strict timeouts. 1.1.1.1 closes the TCP connection almost instantly after the query response, 9.9.9.9 waits a bit longer, about 10 seconds (need to check again).
So everytime you want to make a query, you have to wait several RTTs before getting a response.
The connection need to be open for as long as possible, at least 5 minutes.
I used stubby as forwarder with idle_timeout: 6500000, the idle timeout in ms. The connection gets closed by the remote party, not by stubby.
Doesn't matter what they were designed for. With TCP they need to behave that way. Otherwise this is a solution for people with latency <10 ms to the server. So not a whole lot.
I'll argue that the TCP and TLS handshake take more processing power then keeping the connection open.
Which I doubt is a problem for Cloudflare or Quad9. Anyway, a TCP based DNS service needs to consider those things. Otherwise it is becoming unusable due to very high response times.
A standard 8 GB system with Debian 9 gives me 1048576 max file descriptors. I am sure this can be optimized still.
The default socket receive and send buffers are ~200KB, so you would actually need 400 GB of memory in order to have each of those 1048576 file descriptors connected to a unique socket.
And if you were keeping them open for 5 minutes as suggested, that would still limit you to only 3400 clients / second.
I do actually agree that they need a longer idle timeout on these connections, but I just wanted to point out that comparisons with the processing power required to set up a TLS connection aren't apt.
I'm pretty sure that they don't HAVE to use the defaults, and for something like DNS, they probably shouldn't be... The buffer should probably be limited to what the largest request segment would be for creating the TLS/HTTPS connection in the first place, which just guessing would be closer to 1K.
So everytime you want to make a query, you have to wait several RTTs before getting a response.
The connection need to be open for as long as possible, at least 5 minutes.
I used stubby as forwarder with idle_timeout: 6500000, the idle timeout in ms. The connection gets closed by the remote party, not by stubby.