Million WebSockets and Go (2017)

maxpert · on Dec 24, 2019

So some time ago when I was playing around with my toy project (RaspChat) I noticed creating 2 channels and a go routine for every incoming websocket connection is not the answer. I was designing RaspChat to work on a 512MB Raspberry Pi; and I was bottle-necked by GC, and memory consumption around 3 - 4K connections. After loads of optimizations I got it around 5K. Digging deeper and found well I have to maintain a pool of go routines (like threadpool) and I have to write event loop. I was instantly pulling my hair. I was sacrificing so much simplicity, and flexibility of Node.js just because I was trying to avoid event loop and wanted to use channels (I did too much Erlang months before starting project and couldn't think anything other than process and messages). I got a backlash on my release (https://github.com/maxpert/raspchat/releases/tag/v1.0.0-alph...) from go community telling me how I was using desierializers/leaving loop holes in file upload and I didn't know shit about language.

At that time I found uws (https://github.com/uNetworking/uWebSockets.js) that easily got me to 10K easily, and I was like "I would rather bet on a community investing on efficient websocket event loop rather than me writing my own sh*t". Don't get me wrong; I love Golang! Seriously I love it so much I have been pushing my company to use Golang. I just don't want to glorify the language for being silver bullet (which it's fanboys usually do). I would never implement complicated business logic that involves many moving pieces. When my business requires dealing with shape of an object and mixing matching things to pass data around; I would rather choose a language that lets me deal with shapes of object. Go has it's specific use-cases and strengths, people advertising it as move it to go and it would be faster than Java/C#/Node.js etc. have not done it or have not dealt with complexity of maintaining it.

T_A_3423 · on Dec 24, 2019

The overhead of goroutines is well known. It's often advertised as only being 4 KB, but as in your case this sometimes is too much.

You got bitten by that but that's not the fault of Go.

The OP was bitten as well and describes a solution in Go. You've solved it by using Node.

Still, your post is quite destructive. Just get over it.

kitd · on Dec 24, 2019

The way these guys did it is quite interesting:

http://marcio.io/2015/07/handling-1-million-requests-per-min...

I have experimented with something similar, ie a pool of goroutines to which work is dispatched (in my case, invoking anonymous functions passed via the input channels)

hnarn · on Dec 24, 2019

> When my business requires dealing with shape of an object and mixing matching things to pass data around; I would rather choose a language that lets me deal with shapes of object.

Could you elaborate on this a little?

nesarkvechnep · on Dec 24, 2019

Since he mentioned Erlang, I bet he's talking about pattern matching.

macintux · on Dec 24, 2019

It’s quite frustrating, for me anyway, using other languages after Erlang. Pattern matching and its related features are addictive.

d33 · on Dec 24, 2019

Pardon the obligatory throwing in of Rust, but it sounds like you were okay switching languages anyway - have you considered Rust as an option? It doesn't have GC and has a very healthy ecosystem (recently with async primitives officially supported by the syntax). It also has the pattern matching you seem to mean. Perhaps it would help you solve your optimization needs? Otherwise, I'd love to hear why it's not a good use case for it since I'm still exploring the language myself.

T_A_3423 · on Dec 24, 2019

This kind of promotion creates the tense atmosphere around Rust in the community.

I wonder if anyone has read the linked article?

The overhead of goroutines are well known. The article describes the problem and a solution.

Now someone who got bitten by the overhead of goroutines complains with a (understandable) little bitter tone. He has a good explanation for the issue and why he didn't use Rust but Node.

Citation:

>> I started exploring various options ranging from Rust, Elixir, Crystal, and Node.js. Rust was my second choice, but it doesn't have a good, stable, production ready WebSocket server library yet. Crystal was dropped due to conservative nature of Boehm GC, and Elixir also used more memory than I expected. Node.js surprisingly gave me a nice balance of memory usage and speed.

Then someone didn't seem to have read all the stuff comes around and smartly calls "Use the awesome Rust".

Even as a Rust user myself I get annoyed.

Buge · on Dec 26, 2019

Where is that citation from? Are you quoting from somewhere? I can't find it in the article.

maxpert · on Dec 24, 2019

At that point (3 years back) Rust had no good async IO library. All the recent progress in Rust and Tokio now makes it interesting choice.

andrewmatte · on Dec 23, 2019

This is still super interesting, two years later but does anyone have an update?

Susheel Aroskar, a Netflix engineer, did a talk about push notifications https://www.infoq.com/presentations/neflix-push-messaging-sc... (2018)

andrewmatte · on Dec 23, 2019

https://lwn.net/Articles/775238/

Dave Doyle and Dylan O'Mahony did something pretty amazing related too with websockets for Bose.

phoboslab · on Dec 23, 2019

It's surprising to me that you apparently have to fight for memory usage for these cases when using Go.

A while ago I ran a (quite naively written) nodejs application that maxed out at ~700k WebSocket connections per server - using only 4GB of RAM. Here CPU became the bottleneck.

sansnomme · on Dec 23, 2019

Go's concurrency design trades off memory usage for productivity; instead of red-blue functions where you have to explicitly design for function interrupt/yield points with the async keyword, you can just write sequential code and the runtime will handle the rest. The downside to this approach is that often the stack will have to be copied during the switching process vs the stackless approach preferred by Node.js, Rust, C# etc.

See the excellent Fibers Under a Magnifying Glass paper by Microsoft Research: http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2018/p136...

Thaxll · on Dec 23, 2019

This is not how Go "works" overall, you're talking about the size of the goroutine stack which is by default 4KB, so in a scenario with a lot of connections yes it's going to add up if you use 1:1 connection / goroutine, but outisde of that Go uses less memory than Node / C# / Java / Python ect ...

So I woudn't say "Go trades off memory usage for productivity" since Go is widely used for low memory footprint.

Same reasons why Go makes sense in services like Kubernetes where each pods are in the range of 2 digits MB, it woudn't be possible whith languages mentioned above.

Edit: In your edit context it makes more sense :)

paulfurtado · on Dec 24, 2019

That said, as an operator of a number of fairly large kubernetes clusters (100-500 nodes), I might prefer java's memory tuneability to Go's garbage collector simply not caring about the max memory a system can offer.

At scale, we need to heavily oversize the kubernetes masters to deal with the spikiness of go memory consumption. It's possible for a kube apiserver to average 2GB of memory use and then jump to 18GB after suddenly handling more requests than usual and get OOM-killed. I'd much rather it simply slow down a bit than behave so erratically.

This is a common thing across all Go programs that handle data: since Go's garbage collector doesn't try to keep memory use within some upper bound, if you allocate and throw away a bunch of objects you'll quickly run out of memory on any low-memory system or container even though there's a ton of memory ready to be reclaimed.

Even a major library like the official AWS SDK's S3 client had this wrong as recently as 2017 (solution was to use sync.pool to avoid throwing away buffers): https://github.com/aws/aws-sdk-go/pull/1784

Python should perform much better from a memory perspective in these situations: its weakness would be the deserialized size that objects blow up to in memory, but its use would be bounded. Java would also handle this pretty effortlessly, though with a high enough rate of garbage production might hit a few short GC pauses that last far less time than a process restart.

Really wish go were better about this. Imagine that a simple Go cronjob that uploads a backup to s3 at under a 160MB/sec could risk you OOM killing your server if you don't set cgroup memory caps on all your go processes.

apta · on Dec 24, 2019

> where each pods are in the range of 2 digits MB, it woudn't be possible whith languages mentioned above.

Not sure about NodeJS and Python, but it's certainly doable with Java and C#. It's just that people don't take the time to configure the JVM/CLR correctly.

There's nothing magical about golang that you can't do in C# (and soon enough, in Java with the addition of value types). Arguably, C# and Java's value type implementations are superior anyway.

Thaxll · on Dec 24, 2019

Kubernetes released in 2015, at that time it wasn't possible no to run some Java / C# servers with settings bellow 256MB ( -XMS ), I'm pretty sure it's still the case with Java as of today. Try to run some service with -XMX -XMS 128MB and tell us how it goes.

apta · on Dec 24, 2019

> Try to run some service with -XMX -XMS 128MB and tell us how it goes.

What does the service do? An API call that returns the current time? A batch processor? A payment portal? The memory usage depends on the type of work performed obviously.

Furthermore, there are already offerings like https://quarkus.io/, micronaut, and others that make use of native image compilation for even smaller footprints.

jayd16 · on Dec 24, 2019

You don't even want to to use -XMX -XMS in a container setting anyway. You can set the JVM to use a percentage of the container size.

nielsbjerg · on Dec 23, 2019

Productivity in this case being in the eye of the beholder? I’d argue that people experienced in how node works, wouldn’t have to think too much, since async is the default. I agree with your general sentiment though.

sansnomme · on Dec 23, 2019

Async by default can often lead to weird race conditions if you are not careful.

networkimprov · on Dec 23, 2019

Yes; you still need to lock/synchronize shared resources in an async context. Even if the environment is single-threaded.

mc3 · on Dec 23, 2019

Do you have an example?

If it is single threaded, and we are talking about shared variables, I thought I can assume the runtime is not going to pause my code execution, switch context, and run other code midway through my call back handler.

If we are talking about shared external resources (e.g. who can update a cloud blob) then we could have a proxy for that as a variable. You might need retry logic, and it could get tricky in that respect.

connicpu · on Dec 24, 2019

You can rely on node preventing data races, but those are distinct from race conditions. A race condition in the logic of your code can happen any time two "threads" of execution (in this case a thread could be considered a chain of asynchronous callbacks) interleave their operations. It's possible for one of them to do something with a resource the other was using unless you use some kind of synchronization to prevent the other from using the resource until the first thread is done with it. For example, two callback chains could start using the same database connection object. Perhaps one chain was in the middle of setting up a transaction when it needed to wait for some other async resource to load, and the other chain comes in and does something with it. Now it's in an unintended state because the object was allowed to be used by two different "threads" of callback chains.

foota · on Dec 24, 2019

This is not the case, if your callback then goes off and does something asynchronous.

mc3 · on Dec 24, 2019

Yes sorry I was thinking in terms of code that uses callbacks, not the async keyword.

What I mean is that distinct from threading, where the following code could be interrupted between the first and second line of the function, by something that updates global:

    var global

    function addTheseToGlobal(a, b) {
      im = a + global
      return im + b
    }

nielsbjerg · on Dec 24, 2019

I’d then have node handle the “simple” socket part, and the complexity which is race condition prone, handled by a language better suited for that, responding to the async node implementation? Edit: spelling

Thaxll · on Dec 23, 2019

You don't have to fight for memory, it's because of the overhead of Goroutines / default HTTP connections in a scenario with a lot of connections. By default Go uses way less memory than Node.

And in Node the only way to have semi decent performance is to use ultra optimized C/C++ external libraries.

winrid · on Dec 24, 2019

CPU was probably the bottleneck from GC. You could tune the GC to get better results probably.

jbmsf · on Dec 24, 2019

There's a brief mention of the load-balancer (nginx) in front of the Go servers; I'm curious if there's anything interesting happening there. I'd imagine that if you lose a server, all of the clients will try to reconnect and traffic will be spread across the existing servers. That's all find and good, but presumably when you bring up a new server to replace the failed on, it'll be seriously underutilized. Is there some easy solution here in nginx-land?

toredash · on Dec 24, 2019

For websocket? Yes (https://github.com/SocketCluster/loadbalancer), but you would have to introduce another layer (AFAIK) that would detect failure and reconnect to a healthy target without informing the client.

fasteo · on Dec 24, 2019

For mail.ru, I was expecting [1] you would use tarantool for this task

[1]https://hackernoon.com/tarantool-when-it-takes-500-lines-of-...

maurodelazeri · on Dec 23, 2019

nothing beats this https://github.com/uNetworking