Hacker News new | past | comments | ask | show | jobs | submit login
Million WebSockets and Go (2017) (gbws.io)
210 points by riobard on Dec 23, 2019 | hide | past | favorite | 34 comments



So some time ago when I was playing around with my toy project (RaspChat) I noticed creating 2 channels and a go routine for every incoming websocket connection is not the answer. I was designing RaspChat to work on a 512MB Raspberry Pi; and I was bottle-necked by GC, and memory consumption around 3 - 4K connections. After loads of optimizations I got it around 5K. Digging deeper and found well I have to maintain a pool of go routines (like threadpool) and I have to write event loop. I was instantly pulling my hair. I was sacrificing so much simplicity, and flexibility of Node.js just because I was trying to avoid event loop and wanted to use channels (I did too much Erlang months before starting project and couldn't think anything other than process and messages). I got a backlash on my release (https://github.com/maxpert/raspchat/releases/tag/v1.0.0-alph...) from go community telling me how I was using desierializers/leaving loop holes in file upload and I didn't know shit about language.

At that time I found uws (https://github.com/uNetworking/uWebSockets.js) that easily got me to 10K easily, and I was like "I would rather bet on a community investing on efficient websocket event loop rather than me writing my own sh*t". Don't get me wrong; I love Golang! Seriously I love it so much I have been pushing my company to use Golang. I just don't want to glorify the language for being silver bullet (which it's fanboys usually do). I would never implement complicated business logic that involves many moving pieces. When my business requires dealing with shape of an object and mixing matching things to pass data around; I would rather choose a language that lets me deal with shapes of object. Go has it's specific use-cases and strengths, people advertising it as move it to go and it would be faster than Java/C#/Node.js etc. have not done it or have not dealt with complexity of maintaining it.


The overhead of goroutines is well known. It's often advertised as only being 4 KB, but as in your case this sometimes is too much.

You got bitten by that but that's not the fault of Go.

The OP was bitten as well and describes a solution in Go. You've solved it by using Node.

Still, your post is quite destructive. Just get over it.


The way these guys did it is quite interesting:

http://marcio.io/2015/07/handling-1-million-requests-per-min...

I have experimented with something similar, ie a pool of goroutines to which work is dispatched (in my case, invoking anonymous functions passed via the input channels)


> When my business requires dealing with shape of an object and mixing matching things to pass data around; I would rather choose a language that lets me deal with shapes of object.

Could you elaborate on this a little?


Since he mentioned Erlang, I bet he's talking about pattern matching.


It’s quite frustrating, for me anyway, using other languages after Erlang. Pattern matching and its related features are addictive.


Pardon the obligatory throwing in of Rust, but it sounds like you were okay switching languages anyway - have you considered Rust as an option? It doesn't have GC and has a very healthy ecosystem (recently with async primitives officially supported by the syntax). It also has the pattern matching you seem to mean. Perhaps it would help you solve your optimization needs? Otherwise, I'd love to hear why it's not a good use case for it since I'm still exploring the language myself.


This kind of promotion creates the tense atmosphere around Rust in the community.

I wonder if anyone has read the linked article?

The overhead of goroutines are well known. The article describes the problem and a solution.

Now someone who got bitten by the overhead of goroutines complains with a (understandable) little bitter tone. He has a good explanation for the issue and why he didn't use Rust but Node.

Citation:

>> I started exploring various options ranging from Rust, Elixir, Crystal, and Node.js. Rust was my second choice, but it doesn't have a good, stable, production ready WebSocket server library yet. Crystal was dropped due to conservative nature of Boehm GC, and Elixir also used more memory than I expected. Node.js surprisingly gave me a nice balance of memory usage and speed.

Then someone didn't seem to have read all the stuff comes around and smartly calls "Use the awesome Rust".

Even as a Rust user myself I get annoyed.


Where is that citation from? Are you quoting from somewhere? I can't find it in the article.


At that point (3 years back) Rust had no good async IO library. All the recent progress in Rust and Tokio now makes it interesting choice.


This is still super interesting, two years later but does anyone have an update?

Susheel Aroskar, a Netflix engineer, did a talk about push notifications https://www.infoq.com/presentations/neflix-push-messaging-sc... (2018)


https://lwn.net/Articles/775238/

Dave Doyle and Dylan O'Mahony did something pretty amazing related too with websockets for Bose.


It's surprising to me that you apparently have to fight for memory usage for these cases when using Go.

A while ago I ran a (quite naively written) nodejs application that maxed out at ~700k WebSocket connections per server - using only 4GB of RAM. Here CPU became the bottleneck.


Go's concurrency design trades off memory usage for productivity; instead of red-blue functions where you have to explicitly design for function interrupt/yield points with the async keyword, you can just write sequential code and the runtime will handle the rest. The downside to this approach is that often the stack will have to be copied during the switching process vs the stackless approach preferred by Node.js, Rust, C# etc.

See the excellent Fibers Under a Magnifying Glass paper by Microsoft Research: http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2018/p136...


This is not how Go "works" overall, you're talking about the size of the goroutine stack which is by default 4KB, so in a scenario with a lot of connections yes it's going to add up if you use 1:1 connection / goroutine, but outisde of that Go uses less memory than Node / C# / Java / Python ect ...

So I woudn't say "Go trades off memory usage for productivity" since Go is widely used for low memory footprint.

Same reasons why Go makes sense in services like Kubernetes where each pods are in the range of 2 digits MB, it woudn't be possible whith languages mentioned above.

Edit: In your edit context it makes more sense :)


That said, as an operator of a number of fairly large kubernetes clusters (100-500 nodes), I might prefer java's memory tuneability to Go's garbage collector simply not caring about the max memory a system can offer.

At scale, we need to heavily oversize the kubernetes masters to deal with the spikiness of go memory consumption. It's possible for a kube apiserver to average 2GB of memory use and then jump to 18GB after suddenly handling more requests than usual and get OOM-killed. I'd much rather it simply slow down a bit than behave so erratically.

This is a common thing across all Go programs that handle data: since Go's garbage collector doesn't try to keep memory use within some upper bound, if you allocate and throw away a bunch of objects you'll quickly run out of memory on any low-memory system or container even though there's a ton of memory ready to be reclaimed.

Even a major library like the official AWS SDK's S3 client had this wrong as recently as 2017 (solution was to use sync.pool to avoid throwing away buffers): https://github.com/aws/aws-sdk-go/pull/1784

Python should perform much better from a memory perspective in these situations: its weakness would be the deserialized size that objects blow up to in memory, but its use would be bounded. Java would also handle this pretty effortlessly, though with a high enough rate of garbage production might hit a few short GC pauses that last far less time than a process restart.

Really wish go were better about this. Imagine that a simple Go cronjob that uploads a backup to s3 at under a 160MB/sec could risk you OOM killing your server if you don't set cgroup memory caps on all your go processes.


> where each pods are in the range of 2 digits MB, it woudn't be possible whith languages mentioned above.

Not sure about NodeJS and Python, but it's certainly doable with Java and C#. It's just that people don't take the time to configure the JVM/CLR correctly.

There's nothing magical about golang that you can't do in C# (and soon enough, in Java with the addition of value types). Arguably, C# and Java's value type implementations are superior anyway.


Kubernetes released in 2015, at that time it wasn't possible no to run some Java / C# servers with settings bellow 256MB ( -XMS ), I'm pretty sure it's still the case with Java as of today. Try to run some service with -XMX -XMS 128MB and tell us how it goes.


> Try to run some service with -XMX -XMS 128MB and tell us how it goes.

What does the service do? An API call that returns the current time? A batch processor? A payment portal? The memory usage depends on the type of work performed obviously.

Furthermore, there are already offerings like https://quarkus.io/, micronaut, and others that make use of native image compilation for even smaller footprints.


You don't even want to to use -XMX -XMS in a container setting anyway. You can set the JVM to use a percentage of the container size.


Productivity in this case being in the eye of the beholder? I’d argue that people experienced in how node works, wouldn’t have to think too much, since async is the default. I agree with your general sentiment though.


Async by default can often lead to weird race conditions if you are not careful.


Yes; you still need to lock/synchronize shared resources in an async context. Even if the environment is single-threaded.


Do you have an example?

If it is single threaded, and we are talking about shared variables, I thought I can assume the runtime is not going to pause my code execution, switch context, and run other code midway through my call back handler.

If we are talking about shared external resources (e.g. who can update a cloud blob) then we could have a proxy for that as a variable. You might need retry logic, and it could get tricky in that respect.


You can rely on node preventing data races, but those are distinct from race conditions. A race condition in the logic of your code can happen any time two "threads" of execution (in this case a thread could be considered a chain of asynchronous callbacks) interleave their operations. It's possible for one of them to do something with a resource the other was using unless you use some kind of synchronization to prevent the other from using the resource until the first thread is done with it. For example, two callback chains could start using the same database connection object. Perhaps one chain was in the middle of setting up a transaction when it needed to wait for some other async resource to load, and the other chain comes in and does something with it. Now it's in an unintended state because the object was allowed to be used by two different "threads" of callback chains.


This is not the case, if your callback then goes off and does something asynchronous.


Yes sorry I was thinking in terms of code that uses callbacks, not the async keyword.

What I mean is that distinct from threading, where the following code could be interrupted between the first and second line of the function, by something that updates global:

    var global

    function addTheseToGlobal(a, b) {
      im = a + global
      return im + b
    }


I’d then have node handle the “simple” socket part, and the complexity which is race condition prone, handled by a language better suited for that, responding to the async node implementation? Edit: spelling


You don't have to fight for memory, it's because of the overhead of Goroutines / default HTTP connections in a scenario with a lot of connections. By default Go uses way less memory than Node.

And in Node the only way to have semi decent performance is to use ultra optimized C/C++ external libraries.


CPU was probably the bottleneck from GC. You could tune the GC to get better results probably.


There's a brief mention of the load-balancer (nginx) in front of the Go servers; I'm curious if there's anything interesting happening there. I'd imagine that if you lose a server, all of the clients will try to reconnect and traffic will be spread across the existing servers. That's all find and good, but presumably when you bring up a new server to replace the failed on, it'll be seriously underutilized. Is there some easy solution here in nginx-land?


For websocket? Yes (https://github.com/SocketCluster/loadbalancer), but you would have to introduce another layer (AFAIK) that would detect failure and reconnect to a healthy target without informing the client.


For mail.ru, I was expecting [1] you would use tarantool for this task

[1]https://hackernoon.com/tarantool-when-it-takes-500-lines-of-...





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: