Hacker News new | past | comments | ask | show | jobs | submit login

Creating millions of coroutines or "goroutines" isn't really interesting by itself. I believe the real questions for me are:

1. is Go scheduler better than Linux scheduler if you have a thousand concurrent goroutines or threads?

2. is really creating goroutines that much faster than creating a native thread?

3. are gorutines relevant for long lived and computationally intensive tasks or just for short lived I/O task?

4. What is the performance of using channels between goroutines compared to native threads?

tbh, I have read several of respected articles that criticize Golang goroutines in terms of performance and I am not really sure that Golang's only virtue imho which is simple concurrency is performant at all




Go's scheduler is not preemptive, which means that sometimes Goroutines can hog the processor. If loops don't contain function calls, or allocate memory, or other preemption points, they can prevent other goroutines from running. A simple example is:

    package main
    func main() {
      go println("I ran")
      for {}
    }
If you run with:

    GOMAXPROCS=1 go run main.go
It will never print the statement. This doesn't come up frequently in practice, but Linux does not suffer from this case.


I heard Go got a preemptive scheduler recently.

EDIT: https://github.com/golang/go/issues/24543


This just happened to me. I wanted to run a goroutine and then not exit the program, so I ran an empty for loop and it didn’t work. Would a for{ continue } have work?


`for { continue }` is semantically equivalent to `for {}` since the closing bracket of the loop block implies a `continue`.

As a sibling correctly notes, `select {}` is what you want to do instead. You need to do something that can block the goroutine in such a way that control returns to the scheduler. Selecting is one of those ways.


You can use select{} to block forever and yield the goroutine.


Yield?


Interesting find.


These are good questions and I'd be interested in more authoritative answers, but I'll hazard guesses.

1. Go's scheduler is better at scheduling Go. There are well defined points at which a context switch can occur, and the scheduler is in userland so you never really have expensive context switches.

2. Not sure. I would guess so, if only because it saves a couple context switches.

3. Goroutines are absolutely relevant for both. The stdlib webserver is a high-performance, production-grade implementation and it forks a goroutine for each request, and applications frequently fork more goroutines within a request handler.

4. I assume you're comparing Go channels to pipes? In which case the answer is probably always "faster", but how much faster depends on the size of the thing you're transferring since IPC requires serialization and deserialization while in Go the overhead is just some locking and a pointer copy.

5. Go has dynamically-sized stacks. While you can change stack size on Linux (to use very small stacks and thus spin up more threads), I don't think you can change the size of an individual stack. Plus with Go, you don't have to configure stack size at all; it works out of the box on every platform.

For me, that Go's goroutines are performant is just gravy. I like that they're a platform agnostic concurrency abstraction. I don't have to deal with sync vs async ("What color is your function"; which incidentally cost me the last 1.5 workdays tracking down a performance bug in Python) or posix threads vs Windows threads.


Re 5: pthread allows setting the stack size of each individual thread although this is seldom done in practice as on 64 bit address space is not at premium.

GCC also, on some platforms, allows using segmented stacks on any C and C++ programs which means that a thread will only use only as much physical and virtual memory as required. I don't think segmented stacks have seen much use though as they have non zero cost and, for optimal space usage, ideally require recompiling any dependncy, including glibc with them.

Interestingly segmented stacks were implemented to support GCCGO.


How does Go handle non-async syscalls these days? If you spawn a thousand goroutines which all call stat(), does this still spawn a thousand kernel threads?

This took some of the bloom off the rose for me. For each goroutine I had to predict ahead-of-time whether it would require a kernel thread, and if so send it through a rate limiter. Effectively it was sync-vs-async but hidden.


Yes, it spawns a kernel thread if all the threads are currently blocked. I ran into this issue running Go on a FUSE mounted file system. If the network hung, Go would spawn thousands of threads and OOM. Go can't _not_ spawn them, as that would risk deadlock.


(disclaimer: I know very little about Go)

An expected way to manage this would be to have a limit on the number of threads to run blocking syscalls. Up to you how to do it (threadpool for syscalls, anythrrad can run a syscalls but checks a mutex first, etc)

I don't think there's a danger of deadlocks here -- your blocking syscalls shouldn't be dependant on other goroutines. Eventually the calls will succeed or timeout (one hopes) and the next call will commence.

In my experience, you can usually find a safe number of parallel syscalls to run -- and it's often not very many.


Imagine 1000 pipes, each of which has a goroutine calling write() and another calling read(). If you schedule all of the writers and none of the readers (or vice versa) you'll get a deadlock.


AFAIK when a goroutine "calls write" (on a channel or on a socket, stream, mutex etc).. and the write "is blocked" it yields the execution and the scheduler can activate something else.. which can be a reader goroutine (after all the writers are blocked for example). So there's no deadlock as long as you have at least one reader for any writer.


That requires the underlying syscall to support an async or non blocking mode though. Disk io or stat doesn't on linux for example. The usual alternative is some sort of background pool for blocking tasks (which adds non trivial overhead), or, where supported by the OS, scheduler activation.


You are right, see Synchronous System Calls in

https://www.ardanlabs.com/blog/2018/08/scheduling-in-go-part...


Sorry, I also expected those things that can be translated into non-blocking calls with select-ish apis for readiness to be translated as well -- because it's the right thing to do in this type of environment.

To be more specific, most socket calls should be async + selectish, but file system calls would likely be done as synchronous, because async doesn't generally work for that -- anyway limiting the number of outstanding file I/O requests tends to help throughput.


stat() cannot (on platforms I am familiar with) be performed in a selectish/nonblocking way. That call can block forever. Local filesystem access in general is still not async-capable in very important ways.


Yes, filesystem calls are going to block -- but putting that into a limited threadpool shouldn't deadlock your system -- you can't do a blocking read on a file, waiting for it to be filled by another thread (unless/until you start playing with FUSE, I guess.


Re 3. Funny you should mention net http's goroutine per request model. Fasthttp massively out performs stdlib on every techemporer benchmark using a goroutine/worker pool model.

https://stackoverflow.com/questions/41627931/in-golang-packa...

https://www.techempower.com/benchmarks/

http://marcio.io/2015/07/handling-1-million-requests-per-min...


To be clear, that doesn’t invalidate my claim (goroutines per request are suitable, not maximally performant).


> so you never really have expensive context switches

It can be possible only if there are no more "physical" threads than cores. And anyway a switching between async routines on the same "physical" thread requires a switching of routines contexts (go to the memory and update cashes).


It seems to me that Green threads exist primarily for convenience.

They are definitely not more performant than native threads with a work-stealing approach (producer - consumer pattern) that is tuned to the number of cores available. How could they be more performant? Even the simplest switching adds some cost, from a performance perspective it would never make sense to run thousands of threads on 8 cores, whether they are scheduled by the OS or by the language implementation.

So to answer your 3rd question: short lived I/O tasks.


I guess it's interesting if it makes the programming model easier, but with similar or better performance than normal threads.


if it's about simplicity, then Javascript and Node are better at concurrency for me when it comes to I/O, Node is very performant too


Node is definitely not performant by any equal benchmark, but the async model does lend itself to simplicity in some cases in non-sequential workloads. But even then, it gets stomped by vert.x running on the jvm. I think the most valuable contribution is democratizing package management for SPAs as we move towards thick clients again.


I find goroutines and channels much easier to work with than promises, callbacks, or async/await, and the former can parallelize computation as well. Not to mention that Go is typically faster than Node in single-threaded execution.


You can easily implement channels in ES6. I've done it and my implementation is probably 40-50 lines of code. You can then use async/await to simulate go routines. It's actually a bit nicer than go because of the way async coordinates with promises. You can more easily synchronise the routines (for example in situations where you might need a wait group in go).

I keep expecting someone to write one and put it up on NPM (I haven't looked recently, though). Perhaps I should clean up my and stick it up there. It provides a very nice abstraction of continuations. I did it merely as kata to show that continuations do not require concurrency.


I've read a few articles on implementing CSP in JavaScript, and there are definitely some packages available, e.g.

https://github.com/olahol/node-csp

https://github.com/ubolonton/js-csp

https://github.com/gozala/channel


Yes Yes Both Wat

Are you asking to compare mutexes and channels? The article goes through multiple reasons why the go scheduler has more information about how something should execute than the Linux kernel. Reread and then maybe rephrase?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: