This article isn't bad... but it misses several important points of Go. I also note the article is 9 months old. In the hope that my criticism will be taken as constructive, with apologies for not writing detailed explanations:
1. Goroutines are not threads
2. type inference allows you to elide types in var declarations:
var host = flag.String(...
3. Go's convention is to use camel case, not underscores.
4. Calling os.Exit all over the place is unusual imo - it may be better to panic().
6. An explanation of why the standard log package isn't suitable would be nice, although I see the format used is slightly different.
7. Ignored errors when writing. Why do you wish to sync at every packet?
8. Massive race condition by re-using b. Line 62 overwrites b. b is then passed to c.logger (68) and c.binary_logger (70) for them to process asynchronously. c.logger is then passed another byte slice (72), which forces it to finish using b. c.binary_logger is not passed anything else, allowing it to delay until the next time something is sent on that channel, which would be after b is overwritten by the next packet. I think that simply moving line 58 to between 61,62 would fix this.
The author has not quite grokked the concept of "don't share memory". b is shared, and undefined behaviour results :(
Your critique (which seems valid; I'm not a Go expert) does not exactly instill confidence in the Pragmatic Programmer (PP) brand.
When I read material from them, I would have assumed I'd be seeing "idiomatic code from an expert"...not "hey, I'm a newbie hacking my way around in a new language, here's what I typed out in 2 days of playing with it".
I mean, I understand they're big on the "learning a new language on a regular basis thing", but when I see code published like this, specifically from their brand, I go into it thinking "okay, this should be high quality, idiomatic code."
...guess I'll be more careful with that assumption.
(EDIT: If this is just some personal blog, I guess I'd understand; I saw "magazine" in the URL and got the impression this was published as part of their magazine/PDF series.)
They are not kernel-level threads, but they are threads in every other meaningful sense; they have their own stack and execution pointer, and data operations in goroutines are not guaranteed to be atomic with respect to other goroutines.
While the current official Go implementation never preemptively schedules goroutines except on I/O and on runtime.Gosched(), nothing in the spec precludes a different, more kernel-thread-like scheduling system.
> I think that simply moving line 58 to between 61,62 would fix this.
Doing that would erase the whole benefit of allocating a static array in the first place (you really don't want to be allocating memory for every packet that comes through). The funny thing is the author correctly handles this type of synchronization problem elsewhere, see lines 97-100. The right thing to do is to wait for acknowledgement from both loggers before reusing shared memory, but that sort of defeats his broader argument: if you are explicitly guarding every shared memory access with a barrier, then you might as well have been using any shared memory language besides Go. Buffered channels in both directions used to implement barriers are every bit as error-prone and complex as using a reader-writer lock explicitly.
> Doing that would erase the whole benefit of allocating a static array in the first place (you really don't want to be allocating memory for every packet that comes through).
The benefit is dubious. Performance is unlikely to be important in a single-connection logger. My advice is to lean on the garbage collector. Allocate away.
Buffered channels are certainly more complex, but if you're never sharing memory, shouldn't hurt you.
How are goroutines not threads? Do you mean because it's possible for them to communicate without shared mutable state?
Edit: Oh, apparently you all mean OS threads. So say so. (For example, in Haskell they're called threads without any implication that each one is an OS thread. Haskell's not unusual that way.)
The notion that goroutines aren't threads seems to exist primarly because Golang also makes use of OS-native threads in order to schedule goroutines, and so needs to draw a distinction in order to explain how the runtime works.
But that's just a detail. In reality, goroutines are threads; they're just userland, non-preemptive threads. Similar constructions have been available, even to C programmers, for well over a decade (and probably much longer).
Programming language support for threading predates direct operating system support (at least in mainstream operating systems) by a lot of years, from what I can tell.
It is really common for Go programmers (I've even seen members of the Go core team do it) to say 'thread' when they mean goroutine. It is important to recognize there is a difference, but calling someone out for saying 'thread' when they mean 'goroutine' is pedantic bullshit on the order of calling someone out for calling a class method in an OO language a function.
Sorry if where I replied made this confusing, I wasn't really talking about you nor even did I mean it as a personal thing against Jabbles who did the actual "calling out" of mixing threads and goroutines.
As I mentioned, knowing the difference is actually a good thing so your post is really helpful. My point is just that the two concepts are so intertwined and overlapping that I think it is silly for someone to correct someone's terminology on this (as Jabbles did, though I admit I share some of his(?) other concerns with the OP), particularly in cases where it is clear the person is pretty familiar with goroutines and how they operate.
Its not a commonly accepted "definition" - threads are a general concept that encompass several different forms of concurrency. It may be commonly assumed that the term "thread", in absence of any other qualifier, refers to OS-level threads, however even that is context dependent. The Wikipedia page introduction is poorly written.
However the more salient point here is the OP was either being lazy by saying "goroutines are not threads", or actually doesn't understand the concept of lightweight threads. Along the lines of Rob Pike's own explanation on the matter, he maybe should have said "goroutines provide concurrency but not parallelization".
Thread usually means an in-process concurrency construct controlled by calls to the operating system. A goroutine is managed by the Go runtime.
It's an important distinction between using many (read: hundreds) of OS threads in a single process isn't performant, while using hundreds or even tens of thousands of goroutines is just fine.
Edit: And more importantly, there are wrappers for the standard data structures designed around different concurrency use-cases (sync, async, coordinated, uncoordinated)
Refs are for Coordinated Synchronous access to Many Identities".
Atoms are for Uncoordinated synchronous access to a single Identity.
Agents are for Uncoordinated asynchronous access to a single Identity.
Vars are for thread local isolated identities with a shared default value.
And you can use all (all!) of the Java concurrency tooling as desired, including raw threads (for which Clojure has a wrapper as well).
Part of the reason I use Clojure rather than Go is because it doesn't try to force you into a one-size-fits-all method for handling concurrency. I have no problem with CSP but it doesn't fit everything I do. Sometimes I just want to defer work or wrap it in a future. Or I want to use an intelligent coordinated data structure rather than trying to meld flesh and bone to steel in order to make a concurrency-naive data structure behave how I want in a concurrent environ.
Go doesn't "force" you to use one method of concurrency.
It has more than one (you can do erlang-style share-nothing style or Java/C++ style of using mutexes to protect shared state from concurrent access).
I know nothing about Closure so it's possible it has more features but it's not necessarily a good thing. Is the complexity of 4 different solutions worth it? (by "it" I mean: a programmer has to learn all of them and when to use what; the implementor has to implement them; write wrappers for all standard data structures (what about third party libraries?) etc.).
Feature bloat has a cost.
Go gives you all you need to easily write concurrent programs and it does it with refreshingly simple design (both for people to learn and to implement).
So why use Go instead of node.js? You can probably take that reasoning and substitute "Go" for "Clojure", and "node.js" for "Go".
Go people would probably object that the things that Go adds, and that node.js lacks, aren't just window dressing -- sure, there are situations where a node.js-style fast event loop that avoids blocking operations is all you need, but there are also situations where you want something more like real threads, because the problem demands it.
I'm a lisper but not a Clojure expert - but I'd assume that Clojure people don't consider the existence of e.g. Actors to be "feature bloat". My impression is more that the difficult/special concurrency-enabling feature of the language is STM, and language-level support for different concurrency paradigms, implemented on top of STM, are probably low-hanging fruit once you've got it.
Yep, futures can be done easily in Go with a goroutine that sends a single value on a channel, but: 1) you can't dereference the value multiple times in Go, so you have to manage saving it yourself, and 2) there's no syntactic sugar. Pity.
> 6. Didn't try to convince anyone runtime.GOMAXPROCS(runtime.NumCPU()) was a good idea (it's not)
So what is an appropriate GOMAXPROCS? As someone who has only dabbled in a few Go tutorials, I would imagine that you would want GOMAXPROCS to be NumCPU() (or even greater) so the goroutine thread pool could "fire on all pistons". Why does Go's scheduler default to GOMAXPROCS=1 instead of NumCPU()?
Or turning that around, do you think it requires all 8 of my cores to copy data over the network? Do you think the two lines of code + justification text provides sufficient value for this application to distract from the point of it in order to show why someone should override the default behavior?
Do you believe users shouldn't have any control over the number of cores any particular application consumes?
Have you measured the CPU contention of the application and determined that using more cores is worth the overhead of increased overhead of multi-thread exclusions (vs. more simple things happening directly in the scheduler)?
Overall, it has nothing to do with this article and now even more people are going to copy it in more unnecessary places as a cargo-cult "turbo button" for their programs.
If you are going to use an idiom like that, the least you could do is check for the GOMAXPROCS environment variable and only do this as a default when the user hasn't specified otherwise.
In my experience the majority of well-written concurrent programs become I/O-bound on a single processor. It's pointless to add more processors to that, and can only slow you down, and Go programs are far better behaved in the non-parallel case.
At other times you should think about the number of processors you want to occupy. If the objective is to behave like an appliance, then 1:1 schedulers:cpus is not a bad ballpark.
The default is 1 isn't it? As I pointed out, this will serve the majority of concurrent code.
The best number of processes to use is equal to the parallelism of the solution. Even with highly concurrent problems, this is still most often 1. If you get it wrong performance will suffer. But in practical terms we have more to worry about, and if you're talking to the disk and the network more than you're computing, parallelism will only increase the contention on those resources. The extra processes will consume more CPU without doing any more useful work.
So the default is pretty good.
By the way 1:1 isn't the limit either. Sometimes you will want more. If the problem truly is parallel enough to exceed your CPUs, you may want additional processes anyway. This will keep things up to speed thanks to the host's scheduler which is typically preemptive, unlike Go's. This sometimes works much better if you can pick and choose which routines run on which schedulers, and I'm not sure if Go exposes that.
I have no idea what my response was about. As I said, I was pretty sick that day. Sorry you had to type all this stuff to explain to me that I'm a moron. :)
This has finally let me figure out what annoys me about Go, its a cargo-cult language. People saw that Erlang's Actors/processes were really popular and made it easy to write good, concurrent software.
They then went away and implemented their own language with lightweight processes and message passing, but missed the fact that actors are the price you have to pay for the benefits of not sharing mutable data.
And Go completely skipped that part (the most important part).
Doesn't go just implement Hoare's communicating sequential processes, as does Erlang? They share the same inspiration.
You don't need to share state data between your goroutines if you don't want to either just like you don't have to use mnesia to share state between erlang processes if you don't want to.
I don't think you can really accuse go of being a cargo cult language either, Rob Pike has implemented CSP multiple times (http://swtch.com/~rsc/thread/).
Doesn't go just implement Hoare's communicating sequential processes, as does Erlang? They share the same inspiration.
Nope: The original Communicating Sequential Processes model[24] published by Tony Hoare differed from the Actor model because it was based on the parallel composition of a fixed number of sequential processes connected in a fixed topology, and communicating using synchronous message-passing based on process names (see Actor model and process calculi history). Later versions of CSP abandoned communication based on process names in favor of anonymous communication via channels, an approach also used in Milner's work on the Calculus of Communicating Systems and the π-calculus.¹
You're still shaving v here, we can just see from the snipped that the shared v isn't used inside the anon function. But I suspect what the GP meant by "easy" is not having to think about this sort of thing. Your solution is good when you know you need to do this, but it can't even happen in Erlang so it's not a "gotcha" to watch out for.
Sharing v isn't the problem. The problem occurs when v is evaluated.
The parent's post has no race condition, as v is evaluated before the goroutine starts.
The top example of my post has a race condition because there is no guarantee when v will be evaluated wrt to the loop.
The bottom example has no race condition because v is evaluated on every iteration and assigned to s, which is used by the goroutine at some point afterwards.
My point still stands: to fix this, you have to realize that (a) v could be shared here and (b) that sharing could be a problem. I suspect the first time most newbies get hit with a race condition here they're going to be beyond baffled.
Channels are basically queues with beefed up language support. Not sure what's so annoying about that considering it's a very common concurrency pattern across many languages.
Ugh, I just finished writing the XMPP frontend for an XMPP/IRC bot I'm working on (http://www.getinstabot.com). The frontends are in Go for concurrency, and ferry messages back and forth from the channels to the backend.
Let me tell you, that problem is hard. Go coped pretty well, but the final thing is a mess of global states, and it's pretty elegant for what the problem is. I was hoping to avoid having many moving parts, but it ended up needing a lot of shared state between all processes.
Some problems are just hard, and, no matter how well-designed the language is, they'll still be hard. Something I miss from the language after implementing that is the ability to, say, monitor one goroutine from another to see if it returns (so the former can return as well). I know it's possible with channels, but when one goroutine is blocked on network Recv(), there's not much of a chance to listen to channels.
Anyway, yes, parallelism in Go is great, but not everything is magically all rainbows and unicorns (that's Django). Some problems will be messy and dirty, and even more so when you use channels.
Does sound like you want to use channels, and break your logic into small independent parts.
You will have one goroutine using blocking Read() in a loop and feeding data to some channel. When it's done, you write to another channel that exists only for signaling:
defer {
doneChannel <- true
}
for {
data := make([]byte, 65535)
_, err := conn.Read(data)
if err != nil {
errorChannel <- err
break;
}
dataChannel <- data
}
and then in your other goroutine:
for {
select {
case data := <- dataChannel:
// Handle data
case <- doneChannel:
// Other goroutine is done
}
}
The only things shared here are the channels.
The way to avoid too much state and moving parts is to break the problem into isolated, manageable parts that communicate with channels. Often you will have hierarchical relationships like this, where one piece of dumb code exists to pass data from something lower down to somewhere higher up.
It's not hard, although some of the code gets a bit ugly and disjoint at times, especially in how anything synchronous has to use channels and goroutines. For example, today I wrote a simple worker pool implementation that runs a given function in parallel via goroutines and can adjust the number of workers dynamically at runtime. That function has to be declared not just as "func()" but as "func(abortChannel chan bool)", and the worker function has to honour the abort signal when it arrives from the pool. So channels do leak everywhere, even into APIs. (Yeah, I know I can use "chan struct{}" to avoid any storage, but I think "<- true" looks nicer than "<- struct{} {}".)
What is harder is to intelligently handle complex cascading failures. That's what Erlang, with its supervisor tree, is good at. Go's goroutines are "fire and forget" and cannot even be terminated programmatically from elsewhere in the program.
> I know I can use "chan struct{}" to avoid any storage, but I think "<- true" looks nicer than "<- struct{} {}".
I disagree. struct{}{} tells me that the value isn't important. Whenever I use map as a set, rather than a key-value store I use map[string]struct{} (say), rather than map[string]bool. Then I am forced to use the double assignment to check for membership of the set. And that's exactly what I want. I'm able to make my intent more obvious in the code I write. No one will ever look at it and say "but what if it's false?" - I dislike using booleans instead of empty structs in the same way I dislike other C programmers using integers as booleans.
> I dislike using booleans instead of empty structs in the same way
Eh? If you have `map[keyType]bool`, then a key lookup is simply the set membership function. If a key exists, it returns true. Otherwise, false. That certainly doesn't seem analogous to abusing integers as booleans...
Closing the channel is fine, I suppose, although the supervising goroutine now looks a bit odd:
select {
case _, _ := <- doneChannel
// Other goroutine is now done
It's so implicit that you pretty much have to add a comment to the effect of "this will trigger when the channel is closed", whereas the "case <- doneChannel" is so obvious it doesn't need explaining.
Also, I rather prefer the supervising goroutine to "own" the channel, so it should be the one to close it.
> you ca't just have a defer block without a function invocation
Yeah, I was not thinking Go there for a moment. Should have been "defer func() { doneChannel <- true }".
Yeah, the channel approach was rewrite #3 :P It required me to have shared state in a horrible way as well, ruining the whole thing. There really isn't an elegant/functional way to do it, and sharing channels is error-prone itself. I ended up closing the connection to terminate goroutine B from A, and messaging on an already-shared channel to terminate A from B.
Having cascading failures in Go would be fantastic. The problem, unfortunately, is not very amenable to elegant solutions. I might do a writeup at some point.
> I know it's possible with channels, but when one goroutine is blocked on network Recv(), there's not much of a chance to listen to channels.
Could you elaborate a bit more on this point? This seems like a natural pattern for a channel/goroutine. Spin up a goroutine that reads from a connection and send whatever is read on a channel. Then other goroutines can synchronize on the channel.
What other solution would you want? You could use `SetDeadline` if you like. But closing the connection seems reasonable too.
I think you're framing the problem wrong. You're not asking to shutdown a goroutine, you're asking "How do I abort from a synchronous read from a network connection that is blocked?"
Don't get too hung up on that, it was the least of the problems. Closing the connection works fine, it's just a bit unorthodox. The basic structure of the problem was much more problematic to implement.
Observing other channels, or seeing when a channel does fire is what the select structure is for. It is one of the features that makes channels and goroutines very nice in go.
This is a nice article. I would encourage the author to use the term "goroutine" instead of "thread" when referring to Go's goroutines, because they aren't threads. The pros and cons of threads do not apply to goroutines. What's cool (for me) about this article is that the author knows this fact (and states so at the outset), but he re-discovers it and internalizes it over the course of his implementation.
This article and many like suffer from one of my huge pet-peeves, absolutely terrible coding conventions.
I am a person who likes to scan articles, I'm busy and generally make a read now, read later, read never decision. The code from first scan was unreadable, short 1 character variable names, "why is there a hardcoded date marked 2006.01.02-15.04.05 there??", etc.
Readable code takes a little more time - but it's worth it!
Further, the entire example seems contrived. Am I right in thinking that simply firing up wireshark would solve this problem? Why is the author continuing to write something in nearly every language when a tool exists for exactly this purpose, is multi-featured and pluggable?
Even further, message passing! With the rise in recent years of message passing libraries in nearly every language, multi-threaded, distributed applications are becoming trivial to write. I do not see the Go code presented as anything other than messy, I have seen C++ code utilizing message passing libs that are smaller, prettier and infinitely more maintainable - again with no mutex or conditional variables!
If you want to sell me on Go, make the code pretty, and present a USP.
I really like how Go does message passing, and other than channels being first class types, I don't think that's really the "killer feature" of Go. The killer feature is that goroutines are green threads, scheduled in M:N fashion on to OS threads. This encourages concurrent programming because spinning up a goroutine is comparatively cheap to spinning up an OS thread. It's difficult to do this kind of programming in most other languages (sans Erlang, Rust and Haskell).
Joe Armstrong made this argument years ago. He compared the limited ability to start processes in most languages as being similar to limiting how many objects you could create in your program.
If you want to see examples of concurrent programming in Go, go straight to the source: http://golang.org --- The tour is good, there are some codewalks, talks, articles, etc.
I'm afraid many of the conventions you complain about are standard Go, err, conventions... (Although see my complaint elsewhere.) Because they are so widely used, people who use Go won't bat an eyelid. Unfortunately that doesn't make the article a great introduction to Go.
Short variable names generally reflect the idea that you know what a variable is for just by knowing its type. Thus you have a file named f, a time t, a variadic argument called v. When the type is not enough, a longer name is recommended. Naming things is hard though...
The hardcoded date is a wonderful piece of the time package, which I fully appreciate will look bizarre at first. (And therefore isn't a great thing to use in a first look at Go, without explanation). See the official documentation http://golang.org/pkg/time/#pkg-constants
> > Notably, if a package is included but not used, Go treats this as an error and enforces removing unused declarations
A good illustration that the Go designers didn't think their ideas through. This is a real pain in the butt when you are writing code and regularly commenting in and out sections of code while you are testing things. And every time you do this, you need to remove or restore the imports. And since Go's tooling is nonexistent, there is no IDE to do this automatically for you.
This kind of thing belongs in a compiler plug-in (if it was designed with such a thing in mind, which is not the case for Go), macros (if the languages supports them, ideally the hygienic and statically typed kind) or an external tool, not in the compiler.
> A good illustration that the Go designers didn't think their ideas through.
Please don't misconstrue disagreement as sloppiness. If you read the mailing list, it's pretty clear they thought it through.
> This is a real pain in the butt when you are writing code and regularly commenting in and out sections of code while you are testing things.
Not for me. I love it, actually.
> And since Go's tooling is nonexistent, there is no IDE to do this automatically for you.
Vim does this for me.
Also, Go has some of the most wonderful tools of any programming language I've ever used.
> This kind of thing belongs in a compiler plug-in (if it was designed with such a thing in mind, which is not the case for Go), macros (if the languages supports them, ideally the hygienic and statically typed kind) or an external tool, not in the compiler.
I think reasonable people can disagree on this point.
False. The Go designers have explained many times why they made unused imports an error. In particular unused dependencies slow down compilation. There's even a FAQ: http://golang.org/doc/faq#unused_variables_and_imports.
Maybe they did think it through, but their conclusion reveals some inexperience.
It's impractical on many levels. Thus the need for kludgy solutions like blank identifiers. I'd rather see a strict mode or some other type of compiler flag.
Maybe you don't realize that "they" includes Ken Thompson himself... You may disagree with the Go design team's decisions, but it's amusingly absurd to accuse them of inexperience.
> Maybe you don't realize that "they" includes Ken Thompson himself... You may disagree with the Go design team's decisions, but it's amusingly absurd to accuse them of inexperience.
Not really, if anything, Go shows that its designers have a lot of inexperience when it comes to modern language design.
Go would have been a kille language in the late 90's but it seems to ignore everything that we've learned about language design in the past decade.
Even if the language is like that, if it helps improving the situation where young developers learn that strong typing does not have anything to do with VMs, I find it quite positive.
Jesus Jumping Blue Christ, why is it that every time Golang comes up in any type of discussion, there's always someone out there who complains that Go is horrible because it doesn't implement their favorite pet non-mainstream language features, which is obviously because the Go designers were inexperienced newbies with only 50 years of experience designing groundbreaking software, and therefore lacked your deep insights into how things should really be organized?
> good illustration that the Go designers didn't think their ideas through. This is a real pain in the butt when you are writing code and regularly commenting in and out sections of code while you are testing things.
How could it be that bad?
It's as simple as a compile, a click on the error message and adding a one line comment.
No worse than the standard practice of making sure your c/c++ code compiles without warnings with the maximimum warning level set.
People develop at different speeds. For me, anything like this that slows me down and interrupts my flow with 3-4 seconds of "click on the error message and add a one-line comment" is incredibly annoying.
Development has several modes. One mode is "hacking", just hashing out what you want until it works and is elegant enough as a solution, perhaps changing your mind frequently when you see how it works in practice. Another is "polishing", carefully annotating, cleaning up, documenting, burning off loose threads, making sure the test coverage is top notch, etc.
The problem is that Go's compile-time strictness lends itself to the "polishing" phase, but not to the "hacking" phase.
> Development has several modes. One mode is "hacking",
> just hashing out what you want until it works and is
> elegant enough as a solution, perhaps changing your mind
> frequently when you see how it works in practice.
> Another is "polishing", carefully annotating, cleaning
> up, documenting, burning off loose threads, making sure
> the test coverage is top notch, etc.
>
> The problem is that Go's compile-time strictness lends
> itself to the "polishing" phase, but not to the
> "hacking" phase.
When I write Go, or indeed in any programming language, I generally start with, and stay in, what you call the "polishing" phase. Experimentation occurs in my head, and what makes it through to my fingers is the polished form of that experiment.
That Go is not conducive to writing sloppy (or "hacking" phase) code is I think only a good thing.
Without measuring how productive we (you and I) are as programmers, such a qualitative judgement is largely meaningless.
Anecdotally, I have had colleagues who always plan ahead meticulously, using pen and paper and diagrams and plenty of note-taking before ever writing a single line of code, and the first line of code is often a test. And yet those people were terrible programmers. They take a long time to produce working code, and it's often deeply flawed. They will spend half a day or an entire day trying to hunt down a bug that I found to be trivially obvious even without knowing the codebase. Meticulousness does not imply quality.
1. Goroutines are not threads
2. type inference allows you to elide types in var declarations: var host = flag.String(...
3. Go's convention is to use camel case, not underscores.
4. Calling os.Exit all over the place is unusual imo - it may be better to panic().
5. fmt.Fprintf exists os.Stderr.WriteString(fmt.Sprintf...
6. An explanation of why the standard log package isn't suitable would be nice, although I see the format used is slightly different.
7. Ignored errors when writing. Why do you wish to sync at every packet?
8. Massive race condition by re-using b. Line 62 overwrites b. b is then passed to c.logger (68) and c.binary_logger (70) for them to process asynchronously. c.logger is then passed another byte slice (72), which forces it to finish using b. c.binary_logger is not passed anything else, allowing it to delay until the next time something is sent on that channel, which would be after b is overwritten by the next packet. I think that simply moving line 58 to between 61,62 would fix this.
The author has not quite grokked the concept of "don't share memory". b is shared, and undefined behaviour results :(