The majority of the performance difference between strings concat and builder in your example is explained by memory allocation. Every loop of concat will result in a new allocation, while the builder - which uses []bytes internally - will only allocate when length equals capacity, and the newly allocated slice will be approx. twice the capacity of the old slice (see: https://golang.org/src/strings/builder.go?#L62).
Therefore, 500,000 rounds of concat is about 500,000 allocations, while 200,000,000 rounds of builder is ~ 27.5 allocations (=log2(200000000)).
I would suggest a different benchmark to approximate real world usage:
func BenchmarkConcatString(b *testing.B) {
for n := 0; n < b.N; n++ {
var str string
str += "x"
str += "y"
str += "z"
}
}
func BenchmarkConcatBuilder(b *testing.B) {
for n := 0; n < b.N; n++ {
var builder strings.Builder
builder.WriteString("x")
builder.WriteString("y")
builder.WriteString("z")
builder.String()
}
}
Which still shows a significant performance advantage for using builder (-40% ns/op):
That's absolutely the same as the statement without assignment from a data flow POV. However, the compiler will likely not optimize it away since it would be too burdensome to proof that strings.Builder.String() does not have any side effects. The Go compiler prides itself with fast compilation speed, so I would not expect it to perform cross-package control/data flow analyses.
I would mention that, gc (the official Go compiler) makes special optimization for string concatenation operation (+). If the number of strings to be concatenated is known at compile time, using + to concatenate strings is the most efficient.
package a
import "testing"
import "strings"
var strA, strB string
var x, y, z = "x", "y", "z"
func BenchmarkConcatString(b *testing.B) {
for n := 0; n < b.N; n++ {
strA = x + y + z
}
}
func BenchmarkConcatBuilder(b *testing.B) {
for n := 0; n < b.N; n++ {
var builder strings.Builder
builder.WriteString(x)
builder.WriteString(y)
builder.WriteString(z)
strB = builder.String()
}
}
Note that this is directly contradicted by another commnent[1] on this post, where three fixed strings are concatenated with +=, yet that was still slower.
Perhaps the use of += as separate statements is the difference, but one would hope that gc wasn't so fragile as to be unable to identify those sequences as identical.
They way he uses b.N is wrong. b.N is different for different loops so he's e.g. timing 100 iterations of string '+' with a 1000 iterations of builder.WriteString()
Also the compiler can completely null out no-op functions (without side effects) so in benchmarks it's a good idea to assign the value being calculated into e.g. a global variable.
His use of b.N is correct. The code you have is simply multiplying N by 100 with the inner for loop - so your times are 100x of what each "concat" operation (+,WriteString) takes.
You are also allocating a new string/buffer/builder for every run - which is not useful if you want to just benchmark concat.
I think there's another bug in the generateSlice function if the intention is to create a slice with n random numbers.
func generateSlice(n int) []int {
s := make([]int, n)
for i := 0; i < n; i++ {
s = append(s, rand.Intn(1e9))
}
return s
}
As it is now, the function creates a slice with n zeros followed by n random numbers. I suppose you meant to say make([]int, 0, n). You could just as well assign directly to each slice element instead of using append, which would be more efficient.
I made the exact same mistake quite a few times myself.
While I don't doubt that strings.Builder does is quicker than += concat for many iterations, to make it a fair comparison you probably need to pull out the string at the end rather than just writing to the buffer. It's also not obvious for example what the difference is with just 2 strings to join if I need to join two strings together 40 trillion times or whatnot.
Nice collection of microbenchmarks though. Interesting to see magnitude differences from e.g. regexp compile
The string benchmark has the issue that the amount of work done varies with each pass through the loop since the string just keeps getting appended to. A proper benchmark like the ones in the comments here do the same amount of work for every loop.
Note that you can also get the number of bytes processed per second by calling the SetBytes method. This is very useful on some bench (hashing, base64, ...):
func benchmarkHash(b *testing.B, h hash.Hash) {
data := make([]byte, 1024)
rand.Read(data)
b.ResetTimer()
b.SetBytes(len(data))
for n := 0; n < b.N; n++ {
h.Write(data)
h.Sum(nil)
}
}
> The following benchmarks evaluate various functionality with the focus on real-world usage patterns.
I can't say I write much code that does one thing many times in a really tight loop. It would be a lot more interesting if the code combined multiple functions into the loop body in a better attempt to simulate "real-world usage patterns."
Good point, thanks! The idea behind these benchmarks is to make the results usable in real-world programs, rather than benchmarking real-world programs. I rephrased that sentence to avoid any confusion.
I always wanted to ask this. I'm a full stack developer with good knowledge on Java and JavaScript. I'm currently reading Golang especially for its concurrency idioms. It is good and easy to write concurrent code but people always come and say about actors which are very good when compared with channels. I have never used actors before.. Whats your thoughts on this?
Even though this is clearly a benchmarking game, I don't like that it does not explain how the things benchmarked against each other sometimes have drastically different usecases.
I can assure you that someone is going to use these numbers to argue that crypto.Rand needs to be replaced by math.Rand BECAUSE SPEED, or that MD5 should be preferred over SHA2/3.
Therefore, 500,000 rounds of concat is about 500,000 allocations, while 200,000,000 rounds of builder is ~ 27.5 allocations (=log2(200000000)).
I would suggest a different benchmark to approximate real world usage:
Which still shows a significant performance advantage for using builder (-40% ns/op):