Practical Go benchmarks

aodin · on March 7, 2018

The majority of the performance difference between strings concat and builder in your example is explained by memory allocation. Every loop of concat will result in a new allocation, while the builder - which uses []bytes internally - will only allocate when length equals capacity, and the newly allocated slice will be approx. twice the capacity of the old slice (see: https://golang.org/src/strings/builder.go?#L62).

Therefore, 500,000 rounds of concat is about 500,000 allocations, while 200,000,000 rounds of builder is ~ 27.5 allocations (=log2(200000000)).

I would suggest a different benchmark to approximate real world usage:

    func BenchmarkConcatString(b *testing.B) {
        for n := 0; n < b.N; n++ {
            var str string
            str += "x"
            str += "y"
            str += "z"
        }
    }

    func BenchmarkConcatBuilder(b *testing.B) {
        for n := 0; n < b.N; n++ {
            var builder strings.Builder
            builder.WriteString("x")
            builder.WriteString("y")
            builder.WriteString("z")
            builder.String()
        }
    }

Which still shows a significant performance advantage for using builder (-40% ns/op):

    BenchmarkConcatString-4     20000000            93.5 ns/op
    BenchmarkConcatBuilder-4    30000000            54.6 ns/op

marcus_holmes · on March 7, 2018

Won't the compiler just ignore the "builder.String()" line unless the return value is actually used?

zaarn · on March 7, 2018

Easy fix, use the anonymous variable;

    _ = builder.String()

The compiler should not optimize that out.

majewsky · on March 7, 2018

That's absolutely the same as the statement without assignment from a data flow POV. However, the compiler will likely not optimize it away since it would be too burdensome to proof that strings.Builder.String() does not have any side effects. The Go compiler prides itself with fast compilation speed, so I would not expect it to perform cross-package control/data flow analyses.

tapirl · on March 7, 2018

I would mention that, gc (the official Go compiler) makes special optimization for string concatenation operation (+). If the number of strings to be concatenated is known at compile time, using + to concatenate strings is the most efficient.

    package a
    
    import "testing"
    import "strings"
    
    var strA, strB string
    var x, y, z = "x", "y", "z"
    
    func BenchmarkConcatString(b *testing.B) {
        for n := 0; n < b.N; n++ {
            strA = x + y + z
        }
    }
    
    func BenchmarkConcatBuilder(b *testing.B) {
        for n := 0; n < b.N; n++ {
            var builder strings.Builder
            builder.WriteString(x)
            builder.WriteString(y)
            builder.WriteString(z)
            strB = builder.String()
        }
    }

Result:

    goos: linux
    goarch: amd64
    BenchmarkConcatString-2    	20000000	        83.7 ns/op
    BenchmarkConcatBuilder-2   	20000000	       102 ns/op

BeeOnRope · on March 7, 2018

Note that this is directly contradicted by another commnent[1] on this post, where three fixed strings are concatenated with +=, yet that was still slower.

Perhaps the use of += as separate statements is the difference, but one would hope that gc wasn't so fragile as to be unable to identify those sequences as identical.

---

[1] https://news.ycombinator.com/item?id=16533650

tapirl · on March 7, 2018

The optimization made by gc is only valid for the form: s0 + s1 + .... + sn.

lugg · on March 7, 2018

I suspect / understood that as being the optimization is based on the fact that those are concatenated references and not string literals.

I don't quite understand why string literals wouldn't be even easier to optimize but there it is.

foota · on March 7, 2018

Does Go intern strings? That could mess things up with this bench.

edit: doesn't sound like it does by default

kjksf · on March 7, 2018

String benchmarks are so broken.

They way he uses b.N is wrong. b.N is different for different loops so he's e.g. timing 100 iterations of string '+' with a 1000 iterations of builder.WriteString()

Also the compiler can completely null out no-op functions (without side effects) so in benchmarks it's a good idea to assign the value being calculated into e.g. a global variable.

The corrected code is: https://gist.github.com/kjk/6a7d7135ae1e5fa6cd1f0db23d2eaf4d

An example of correctly benchmarking:

    func BenchmarkConcatString(b *testing.B) {
	for n := 0; n < b.N; n++ {
		var str string
		for i := 0; i < 100; i++ {
			str += "x"
		}
		gStr = str
	}
    }

After fixes it paints significantly different picture:

    go test -bench=. -benchmem
    goos: darwin
    goarch: amd64
    BenchmarkConcatString-8    	  300000	      5148 ns/op	    5728 B/op	      99 allocs/op
    BenchmarkConcatBuffer-8    	 1000000	      1046 ns/op	     368 B/op	       3 allocs/op
    BenchmarkConcatBuilder-8   	 1000000	      1177 ns/op	     248 B/op	       5 allocs/op

i0exception · on March 7, 2018

His use of b.N is correct. The code you have is simply multiplying N by 100 with the inner for loop - so your times are 100x of what each "concat" operation (+,WriteString) takes.

You are also allocating a new string/buffer/builder for every run - which is not useful if you want to just benchmark concat.

dmitrim · on March 7, 2018

Thanks for pointing it out. Should clearly not depend on the number of iterations. It's fixed now.

fauigerzigerk · on March 7, 2018

I think there's another bug in the generateSlice function if the intention is to create a slice with n random numbers.

    func generateSlice(n int) []int {
        s := make([]int, n)
        for i := 0; i < n; i++ {
            s = append(s, rand.Intn(1e9))
        }
        return s
    }

As it is now, the function creates a slice with n zeros followed by n random numbers. I suppose you meant to say make([]int, 0, n). You could just as well assign directly to each slice element instead of using append, which would be more efficient.

I made the exact same mistake quite a few times myself.

dmitrim · on March 7, 2018

Yep, that meant to be capacity, not length. Corrected. Thanks!

bpicolo · on March 7, 2018

While I don't doubt that strings.Builder does is quicker than += concat for many iterations, to make it a fair comparison you probably need to pull out the string at the end rather than just writing to the buffer. It's also not obvious for example what the difference is with just 2 strings to join if I need to join two strings together 40 trillion times or whatnot.

Nice collection of microbenchmarks though. Interesting to see magnitude differences from e.g. regexp compile

Vendan · on March 7, 2018

Fun fact: the crypto rand "number" benchmark depends on the number you pass into it:

    BenchmarkCryptoRand27-8   	 5000000	       388 ns/op
    BenchmarkCryptoRand28-8   	 3000000	       356 ns/op
    BenchmarkCryptoRand29-8   	 5000000	       335 ns/op
    BenchmarkCryptoRand30-8   	 5000000	       327 ns/op
    BenchmarkCryptoRand31-8   	 5000000	       331 ns/op
    BenchmarkCryptoRand32-8   	 5000000	       322 ns/op
    BenchmarkCryptoRand33-8   	 3000000	       480 ns/op
    BenchmarkCryptoRand34-8   	 3000000	       474 ns/op

for benchmarks like

    func BenchmarkCryptoRand32(b *testing.B) {
        for n := 0; n < b.N; n++ {
            _, err := crand.Int(crand.Reader, big.NewInt(32))
            if err != nil {
                panic(err)
            }
        }
    }

This is because the crypto/rand library is very very careful to give you unbiased random numbers.

friday99 · on March 7, 2018

The string benchmark has the issue that the amount of work done varies with each pass through the loop since the string just keeps getting appended to. A proper benchmark like the ones in the comments here do the same amount of work for every loop.

jossctz · on March 7, 2018

Note that you can also get the number of bytes processed per second by calling the SetBytes method. This is very useful on some bench (hashing, base64, ...):

  func benchmarkHash(b *testing.B, h hash.Hash) {
  	data := make([]byte, 1024)
  	rand.Read(data)  
  
  	b.ResetTimer()
  	b.SetBytes(len(data))
  	for n := 0; n < b.N; n++ {
  		h.Write(data)
  		h.Sum(nil)
  	}
  }

pbnjay · on March 7, 2018

> The following benchmarks evaluate various functionality with the focus on real-world usage patterns.

I can't say I write much code that does one thing many times in a really tight loop. It would be a lot more interesting if the code combined multiple functions into the loop body in a better attempt to simulate "real-world usage patterns."

dmitrim · on March 7, 2018

Good point, thanks! The idea behind these benchmarks is to make the results usable in real-world programs, rather than benchmarking real-world programs. I rephrased that sentence to avoid any confusion.

antoaravinth · on March 8, 2018

I always wanted to ask this. I'm a full stack developer with good knowledge on Java and JavaScript. I'm currently reading Golang especially for its concurrency idioms. It is good and easy to write concurrent code but people always come and say about actors which are very good when compared with channels. I have never used actors before.. Whats your thoughts on this?

majewsky · on March 7, 2018

Even though this is clearly a benchmarking game, I don't like that it does not explain how the things benchmarked against each other sometimes have drastically different usecases.

I can assure you that someone is going to use these numbers to argue that crypto.Rand needs to be replaced by math.Rand BECAUSE SPEED, or that MD5 should be preferred over SHA2/3.

Xeoncross · on March 7, 2018

It's worth noting that the first number in a benchmark result is how many loops (for n := 0; n < b.N) that Go used to find the results.

The nanoseconds, bytes, and allocs per operation are the important part.