Go crypto: bridging the performance gap

travjones · on May 7, 2015

It's great to see that companies can turn a profit AND contribute to open source projects that benefit the web as a whole. Thanks, Cloudflare.

Intermernet · on May 7, 2015

I didn't know the name Vlad Krasnow (I'm not that familiar with the crypto world), but from looking at these [1], [2], it seems that he would know what he's talking about from both a crypto and performance viewpoint.

> attempting to make them part of the official Go build for the good of the community

Would be interesting to hear what Adam Langley thinks about this, but I couldn't find anything recent on the golang-dev list.

[1]: http://dblp.uni-trier.de/pers/hd/k/Krasnov:Vlad

[2]: http://stackoverflow.com/users/1516766/vlad-krasnov

stock_toaster · on May 7, 2015

I found this: https://github.com/cloudflare/go/issues/5

CyberShadow · on May 7, 2015

I don't understand, why reimplement this if it is already implemented with adequate performance/quality in OpenSSL? I thought Go didn't have issues calling C code?

jgrahamc · on May 7, 2015

Sure, you could do that but imagine the nightmare of modifying the Go standard libraries for stuff like TLS to hook into the OpenSSL C interface. We actually did that for a while and it was horrible: http://blog.jgc.org/2013/01/integrating-openssl-crypto-funct...

4ad · on May 7, 2015

Calling C code from Go is something you do as a last resort. It breaks some desirable properties of Go programming, like fast compilation, cross compiling (at least without effort), and static binaries (without serious effort). Code called through cgo will never be idiomatic Go code, unless perhaps if hidden under an abstraction layer which takes time and effort to write and adds overhead.

On top of that, the cgo overhead is significant, not only in time, but in threads, so you'd want it only for big chunks of data and program accounting for it.

Another advantage of Go crypto is having been written in a memory safe language, and with runtime bounds checking. The assembly fast paths break some of this, but not most of it.

andrewchambers · on May 7, 2015

Go has moving memory so it generally means calling C requires a bunch of copying. This is slow and annoying.

sph · on May 7, 2015

> Go has moving memory so it generally means calling C SHOULD/WILL require a bunch of copying.

I've been working on a few Go frontend to C libraries and researching what's the exact memory model when calling C: Yes, Go has moving memory and you _shouldn't_ pass a Go pointer to C, and it's actively discouraged [1], but since many libraries just ignore that advice [2] for performance reason Go isn't currently AFAIK not moving any pointer passed through cgo.

That will probably be changed in Go 1.5+, I reckon an official post from the Go developer about the current and future state of C/Go memory interaction would help clarify this.

1: https://github.com/golang/go/issues/8310

2: comment #31 of https://github.com/golang/go/issues/8310

robmccoll · on May 7, 2015

Are you sure that's true? Perhaps they have reserved the ability to do that in the future (can you point to doc on that? ), but right now I'm fairly certain that go does no such thing. The only copies necessary when calling C code are for strings since they aren't guaranteed / required to be null terminated byte arrays in Go. And cgo has C ABI compatibility so function call overhead is non existent.

ominous_prime · on May 7, 2015

Function call overhead in cgo is considerable. You're right that it's not from copying, but the runtime scheduler still has to coordinate the blocking call, and the stack needs to be switched out to the C stack, kind of like a context switch.

robmccoll · on May 7, 2015

True, just benchmarked it and found the function call overhead to be about 1.85834729e-7 seconds (185 ns). Which isn't much, but the pure C version would obviously be single nanoseconds for the handful of instructions needed depending on the function call.

fdej · on May 7, 2015

185 ns is worse than function call overhead in Python (about 100 ns on this machine)

robmccoll · on May 7, 2015

Python calling Python, Python calling out to a native module, or Python calling through ctypes?

fdej · on May 11, 2015

Python calling a no-op Python function def foo(): pass

vardump · on May 7, 2015

185 ns is enough to compute over 17k floating point operations on a single 3 GHz CPU (Intel Haswell/Broadwell) core.

ominous_prime · on May 7, 2015

No, the slow part is switching stacks, and coordinating with the scheduler. Go's GC doesn't yet move memory, and even so there will probably be allowances for passing pointers from Go to C when it is implemented.

andrewchambers · on May 7, 2015

The stacks currently get copied when they run out of space, that may break things already.

pcwalton · on May 7, 2015

cgo is slow due to Go's stack growth and userspace threading model. (The limitations are basically fundamental to the way Go handles concurrency.)

larssorenson · on May 7, 2015

I think it's the overhead of the call or maybe their distaste for the state of repair of OpenSSL, e.g. Heartbleed, that drove them to it. Alternatively it could just be the chance to write some crypto in Go assembly and have it be included in their code base.

istvan__ · on May 7, 2015

I don't think that OpenSSL is a great example of OSS, there are several areas where they don't follow bad practices for no reason and you can just look at their history of serious security flaws. Why would you want to integrate with that? In this case it is better and cleaner to implement the functionality in your code (in this case Go).

0xdeadbeefbabe · on May 7, 2015

Having a history of security flaws is better than having a future of security flaws.

schmichael · on May 7, 2015

Arguably OpenSSL has both.

istvan__ · on May 7, 2015

there are the same amount of security bugs in almost any software, and the number shows some correlation with the lines of code count. Strictly from the code point of view you can follow best practices and actively training the stuff on security. This cost money and time and nobody really wants to do it. The companies started to do this invested serious amount of money into the project and it shows in the statistics.

http://www.gfi.com/blog/most-vulnerable-operating-systems-an...

In open source this is more of a community thing with little discipline, the nature of the software development is less tight, this yields to mediocre results.

I guess the at Apple security is as far is from design as something can be, probably not a high priority.

Knowing the historical flaws is only useful if invest into mining it and act on the results.

baby · on May 7, 2015

jsprogrammer · on May 7, 2015

http://en.wikipedia.org/wiki/Open-source_software

pjc50 · on May 7, 2015

OpenSSL no longer really qualifies as "adequate safety".

vitriol83 · on May 7, 2015

I agree, ECDH and AES-GCM are sufficiently complex to implement it makes sense to call into OpenSSL. OpenSSL has had its problems, but these tend to be with TLS protocol handling. The underlying cryptographic constructions have had a lot of attention.

giovannibajo1 · on May 7, 2015

Well, to tell the truth, the underlying crypto functions are also the ones harder to get wrong, using test vectors. The hardest part is probably trying to make them constant-time, but AGL is also the author and maintainer of Go crypto library, and he did extensive research on constant functions, and contributed code to OpenSSL as well. He even hacked valgrind to check for constant-time at runtime, which blows my mind (https://www.imperialviolet.org/2010/04/01/ctgrind.html).

So I think the general advise not to rewrite a TLS library doesn't fully apply to the Go team like it would apply to us.

pjmlp · on May 7, 2015

The more C code that gets replaced out there increases the system security.

slashdev · on May 7, 2015

Because assembly is safer than C? Did you read the article?

pjmlp · on May 7, 2015

There is a difference between writing a well contained piece of code inside a memory safe language, and using a memory unsafe language for user space applications.

Additionally removing the dependency on C proves the point that C isn't the only game in town, specially given that the same approach would require Assembly with C anyway.

slashdev · on May 7, 2015

Agreed, but this is for go only, so it's replacing a mix of go and assembly with a mix of go and assembly. How does that make anything safer?

pjmlp · on May 7, 2015

By not introducing a dependency into OpenSSL, which is the point being discussed.

slashdev · on May 7, 2015

Missed the indent. I agree then.

4ad · on May 7, 2015

I will happily look over these once they are part of upstream Go, or I will happily review them once they are proposed upstream, but I will take a pass on their "special fork of Go".

biftek · on May 7, 2015

I'm not sure what's involved with getting assembly working in Go (my experience has only been with servers/clients and CLI's), but could this not have been implemented as a stand alone package? What's the benefit of a fork in this case?

jzelinskie · on May 7, 2015

I'm also wondering the answer to this question. To my understanding, go-crypto is actually maintained separately from the Go stdlib by the Go developers[0]. So why isn't this just a fork of crypto? Why fork the entire language? Why can't this be upstreamed?

[0]: https://github.com/golang/crypto

sdevlin · on May 7, 2015

This is a library of supplemental crypto algorithms. You won't see things like AES, SHA-2, or the NIST curves in here. Those things are part of the standard library in Go.

sdevlin · on May 7, 2015

I can't speak for Cloudflare, but I would guess that they want these changes to be merged upstream. If that happens, the benefit is that consumers of the stdlib crypto API will get increased security and performance for free in a future update.

elithrar · on May 7, 2015

> I can't speak for Cloudflare, but I would guess that they want these changes to be merged upstream. If that happens, the benefit is that consumers of the stdlib crypto API will get increased security and performance for free in a future update.

Yes, they do: https://go-review.googlesource.com/#/c/8968/

amelius · on May 7, 2015

I'm also wondering, why fork the compiler? What if another company forks Go to implement, say, a faster spam filter. How would one ever combine these two forks?

ashearer · on May 7, 2015

Nice work! The article benchmarks a > 20X speedup for AES-128-GCM, for performance described in the text as "on par" with OpenSSL. It would be helpful for reference to have an OpenSSL column added to the benchmark table.

michaelt · on May 7, 2015

Cloudflare's Universal SSL is pretty great - I used to host my static website on S3, but that means no SSL or ipv6. Putting everything through cloudflare sorts both those out.

The original announcement [1] mentioned they were planning support for adding in the HSTS header - as jgrahamc is here responding to comments, I'd be interested to hear how far they've got with that :)

[1] https://blog.cloudflare.com/introducing-universal-ssl/

jgrahamc · on May 7, 2015

We added it during "Week of SSL": https://blog.cloudflare.com/enforce-web-policy-with-hypertex...

thomasahle · on May 7, 2015

"Let's just rewrite all the crypto ourselves! In assembly!"

Sorry to be a buzzkill, but that sounds like a recipe for disaster.

sdevlin · on May 7, 2015

A couple points on this.

First, as some have noted, serious crypto primitive implementations are written in assembly. This is both to achieve state-of-the-art performance as well as data-independent execution times. The latter is important to prevent timing attacks.

Second: I'm not sure if this was your point, but some have invoked Heartbleed and other native code disasters. But the kind of problems that lead to Heartbleed aren't likely to be a problem in low-level crypto implementations. This is because they tend to operate on fixed-size buffers using algorithms with little or no conditional logic. While there could certainly be mathematical flaws (i.e. producing the wrong output), something like a buffer overrun is not likely here.

If you look in basically any crypto library, you will find important primitives implemented in assembly. This is even true in the main Go repository, where AES is implemented in assembly.

4ad · on May 7, 2015

All the low-level crypto is written in assembly. And not only for speed, but for ensuring properties like constant-time execution, etc. It's the same for OpenSSL, and the same for commercial crypto libraries. Go is not at all different here.

The difference is that the higher-level crypto is written in Go, not in C; Go is memory safe, much more strongly-typed in general, and with run-time bounds checking which eliminate buffer overflows.

The bugs are almost never in the low-level algorithms, they are in the higher-level components.

zurn · on May 7, 2015

OpenSSL does optional assembly implementations for many primitive+platform combinations, but also many of the algorithms under crypto/ have zero or one architectures covered. And many asm implementations predate widespread concerns about timing attacks.

alfiedotwtf · on May 7, 2015

There was a post on HN a few weeks ago talking about "ensuring properties like constant-time execution" isn't possible as instruction timings doesn't take in account things like caching, pipelining, task switching, microcode optimisations etc.

sdevlin · on May 7, 2015

When people describe crypto code as "constant-time", they typically just mean its execution time is data-independent.

pjc50 · on May 7, 2015

Yeah, the point of moving from C to Go would be to get away from all the disasters. But if the core is small enough using short bursts of inline assembly to actually calculate the cipher, while all the complex protocol handling and buffer manipulation is done in the safe layer, it would be a good solution.

Edit: that does indeed appear to be what they've done. In particular using the AESENC instruction. https://github.com/cloudflare/go/blob/master/src/crypto/aes/...

grittygrease · on May 7, 2015

The OpenSSL AES-GCM and P256 assembly code was also written by Vlad. There's no better person to write the Golang version.

sdevlin · on May 7, 2015

Neat, I didn't even notice this.

Vlad Krasnov is also a co-author of this paper on state-of-the-art P256 implementation: https://eprint.iacr.org/2013/816.pdf.

thomasahle · on May 7, 2015

That's a good argument. However (unless the changes get merged upstream) it is still one more crypto library for 'bad people' to find weaknesses in.

jussij · on May 7, 2015

And only a year or two ago OpenSSL was found to have a major hole (i.e. remember Heartbleed), caused by a buffer overrun bug that had been around for years. If I remember correctly I also remember reading on the Go forums Go didn't have that issue, only because it had been fully re-written.

imaginenore · on May 7, 2015

If it's tested extensively, it should be fine. Somebody has to write the crypto.

vitriol83 · on May 7, 2015

without formal methods it's impossible to fully test crypto implementations, such as ECDH, because the number of possible inputs are enormous. bugs in small proportion of inputs can lead to fault attacks. and furthermore side channel attacks are very common.

imaginenore · on May 7, 2015

That's true of every crypto implementation.

vitriol83 · on May 7, 2015

Sure, that's why I think there's an argument to use ones which are more 'battle tested'.

amaranth · on May 7, 2015

That's not an option here though because of the poor performance of Go's C FFI infrastructure. It's like Java in this regard, unless you're handing off large batches of work to the C level all at once it's more efficient to just do the work in Go. Except in this case pure Go isn't fast enough either thus assembly.

Twirrim · on May 7, 2015

Go's existing crypto library is far from battle tested either (as well as being comparatively slow)

tptacek · on May 7, 2015

Go's crypto library is probably the best of all the "standard library" crypto implementations. The modal standard crypto library among other languages is a set of bindings to OpenSSL.

lmm · on May 7, 2015

The most popular language around is still Java, I think? Which comes with a reputable, non-OpenSSL crypto implementation in its standard library.

TheLoneWolfling · on May 7, 2015

Except that it's quite literally impossible to ensure data-independent timing in Java - or, for that matter, in any language that does optimizations without a way to disable them. Yes, this includes standard-compliant C / C++, ironically enough.

JITters are especially bad for this - what is data-independent today may not be data-independent tomorrow. Or even in a couple minutes when it decides to re-optimize.

You ultimately have to dip down to assembly, or something that can be relied on to not do data-dependent optimizations, to ensure resilience against timing attacks.

JNI can work, as can inline assembly in things like C / C++, or specifying compilers. But that's just punting things to another language. And you lose portability, among other things. Or worse, you end up with something that looks like language X, and is valid code in language X, but breaks evilly if it's ever run as though it was in language X.

tptacek · on May 7, 2015

No, I do not think Java's crypto library is as well regarded as Golang's. For example, didn't Java SSL recently manage to reincarnate the Bleichenbacher padding oracle?

tedunangst · on May 7, 2015

And the message skipping vuln, which put basically all TLS clients in the "no security" category.

wolf550e · on May 7, 2015

For those wondering: https://www.smacktls.com/#skip

wolf550e · on May 7, 2015

"the JSSE implementation of TLS has been providing virtually no security guarantee (no authentication, no integrity, no confidentiality) for the past several years." from https://www.smacktls.com/#skip

ufo · on May 7, 2015

> Given the many vulnerabilities related to the use of AES-CBC with HMAC

What are they talking about here? Are there any important ones if you MAC after encryption? The only vulnerabilities I know of are when you MAC before you encrypt.

tptacek · on May 7, 2015

They're talking about the TLS CBC constructions, not AES-CBC and HMAC in the abstract.

donpark · on May 7, 2015

Wish Cloudflare spread some of that Go asm love to Ed25519.

higherpurpose · on May 7, 2015

Indeed. Couldn't they have supported it for a small part of users like they're doing with ChaCha20-Poly1305?

donpark · on May 7, 2015

ChaCha20-Poly1305's 3-4x performance gain over AES-GCM made it worthwhile for them where Ed25519's benefits are not as clear cut to non-proponents. Too bad.

tav · on May 7, 2015

Nice work! Do you guys also have a similar implementation of ChaCha20-Poly1305 for Go? If so, any chance you could share that too?

jgrahamc · on May 7, 2015

That work has been done for OpenSSL, I guess we could add for Go: https://blog.cloudflare.com/do-the-chacha-better-mobile-perf...

tav · on May 7, 2015

That would be pretty awesome! Also, do you happen to know if your current patch is safe to apply on top of OpenSSL 1.0.2a?

jvermillard · on May 7, 2015

Strange they dont mention AES-CCM at all. I'm using it a lot in IoT applications (dtls). It's not frequent in the "web" world?

atlbeer · on May 7, 2015

Off-topic but, they probably shouldn't be using a Southeast train image in this blog post as they are notoriously bad for delays and slow service in the UK[1] :)

[1]http://www.which.co.uk/home-and-garden/leisure/reviews-ns/be...

jgrahamc · on May 7, 2015

I added that. As someone who lives in London I'm well aware of how bad their service is. It was a deliberate ploy to make you spend more time on the blog post by getting your to shake you head about the image and then read the post.

aikah · on May 7, 2015

I'm curious whether the Go team will merge Go crypto into the core or not. It will show how open they are to third party contributions.

4ad · on May 7, 2015

Your comment is significantly out of place. The Go team is extremely open to quality external contributors. 463 people have contributed to Go so far, (obviously) most of them outside the core Go team. The Windows port was exclusively done and maintained by contributors. I have done the arm64 Go compiler, which is upstream now, the Solaris port, and I am now doing the sparc64 compiler. Many 3rd party contributors do many things every day.

The Go project is an extremely open project. More than half of the people who have direct commit access are external contributors.

coldtea · on May 7, 2015

>Your comment is significantly out of place. The Go team is extremely open to quality external contributors.

I think his comment is still valid. Adapting something major like this is not the same as accepting bugfixes from hundrends of people, or ports to a different architecture.

Even more different would be accepting some code for the standard library whose API wasn't designed by the core team.

From what I've seen the core team is quite opinionated and micro-managing things.

nickcw · on May 7, 2015

The go team was quite happy to accept my (individual; non core developer) contributions of md5 and sha1 written in ARM assembler given the appropriate review, so I don't see why Cloudflare's contribution should be any different.

DannyBee · on May 7, 2015

well, for starters, to accept it, the licensing would need to be changed slightly.

Parts have

  +// Copyright 2015 The Go Authors. All rights reserved.
  +// Use of this source code is governed by a BSD-style
  +// license that can be found in the LICENSE file.
  +
  +// Copyright 2015 Intel Corporation
  +// Copyright 2015 CloudFlare, Inc.
  +
  +// This file contains constant-time, 64-bit assembly implementation of
  +// P256. The optimizations performed here are described in detail in:
  +//   S.Gueron and V.Krasnov, "Fast prime field elliptic-curve cryptography with
  +//                            256-bit primes"
  +"

The additional copyright notices would not be okay. They would cause everyone who uses this library to have to reproduce not just the standard go copyright notices, but those, too.

twotwotwo · on May 7, 2015

They seem OK to me. A quick grep of my Go tree turns up lots of third-party copyright notices. It's OK as long as their code was released under licenses compatible with Go's.

These additional copyright notices could be removed by CloudFlare and Intel going in the AUTHORS file (which defines "The Go Authors"), presumably after they and Google do any required paperwork. Red Hat, Dropbox, and Fastly are in AUTHORS. But that needn't be a condition of integrating their code as long as it's licensed properly.

The paper citation doesn't appear to be copyright-related, and others like it are sprinkled around the codebase, e.g., package sort's source cites some papers on efficient sorting.

DannyBee · on May 8, 2015

"They seem OK to me. A quick grep of my Go tree turns up lots of third-party copyright notices. It's OK as long as their code was released under licenses compatible with Go's."

Okay, let me rephrase: "they aren't okay". It's actually my job to make these decisions and tell teams what is and what isn't okay :)

The other issues you mention are in the process of being fixed.

"These additional copyright notices could be removed by CloudFlare and Intel going in the AUTHORS file (which defines "The Go Authors")"

Yes, they could, but that requires agreement from more than just cloudfare. This is code Intel donated to openssl, not to Go, so it's simply not as trivial as cloudfare saying "sure, here's some code". Intel has to agree to have their copyright notice changed, etc.

"The paper citation doesn't appear to be copyright-related, and others like it are sprinkled around the codebase, e.g., package sort's source cites some papers on efficient sorting. " I have no care in the world about this part.

twotwotwo · on May 8, 2015

I get that Google always really really wants a CLA since it does things the BSD license doesn't (patent grant!). I also agree that, practically speaking, legal stuff is absolutely part of the process of getting the asm crypto stuff merged in. And I know you're qualified.

But I read the initial comment as saying, specifically, no third-party copyright notices, ever.

I have code up with under Go's license with more than one set of copyright notices (https://github.com/twotwotwo/sorts). From a grep, Go 1.4 has ~1,026 non-"The Go Authors" copyright notices in ~271 files in ~35 dirs (Lucent and other Plan 9 copyright holders, Sun, individuals, MPEG and yacc authors).

If there is stuff I should read/learn to have any hope of understanding what's OK (to keep my own stuff clean, and generally), it would help me to know.