It really depends on the type of things that you do - if your network is deep an...

It really depends on the type of things that you do - if your network is deep and has a lot of matrix multiplication, GPUs definitely do speed things up. Libraries like cuDNN have built in optimized convolution ops that will also make convolutions a lot faster.

In my experience (not tf related, I mainly work on my own library now: https://github.com/chewxy/gorgonia) even with a cgo penalty, deep networks do improve with GPU training. Never dabbled much in CNNs (convolutions tend to do my head in) so can't say much.