Which GPUs to Get for Deep Learning

benanne · on Sept 26, 2014

I disagree with his position on memory (he mentions in the post that anything above 1.5GB should be fine). In my experience, anything below 3GB can be pretty uncomfortable these days, if you want to work on serious problems.

It's not just about fitting the parameters into GPU memory, but also all the operations you perform on them, which can require a lot of intermediate storage. The example he gives only has fully-connected layers, but convolutional neural networks tend to require more space, especially some more recent implementations (e.g. FFT-based convolutions or the GEMM approach used by Caffe).

He mentions that his network (fully connected) has 52M parameters and compares it to Krizhevsky's 2012 ImageNet network (convolutional) network, which had 60M. But Krizhevsky actually explicitly mentions in his paper that memory was an issue:

"A single GTX 580 GPU has only 3GB of memory, which limits the maximum size of the networks that can be trained on it. It turns out that 1.2 million training examples are enough to train networks which are too big to fit on one GPU. Therefore we spread the net across two GPUs." (from http://papers.nips.cc/paper/4824-imagenet-classification-wit... )

PostOnce · on Sept 27, 2014

Serious is a spectrum, someone working on more "serious" problems than you might laugh at the idea of doing any work at all on a single consumer GPU.

benanne · on Sept 27, 2014

Well, I did say "in my experience" :) I certainly didn't mean to imply that problems requiring less than 3GB of GPU RAM are laughable, or anything like that. I should have said something like like "problems that people are currently writing papers about", maybe.

agibsonccc · on Sept 26, 2014

For those looking to do it on the jvm, I have a prepacked scientific computing framework that might be interesting:

http://nd4j.org/

This is a generic wrapper with ndarrays for cuda and normal blas operations. Deeplearning4j (my deeplearning project) also has support for GPUs (it's built on nd4j)

Stable version coming soon =D

For those of you in python land, I would look in to theano

e_modad · on Sept 26, 2014

Hey Adam, I just want to say I loved the recent talks you've given. The one at Hadoop Summit with Josh Patterson was so cool as a general overview. Keep it up!

agibsonccc · on Sept 27, 2014

Thanks! Things are coming along.

postit · on Sept 26, 2014

I'll give a shot over the next weeks. I'll let you know my comments.

agibsonccc · on Sept 27, 2014

I look forward to it, it's obv still very new, but I think the idea of a common interface for ndarrays on the jvm has potential.

driverdan · on Sept 26, 2014

The best value right now are used AMD GPUs. GPU cryptocurrency mining has become unprofitable so the used market is saturated with 290, 280x, and other AMD GPUs. If you have a limited budget this is probably the way to go.

scottlocklin · on Sept 27, 2014

What DL packages target AMD GPUs?

simplyinfinity · on Sept 26, 2014

Is there any reason why there isn't any AMD GPU? Mantle?

TTPrograms · on Sept 26, 2014

My guess is that while AMD GPUs would likely be better raw performance/$, my understanding is that their version of CUDA is sorely lacking compared to nVidia's offering.

agapos · on Sept 26, 2014

AMD GPU's have no CUDA support, if someone wants to do some computing on it, usually OpenCL is the way to go.

TTPrograms · on Sept 26, 2014

Sorry, when I say "version of CUDA" I meant "whatever the hell they have on AMD" :)

Last I heard CUDA will outperform OpenCL on nVidia vs AMD at the same price point generally as a result of CUDA being in house and closer to hardware. As a result if you just care about compute performance you would go nVidia. If AMD offered their own Mantle based compute offering it would probably shift the other way.

wmobit · on Sept 26, 2014

There isn't really anything fundamentally that would make CUDA faster that OpenCL. There aren't any huge semantic differences between them.

liuliu · on Sept 26, 2014

The computing model, no, not really anything fundamentally different. It comes to tooling and profiling under Linux. Also, NVidia has slightly beefer cores and fewer ones, where as AMD has more cores (as I heard). Thus, for me, CUDA is a more complete tool-chain with proper compiler (nvcc), profiler (nvprof, nvvp) and libraries (cublas, cudnn, cufft).

wmobit · on Sept 26, 2014

There is an OpenCL profiler for AMD, and library equivalents for those in clBLAS / clFFT

seanmcdirmid · on Sept 27, 2014

Is there a a comparable of cuBLAS for OpenCL?

wmobit · on Sept 27, 2014

clBLAS

obrienmd · on Sept 26, 2014

The new Maxwell 970/980 kit is very interesting from a compute perspective.

970: $329 for 3494 SP / 109 DP GFLOPS @ 145W TDP

980: $549 for 4612 SP / 144 DP GFLOPS @ 165W TDP

benanne · on Sept 26, 2014

Anandtech has some compute benchmarks for the 980: http://www.anandtech.com/show/8526/nvidia-geforce-gtx-980-re...

unfortunately I don't know which of these are representative of typical deep learning workloads, i.e. lots of GEMM calls, basically. A discussion about this is ongoing on the G+ Deep Learning community as well: https://plus.google.com/+SanderDieleman/posts/7ua9oCdRFV7

This thread on the NVIDIA forums is also interesting, someone mentions they achieved > 6 TFlops for SGEMM on an overclocked GTX 980: https://devtalk.nvidia.com/default/topic/776043/cuda-program...

miahi · on Sept 26, 2014

If the article is correct (saying that the memory bandwidth is very important) then a 780Ti is still interesting (336GB/s[1] vs 224GB/s[2]). They increased the memory clock but they decreased the memory interface width from 384 to 256-bit, for some reason.

[1] http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-780...

[2] http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-980...

happycube · on Sept 26, 2014

nVidia released the first level cutdown (GM204) as the "80" part, while it really should be the 960 at most - the full/big Maxwell core hasn't been released yet.

Chances are the first fabbed version simply didn't work and they're waiting for the GM210.

liuliu · on Sept 26, 2014

Hmm. I am thinking about to get additional two GPUs for my configuration. 980 indeed sounds interesting, but the memory is too small. Any rumors about when NVIDIA plan to release a 6G+ card at similar price point as Titan?

brigade · on Sept 26, 2014

No rumors, but it would make sense to come out about a year after they released the Titan black, which itself was about a year after they released the original Titan.

antimora · on Sept 26, 2014

Does anyone know if there are options to use GPUs on AWS to do calculations using Python for Deep Learning?

valarauca1 · on Sept 26, 2014

Amazon has GPU powered instances [1]

Python has pycuda [2]

Python has OpenCL [3] amusing ran by the same person as pycuda

[1] http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using_clu...

[2] http://documen.tician.de/pycuda/

[3] http://documen.tician.de/pyopencl/

jo_ · on Sept 26, 2014

The gx2.2 instances will run you $0.650 per hour of compute time and have K520 (?) in them, which works pretty well. There are a few preconfigured instances, but if you want to make your own and go through the hassle, it will take about two hours to set up. When you're spinning up your instance, do a search for CUDA and you'll see a few Ubuntu preconfigured images with Theano+PyLearn2 already installed. Spawn them on the aforementioned GPU instance and you're good to go.

adrianbg · on Sept 26, 2014

Theano supports CUDA:

http://deeplearning.net/software/theano/

http://deeplearning.net/software/theano/tutorial/using_gpu.h...