Hacker News new | past | comments | ask | show | jobs | submit login
Which GPUs to Get for Deep Learning (timdettmers.wordpress.com)
59 points by jonbaer on Sept 26, 2014 | hide | past | favorite | 29 comments



I disagree with his position on memory (he mentions in the post that anything above 1.5GB should be fine). In my experience, anything below 3GB can be pretty uncomfortable these days, if you want to work on serious problems.

It's not just about fitting the parameters into GPU memory, but also all the operations you perform on them, which can require a lot of intermediate storage. The example he gives only has fully-connected layers, but convolutional neural networks tend to require more space, especially some more recent implementations (e.g. FFT-based convolutions or the GEMM approach used by Caffe).

He mentions that his network (fully connected) has 52M parameters and compares it to Krizhevsky's 2012 ImageNet network (convolutional) network, which had 60M. But Krizhevsky actually explicitly mentions in his paper that memory was an issue:

"A single GTX 580 GPU has only 3GB of memory, which limits the maximum size of the networks that can be trained on it. It turns out that 1.2 million training examples are enough to train networks which are too big to fit on one GPU. Therefore we spread the net across two GPUs." (from http://papers.nips.cc/paper/4824-imagenet-classification-wit... )


Serious is a spectrum, someone working on more "serious" problems than you might laugh at the idea of doing any work at all on a single consumer GPU.


Well, I did say "in my experience" :) I certainly didn't mean to imply that problems requiring less than 3GB of GPU RAM are laughable, or anything like that. I should have said something like like "problems that people are currently writing papers about", maybe.


For those looking to do it on the jvm, I have a prepacked scientific computing framework that might be interesting:

http://nd4j.org/

This is a generic wrapper with ndarrays for cuda and normal blas operations. Deeplearning4j (my deeplearning project) also has support for GPUs (it's built on nd4j)

Stable version coming soon =D

For those of you in python land, I would look in to theano


Hey Adam, I just want to say I loved the recent talks you've given. The one at Hadoop Summit with Josh Patterson was so cool as a general overview. Keep it up!


Thanks! Things are coming along.


I'll give a shot over the next weeks. I'll let you know my comments.


I look forward to it, it's obv still very new, but I think the idea of a common interface for ndarrays on the jvm has potential.


The best value right now are used AMD GPUs. GPU cryptocurrency mining has become unprofitable so the used market is saturated with 290, 280x, and other AMD GPUs. If you have a limited budget this is probably the way to go.


What DL packages target AMD GPUs?


Is there any reason why there isn't any AMD GPU? Mantle?


My guess is that while AMD GPUs would likely be better raw performance/$, my understanding is that their version of CUDA is sorely lacking compared to nVidia's offering.


AMD GPU's have no CUDA support, if someone wants to do some computing on it, usually OpenCL is the way to go.


Sorry, when I say "version of CUDA" I meant "whatever the hell they have on AMD" :)

Last I heard CUDA will outperform OpenCL on nVidia vs AMD at the same price point generally as a result of CUDA being in house and closer to hardware. As a result if you just care about compute performance you would go nVidia. If AMD offered their own Mantle based compute offering it would probably shift the other way.


There isn't really anything fundamentally that would make CUDA faster that OpenCL. There aren't any huge semantic differences between them.


The computing model, no, not really anything fundamentally different. It comes to tooling and profiling under Linux. Also, NVidia has slightly beefer cores and fewer ones, where as AMD has more cores (as I heard). Thus, for me, CUDA is a more complete tool-chain with proper compiler (nvcc), profiler (nvprof, nvvp) and libraries (cublas, cudnn, cufft).


There is an OpenCL profiler for AMD, and library equivalents for those in clBLAS / clFFT


Is there a a comparable of cuBLAS for OpenCL?


clBLAS


The new Maxwell 970/980 kit is very interesting from a compute perspective.

970: $329 for 3494 SP / 109 DP GFLOPS @ 145W TDP

980: $549 for 4612 SP / 144 DP GFLOPS @ 165W TDP


Anandtech has some compute benchmarks for the 980: http://www.anandtech.com/show/8526/nvidia-geforce-gtx-980-re...

unfortunately I don't know which of these are representative of typical deep learning workloads, i.e. lots of GEMM calls, basically. A discussion about this is ongoing on the G+ Deep Learning community as well: https://plus.google.com/+SanderDieleman/posts/7ua9oCdRFV7

This thread on the NVIDIA forums is also interesting, someone mentions they achieved > 6 TFlops for SGEMM on an overclocked GTX 980: https://devtalk.nvidia.com/default/topic/776043/cuda-program...


If the article is correct (saying that the memory bandwidth is very important) then a 780Ti is still interesting (336GB/s[1] vs 224GB/s[2]). They increased the memory clock but they decreased the memory interface width from 384 to 256-bit, for some reason.

[1] http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-780...

[2] http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-980...


nVidia released the first level cutdown (GM204) as the "80" part, while it really should be the 960 at most - the full/big Maxwell core hasn't been released yet.

Chances are the first fabbed version simply didn't work and they're waiting for the GM210.


Hmm. I am thinking about to get additional two GPUs for my configuration. 980 indeed sounds interesting, but the memory is too small. Any rumors about when NVIDIA plan to release a 6G+ card at similar price point as Titan?


No rumors, but it would make sense to come out about a year after they released the Titan black, which itself was about a year after they released the original Titan.


Does anyone know if there are options to use GPUs on AWS to do calculations using Python for Deep Learning?


Amazon has GPU powered instances [1]

Python has pycuda [2]

Python has OpenCL [3] amusing ran by the same person as pycuda

[1] http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using_clu...

[2] http://documen.tician.de/pycuda/

[3] http://documen.tician.de/pyopencl/


The gx2.2 instances will run you $0.650 per hour of compute time and have K520 (?) in them, which works pretty well. There are a few preconfigured instances, but if you want to make your own and go through the hassle, it will take about two hours to set up. When you're spinning up your instance, do a search for CUDA and you'll see a few Ubuntu preconfigured images with Theano+PyLearn2 already installed. Spawn them on the aforementioned GPU instance and you're good to go.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: