I disagree with his position on memory (he mentions in the post that anything above 1.5GB should be fine). In my experience, anything below 3GB can be pretty uncomfortable these days, if you want to work on serious problems.
It's not just about fitting the parameters into GPU memory, but also all the operations you perform on them, which can require a lot of intermediate storage. The example he gives only has fully-connected layers, but convolutional neural networks tend to require more space, especially some more recent implementations (e.g. FFT-based convolutions or the GEMM approach used by Caffe).
He mentions that his network (fully connected) has 52M parameters and compares it to Krizhevsky's 2012 ImageNet network (convolutional) network, which had 60M. But Krizhevsky actually explicitly mentions in his paper that memory was an issue:
"A single GTX 580 GPU has only 3GB of memory, which limits the maximum size of the networks that can be trained on it. It turns out that 1.2 million training examples are enough to train networks which are too big to fit on one GPU. Therefore we spread the net across two GPUs." (from http://papers.nips.cc/paper/4824-imagenet-classification-wit... )
Well, I did say "in my experience" :) I certainly didn't mean to imply that problems requiring less than 3GB of GPU RAM are laughable, or anything like that. I should have said something like like "problems that people are currently writing papers about", maybe.
This is a generic wrapper with ndarrays for cuda and normal blas operations. Deeplearning4j (my deeplearning project) also has support for GPUs (it's built on nd4j)
Stable version coming soon =D
For those of you in python land, I would look in to theano
Hey Adam, I just want to say I loved the recent talks you've given. The one at Hadoop Summit with Josh Patterson was so cool as a general overview. Keep it up!
The best value right now are used AMD GPUs. GPU cryptocurrency mining has become unprofitable so the used market is saturated with 290, 280x, and other AMD GPUs. If you have a limited budget this is probably the way to go.
My guess is that while AMD GPUs would likely be better raw performance/$, my understanding is that their version of CUDA is sorely lacking compared to nVidia's offering.
Sorry, when I say "version of CUDA" I meant "whatever the hell they have on AMD" :)
Last I heard CUDA will outperform OpenCL on nVidia vs AMD at the same price point generally as a result of CUDA being in house and closer to hardware. As a result if you just care about compute performance you would go nVidia. If AMD offered their own Mantle based compute offering it would probably shift the other way.
The computing model, no, not really anything fundamentally different. It comes to tooling and profiling under Linux. Also, NVidia has slightly beefer cores and fewer ones, where as AMD has more cores (as I heard). Thus, for me, CUDA is a more complete tool-chain with proper compiler (nvcc), profiler (nvprof, nvvp) and libraries (cublas, cudnn, cufft).
unfortunately I don't know which of these are representative of typical deep learning workloads, i.e. lots of GEMM calls, basically. A discussion about this is ongoing on the G+ Deep Learning community as well: https://plus.google.com/+SanderDieleman/posts/7ua9oCdRFV7
If the article is correct (saying that the memory bandwidth is very important) then a 780Ti is still interesting (336GB/s[1] vs 224GB/s[2]). They increased the memory clock but they decreased the memory interface width from 384 to 256-bit, for some reason.
nVidia released the first level cutdown (GM204) as the "80" part, while it really should be the 960 at most - the full/big Maxwell core hasn't been released yet.
Chances are the first fabbed version simply didn't work and they're waiting for the GM210.
Hmm. I am thinking about to get additional two GPUs for my configuration. 980 indeed sounds interesting, but the memory is too small. Any rumors about when NVIDIA plan to release a 6G+ card at similar price point as Titan?
No rumors, but it would make sense to come out about a year after they released the Titan black, which itself was about a year after they released the original Titan.
The gx2.2 instances will run you $0.650 per hour of compute time and have K520 (?) in them, which works pretty well. There are a few preconfigured instances, but if you want to make your own and go through the hassle, it will take about two hours to set up. When you're spinning up your instance, do a search for CUDA and you'll see a few Ubuntu preconfigured images with Theano+PyLearn2 already installed. Spawn them on the aforementioned GPU instance and you're good to go.
It's not just about fitting the parameters into GPU memory, but also all the operations you perform on them, which can require a lot of intermediate storage. The example he gives only has fully-connected layers, but convolutional neural networks tend to require more space, especially some more recent implementations (e.g. FFT-based convolutions or the GEMM approach used by Caffe).
He mentions that his network (fully connected) has 52M parameters and compares it to Krizhevsky's 2012 ImageNet network (convolutional) network, which had 60M. But Krizhevsky actually explicitly mentions in his paper that memory was an issue:
"A single GTX 580 GPU has only 3GB of memory, which limits the maximum size of the networks that can be trained on it. It turns out that 1.2 million training examples are enough to train networks which are too big to fit on one GPU. Therefore we spread the net across two GPUs." (from http://papers.nips.cc/paper/4824-imagenet-classification-wit... )