My Experience and Advice for Using GPUs in Deep Learning: Which GPU to get

lern_too_spel · on Aug 21, 2018

The "I have almost no money" recommendation should include Colab. https://medium.com/deep-learning-turkey/google-colab-free-gp...

Somebody who has almost no money isn't going to be able to equip a desktop with a GTX 1050 Ti ($175), fast disk ($50), and RAM ($50) on an entry level cpu/motherboard/power supply/case/monitor/peripherals ($300) and pay for the electricity used during training. Colab can be accessed from a free public computer or a cheap Chromebook ($200).

gaius · on Aug 22, 2018

A cheap (but not free) option is leadergpu.com - no affiliation, they just seem like super nice people and have per-minute billing. They are Dutch.

andy_ppp · on Aug 22, 2018

What are the rules about datasets I upload to this free service? Do Google now own them?

ColanR · on Aug 22, 2018

My guess it's the standard caveat: if you don't pay for the service, you and your stuff is the commodity.

lern_too_spel · on Aug 22, 2018

I would imagine no more than it owns the files you upload to Google Drive. The disk on the Colab instance is ephemeral, so you will need external storage for your dataset anyway.

abcdefgh214 · on Aug 22, 2018

If you have the programming skills necessary to develop deep learning applications, it should be assumed that you can also easily get a well-paying job so this isn't really even relevant.

giomasce · on Aug 22, 2018

Maybe you want to learn. Maybe you want to retain your current job because you like it even if it pay less then you could achieve, and you still would like to develop your other skills. Or maybe you want to stay close to your family, or in a place where they speak your mother tongue. Or maybe you just want to spend carefully your money even if you have plenty.

sabalaba · on Aug 22, 2018

The 2080Ti numbers are likely going to be a lot lower than that.

We’ve benched the 1080Ti vs the Titan V and the Titan V is nowhere near 2x faster at training than the 1080Ti as suggested in that graph. We observed a 30% to 40% speedup during our benchmarking:

https://deeptalk.lambdalabs.com/t/benchmarking-the-titan-v-v...

This is consistent with the 32% increase in FP32 flops from 11.3TFlops for the 1080Ti to 15TFlops for the Titan V. Additional speedups can be explained by the increase in memory bandwidth for HBM2 and the mixed precision fused multiply adds provided by the TensorCores.

Thus, given the quoted 13Tflop numbers for the 2080Ti, I would expect the 2080Ti to present something more like a 15-20% speedup over the 1080Ti. So 2080Ti is less bang for your buck. But benchmarking is the only way to tell what’s better on a FLOPS/$ basis.

timdettmers · on Aug 22, 2018

Your data are inconsistent with the benchmarks that I mention in the blog post: https://github.com/u39kun/deep-learning-benchmark

You also do not benchmark LSTMs: https://www.xcelerit.com/computing-benchmarks/insights/bench...

If you put both of those benchmarks together my conclusion is quite reasonable. But I see that you could also come to your conclusion with your benchmarks. It is just a question which benchmarks are less biased and that is too difficult to evaluate.

I guess we have to wait for real data, but thanks for putting your data out there to get a discussion going.

ageitgey · on Aug 21, 2018

This is a great article and I highly respect his opinions.

However, since you are probably eagerly reading this to see how fast the new RTX cards are, so you should know upfront that the numbers he has so far are just estimates based on specs:

> Note that the numbers for the RTX 2080 and RTX 2080 Ti should be taken with a grain of salt since no hard performance numbers existed. I estimated performance according to a roofline model of matrix multiplication and convolution under this hardware together with Tensor Core benchmarks from the V100 and Titan V.

shaklee3 · on Aug 21, 2018

I'd guess that the performance could be slightly better than the 1080 scaled by cores/MHz/FLOPS. The reason being that the memory bandwidth is higher on the 2080, and that's hard to model unless the person knows exactly how efficient the kernel is and if it's memory bound.

steve_musk · on Aug 21, 2018

Plus the architecture improvements. Do we know how many cores per SM? They’ve decoupled int and FP execution units which could give larger increases for certain kernels (although FP heavy deep learning kernels aren’t likely to benefit as much, they will still get address calculation benefits).

shaklee3 · on Aug 21, 2018

I hadn't seen the whitepaper on Turing yet. Where did you see they decoupled them?

steve_musk · on Aug 22, 2018

The keynote

nolok · on Aug 21, 2018

A great way to turn a listing you can trust enough to use as one of your comparison basis, into a listing made up of imaginary marketing numbers.

I guess the click baiting is needed / the best option, but I hate that's it's what most web resources are like now.

r1nkgrl · on Aug 21, 2018

But the article isn't hiding the fact that the numbers are estimates. People are curious how the new cards will stack up, and this article provides the best evaluation of that given the information they have available.

p1necone · on Aug 21, 2018

The clock rates, number of CUDA cores, memory size/type etc in the new cards aren't really "imaginary marketing numbers". NVidia could have changed their hardware so they could put bigger numbers on paper without corresponding real world performance gains, but that's a big assumption for you to seemingly take as fact.

alkonaut · on Aug 21, 2018

No one minds comparing some products as guesses/estimates/extrapolations with some products as being real performance figures. So long as it's clear which products have which type of figure.

pirocks · on Aug 21, 2018

Seems down for me:

https://web.archive.org/web/20180821173206/http://timdettmer...

scottlegrand2 · on Aug 21, 2018

The biggest advance here is that Nvidia has produced a consumer card that has all the high-end deep-learning features. This was missing in both the Pascal and Volta Generations even though in Pascal fp32 was full power. I think the TPU scared them and that's a good thing.

syntaxing · on Aug 21, 2018

Hacker news hug of death? Anyone here have any experience using AMD cards with something like PlaidML? I have a 1050Ti SSC but I'm starting to feel the limitation as my complexity grows. But getting a 1080 is a bit out of my budget right now. I'm tempted to get the new Vega 56 released recently.

steve_musk · on Aug 21, 2018

You could wait and see how pascal prices fall after Turing comes out.

fermienrico · on Aug 21, 2018

The cost/performance plot - shouldn't it be "Lower is better"? It says "Higher is better".

Lower value would indicate lower cost per unit level of performance.

It should be "Lower is better" or the plot needs to say "Performance/Cost". Am I missing something?

timdettmers · on Aug 22, 2018

Thanks for your feedback! Someone mentioned this on twitter as well and I thought it was a good point so I implemented that change.

fermienrico · on Aug 22, 2018

Thanks for being receptive. I wouldn’t call it a “good point” if it was a mistake that was corrected.

KSS42 · on Aug 22, 2018

Do you mean "Figure 3: Normalized performance/cost numbers"?

Its performance/cost and not cost/performance.

Or maybe the author fixed a typo?

songgao · on Aug 22, 2018

I think it used to be cost/performance and was later fixed. GP left comment before the fix.

wmf · on Aug 21, 2018

You're missing the principle of charity.

fermienrico · on Aug 21, 2018

dostres · on Aug 22, 2018

An open question for me is the performance of two 2080tis using NVLink as one virtual GPU. I imagine it’ll be close to linear, but I’ll be interested to know for sure.

shaklee3 · on Aug 22, 2018

It won't be linear for memory-bound applications. The v100 was able to make it close to linear with large enough transfer sizes, but it has 50% more memory bandwidth than these.

KayL · on Aug 22, 2018

Good article, but as a new learner, I'm interested in (your experiences on) how much time taken for the common task to train a model? 1min vs 2mins, probably I will get a cheaper GPU but if there's 5h vs 10h or 1 day vs 2 days, I'd save more money for one with good performance