Somebody who has almost no money isn't going to be able to equip a desktop with a GTX 1050 Ti ($175), fast disk ($50), and RAM ($50) on an entry level cpu/motherboard/power supply/case/monitor/peripherals ($300) and pay for the electricity used during training. Colab can be accessed from a free public computer or a cheap Chromebook ($200).
I would imagine no more than it owns the files you upload to Google Drive. The disk on the Colab instance is ephemeral, so you will need external storage for your dataset anyway.
If you have the programming skills necessary to develop deep learning applications, it should be assumed that you can also easily get a well-paying job so this isn't really even relevant.
Maybe you want to learn. Maybe you want to retain your current job because you like it even if it pay less then you could achieve, and you still would like to develop your other skills. Or maybe you want to stay close to your family, or in a place where they speak your mother tongue. Or maybe you just want to spend carefully your money even if you have plenty.
The 2080Ti numbers are likely going to be a lot lower than that.
We’ve benched the 1080Ti vs the Titan V and the Titan V is nowhere near 2x faster at training than the 1080Ti as suggested in that graph. We observed a 30% to 40% speedup during our benchmarking:
This is consistent with the 32% increase in FP32 flops from 11.3TFlops for the 1080Ti to 15TFlops for the Titan V. Additional speedups can be explained by the increase in memory bandwidth for HBM2 and the mixed precision fused multiply adds provided by the TensorCores.
Thus, given the quoted 13Tflop numbers for the 2080Ti, I would expect the 2080Ti to present something more like a 15-20% speedup over the 1080Ti. So 2080Ti is less bang for your buck. But benchmarking is the only way to tell what’s better on a FLOPS/$ basis.
If you put both of those benchmarks together my conclusion is quite reasonable. But I see that you could also come to your conclusion with your benchmarks. It is just a question which benchmarks are less biased and that is too difficult to evaluate.
I guess we have to wait for real data, but thanks for putting your data out there to get a discussion going.
This is a great article and I highly respect his opinions.
However, since you are probably eagerly reading this to see how fast the new RTX cards are, so you should know upfront that the numbers he has so far are just estimates based on specs:
> Note that the numbers for the RTX 2080 and RTX 2080 Ti should be taken with a grain of salt since no hard performance numbers existed. I estimated performance according to a roofline model of matrix multiplication and convolution under this hardware together with Tensor Core benchmarks from the V100 and Titan V.
I'd guess that the performance could be slightly better than the 1080 scaled by cores/MHz/FLOPS. The reason being that the memory bandwidth is higher on the 2080, and that's hard to model unless the person knows exactly how efficient the kernel is and if it's memory bound.
Plus the architecture improvements. Do we know how many cores per SM? They’ve decoupled int and FP execution units which could give larger increases for certain kernels (although FP heavy deep learning kernels aren’t likely to benefit as much, they will still get address calculation benefits).
But the article isn't hiding the fact that the numbers are estimates. People are curious how the new cards will stack up, and this article provides the best evaluation of that given the information they have available.
The clock rates, number of CUDA cores, memory size/type etc in the new cards aren't really "imaginary marketing numbers". NVidia could have changed their hardware so they could put bigger numbers on paper without corresponding real world performance gains, but that's a big assumption for you to seemingly take as fact.
No one minds comparing some products as guesses/estimates/extrapolations with some products as being real performance figures. So long as it's clear which products have which type of figure.
The biggest advance here is that Nvidia has produced a consumer card that has all the high-end deep-learning features. This was missing in both the Pascal and Volta Generations even though in Pascal fp32 was full power. I think the TPU scared them and that's a good thing.
Hacker news hug of death? Anyone here have any experience using AMD cards with something like PlaidML? I have a 1050Ti SSC but I'm starting to feel the limitation as my complexity grows. But getting a 1080 is a bit out of my budget right now. I'm tempted to get the new Vega 56 released recently.
An open question for me is the performance of two 2080tis using NVLink as one virtual GPU. I imagine it’ll be close to linear, but I’ll be interested to know for sure.
It won't be linear for memory-bound applications. The v100 was able to make it close to linear with large enough transfer sizes, but it has 50% more memory bandwidth than these.
Good article, but as a new learner, I'm interested in (your experiences on) how much time taken for the common task to train a model? 1min vs 2mins, probably I will get a cheaper GPU but if there's 5h vs 10h or 1 day vs 2 days, I'd save more money for one with good performance
Somebody who has almost no money isn't going to be able to equip a desktop with a GTX 1050 Ti ($175), fast disk ($50), and RAM ($50) on an entry level cpu/motherboard/power supply/case/monitor/peripherals ($300) and pay for the electricity used during training. Colab can be accessed from a free public computer or a cheap Chromebook ($200).