Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

16,000 cores sounds impressive until you realize it's just five to ten modern GPUs. For Google, it's easier to just run a 1,000 machine job than requisition some GPUs.

See: http://www.nvidia.com/object/tesla-servers.html (4.5 teraflops in one card)

Reminder: GPUs will destroy the world.



What a GPU calls a "core" doesn't at all correspond to what a CPU calls a "core". Going by the CPU definition (something like "something that can schedule memory accesses") a high end GPU will only have 60 or so cores. And going by the the GPU definition (An execution unit) a high end CPU will tend to have 30-something cores.

GPUs do have fundamentally more execution resources, but that comes at a price and not every algorithm will be capable of running faster on a GPU than on a CPU. If neural networks just involve multiplying lots of matrices together with little branching they might be well suited to GPUs, but most AI code isn't like that.


"What a GPU calls a "core" doesn't at all correspond to what a CPU calls a "core"."

They aren't as different as you imagine. They're general purpose programmable arithmetic units with processing rates on the order of 20-30% of CPUs, provided the limitation that they're all doing roughly the same thing.

For most machine learning tasks, that's exactly what you're doing anyway. Oh no, your neural network engine has to be parallel? C'est damage!


A CPU core isn't a general purpose programmable arithmetic unit, though. In fact what you call a "core" when you're talking about CPUs is composed of multiple such general purpose programmable, as well as less general purpose memory load/store units that can still be used for basic arithmetic and a instruction fetch and scheduling system. So a core in you Intel iFOO processor is structurally equivalent to what NVidia calls an SM. Now, and NVidia SM has 48 execution units to Intels 6, but it operates at a lower frequency and doesn't have the bypass network, branch predictor, memory prefetcher, etc that you could find in an Intel core. So there are some tasks where the Intel core will be much faster than the NVidia SM, and some tasks where the NVidia SM will be much faster. And the case here does seem like one where the GPU has an advantage. But saying that the NVidia GPU has 1526 "cores" is just dishonest.


"In fact what you call a "core" when you're talking about CPUs is composed of multiple such general purpose programmable, as well as less general purpose memory load/store units"

So are GPU cores.

"But saying that the NVidia GPU has 1526 "cores" is just dishonest."

No, it isn't. You can run 1536 things in parallel at speeds that would have qualified as full cpu speeds several years prior.

Something isn't any less a core merely because it does less juggling magic, and that juggling magic is actually undesirable for a heavily parallelized task.

"So there are some tasks where the Intel core will be much faster than the NVidia SM, and some tasks where the NVidia SM will be much faster."

This conversation already has a context. Arguments which ignore that context completely miss the point.

If you don't understand how I achieved the amount of processing I did, that's fine. Playing games with the semantics of a "core" somehow magically requiring all the features of current Intel-strategy chips, though, are not going to convince me.

There is more to Heaven and Earth, Horatio, than is dreamt of in Intel's philosophy. This sort of attitude towards what constitutes the no true scotsman "a real core" is why Arm is in the process of eating Intel alive, and why Tilera stands a decent chance of doing the same thing to ARM.

This is merely extreme RISC. I realize it's sort of a tradition for the modern VLIW movement to suggest that if you can't double-backflip through a flaming hoop made out of predictive NAND gates it somehow doesn't count.

But, if you actually look, the rate of modern supercomputing going to video cards is rising dramatically.

So obviously they count as cores to somebody.

You also seem to have missed the point. It's not the core scale that we're discussing here. It's the dataset scale. The number of cores you throw at a problem is not terribly important; 20 years ago it would have been breathtaking to throw 32 cores at a problem, and now that's two CPUs.

What makes an experiment cutting edge is the nature of the experiment, not the volume of hardware that you throw at it. I was talking about the /data/ and the /problem/ . Predicting movie ratings is a hell of a lot harder than feature detection.


OpenCV has rewritten several of its algorithms for GPU's. http://www.opencv.org.cn/opencvdoc/2.3.2/html/modules/gpu/do... In general, the GPU versions are faster but you need to be cognizant of data transfer times between memory and the GPU. Relative speed also depends on which CPU and GPU's you have access to and the quality of the GPU vs CPU algorithm implementations. For example, my 2012 Macbook CPU is faster than my 2011 Macbook GPU for certain OpenCV algorithms.


The problem is not the theoretical peak teraflops. The problem is actually achieving those teraflops with useful work. Due to architecture that is easier on a CPU than on a GPU, so you can't directly compare teraflops and conclude that GPUs are superior. Getting something to run fast on a GPU is very difficult.

And actually the thing that does 4.5 teraflops in single precision does only 95 gigaflops in double precision per GPU. A good x86 CPU does ~100 gigaflops in double precision as well, and you're much more likely to actually achieve that number on a x86. Although another one on the page you linked to theoretically does 665 gigaflops double precision.


Single precision is probably fine for a neural network. Neural networks are somewhat insensitive to noise and failure and single precision adds very little noise.


I don't think that one CPU core is exactly comparable to one GPU core.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: