> "The growing body of Big-data, HPC, and especially machine learning applications don’t need Windows and don’t perform on X86. So 2017 is the year Nvidia slips its leash and breaks free to become a genuinely viable competitive alternative to x86 based enterprise computing in valuable new markets that are unsuited to x86 based solutions."
Google's TPU paper [0] showed the CPUs were relatively competitive in the machine learning space (within 2x of a K80). It's not true that x86 doesn't perform on these workloads.
The existence of the TPU itself threatens Nvidia's dominance in the ML processor space. Google built an ASIC in a short time period that more than rivals a GPU on these tasks. The TPU performance improvements (section 7) make it look very straightforward to get even better performance with a few more years of development effort. With developers moving to higher level libraries, migration between GPU/CPU/TPU becomes painless, so they'll just go with whatever has the lowest TCO. (Google hosted TPUs?)
Aside from machine learning tasks, the author seems to be advocating for the cpu/gpu combinations that AMD is already selling to game console manufacturers. Granted, Nvidia has a piece of this via the Switch. If Microsoft/Qualcomm goes full-on with their ARM-based x86 emulation, then perhaps a future ARM-based Xbox is in the cards driven by an Nvidia chip? /speculation
I always thought Microsoft keeping dominance via video game and directx was their main drive to keep Linux out. Didn't realize directx was their move to fight against Intel.
The unintentional effect of this was GPU manufacturer florish, which is something I didn't know either.
The two big monopoly fighting against each other is super interesting. Also highlight the fact that we do need some diversity and competition in these segment.
Rooting for AMD on the GPU front to give Cuda a good challenge and also on the Intel front for their CPU.
> Intel keeps the PCIe bus slow and limits the number of IO lanes that an Intel CPU supports thus ensuring that GPU’s are always dependent on an Intel CPU to serve their workload
I was looking into building a multi-GPU machine for ML and was very confused as to why the latest Intel CPU has less PCIe lanes than the previous one, it didn't made any sense to me. Those sneaky Intel bastards...
GPUs aren't going to _replace_ CPUs any time soon because GPUs aren't capable nor designed for general-purpose processing. It's a fundamentally REALLY HARD problem to use GPUs to speed up any arbitrary computations (solve it and you'll be a shoo-in for the Turing Award).
Businesses aren't going to be "moving" to GPUs. The "age of NVIDIA" is primarily predicated on its role in accelerating training the machine learning algorithms hyped up as "deep learning."
And AMD still very much has an uphill battle on both the CPU and the GPU fronts.
But yes, the excitement around Intel that used to be there is now gone. You can probably blame the "death of the PC" -- we teach kids coding in the Bay Area, and every single kid has an iPad tablet, but not a laptop. Sure professional engineers like us still have x86 laptops, but the average person does not.
It's not just about accelerating ML, specifically deep learning. There are many other enterprise technologies that can benefit from GPUs. One example: OLAP-focused databases (such as MapD - https://www.mapd.com/). For some benchmarks, check out this blog: http://tech.marksblogg.com/benchmarks.html.
The DL "training" use-case is well-known at this point, but there are many others which are emerging.
A GPU database isn't that useful, because the arithmetic intensity (ops/byte) is relatively low. Cross-sectional memory bandwidth is what really matters; you can get similar effects with a cluster of CPU machines provisioned appropriately, with a shard or a replica of the database on each CPU machine. I say this as someone who has written a GPU in-memory database of sorts that is used at Facebook (Faiss), but what is interesting if you can tie that to something that has higher arithmetic intensity before or after the database lookup on the GPU.
GPUs are only really being used for machine learning due to the sequential dependence of SGD and the relatively high arithmetic intensity (flops/byte) of convolutions or certain GEMMs. The faster you can take a gradient descent step means the faster wall clock time to converge, and you would lose by limiting memory reuse (for conv/GEMM) or on communication overhead or latency if you attempt to split a single computation between multiple nodes. The Volta "tensor cores" (fp16 units) make the GPU less arithmetic bound for operations such as convolution that require a GEMM-like operation, but the fact that the memory bandwidth did not increase by a similar factor means that Volta is fairly unbalanced.
The point about Intel not increasing their headline performance by as much as GPUs is also misleading. Intel CPUs are very good at branchy codes and are latency optimized, not throughput optimized (as far as a general purpose computer can be). Not everything we want to do, even in deep learning, will necessarily run well on a throughput-optimized machine.
Actually, in columnar databases the ops/byte intensity is significantly greater, and the GPU helps here.
If you think about how a database CAN be built, instead of how they were built until now, you will find that there are very interesting ideas that can and do make use of the GPU.
The research into these has been around since 2006, with a lot of interesting papers published around 2008-2010.
There are also at least 5 different GPU databases around, each with their own aspects and suitable use-cases [1]...
I think they actually mean CPU = Intel, GPU = NVIDIA because strictly speaking Intel is also a GPU manufacturer, albeit low end GPUs. NVIDIA will play a major role in AI and simulation, that is clear.
I've always found Alex St. John to be a blow-hard, having read his articles dating back to Boot (Maximum PC). No to mention the fact that he creates malware like WildTangent.
In the DirectX comments, I'm struck by how interesting the strategic development of Intel and Microsoft is.
I'd never thought about it before, but there aren't many industries that could instantly (2-4 years) be killed solely by technical development.
If Intel had developed an OS-independent abstraction layer for devices, if Microsoft had pushed harder on ISA-independent programs, etc, how would today look?
Oh, Intel was certainly a contributor to Linux and Microsoft flirted with having Windows run on other hardware. But backwards compatibility prevented either from commoditizing their complement on the desktop.
There is a reason this partnership/cartel has been called "Wintel." Both knew it's in their best interest not to screw each other over by adopting other platforms more aggressively.
I think they've been increasingly more dissatisfied with that arrangement since Intel was trying to get into iPads/iPhones, and then Microsoft built Windows RT and Windows Phone for ARM-only.
Similarly, Google has been pushing Intel chips in Chromebooks (even though Chrome OS is architecture-agnostic, as you were hoping Microsoft to do with Windows), because they get those sweet discounts and early-access to Intel server chips. Plus, I think Intel is still on Google's board.
Well if google is pushing "intel" in chrombooks, it still doesn't explain why the intel chrombooks tend to have far more ram/disk space/etc than the ARM ones. Side by side, the ARM chrombooks seem like phones with big screens while the intel devices appear to be inexpensive ultrabooks.
One of the things Nvidia excels at, and has done very well in is supplying the right ecosystem for writing performant GPU code.
This really pushed the adoption of GPUs forward.
Writing good, fast, high performance code for many x86 nodes is still quite difficult. Nvidia's CUDA stack including Thrust, CUB, cuBLAS and other libraries really made it easy to write parallel code, without thinking too much about the lower level operations.
This post doesn't mention the strategic dilemma Intel has with GPUs. GPUs are fast because they have fast local memory, the core count is secondary, and some Intel chips have 22x16 vector processing elements clocked 2x higher than NVIDIA ( around 3k CUDA core equivalent ).
Why doesn't Intel make a fast local memory GPU style device. Well, they did (Xeon Phi) but if they give it a reasonable price point it would cannibalize their existing x86 market, which is more than the GPU market.
His main idea is that x86 will decline in GPU host roles because Intel is deliberately limiting PCIe bus width, but then he shows AMD (another manufacturer of x86 chips) which has no such interest, delivering what he's asking for basically right now.
He also seems to think that NVIDIA will be the sole winner, but if you think about it, AMD seems much better positioned. They manufacture chips which implement the most popular server ISA (x86) and on the same process they manufacture GPUs which are objectively more suitable for the tasks that are driving adoption(and starting to be, now that the software is catching up).
So really the situation is not "NVIDIA is killing x86", it's "Discrete GPU vendors are selling more GPUs than fit on Intel's x86 implementation. Which will drive the adoption of systems with lots of bus width."
It is, I think he ends with the possibility that with Arm being integrated into the GPU - a new possible world exists with the lack is a lack of Intel. With Linux and Windows Arm ready it could be very interesting the kind of possible system builds. For one having as many pci express lanes (or something completely different). Would be great to see a platform that is more friendly to extracting as much out of it as possible.
Wouldn't that be basically exactly the same as integrating the GPU into the CPU? This is something that AMD already does. The question is: does it have a considerable positive impact on compute workloads?. It doesn't seem like anyone is clambering for it, so I doubt it matters much. More peripheral bus bandwidth and connectors is probably all that anyone cares about.
Because this is the year that the first generation of self-hosting GPU’s are widely available on the market
Really? Then what's inside of 99% of smartphones and tablets today already? It's not like ARM cores + beefy GPUs is a brand new concept. In fact even the first Raspberry Pi featured such a combination.
Intel plays no role in the smartphones and tablets business already and anyone who doesn't care about x86 compatibility has been free to use whatever non-Intel accelerator in the datacenter / HPC space for years now. Not quite sure what the dramatic thing is that the author is implying has changed.
It's a fun article, but it actually drives me to a question: is it possible (by which I mean remotely feasible) to design a user experience from the ground up around a massively parallel system?
Let's say you had some low speed hardware to support a beefy GPU in a case, could a manageable operating system be built that wouldn't spend all of it's time in IO-wait? I like the idea and challenge in designing this way, but I don't see a way forward that doesn't just fall back on the architecture of core computational unit with specialized auxiliary units quite rapidly.
99,9% of code written will not be specific to ARM, nor will it be CUDA code.
Most programmers will still be busy assembling software from prefabricated low-level parts (which may have ARM optimizations or run on CUDA). If (for example) you use Tensorflow, you will not need to deal with CUDA, even though Tensorflow runs on top of it.
GPU programming is not for the faint of heart, debugging is much more tedious and you need to "think" parallel. You can't be a clown programming GPUs like you can with Javascript. There aren't a lot of jobs to go around either, even though they may be well-paid.
Most of developers are not working with code that requires top-notch computer power; I don't think that Nvidia/Intel rivalry will affect the market for people who know JS, CSS and HTML to a significant degree.
But generally speaking, learning Caffe, Tensorflow and math in general sounds like a much more important investment than low-level libraries for specific hardware.
I think this is a good point. One other thing I would like to emphasize, after having worked only a little while professionally, is that its not as important what specific architecture you choose while learning, except that maybe you will write code faster and already know many of the nuances. There are so many resources available to learn any architecture now, that the best advice would be to pick a reasonable one and just learn it; with that experience, one can then move with confidence between architectures.
Knowing the architecture of ARM is a perk, but the software is going to be all CUDA / OpenCL / Vulkan Compute - the CPU side is just going to be a shim layer to get kernels to execute on that graphics card and won't take much development and will hopefully just be the same open standard everyone uses.
I agree with you, except when they aren't hidden...
For some large percentage of the problems we fix I agree with you, but every once in a while you need to peel back a layer or two of abstraction and ask "Why did it do that?" Most abstractions leak, despite whatever heavy lifting they do for us.
Having familiarity with those underlying implementations can make that trouble-shooting much easier. So I guess it is up to each developer to understand how likely they are to peel back different abstractions. I suspect a JS dev will rarely want to look at the assembly and C/C++ might want to every week.
What struck me most was the part about another giant corporation abusing its monopoly position to screw a competitor, at the of the expense of the customer.
An intelligent market will always fall victim to a tragedy of the commons scenario. Consumers will take the quick buck and let someone else worry about the collective future impact.
I had wondered whether Microsoft and Intel were more a fraternity or an uneasy alliance. This article gives hard details about how they have always been forging sabers to cut each other's throats whenever the time was ripe.
It also woke me up to Intel's hardware throttling, starving the graphics unit of enough busses. I had been in awe of PCIe's bandwidth, but now I know it could be so much more. It reminds me that if you focus on microspecs doubling every few years, you think there's breathtaking progress. But if you take a step back at the overall result, computing is moving much more slowly. For example, it seems like the average laptop has always had 6-9 hours of battery life.
For a moment they made me feel like it was 2007 again and I was watching an Apple keynote. I didn't expect to have that feeling again from any company.
One disagreement i have with this otherwise great piece is that the future may very well end up becoming MIMT/D. MIMD is a more flexible programming model than the lockstep SIMT of GPGPU and examples like the Rex Neo, Adapteva Epiphany and OpenPiton show that efficiency often supersedes GPUs.
SIMT is inherently more power efficient than MIMD. Less control flow logic/flop. Even then, it makes sense to devote dedicated logic for specific algorithms. Even NVIDIA GPUs (Volta) are going to have special matrix multiply hardware (tensor cores) to increase power efficiency and performance.
The future lies not in flexible programming model but dedicated hardware/IP. Look at the crypto block, ISP, h264/265 encoding/decoding, and now tensor cores. It's mentioned in what seems like every architecture paper in the last ten years, but dark silicon is driving the need to differentiate compute into smaller blocks. We can pack more and more transistors into a chip, but we can only power a smaller and smaller section of it at any given time. It only makes sense that we make whatever that can be powered on be as efficient as possible.
In theory yes, but clean restarts like the Adapteva Epiphany and the Rex Neo can get better efficiency than GPUs because they don't suffer from legacy issues while still running legacy OpenCL code.
As for the matrix multiplier ASICs like the tpu and Volta, I consider them to be incredibly uncreative and an insult of sorts to computer architecture to call that a "deep learning processor". What happens when tomorrow SPNs or graph ConvNets dominate? A proper application specific processor will be able to adapt and still maintain efficiency.
Obviously i have some bias and hubris here, but our simulations show consistently superior efficiency to the tpu while running the same workloads while still retaining the ability to adapt to other computational graphs that TensorFlow may choose to run.
The article's thesis, "Well it appears that the GPU era of computing is finally here! Intel is in deep trouble," has an implicit assumption that Intel's future mostly depends on processing power. Is that really the case?
Even with NVDA's recent rally, INTC is still very close to double NVDA's size ($168bn vs $88bn market cap). The point being, the author's statement that the x86 party is over seems a bit far off at the moment...
" 2017, the year GPU’s finally begin to permanently displace the venerated x86 based CPU"
he never specifies what he means by displace exactly, and follows up with vacuous statements like "x86 party ends in 2017". The entire thing sounds like trying to manufacture a momentous event out of a gradual shift of market profits and priorities over time. Best you could hope for is to state that Intel has failed to take a significant market in high end throughput computing (ie. desktop and HPC GPUs), but its doing quite well with integrated GPUs.
It's very wordy and takes a lot of reading, but he does have a pretty solid point. 1) He is talking about enterprise computing, and makes that clear with:
"
Up until now Intel has held a dominant monopoly over Enterprise computing for many years, successfully fending off all challengers to their supremacy in the Enterprise computing space. This dominance is ending this year and the market sees it coming.
"
So integrated graphics like you mention is irrelevant.
Then at the end he lists why he thinks that with links:
Softbank bought ARM and funded NVIDIA, who announced an ARM & NVIDIA integrated enterprise computing product. IBM is supporting NVIDIA with a POWER and NVIDIA integrated enterprise computing product, and AMD is supporting NVIDIA in Ryzen by providing lots of PCIe bandwidth to the graphics card to support compute tasks.
Yes, at least for gaming. (Don't know about DNN.) Single-precision is the only kind that GPUs supported until CUDA happened.
Around 2011, I got my feet wet in CUDA and tried to calculate quantum waveforms (using a method that is mostly matrix multiplications and FFTs). I eventually went back to doing stuff on the CPU because GPU memory was too small in the systems that I had access to (256 MB), which restricted me to one job at a time, whereas the CPU (a contemporary i7) had enough cores and memory to do 4 jobs in parallel. And I needed double precision, which the GPU could only execute at a tenth the speed of a single-precision job. Also, with the GPU, I was restricted to running jobs during the night since those systems were desktops that were also used for classes. Whenever one of my calculations ran, it would occupy the GPU completely, this rendering the graphical login unusable.
I reckon that the situation would look much more favorably for the GPU today, esp. because of the larger memory sizes and because double-precision speed has caught up. But yeah, the most common uses need only single-precision.
GeForce 5xx series came out in 2010 (https://en.wikipedia.org/wiki/GeForce_500_series) and NONE of them had less than 1GB of memory. Idk what GPU you used, but it was old technology at that point.
Probably. Whoever bought those machines probably didn't realize that GPU performance was quickly becoming a relevant metric for scientific computation.
Actual translation: SAP and NVIDIA partner to milk the fad for all its worth -- it wont amount to much in the grand scheme of things, as it's an inconsequential part of enterprise computing.
If you are trying to compare it to the year of Linux, it would be wise to look at how the market changed in general rather than how the market changed for Microsoft. Linux did not decimate Windows, but it has certainly hurt other platforms and it has likely hurt Microsoft's reach.
Likewise, I doubt that GPGPU applications will decimate Intel. It will likely hurt Intel's reach. This is particularly true if AMD, ARM, and IBM are facilitating the use of to GPU as a high performance coprocessor.
Even gradual events have recognizable tipping points or points of no return. I think that is what he is calling 2017 for the gradual transition from CPU dominance to the future of GPU/FPGA dominance. Or just CPU irrelevance.
Don't know how that article you link has any relevance to the facts and analysis he presents here.
Are the slides from the kotaku post really that bad? I mean the one on girlfriends/wives is pretty nasty (although has a nugget of truth), but the rest of it seems reasonably valid. It doesn't seem as "exploitative" as kotaku is trying to color it, it's just a rational take on hiring practices that the vast majority of the world ignores in exchange for hiring degrees and certificates.
They look pretty bad to me, and I skipped most of Kotaku's commentary and analysis. FWIW, I've done 10 years in games, gone through 5 games crunches that were longer than 6 months each.
"Coding is NOT WORK. People who think it is aren't real software engineers."
That's some highly macho driven bullshit that is untrue and somewhat meaningless to boot. People who think coding is not real work should be able to easily hire a AAA title producing team without paying them anything. Yeah, good luck.
Same slide "High expectations, long hours, new challenges and a customer/market driven mission are motivating for real engineers."
There's no reason that "long hours" needs to be in that sentence, and no evidence that it's true. These sentences on this slide are working to rationalize uncompensated hours of work.
"The Young the Old and the Useless" is downright inflammatory, even if there are elements of truth. It's negative stereotyping to suggest good engineers have no social skills, can't make eye contact, and will marry the first girl they can. The last sentence suggest's their value is high because they will keep a job regardless of the working conditions. If that's true, it's immoral to exploit.
There are some people who aren't good communicators and who won't quit a bad job unless it's really extremely bad. But personally, I don't believe that the best hires are the antisocial engineers that won't stick up for themselves. The best hires are the people that have both good engineering and good communication skills. The best hires are the ones that are hard to keep because they have good options and aren't afraid to take them.
The only reason to prefer people who don't change jobs and don't stick up for themselves is because it's easier than being a good manager. The stuff on these slides is perpetuating the problems that are causing crunch times in the games industry. These ideas aren't forward looking, they aren't setting goals for being better at making games or building healthy companies. These slides are making excuses for taking the cheap way out by exploiting engineers. This isn't just bad in the unethical way, it's also deeply lazy.
If anything, it helps to reinforce his credibility on the GPU issue. Sure, he's a cretin for taking advantage of the young, inexperienced, and under-socialized, but there is a lot of personally-validatible truth in those insights.
GPU computing is economically feasible only because the PC and console gaming markets subsidize it. FPGAs are powerful resources for certain I/O-intensive tasks, but -- at least until VR takes over Real Soon Now -- they have no mainstream appeal to game developers.
So we won't see the kind of organic growth in FPGA computing that we've seen from GPUs over the last 10-15 years. If Intel is counting on that, they (and their stockholders) are in for disappointment.
Ugh, his slides from that Kotaku article are especially cringeworthy. The perception of that type of attitude from management is what causes unions to form. He's right out front with a playbook how to exploit your employees.
Google's TPU paper [0] showed the CPUs were relatively competitive in the machine learning space (within 2x of a K80). It's not true that x86 doesn't perform on these workloads.
The existence of the TPU itself threatens Nvidia's dominance in the ML processor space. Google built an ASIC in a short time period that more than rivals a GPU on these tasks. The TPU performance improvements (section 7) make it look very straightforward to get even better performance with a few more years of development effort. With developers moving to higher level libraries, migration between GPU/CPU/TPU becomes painless, so they'll just go with whatever has the lowest TCO. (Google hosted TPUs?)
Aside from machine learning tasks, the author seems to be advocating for the cpu/gpu combinations that AMD is already selling to game console manufacturers. Granted, Nvidia has a piece of this via the Switch. If Microsoft/Qualcomm goes full-on with their ARM-based x86 emulation, then perhaps a future ARM-based Xbox is in the cards driven by an Nvidia chip? /speculation
[0] https://arxiv.org/abs/1704.04760