Hacker News new | past | comments | ask | show | jobs | submit login
Does not GPU compute (alexvoica.com)
76 points by alexvoica on June 2, 2016 | hide | past | favorite | 33 comments



For actual mobile devices, yes there is no need. FP64 only has a use in scientific research, maybe finance and a few other fields. Even there you would do a lot of mixed precision stuff.

The reason that the support was there was probably that they wanted to design a single chip and either remove/disable cores for truly mobile or general purpose boards while having the logic available for customers that would actually want it.


I once needed FP64 in a GPU for physics calculations. One reason that impulse/constraint won out over spring/damper is that spring/damper has a total loss of precision problem with 32-bit floats.


I don't remember there ever was a time were FP64 was considered a big deal for mobIle GPUs. The article is trying to debunk a claim that was never popular to begin with.


Allow me to refresh your memory then. How does the following sound to you?

"We also support Full Profile and 64-bit natively, in hardware. After years of evangelising the benefits of such an approach it is nice to see other players in the industry join down this avenue." https://community.arm.com/groups/arm-mali-graphics/blog/2013...

"Mali-T622 was specifically tailored for this job. Mali-T622 also supports OpenCL Full Profile and includes double-precision FP64 and full IEEE-754-2008 floating-point support which are essential features in order to enhance the user experience" https://community.arm.com/groups/arm-mali-graphics/blog/2013...

I could go on with the examples but I think there's no need to spam the thread with tens of blog articles that say FP64 and "native 64-bit" (whatever that means) are essential to the mobile experience.


Is there any point in actually damning FP64 this hard anymore? There is no reason, imo, for a modern GPU to get worse than 1/3rd performance on FP64 over FP32.

Side note: Non-Quadro/Tesla "professional" GeForces have FP64 performance specifically locked out, and AMD has started also doing this with GCN-era chips.


My focus was on mobile, not desktop GPUs. However, I've noticed that energy and area efficiency are starting to become more relevant in desktop too.

At the end of the day, I guess it's all about finding the right balance for your target application.


Even if you're doing fintech simulations, FP16 could well be plenty of precision for a first pass, and then you'd get all those extra cores and ops/watt.

FP64 seems like a very small use case for most of the parallelized workflows I can imagine.


For scientific workloads, double precision is a must have. The 7 digits of FP32 is not enough. In my lab, we haven't updated our Kepler based GPU since 2013 for this reason.


I agree but the article has an answer to this.

Since I don’t plan to run any DNA analysis or fintech simulations on my smartphone anytime soon, I am very satisfied having FP32/FP16 precision in mobile right now. And so should you.


You are right, I thought vessenes meant: "FP64 seems like a very small use case for most of the parallelized workflows I can imagine" for any platform.


That makes sense. I'm assuming that the majority of GPUs are not running scientific workloads, instead deep learning and matrix ops for finance.


It's not just mobile either. The original Titan card released in early 2013 has much higher FP64 performance than any later Nvidia card.


It may be that mobile graphics gets by fine with FP32 or less, but I worry that if FP64 gets sidelined then none of the effort going into this hardware today will benefit applications that need real precision like science, weather prediction, GPS, etc.


Not really a problem. NVIDIA sells cards based around 32-bit (and now increasingly, 16-bit) ALUs for desktop usage, while offering more expensive ones with more 64-bit-focused ALUs for workstations and compute. Compute is important enough to their bottom line to justify it.

The real problem is that NVIDIA has compute locked down with CUDA. Mobile chipset vendors can't expand into compute if they're barred from entry at the API level.


Vulkan/SPIR-V looks promising just needs chip vendors (ARM, Qualcomm, AMD, Intel) to come together and invest in CuDNN equivalents.

Although I reckon deep learning on mobile (at least for some use cases like cameras) will use dedicated silicon from Movidius etc and ultimately be embedded in the camera chips directly


Given the quality of OpenCL and its cross platform nature, it's amazing that everything is still written directly for CUDA...

There were no CUDA -> OpenCL toolchains last time I checked. Which is even more frustrating.


There is CU2CL: http://chrec.cs.vt.edu/cu2cl/

The cross-platform nature is actually part of the problem--the whole point of doing GPGPU work is that you're playing to the hardwares' strengths, which can be difficult when the hardware can be nearly anything from a CPU to a GPU to an FPGA.

It doesn't help that until recently, AMD hasn't tried to push OpenCL nearly as hard as nVIDIA pushes CUDA.


Modern AMD and NVIDIA GPUs are fairly similar hardware-wise, and it is not hard to write OpenCL code that executes efficiently on both. I agree that it is pretty hopeless to write performance-portable OpenCL across entirely different architectures, however.


Sure, but if you go with nVIDIA, you also get access to all the other goodies they distribute (thrust, cudaFFT, cudaDNN, etc) and all the CUDA-compatible stuff other people have written, like Theano and TensorFlow.

It does seem like people have gotten a little more interested in OpenCL lately, but it still lags pretty far behind. As dharma1 says below, AMD seems weirdly uninterested in catching up. If I were in change of AMD, I'd be throwing money and programmers at this: "Want to port your library to OpenCL? Here, have a GPU! We'll help."


AMD management has completely missed the memo on deep learning. No mention of deep learning or FP16 perf yesterday when Polaris was announced - it was all around VR.

They are just not turning up to the party and as a company are running out of time if Polaris and Zen dont sell.


> Given the quality of OpenCL and its cross platform nature

I'm sorry, WHAT? OpenCL is absolute shit. Cumbersome API definition, lack of low-level control, stringly typed programs (all programs are provided as strings and kernels are identified with those too). Which means nearly no compile-time feedback, it's hard to embed GPU kernels into a single binary. The API is woefully lacking in flexibility (no dynamic launch), OpenCL 2.0 is better, (EDIT: Apparently AMD supports it now, I'd have to check whether Intel/NVidia have also added support), but no one supports it so it's also irrelevant.

Not only that, AMD hardware is terrible. Atomics on NVidia's maxwell are orders of magnitude faster than on AMD (to the point of being comparable to non-atomic operations with low contention).

CUDA's environment provides: Better documentation, better feature support, saner development and debugging, possibility to ship both generic & specialised binary kernels, JITtable kernels in intermediate representation, better compile time sanity checking, the ability to generate your own IR/CUDA assembler from non CUDA languages...

The reason everyone does CUDA and uses NVidia is because there's zero real competition. AMD is the only company that cares about OpenCL, Intel and NVidia just implement the bare minimum to have AMD's OpenCL code be portable to them. Intel has OpenMP and TBB for the Phi, NVidia has CUDA.

To me it's crazy that anyone keeps mentioning OpenCL as a serious alternative. In theory I agree that an open standard would be nice, but over here in reality where I have to actually write code there is no realistic alternative to CUDA if you want to stay sane.


You write OpenCL if you want to target anything other than AMD/NVIDIA/Intel. If you're writing code for an embedded application (with some heterogeneous core), or for a mobile application, you absolutely have to write OpenCL code, as there's no alternative. OpenCL is shit, but it's cross platform shit.

If your aim is to get 100% performance in a GPU heavy cluster, then sure, you're going to need to write CUDA code, and buy some NVIDIA GPUS, however there are a lot of applications which run in entirely different environments which _only_ support OpenCL.


> for a mobile application

Not really, OpenCL doesn't have any real foot on mobile.

Android uses their own Renderscript dialect instead of OpenCL and iOS moving away from OpenCL to Metal Compute.

And the dying WP uses C++ AMP.


Yes, but Vulkan will change that since it is an API designed for graphics and compute.


It's of course not about OpenCL, but AMD recently released their HIP tool that make it easier to target HSA using exist CUDA code.

https://github.com/GPUOpen-ProfessionalCompute-Tools/HIP


> Given the quality of OpenCL...

What quality?

CUDA has had support for Fortran, C++ and any language with compilers that could target PTX from the early days.

Also the graphical debugging tools are quite good. Debugging GPUs feel just like debugging any other application.

Meanwhile on OpenCL land, plain C and printf debugging.

Only with OpenCL 2.0 Khronos finally started to address these issues, so what quality?!


Does anyone actually implement OpenCL 2.0 yet? Last I checked not even AMD supported it, and they're the only company that has a reason to care about advancing OpenCL.



A more optimal approach might be to look at what you are trying to accomplish and figuring out the scale that works best for what a local view appears to be.

For example, if I imagine a game world where it's open, but there are natural limits to useful render distance, then it is possible to define absolute maximum scale sizes.

My new to this problems space view is that even if the world is larger than those sizes, there is probably still some limited observer and scale that makes sense. Building in some spare room and padding in to the scale and it can then be transformed to center on different points. As movement towards one of those points happens the new centering for each object could be pre-computed in spare cycles (or at least spread out so it isn't a single noticeable hit).


> My new to this problems space view

I'm going to be downvoted because this isn't particularly on-topic, but nonetheless I'd like to suggest you try hyphenating phrases like this to make them easier to read, so you don't construct garden-path sentences where the correct parse exists but isn't obvious.

Thanks!


Being off-topic isn't necessarily cause for downvotes, and correcting English usage is often welcomed by people with a different native language.

Saying you're going to be downvoted for something on the other hand, is more likely to get people to downvote you.

In fact, some people will always downvote posts that have words to that effect.


Garden path sentences make pleasent experiences in parsing that accentuate the means to the end resulting communication. Smell the roses


I like garden path sentences in my Joyce, but keep them off HN, please.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: