Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is launched in response to MI300X, and this should still not be enough to match AMD's product. This launches 2 quarters after MI300X, but B100 should arrive before AMD's MI400 generation.


AMD always launches impressive hardware specs. But they are way behind in software, which is more important than hardware


StableHLO[1] and IREE[2] are interesting projects that might help AMD here, from [1]:

> Our goal is to simplify and accelerate ML development by creating more interoperability between various ML frameworks (such as TensorFlow, JAX and PyTorch) and ML compilers (such as XLA and IREE).

From there, their goal would most likely be to work with XLA/OpenXLA teams on XLA[3] and IREE[2] to make RoCM a better backend.

[1] https://github.com/openxla/stablehlo

[2] https://github.com/openxla/iree

[3] https://www.tensorflow.org/xla


If AMD launches hardware that is clearly faster, the software will move towards it


That's exactly what the CUDA monopoly is meant to prevent, and as a fervent supporter of OpenCL (with two commercial apps), this is exactly the case I always make: even if some GPU came out tomorrow costing $0 and with infinite performance, all these people who paint themselves into a corner are hosed.

Not that anyone cares, and everyone keeps using CUDA while simultaneously complaining about Nvidia GPU prices, as if those two things have nothing to do with each other...


Have you had good experience with this for portability though? On what classes of hardware and OS?

I did a bit of work in OpenCL almost 10 years ago, and found it decently portable on a range of NVIDIA GPUs as well as Intel iGPUs. On the high end I used something like the Titan X while on the low end it was typical GPUs found in business class laptops.

But my limited exposure to AMD was terrible by comparison. Even though I am away from that work now, I still tend to try to run "clpeak" and one of my simpler image processing scripts on each new system. And while I liked a Ryzen laptop for general use or even games, it seemed like OpenCL was useless there. It seemed my best option was to ignore the GPU and use Intel's x86_64 SIMD OpenCL runtime.


Yes, even ~2012 OpenCL code works incredibly well today for spectral path tracing: https://indigorenderer.com/indigobench

Also my fractal software incl OpenCL multi-GPU / mixed plaftorm rendering: https://chaoticafractals.com/

Both work on [ Nvidia, AMD, Intel, Apple ] x [ CPU, GPU ].

Some of the shared code here: https://github.com/glaretechnologies/glare-core

Don't let anyone tell you OpenCL is dead! Keep writing OpenCL software!!


AIUI, your current best bet for good OpenCL implementation on less-than-cutting-edge AMD hardware is the Mesa Project's RustiCL work.


My own understanding is that OpenCL is semi-obsolete at the moment (although newer standards revisions are still coming out, so this may change in the future) with forward-looking projects mostly targeting Vulkan Compute or SYCL.

(There are some annoying differences in the low-level implementations of OpenCL vs. Vulkan Compute, due to their being based on SPIR-V compute "kernels" vs. "shaders" respectively, that make it hard for them to interop cleanly. So that's why the choice can be significant.)


OpenCL tooling has always been bad versus CUDA, and that isn't NVidia to blame, rather Intel and AMD.

Only C, C++ and Fortran were never taken seriously enough, other language stacks never considered.

Thus everyone that enjoyed programming in anything not C, with great libraries and graphical debuggers flocked to CUDA, now remains to be seen if SYCL and SPIRV will ever matter enough to regain some of those folks back.


The probability of a processor emerging that is so much cheaper that it warrants moving over, but that also its API is the OpenCL that you already target is very low. Aim for what is probable, not what is theoretically possible.


Most people who utilize this hardware aren't programming kernels directly for the GPU, they're using abstraction layers like pytorch, tensorflow, etc. For the developers of those type of frameworks, cuda itself offers a lot of libraries like cublas.

There's relatively few people capable of implementing these frameworks without a solid cuda-like foundation, and those that do exist would need a very strong incentive to do it.


If they were allowed to get significantly ahead that status quo would likely be disrupted pretty fast.


I thought CUDA was NVIDIA’s moat. Is that no longer the case, or did AMD come up with a good alternative?


The vast majority of work in ML isn't people working with CUDA directly - people use open source frameworks like PyTorch and TensorFlow to define a network and train it, and all the frameworks support CUDA as a backend.

Other backends are also available, such as CPU-only training. And you can export networks in reasonably-standard formats.

nvidia's moat is much more mature framework support than AMD's cards; widespread popularity due to that good framework support, ensuring everyone develops on nvidia, thus maintaining their support lead; much faster performance than CPU-only training; and a price that, though high, is a lot less than an ML developer's salary.

If you need 24GB of vram and nvidia offers that for $1600 while AMD offers it for $1300, how many compatibility problems do you want to deal with to save a single day's wages?

But nvidia's moat is far from guaranteed. Huge users like OpenAI and Facebook might find improving AMD support pays for itself.


> Huge users like OpenAI and Facebook might find improving AMD support pays for itself.

At that scale they may actually develop their own hardware a la Google TPU.

If you want to just focus on the AI problem and not on infrastructure, just use NVidia. If you want control and efficiency, design your own. AMD kind of falls in a weird middle ground with respect to the massive companies.


Google’s focus on TPUs have caused them many issues though other giant players might see spending FTE-equivalents on PyTorch etc development as a better investment choice.


CUDA code can be forward-ported to AMD's HIP, which can be used with the ROCm stack. For a more standards-focused alternative there's also SYCL, which has implementations targeting a variety of hardware backends (including HIP) and may also target Vulkan Compute in the future.


> CUDA code can be forward-ported to AMD's HIP, which can be used with the ROCm stack.

Maybe in some cases, but that doesn't even really matter since hardware support is poor.


StableHLO[1] and IREE[2] are interesting projects that might help AMD here, from [1]:

> Our goal is to simplify and accelerate ML development by creating more interoperability between various ML frameworks (such as TensorFlow, JAX and PyTorch) and ML compilers (such as XLA and IREE).

From there, their goal would most likely be to work with XLA/OpenXLA teams on XLA[3] and IREE[2] to make RoCM a better backend.

[1] https://github.com/openxla/stablehlo

[2] https://github.com/openxla/iree

[3] https://www.tensorflow.org/xla




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: