No direct Metal target.

almostgotcaught · on Dec 14, 2024

this is an LLVM project... you want this to work on Metal, ask apple to add a Metal backend to LLVM

https://github.com/llvm/llvm-project/tree/main/llvm/lib/Targ...

rbanffy · on Dec 14, 2024

I am surprised there isn't.

JonChesterfield · on Dec 14, 2024

No Intel either. The port would be easy - gpuintrin.h abstracts over the intrinsics, provide an implementation for those, write a loader in terms of opencl or whatever if you want to run the test suite.

The protocol needs ordered load/store on shared memory but nothing else. I wrote a paper trying to make it clear that load/store on shmem was sufficient which doesn't seem to be considered persuasive. It's specifically designed to tolerate architectures doing slopping things with cache invalidation. It could run much faster with fetch_or / fetch_and instructions (as APUs have, but PCIe does not). It could also hang off DMA but that isn't implemented (I want to have the GPU push packets over the network without involving the x64 CPU at all).