[I work on SCALE] Mapping inline ptx to AMD machine code would indeed *suck*. Co...

JonChesterfield · on July 15, 2024

Huh. Inline assembly is strongly associated in my mind with writing things that can't be represented in LLVM IR, but in the specific case of PTX - you can only write things that ptxas understands, and that probably rules out wide classes of horrendous behaviour. Raw bytes being used for instructions and for data, ad hoc self modifying code and so forth.

I believe nvcc is roughly an antique clang build hacked out of all recognition. I remember it rejecting templates with 'I' as the type name and working when changing to 'T', nonsense like that. The HIP language probably corresponds pretty closely to clang's cuda implementation in terms of semantics (a lot of the control flow in clang treats them identically), but I don't believe an exact match to nvcc was considered particularly necessary for the clang -x cuda work.

The ptx to llvm IR approach is clever. I think upstream would be game for that, feel free to tag me on reviews if you want to get that divergence out of your local codebase.

ckitching · on July 16, 2024

I certainly would not attempt this feat with x86 `asm` blocks :D. PTX is indeed very pedestrian: it's more like IR than machine code, really. All the usual "machine-level craziness" that would otherwise make this impossible is just unrepresentable in PTX (though you do run into cases of "oopsie, AMD don't have hardware for this so we have to do something insane").

JonChesterfield · on July 16, 2024

It's a beautiful answer to a deeply annoying language feature. I absolutely love it. Yes, inline asm containing PTX definitely should be burned off at the compiler front end, regardless of whether it ultimately codegens as PTX or something else.

I'm spawned a thread on the llvm board asking if anyone else wants that as a feature https://discourse.llvm.org/t/fexpand-inline-ptx-as-a-feature... in the upstream. That doesn't feel great - you've done something clever in a proprietary compiler and I'm suggesting upstream reimplement it - so I hope that doesn't cause you any distress. AMD is relatively unlikely to greenlight me writing it so it's probably just more marketing unless other people are keen to parse asm in string literals.

saagarjha · on July 16, 2024

nvcc is nowhere near that bad these days, it supports most C++ code directly (for example, I've written kernels that include headers like <span> or <algorithm> and they work just fine).

ckitching · on July 16, 2024

NVCC is doing much better than before in terms of "broken C++". There was indeed a time when lots of modern C++ just didn't work.

Nowadays the issues are more subtle and nasty. Subtle differences in overload resolution. Subtle differences in lambda handling. Enough to break code in "spicy" ways when you try to port it over.

saagarjha · on July 17, 2024

What do you think the source of this is? My understanding was that Nvidia is basically adopting the clang frontend wholesale now so I'm curious where it differs.

ckitching · on July 19, 2024

The LLVM manual touches on some of the basics of why: https://llvm.org/docs/CompileCudaWithLLVM.html#dialect-diffe...