AMD's "Peano" – An LLVM Compiler for Ryzen AI NPUs

yukIttEft · on June 9, 2024

Generally speaking, AI Engine processors are in-order, exposed-pipeline VLIW processors. Each VLIW instruction bundle specifies the behavior of one or more functional units, which begin executing a new instruction at the same time. The processor pipeline does not include stall logic to enforce data dependencies, and instructions will continue executing in order regardless of other instructions in the pipeline. As a result, the compiler is able to schedule machine instructions which access the same register in ways that potentially overlap

So a software scheduler? I had no idea it works like this. Very interesting.

sr-latch · on June 9, 2024

If you've taken a compilers course, it's pretty similar to register allocation. You want to make sure you're using as much of the processor as possible.

jauntywundrkind · on June 8, 2024

I'm so interested to see what non-ML pipelines end up being possible & useful. Can we cobble these things into sound DSP processors? Or image convolvers? I'm not sure what architectural constraints there would be that would limit things down beyond what llvm can compile.

> Note that these accelerators include an array of processors, while the LLVM backend only supports a single processor. Support for devices as a whole is available in open source tools based on MLIR

Would be neat to see what interop with something like IREE looks like. Sure seems like MLIR is leading the rest-of-industry's (non-NV) efforts.

imtringued · on June 9, 2024

It might be possible to make video encoders run on these NPUs. If I remember correctly the 8700G had 32 tiles. Even if the vectorization performance is weak, this should still encode a single 1080p stream at no CPU usage for the main CPU.

almostgotcaught · on June 8, 2024

> Would be neat to see what interop with something like IREE looks like

https://github.com/nod-ai/iree-amd-aie

Archit3ch · on June 8, 2024

> Can we cobble these things into sound DSP processors?

Not any more than the FPGAs they are based on. You need hard floating point for audio. I bet FFTs will be much faster on this hardware, though...

Neywiny · on June 8, 2024

The AI engines have floating point support. The AIE-ML emulates it. I don't know which kind these are based off of. The DSP58 of the Versal (which has the AI engines) also has floating point

flakiness · on June 9, 2024

This move deserves big credit for them. These NPUs are otherwise blackboxes you can access only through some high level APIs like DirectML on windows or NNAPI on Android. Although Qualcomm's NPU (Hexagon) is relatively open as their compiler is also based on LLVM and their arch is upstreamed to a certain degree, it is inherently limited as it demands code signing. Other CPU vendors are worse.

imtringued · on June 9, 2024

You could previously program them with vitis AI at the cost of it occupying hundreds of GBs of your SSD.

gumby · on June 8, 2024

Seems like you could only use very small integers or your program size would become enormous.

user_7832 · on June 8, 2024

Does anybody have any idea what sort of things are possible with such a tool?

almostgotcaught · on June 8, 2024

It's an LLVM fork so basically the same things that are possible with LLVM.

yarg · on June 8, 2024

According to the article and the Xilinx engineer, it's a backend not a fork.

> "Peano" is the apparent name for their new LLVM compiler back-end supporting the Ryzen AI SoCs and other AMD/Xilinx AI engines. Stephen Neuendorffer of AMD/Xilinx and part of the "Peano team"

> On behalf of AMD, I’m pleased to announce the open sourcing of an LLVM backend for AMD/Xilinx AI Engine processors.

https://discourse.llvm.org/t/peano-llvm-support-for-amd-xili...

almostgotcaught · on June 8, 2024

yes a new backend was added to the fork of LLVM

yarg · on June 9, 2024

> (GitHub - Xilinx/llvm-aie: Fork of LLVM to support AMD AIEngine processors)

Literally the second line of the page I linked to - sometimes I really need to learn to read.

ladyanita22 · on June 8, 2024

Basically this is written in C