Generally speaking, AI Engine processors are in-order, exposed-pipeline VLIW processors. Each VLIW instruction bundle specifies the behavior of one or more functional units,
which begin executing a new instruction at the same time. The processor pipeline does not include stall logic to enforce data dependencies, and instructions will continue
executing in order regardless of other instructions in the pipeline. As a result, the compiler is able to schedule machine instructions which access the same register in ways that potentially overlap
So a software scheduler? I had no idea it works like this. Very interesting.
If you've taken a compilers course, it's pretty similar to register allocation. You want to make sure you're using as much of the processor as possible.
I'm so interested to see what non-ML pipelines end up being possible & useful. Can we cobble these things into sound DSP processors? Or image convolvers? I'm not sure what architectural constraints there would be that would limit things down beyond what llvm can compile.
> Note that these accelerators include an array of processors, while the LLVM backend only supports a single processor. Support for devices as a whole is available in open source tools based on MLIR
Would be neat to see what interop with something like IREE looks like. Sure seems like MLIR is leading the rest-of-industry's (non-NV) efforts.
It might be possible to make video encoders run on these NPUs. If I remember correctly the 8700G had 32 tiles. Even if the vectorization performance is weak, this should still encode a single 1080p stream at no CPU usage for the main CPU.
The AI engines have floating point support. The AIE-ML emulates it. I don't know which kind these are based off of. The DSP58 of the Versal (which has the AI engines) also has floating point
This move deserves big credit for them. These NPUs are otherwise blackboxes you can access only through some high level APIs like DirectML on windows or NNAPI on Android. Although Qualcomm's NPU (Hexagon) is relatively open as their compiler is also based on LLVM and their arch is upstreamed to a certain degree, it is inherently limited as it demands code signing. Other CPU vendors are worse.
According to the article and the Xilinx engineer, it's a backend not a fork.
> "Peano" is the apparent name for their new LLVM compiler back-end supporting the Ryzen AI SoCs and other AMD/Xilinx AI engines. Stephen Neuendorffer of AMD/Xilinx and part of the "Peano team"
> On behalf of AMD, I’m pleased to announce the open sourcing of an LLVM backend for AMD/Xilinx AI Engine processors.
So a software scheduler? I had no idea it works like this. Very interesting.