>The ANE and tensor cores are not comparable though They're both built to do the...

>The ANE and tensor cores are not comparable though

They're both built to do the most common computation in AI (both training and inference), which is multiply and accumulate of matrices - A * B + C. The ANE is far more limited because they decided to spend a lot less silicon space on it, focusing on low-power inference of quantized models. It is fantastically useful for a lot of on-device things like a lot of the photo features (e.g. subject detection, text extraction, etc).

And yes, you need to use CoreML to access it because it's so limited. In the future Apple will absolutely, with 100% certainty, make an ANE that is as flexible and powerful as tensor cores, and they force you through CoreML because it will automatically switch to using it (where now you submit a job to CoreML and for many it will opt to use the CPU/GPU instead, or a combination thereof. It's an elegant, forward thinking implementation). Their AI performance and credibility will greatly improve when they do.

>you really need to compare against the GPU

From a raw performance perspective, the ANE is capable of more matrix multiply/accumulates than the GPU is on Apple Silicon, it's just limited to types and contexts that make it unsuitable for training, or even for many inference tasks.