Just as going from scalar to vector instructions provides a speedup so does going from vector to matrix instructions. If you've got big vectors than the amount of parallelism exposed for more hardware execution resources used on isn't too big but the reduction in register file read port usage is pretty significant.
Also, inference is usually happy with int8s whereas graphics workloads are mostly float32s. So you can save a lot of hardware that way too.
Why are graphics workloads float32? 32bit (1million+alpha) which is higher color resolution than most eyes can see - "true color" - is 3 8-bit ints + an 8 bit alpha channel (sometimes)
Because before you can see a color, you have to compute it. For example, you need to calculate what color would result from the interaction of a light source of a given color / intensity and a surface of a given color / reflectivity / glossiness etc. There's no way to reasonably compute that using just 8-bit ints without getting terrible banding / quantization artifacts.
I am 100% positive that they considered that before starting to work on this ASIC. Some possible reasons: GPUs are insufficiently specialized, or use too much energy on the subset of work that Apple wants to enable with this chip. GPUs are too large. GPUs are busy doing other things, etc.
They considered it sure - but why did they conclude they should go with an ASIC? That's what grandparent asked and it was a reasonable question. "They considered that" isn't a suitable answer.
I think when you add "why try to reinvent the wheel?" to the end, it is less of a question and more of a statement. Similar to saying, "Why would you do that?" after someone does something silly. You aren't actually asking them why they'd do the thing. You're saying they ought not have.
The rest of the reply seems to be an answer (as far as you can get with Apple)
I don't think it's wild speculation to say Apple is looking for efficiencies they may not have been able to get with GPUs especially performance-per-watt since so many of their devices are mobile focused
To be fair to nsxwolf, I did not originally explain. I tend to gradually expand my comments. The first iteration was just lashing out at this trend I see on this site which I highly disapprove of: facile reactions to any work that the commenter does not understand. I really detest this reaction that boils down to, "I once heard about a tensor, so clearly I have a better idea of whether this chip should be invented than the experts working at Apple."
despite gpus being fairly "general purpose" these days, there is still a lot of circuitry built for graphics pipeline like workloads. If you just want to do linear algebra you just need a high bandwidth interface to memory and lots of math units.