The Cortex-A53/A57 was the first core to implement it. And a Mali+Cortex-A53 vanilla SOC is the first implementation.
Then AppliedMicro had a hardware version to demo.
Apple simply had the first to market consumer product with one. Them buying up a good chunk of the fab space needed to produce them at the time probably contributed to that.