I don't think people are running 1B+ models on the Neural Engine these days. The high-performance models I've seen all rely on Metal Performance Shaders, which scales with how powerful your GPU is. It's not terribly slow on iPhone, but I think some people get the wrong idea and correlate an ambient processor like the Neural Engine with LLMs.
The bigger bottleneck seems like memory, to me. iPhones have traditionally skimped on RAM moreso than even cheap and midrange Android counterparts. I can imagine running an LLM in the background on my S10 - it's a bit harder to envision iOS swapping everything smoothly on a similarly-aged iPhone.
The bigger bottleneck seems like memory, to me. iPhones have traditionally skimped on RAM moreso than even cheap and midrange Android counterparts. I can imagine running an LLM in the background on my S10 - it's a bit harder to envision iOS swapping everything smoothly on a similarly-aged iPhone.