Perhaps instead of posting erroneous assertions to HN you could wander over to your LLM of choice and ask it something along the lines of: What are some examples of edge AI applications that achieve good performance on a CPU where memory bandwidth is severely limited compared to a GPU? Please link to publicly available models where possible.
I run AI applications all the time in exactly those situations. The models range from 2GB (vector models) 30GB (small LLMs) to 100GB (medium LLMs).
None of those fit in 4MB of cache (the per-core on this part), or 1GB (the aggregate cache).
What AI models are you actually talking about? Do you mean old-school ML stuff, like decision trees or high dimensional indexes? No one I know calls those "AI", which is generally reserved for big-ish neural networks.
"Exactly those situations" you say while describing an entirely different sort of model. Your first clue that you're missing knowledge should have been the part where the thing that the well financed experts were doing didn't make sense to you. Your second clue should have been the part where what I was saying didn't seem to match up with your experience.
I let you know that your were uninformed and even suggested a very low effort way that you might look into the matter. So why didn't you do that?
A couple fairly arbitrary examples. A high performance zero shot TTS model can weigh in at well under 150 MiB. You can solve MNIST (ie perform OCR of handwritten english) to better than 99% accuracy with a sub-100 KiB model. Your LLM of choice will be able to provide you with plenty of others.
So, I wonder if this is going to be any faster than the previous generation for edge AI.