Are the two devices running the same model? The article claims the DSP has higher confidence, but I don't see why that would be the case. I suppose one could work at a higher precision but that wouldn't make sense if they're comparing performance.
I've talked a little bit with some engineers at Qualcomm who worked on projects like this. My impression was that they make a lot of compromises when they optimize a computer vision algorithm for their hardware which slightly alters the result, but can run extremely fast with comparable performance. It's likely they're doing something similar here, which might explain the difference in confidences, but I highly doubt that it objectively classifies images better than the one running on the CPU. If anything the better performance is an illusion just because the model running on the DSP reacts more quickly.