Sometimes it's a DSP or FPGA with blocks/instructions designed to implement common matrix transformations.
In this particular case a vendor on that site claims:
> 3. There are 5.9MB SRAM can be used for convolutional neural network acceleration, so, it is possible to run small model like tiny-yolo v2,MobileNet, as you see in face detection routine video.
In this particular case a vendor on that site claims:
> 3. There are 5.9MB SRAM can be used for convolutional neural network acceleration, so, it is possible to run small model like tiny-yolo v2,MobileNet, as you see in face detection routine video.