In the case of Apple AVD, it's a multi-stage system with a bunch of special primitives, orchestrated by a Cortex-M3 with firmware. Codec-specific frontends emit IR which a less specialized backend can execute.
This really heavily depends on the device, though. There are all sorts of "hardware" video decoders ranging from fairly generic vector coprocessors running firmware to "pure" HDL/VLSI level implementations. Usually on more modern or advanced hardware you'll see more and more become more general purpose, since a lot of the later stages can be shared across codecs, saving area vs. a pure hardware implementation.