It's mainly moving memory around, matrix multiplication, convolution, and applying activation functions (sigmoid, tanh, relu, etc.). Very simple, high-level stuff. This has the handy side-effect of making timing very predictable, which makes the latency a lot more deterministic.