Huh - I do ML and a lot of bit twiddling with trained feature detector outputs - intake goes into a stage, feature detectors light up, and feature combos get fed to subsequent input via lots of bit banging - can't afford for it to be slow in the middle of that pipeline!