I need to profile it more closely, I actually don't remember the exact FPS etc. but I don't expect that to be the limiting factor.
The inference / ml is expensive (which I did profile initially...), and I suspect not really optimized on this backend. It appears to be faster with webGL in the browser.
I sorta stopped worrying about it once it was "good enough" to show up to a few meetings with, but with all the attention I'll probably take another look.
It does look like someone ported bodypix to python, I'll probably try that next.
I guess I must be completely miscalibrated wrt. performance of newer technologies, because I'd imagine it's the opposite. In particular, I'd be surprised to get a Python+Node loop passing large amounts of data around like that to run 30+ FPS, unless everything Python-side is carefully written to do everything on C side. At the same time, I'd assume the inference/ML part is the fastest one, because, as far as I understand how NNs work, they're supposed to be blazingly fast once trained (it's just lots of parallelizable linear algebra). Is the inference part in your solution doing anything more complicated than that in real-time?
A modern laptop will run Bodypix at about 30 fps. There could be additional bottlenecks but the deep (and wide) NNs are usually not super fast, they're just fast for the wondrous things they do.
You can usually alter performance (with Bodypix that's an accuracy/speed tradeoff) or do something silly like downscale, run, and upscale the mask. I'd like to try this.
BodyPix does downsample before masking OOTB, the article is doing 'medium' (50%) (though for this script we ought to move that over to the python side), it's still not 30fps though without egregiously sacrificing quality, at least on my (fairly powerful) machine unless I've missed something.
Amusingly I did some hacking on this and the current bottleneck is actually reading from the webcam which is capped at <10fps without doing anything else. Switching the capture to MJPG helps.
It seems like moving all that data backwards and forwards between Python and Node might be a bottleneck, no?