You mention the ~10FPS performance. It seems like moving all that data backwards...

BenTheElder · on April 10, 2020

I need to profile it more closely, I actually don't remember the exact FPS etc. but I don't expect that to be the limiting factor.

The inference / ml is expensive (which I did profile initially...), and I suspect not really optimized on this backend. It appears to be faster with webGL in the browser.

I sorta stopped worrying about it once it was "good enough" to show up to a few meetings with, but with all the attention I'll probably take another look.

It does look like someone ported bodypix to python, I'll probably try that next.

https://github.com/ajaichemmanam/simple_bodypix_python

rogierhofboer · on April 11, 2020

I had success by replacing the get_mask function with:

    from keras.models import load_model


    model = load_model('models/transpose_seg/deconv_bnoptimized_munet.h5', compile=False)


    def get_mask(frame):
        # Preprocess
        frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

        simg = cv2.resize(frame, (128, 128), interpolation=cv2.INTER_AREA)
        simg = simg.reshape((1, 128, 128, 3)) / 255.0

        # Predict
        out = model.predict(simg)

        # Postprocess
        msk = out.reshape((128, 128, 1))
        mask = cv2.resize(msk, (frame.shape[1], frame.shape[0]))

        return mask

The model file I got from: https://github.com/anilsathyan7/Portrait-Segmentation

wasmitnetzen · on April 11, 2020

Thanks! That helped me in my attempt at this: https://github.com/wasmitnetzen/dynjandi

TeMPOraL · on April 10, 2020

I guess I must be completely miscalibrated wrt. performance of newer technologies, because I'd imagine it's the opposite. In particular, I'd be surprised to get a Python+Node loop passing large amounts of data around like that to run 30+ FPS, unless everything Python-side is carefully written to do everything on C side. At the same time, I'd assume the inference/ML part is the fastest one, because, as far as I understand how NNs work, they're supposed to be blazingly fast once trained (it's just lots of parallelizable linear algebra). Is the inference part in your solution doing anything more complicated than that in real-time?

renewiltord · on April 10, 2020

A modern laptop will run Bodypix at about 30 fps. There could be additional bottlenecks but the deep (and wide) NNs are usually not super fast, they're just fast for the wondrous things they do.

You can usually alter performance (with Bodypix that's an accuracy/speed tradeoff) or do something silly like downscale, run, and upscale the mask. I'd like to try this.

BenTheElder · on April 10, 2020

BodyPix does downsample before masking OOTB, the article is doing 'medium' (50%) (though for this script we ought to move that over to the python side), it's still not 30fps though without egregiously sacrificing quality, at least on my (fairly powerful) machine unless I've missed something.

Amusingly I did some hacking on this and the current bottleneck is actually reading from the webcam which is capped at <10fps without doing anything else. Switching the capture to MJPG helps.