Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Open-Source Virtual Background (elder.dev)
527 points by jcastro on April 10, 2020 | hide | past | favorite | 81 comments


I'd love to see something like this as a plugin for OBS. I've been using it lately due to all the video conferencing we're all growing to love since it's got basic color correction/manual controls for my webcam feed.

It's got the option for "real" chromakey, but like the author, I don't have a green screen, Amazon isn't scheduling any deliveries for another month, and I don't feel like a trip to the fabric store would count as "essential" travel (especially if it's only so I can screw around with stupid backgrounds).

Tried a few different sheets/blankets I had at home, but none are a suitable color or uniform/matte enough to work well, even with proper lighting. I admit this is such a non-issue and only something I want to play around with, but it would be fun nonetheless.


I agree, a plugin like this for OBS would be amazing.

I, too, have been playing with the chromakey in OBS, and like you tried a bunch of different sheets and blankets. The one that worked the best I discovered by accident; my wife was pulling out a puzzle, and with it she pulled out a green felt puzzle mat (really just a large rectangle of green felt). It was perfect. But I had to get it from her so I could use it on my meeting before she started her puzzle. After some brief, tense negotiating, I was able to snag it. All I have to do is the dishes for another month!

It works really well, but this would be way better and easier to set up.


I mean...in fairness I could probably figure out how to get hold of a sheet (seriously, of all the ugly colored sheets and blankets they sell at the local Target, there isn't a bright green or even blue??)

Likewise there are loads of cheap ones on Amazon or I could even get some bright paint and do a wall (or maybe dye my current non-chromakey white sheet background). It's just a matter of suspended/delayed deliveries and state orders to avoid unnecessary travel--including shopping for things like home furnishings.

It makes sense. There are plenty of important reasons why I should stay home and Amazon should focus on delivering essentials. I guess it's the definition of being spoiled/privileged that this is what I'm musing about.


Been struggling with this myself the last few days! I don’t NEED to go out or order all of this fancy video conferencing equipment for my weekend virtual happy hour but it’s way more fun being locked inside if I do.

I was fortunate enough that the previous owners left behind a god-awful teal paint that works amazingly as a ‘blue’ chromakey.

For a huge upgrade in camera I’m using the OBS Camera app on iOS which wirelessly beams an NDI stream to my desktop.

For lighting I’ve just played around with various lamps from my house.

The one thing I did order was a USB microphone because I couldn’t stand the thought of wearing a headset or adding latency with wireless earbuds.


My lighting is currently a mix of a clamp-on utility light with parchment paper diffuser (key), a window (fill), and a cheap old desk gooseneck lamp pointed at my sheet (I had used it as cheapo outdoor movie screen and now it acts as a clean background for current VTC needs).

Cam is the basic 1280x720 Microsoft Lifecam USB webcam I've had for years but is still much better than any laptop cam due to placement flexibility and better optics. OBS lets me color correct and exposes the manual controls for focus, exposure, etc.

I did grab a USB headset from work before work-from-home started, but I found it had a frayed wire and isn't any good. Instead I am using my (wired) cell headphones with their integrated mic instead of the USB headset.

I dug out an old USB audio interface I had packed away and tried it with a Shure vocal mic, but that led me down another hole of messing with Voicemeeter to tweak EQ and noise reduction because it's really meant to be held right up to your mouth (for singing, etc) and picks up background noise if I have the levels up high enough to use as a desk/stand mic. For the time being I am sticking with the headphone/mic for simplicity's sake.

None of this is what I'd use if I had been handed a couple hundred bucks to put together a VTC setup but it's all stuff I had laying around and looks/sounds so much better than everyone who's just using built-in laptop cam/mic/speakers.


> Amazon isn't scheduling any deliveries for another month

You can still try. I ordered some ink (which wasn't deemed essential by Amazon sadly, but in my case, it wasn't that essential, it could wait a month) 3 days ago and I just got it today (I saw yesterday that it was updated for tuesday, but got it way earlier, most probably because of easter holidays). They clearly give priority to essential stuff because it was still longer than usual, but they doesn't seems to have a huge backlog of essentials stuff either.


You could try ordering directly from a supplier. Amazon.de gave me 4 weeks delivery time, but a company which sells theatre level fabric (Molton green) was able to send several meters within ~4 days.


Good article, thanks. BTW, this is the method Jitsi-meet uses. They also use BodyPix. https://github.com/jitsi/jitsi-meet/blob/master/react/featur...


Sadly bodypic is very slow. You're only achieving 8-15fps, so any motion is very sluggish. Additionally the detection is on many cases very inaccurate: Too much or too less is detected belonging to the body so you can really only use a blur filter because a virtual background looks very strange if parts of your shoulders/hair or half an ear is missing etc.

But i am really impressed by the detection of Microsoft Teams, i would love having this quality and speed available in a browser.


Thanks for the share @knotty66. Curious if you can discuss the technicals of this a bit? In the case of Jitsi-meet mobile apps, it seems advantageous that BodyPix models are mobile-first, but did that make sense for the desktop version as well?

I've always been disappointed that BodyPix and some similar models are mobile-first and mobile-only on TensorFlow (https://blog.tensorflow.org/2019/11/updated-bodypix-2.html) -- are these models just not used much on server-side settings? There seems to be very little documentation on doing this server side.


Where is the virtual background option in Jitsi? I couldn't find it.


On Jitsi Meet, in the browser, there is a menu on the bottom right. Click it and there are loads of options there including blur background.


I'm confused. Parent was asking about virtual background. I've checked with meet.jit.si using Chrome and there is no virtual background option in the menu.


It seems that Jitsi uses the tensorflow NN to detect the background, but only for the purpose of blurring it, not replacing it with a virtual background.


It's probably an easier solution to create a processed video device


Thanks for sharing -- With very similar settings too! (Not far from the defaults unsurprisingly).

Injecting this into a web client seems like the sweet spot effort wise.


thank you for the link! Do you know if browser extensions can access the webcam too?


Found something similar yesterday, which is basically deepfake for avatars - https://github.com/alievk/avatarify


Nice. How could I get this to run on Windows?


I'm guessing you can use something like [1] to create a virtual device on Windows

[1] https://webcamoid.github.io


I've found the VirtualCam plugin for OBS works better, but YMMV


Since it looks like your webcam is mounted to a stationary PC and you're only really writing this for yourself, wouldn't it be a lot easier to just subtract out a static picture of your background from the feed?


This breaks down in daylight (brightness shifts due to sun moving/clouds) or whenever anything shifts ever so slightly. Even shadows may make things appear where they shouldn't.

Chroma keying is the stable version of this idea: the single background color of a separately and well lit background removes these issues.


This

I did actually tinker with this and other approaches a little.

Also the things in the background get moved around pretty frequently, e.g. what is on the couch. I don't have a dedicated office room



I'm in a 1 bedroom apartment with my fiance who is also work from home now, there isn't really space to put a physical green screen behind me.

If you have the space, and good lighting it's a much simpler approach.

It's also less fun though, and I could do this with what I had on hand pretty quickly.


This is exactly how it worked in Photo Booth on OS X over 10 years ago. It was good enough for short clips, but not reliable for long sessions.

Random example clip: https://youtu.be/T5uqB1Kqukw


I'm working on something very similar right now, so it's great to see that this is actually possible with currently available open-source resources. I'm trying to re-create Skype's background blur feature, but that I can use in OBS (obsproject.com) so I can apply it just to my webcam without messing up the other stream elements for online lectures. I've been using the the DeepLab model from tensorflow research repo to do segmentation, but BodyPix seems even better for the job. If it performs any better, this might be the break I was looking for...


I am eagerly anticipating this. I use OBS and chromakey to have some fun on my MS Teams meetings, but it's a hassle to set up the green screen every time I want to be a bit cheeky.

Do you have a github/gitlab/something else link?


I set up a GitLab repo, but I haven't pushed any code yet. I think you can set it up to notify you when there's a release, which I will try to get out in the following week. https://gitlab.com/franga2000/obs-background-blur-filter


This is pretty awesome work, but I just wanted to point out that Zoom doesn't actually require a green screen. If you uncheck the "I have a green screen" button, choosing your own virtual background still looks really good, although of course you're not going to get any crazy effects like this script adds.


Depends on your hardware. I have a Win machine, but need a green screen because I don't have the horsepower.


And your software. I have a dual-booting machine, the green screen is required in Linux but not in Windows.


Ah ok, didn't know that.


The article links to [1] and comments on this near the beginning -- as others have mentioned this is not always the case. The Linux client does not offer this functionality, only green screen.

I'm also using this with other apps for fun though (duo, hangouts, etc.)

[1]: https://support.zoom.us/hc/en-us/articles/210707503-Virtual-...


that's true for the Mac app, but the Linux client doesn't support the non-green-screen option (according to the post)


Using video loopback opens up some great creative possibilities for fun with video conferencing.

There's one thing I'd love to achieve though, which seems not possible on Linux desktop (specifically kubuntu)....

I want to be able to use the loopback as the source for screen share instead of webcam; i.e. to use the loopback as the conference presentation.

Has anyone got any ideas how to achieve this? Given most conference solutions on Linux do not seem to support either 'share this window' or 'share this screen region'. It seems to be the whole desktop or nothing.


Has anyone got any ideas how to achieve this?

Yes. Thanks for giving me a reason to write this up.

1. Download and install OBS. OBS will be your video processor; among other things it's super easy to make it capture the whole screen or individual windows.

2. Install the v4l2 loopback kernel module[1]. This makes it possible to have a virtual webcam. On Ubuntu 19.10, this was as easy as apt install v4l2loopback-dkms and then modprobe v4l2loopback.

3. Install the OBS plugin obs-v4l2sink[5]. This exports the OBS output to the new virtual webcam device. I just installed the deb file provided by the project[2]. In OBS, under Tools, select v4l2sink and Start.

That's all I had to do. Surprisingly straightforward. At least Chrome and Firefox[3] will now pick up a "Dummy Video Device" webcam that streams the window, or whatever scene I set up in OBS.

In my case, the primary advantage was that this virtual webcam is streamed in Jitsi Meet at a higher quality/framerate than the regular desktop share feature. It's also much lower latency than both Twitch and Youtube Live streaming (Jitsi Meet/WebRTC: <1s, Twitch: 5s, Youtube: 15s[4]; YMMV).

You also get to enjoy the rich feature set OBS provides for Twitch streams; for one thing, you can include the real webcam video.

Bonus: Desktop audio "just worked" in Firefox, which offers the pulseaudio monitor (loopback) device as an input. Chrome doesn't -- probably the intended behaviour. I'm sure there's a workaround.

[1] https://github.com/umlaeute/v4l2loopback

[2] https://github.com/CatxFish/obs-v4l2sink/releases

[3] For some reason, Gnome's Cheese won't

[4] Microsoft's Mixer allegedly has super-low-latency streaming (FTL protocol), but new account are cleared manually and I haven't had the chance to try

[5] For Windows, you can use OBS Virtualcam https://obsproject.com/forum/resources/obs-virtualcam.539/


Thanks for the info, but I already have all of this part working.

The problem is that v4l2loopback only provides a virtual _webcam_ (video source), not a virtual _screen_ - the two are different and are handled differently both by browsers (webrtc) and desktop conference apps (Slack, Teams, etc).

I guess the other issue is that conference apps treat webcam and screen capture differently; usually if someone is sharing a screen, then that feed takes over the full view for all participants so they you can actually read the content.

I don't want my screen recording only to show up in my _webcam_ view, which is usually just a tiny thumbnail.


OBS will handle the screen grabbing. At least on Ubuntu 18.04 I can select individual windows or even the whole screen.


The parent is pointing out functionality that only pops up for explicit screensharing -- changing your camera to a screen grab won't trigger these.

In particular, one I've found very useful with Zoom is being able to zoom in to a small region and scroll around. I also suspect Zoom prioritized resolution (for content clarity) over frame rate for screen sharing, which probably doesn't apply when it's just a "webcam" in the eyes of the client. I'm guessing your window capture would get decimated in terms of quality.


The most naive solution would be to run a nested framebuffer with just the meeting and the windows you would like to share.


You mention the ~10FPS performance.

It seems like moving all that data backwards and forwards between Python and Node might be a bottleneck, no?


I need to profile it more closely, I actually don't remember the exact FPS etc. but I don't expect that to be the limiting factor.

The inference / ml is expensive (which I did profile initially...), and I suspect not really optimized on this backend. It appears to be faster with webGL in the browser.

I sorta stopped worrying about it once it was "good enough" to show up to a few meetings with, but with all the attention I'll probably take another look.

It does look like someone ported bodypix to python, I'll probably try that next.

https://github.com/ajaichemmanam/simple_bodypix_python


I had success by replacing the get_mask function with:

    from keras.models import load_model


    model = load_model('models/transpose_seg/deconv_bnoptimized_munet.h5', compile=False)


    def get_mask(frame):
        # Preprocess
        frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

        simg = cv2.resize(frame, (128, 128), interpolation=cv2.INTER_AREA)
        simg = simg.reshape((1, 128, 128, 3)) / 255.0

        # Predict
        out = model.predict(simg)

        # Postprocess
        msk = out.reshape((128, 128, 1))
        mask = cv2.resize(msk, (frame.shape[1], frame.shape[0]))

        return mask

The model file I got from: https://github.com/anilsathyan7/Portrait-Segmentation


Thanks! That helped me in my attempt at this: https://github.com/wasmitnetzen/dynjandi


I guess I must be completely miscalibrated wrt. performance of newer technologies, because I'd imagine it's the opposite. In particular, I'd be surprised to get a Python+Node loop passing large amounts of data around like that to run 30+ FPS, unless everything Python-side is carefully written to do everything on C side. At the same time, I'd assume the inference/ML part is the fastest one, because, as far as I understand how NNs work, they're supposed to be blazingly fast once trained (it's just lots of parallelizable linear algebra). Is the inference part in your solution doing anything more complicated than that in real-time?


A modern laptop will run Bodypix at about 30 fps. There could be additional bottlenecks but the deep (and wide) NNs are usually not super fast, they're just fast for the wondrous things they do.

You can usually alter performance (with Bodypix that's an accuracy/speed tradeoff) or do something silly like downscale, run, and upscale the mask. I'd like to try this.


BodyPix does downsample before masking OOTB, the article is doing 'medium' (50%) (though for this script we ought to move that over to the python side), it's still not 30fps though without egregiously sacrificing quality, at least on my (fairly powerful) machine unless I've missed something.

Amusingly I did some hacking on this and the current bottleneck is actually reading from the webcam which is capped at <10fps without doing anything else. Switching the capture to MJPG helps.


Okay now please use your skills around AI and Virtual Webcams to Create a script that just generates a Picture of me that nods at the right moments during a zoom call ;)


Really good write up of a flexable approch.

If you just want an easy greenscreen https://obsproject.com/ has a very good chromakey filter and a V4L2loopback plugin https://github.com/CatxFish/obs-v4l2sink


I love the added hologram effect! That is a very creative addition.


Pytorch has Deeplab available on their hub (I'm sure TF has something similar).

It's a couple of lines to use: https://pytorch.org/hub/pytorch_vision_deeplabv3_resnet101/


Fascinating write-up Ben, who would have known that you were a genius with image processing as well as running containers :-) Love the gory details and I didn't know about pyfakewebcam either.

Do you have a live video recorded showing how quickly it can process a stream?


The demo at the end of the page is a video (webm), but there's not a ton of motion to reference besides the blinking.

IIRC it's something like 10FPS currently which is sufficient enough for meetings so far (about 1/3 what you might get with sufficient bandwidth in most video conference tools).

There's definitely room to improve it.


Amusingly the current bottleneck is actually reading from the webcam with the suboptimal ~default capture config. Without doing anything else that's ≤ 10fps. Low hanging fruit still :-)


Great read! I am curious then why the Linux client doesn't supports this, if all it takes is to send out our webcam stream data to be processed server-side?

P.S. What happens when they do e2ee on the webcam stream?


Why use expensive server side processing when you can do it for free on the client? Doing it on the client could theoretically save bandwidth, if the background was a changing scene.

Obviously Zoom isn't end to end encrypted, it's client-server encrypted.


I think server side processing is really not a good idea for this. Zoom is really seeing a lot of use right now. It would not be sustainable for them to not take advantage of all the computing power of the clients.


I am thinking the same now. The OP solution requires a server to do the the processing, but I am not sure this is the case for the Zoom client.

The Zoom client also requires minimal hardware requirements from the processor iirc.


Brilliant!

Now, I know that there are many companies that force people to be on with a live video feed, and that many don't really like it.

How about recording a 3-min clip and playing that in an infinite loop - creating a fake feed (remember Keanu Reeves' Speed?) - so that people can avoid not being seen, but still get things done better? A mask on the face is a simple addition to avoid detection. As the saying goes, modern problems require modern solutions!


You could actually pipe in a video clip with just ffmpeg and v4l2loopback pretty easily without writing any code I think :-)

The virtualvideo readme has an example of looping a single frame. https://github.com/Flashs/virtualvideo#errorhandling


Is there something Python can't do? Can't wait for a React/Vue like framework for Python.


I've always wondered why Skulpt didn't get more traction: https://skulpt.org/


you mean for developing desktop applications?

because for a scripting/backend focused language, react/vue is not even something you want to aspire towards.


It can't run in the browser which makes this a non-starter.

It also can't make efficient use of resources, which explains it's lack of prevalence in embedded/back-end apps


Thanks for sharing! I have been working on the exact same project with Tensorflow & BodyPix, really helpful to compare notes & see the pyfakewebcam approach!


:-) Great minds think alike!

I'm iterating on it again tonight, and tentatively virtualvideo [1][2] gives better results by just piping frames into ffmpeg vs pyfakewebcam

[1]: https://github.com/Flashs/virtualvideo/ [2]: https://pypi.org/project/virtualvideo/

I'll have to update the post or do a follow-up at some point.


Very interesting,

The magic is pyfakewebcam and v4l2loopback, I was looking foe a way to turn myself into a potato on Teams. The bit I was missing was how to create a virtual webcam.


Awesome stuff Ben! Could this be deployed via kind by passing the necessary mounts into a node, then mounting those mounts via hostPath in the pods?


Pretty cool write up of how this can be replicated with opencv and out of the box libraries like bodypix. I imagine Zoom is using something like this too.


Did anyone find a good loopback video driver ("virtual webcam") that works on recent macOS?


I haven't tried yet but there's some work related to this going on in OBS https://github.com/obsproject/rfcs/pull/15#issuecomment-6054...


https://webcamoid.github.io/ has one that "works" for about 30 frames at a time for me. But maybe you'll have more luck?


I really like it. But why not keep Tensorflow in python? What is the reason for using node?


FTA: "BodyPix is currently only available in TensorFlow.js form, so the easiest way to use it is from the body-pix-node library."


Beautiful hack. What about accessories à la Instagram or Facebook?


Pretty insane to use Docker & web-requests for high frame rate video chat with something which runs just fine even on CPU.


Docker made it easier to package up the dependencies (especially aligning cuda etc.), and containers are my dayjob :-)

The web requests are just an easy mode of IPC to pass around some bags of bytes, "high frame rate" is at most 30 qps ... that part isn't really interesting performance wise and this isn't a production tool :-)

I'm not sure I'd be so confident about tensorflow.js being so fast on the CPU ... you can see a marked difference in the backends httpss://www.tensorflow.org/js/guide/platform_environment


So this is super cool, and I would love to use it, but sadly, in your write-up you mention _requires_ nVidia hardware, which sadly, I don't have :(

Any follow up on making this more generic, e.g. with AMD/Intel setups?


Really cool.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: