I'd love to see something like this as a plugin for OBS. I've been using it lately due to all the video conferencing we're all growing to love since it's got basic color correction/manual controls for my webcam feed.
It's got the option for "real" chromakey, but like the author, I don't have a green screen, Amazon isn't scheduling any deliveries for another month, and I don't feel like a trip to the fabric store would count as "essential" travel (especially if it's only so I can screw around with stupid backgrounds).
Tried a few different sheets/blankets I had at home, but none are a suitable color or uniform/matte enough to work well, even with proper lighting. I admit this is such a non-issue and only something I want to play around with, but it would be fun nonetheless.
I agree, a plugin like this for OBS would be amazing.
I, too, have been playing with the chromakey in OBS, and like you tried a bunch of different sheets and blankets. The one that worked the best I discovered by accident; my wife was pulling out a puzzle, and with it she pulled out a green felt puzzle mat (really just a large rectangle of green felt). It was perfect. But I had to get it from her so I could use it on my meeting before she started her puzzle. After some brief, tense negotiating, I was able to snag it. All I have to do is the dishes for another month!
It works really well, but this would be way better and easier to set up.
I mean...in fairness I could probably figure out how to get hold of a sheet (seriously, of all the ugly colored sheets and blankets they sell at the local Target, there isn't a bright green or even blue??)
Likewise there are loads of cheap ones on Amazon or I could even get some bright paint and do a wall (or maybe dye my current non-chromakey white sheet background). It's just a matter of suspended/delayed deliveries and state orders to avoid unnecessary travel--including shopping for things like home furnishings.
It makes sense. There are plenty of important reasons why I should stay home and Amazon should focus on delivering essentials. I guess it's the definition of being spoiled/privileged that this is what I'm musing about.
Been struggling with this myself the last few days! I don’t NEED to go out or order all of this fancy video conferencing equipment for my weekend virtual happy hour but it’s way more fun being locked inside if I do.
I was fortunate enough that the previous owners left behind a god-awful teal paint that works amazingly as a ‘blue’ chromakey.
For a huge upgrade in camera I’m using the OBS Camera app on iOS which wirelessly beams an NDI stream to my desktop.
For lighting I’ve just played around with various lamps from my house.
The one thing I did order was a USB microphone because I couldn’t stand the thought of wearing a headset or adding latency with wireless earbuds.
My lighting is currently a mix of a clamp-on utility light with parchment paper diffuser (key), a window (fill), and a cheap old desk gooseneck lamp pointed at my sheet (I had used it as cheapo outdoor movie screen and now it acts as a clean background for current VTC needs).
Cam is the basic 1280x720 Microsoft Lifecam USB webcam I've had for years but is still much better than any laptop cam due to placement flexibility and better optics. OBS lets me color correct and exposes the manual controls for focus, exposure, etc.
I did grab a USB headset from work before work-from-home started, but I found it had a frayed wire and isn't any good. Instead I am using my (wired) cell headphones with their integrated mic instead of the USB headset.
I dug out an old USB audio interface I had packed away and tried it with a Shure vocal mic, but that led me down another hole of messing with Voicemeeter to tweak EQ and noise reduction because it's really meant to be held right up to your mouth (for singing, etc) and picks up background noise if I have the levels up high enough to use as a desk/stand mic. For the time being I am sticking with the headphone/mic for simplicity's sake.
None of this is what I'd use if I had been handed a couple hundred bucks to put together a VTC setup but it's all stuff I had laying around and looks/sounds so much better than everyone who's just using built-in laptop cam/mic/speakers.
> Amazon isn't scheduling any deliveries for another month
You can still try. I ordered some ink (which wasn't deemed essential by Amazon sadly, but in my case, it wasn't that essential, it could wait a month) 3 days ago and I just got it today (I saw yesterday that it was updated for tuesday, but got it way earlier, most probably because of easter holidays). They clearly give priority to essential stuff because it was still longer than usual, but they doesn't seems to have a huge backlog of essentials stuff either.
You could try ordering directly from a supplier. Amazon.de gave me 4 weeks delivery time, but a company which sells theatre level fabric (Molton green) was able to send several meters within ~4 days.
Sadly bodypic is very slow. You're only achieving 8-15fps, so any motion is very sluggish. Additionally the detection is on many cases very inaccurate: Too much or too less is detected belonging to the body so you can really only use a blur filter because a virtual background looks very strange if parts of your shoulders/hair or half an ear is missing etc.
But i am really impressed by the detection of Microsoft Teams, i would love having this quality and speed available in a browser.
Thanks for the share @knotty66. Curious if you can discuss the technicals of this a bit? In the case of Jitsi-meet mobile apps, it seems advantageous that BodyPix models are mobile-first, but did that make sense for the desktop version as well?
I've always been disappointed that BodyPix and some similar models are mobile-first and mobile-only on TensorFlow (https://blog.tensorflow.org/2019/11/updated-bodypix-2.html) -- are these models just not used much on server-side settings? There seems to be very little documentation on doing this server side.
I'm confused. Parent was asking about virtual background. I've checked with meet.jit.si using Chrome and there is no virtual background option in the menu.
It seems that Jitsi uses the tensorflow NN to detect the background, but only for the purpose of blurring it, not replacing it with a virtual background.
Since it looks like your webcam is mounted to a stationary PC and you're only really writing this for yourself, wouldn't it be a lot easier to just subtract out a static picture of your background from the feed?
This breaks down in daylight (brightness shifts due to sun moving/clouds) or whenever anything shifts ever so slightly. Even shadows may make things appear where they shouldn't.
Chroma keying is the stable version of this idea: the single background color of a separately and well lit background removes these issues.
I'm working on something very similar right now, so it's great to see that this is actually possible with currently available open-source resources. I'm trying to re-create Skype's background blur feature, but that I can use in OBS (obsproject.com) so I can apply it just to my webcam without messing up the other stream elements for online lectures.
I've been using the the DeepLab model from tensorflow research repo to do segmentation, but BodyPix seems even better for the job. If it performs any better, this might be the break I was looking for...
I am eagerly anticipating this. I use OBS and chromakey to have some fun on my MS Teams meetings, but it's a hassle to set up the green screen every time I want to be a bit cheeky.
I set up a GitLab repo, but I haven't pushed any code yet. I think you can set it up to notify you when there's a release, which I will try to get out in the following week. https://gitlab.com/franga2000/obs-background-blur-filter
This is pretty awesome work, but I just wanted to point out that Zoom doesn't actually require a green screen. If you uncheck the "I have a green screen" button, choosing your own virtual background still looks really good, although of course you're not going to get any crazy effects like this script adds.
The article links to [1] and comments on this near the beginning -- as others have mentioned this is not always the case. The Linux client does not offer this functionality, only green screen.
I'm also using this with other apps for fun though (duo, hangouts, etc.)
Using video loopback opens up some great creative possibilities for fun with video conferencing.
There's one thing I'd love to achieve though, which seems not possible on Linux desktop (specifically kubuntu)....
I want to be able to use the loopback as the source for screen share instead of webcam; i.e. to use the loopback as the conference presentation.
Has anyone got any ideas how to achieve this? Given most conference solutions on Linux do not seem to support either 'share this window' or 'share this screen region'. It seems to be the whole desktop or nothing.
Yes. Thanks for giving me a reason to write this up.
1. Download and install OBS. OBS will be your video processor; among other things it's super easy to make it capture the whole screen or individual windows.
2. Install the v4l2 loopback kernel module[1]. This makes it possible to have a virtual webcam. On Ubuntu 19.10, this was as easy as apt install v4l2loopback-dkms and then modprobe v4l2loopback.
3. Install the OBS plugin obs-v4l2sink[5]. This exports the OBS output to the new virtual webcam device. I just installed the deb file provided by the project[2]. In OBS, under Tools, select v4l2sink and Start.
That's all I had to do. Surprisingly straightforward. At least Chrome and Firefox[3] will now pick up a "Dummy Video Device" webcam that streams the window, or whatever scene I set up in OBS.
In my case, the primary advantage was that this virtual webcam is streamed in Jitsi Meet at a higher quality/framerate than the regular desktop share feature. It's also much lower latency than both Twitch and Youtube Live streaming (Jitsi Meet/WebRTC: <1s, Twitch: 5s, Youtube: 15s[4]; YMMV).
You also get to enjoy the rich feature set OBS provides for Twitch streams; for one thing, you can include the real webcam video.
Bonus: Desktop audio "just worked" in Firefox, which offers the pulseaudio monitor (loopback) device as an input. Chrome doesn't -- probably the intended behaviour. I'm sure there's a workaround.
[4] Microsoft's Mixer allegedly has super-low-latency streaming (FTL protocol), but new account are cleared manually and I haven't had the chance to try
Thanks for the info, but I already have all of this part working.
The problem is that v4l2loopback only provides a virtual _webcam_ (video source), not a virtual _screen_ - the two are different and are handled differently both by browsers (webrtc) and desktop conference apps (Slack, Teams, etc).
I guess the other issue is that conference apps treat webcam and screen capture differently; usually if someone is sharing a screen, then that feed takes over the full view for all participants so they you can actually read the content.
I don't want my screen recording only to show up in my _webcam_ view, which is usually just a tiny thumbnail.
The parent is pointing out functionality that only pops up for explicit screensharing -- changing your camera to a screen grab won't trigger these.
In particular, one I've found very useful with Zoom is being able to zoom in to a small region and scroll around. I also suspect Zoom prioritized resolution (for content clarity) over frame rate for screen sharing, which probably doesn't apply when it's just a "webcam" in the eyes of the client. I'm guessing your window capture would get decimated in terms of quality.
I need to profile it more closely, I actually don't remember the exact FPS etc. but I don't expect that to be the limiting factor.
The inference / ml is expensive (which I did profile initially...), and I suspect not really optimized on this backend. It appears to be faster with webGL in the browser.
I sorta stopped worrying about it once it was "good enough" to show up to a few meetings with, but with all the attention I'll probably take another look.
It does look like someone ported bodypix to python, I'll probably try that next.
I guess I must be completely miscalibrated wrt. performance of newer technologies, because I'd imagine it's the opposite. In particular, I'd be surprised to get a Python+Node loop passing large amounts of data around like that to run 30+ FPS, unless everything Python-side is carefully written to do everything on C side. At the same time, I'd assume the inference/ML part is the fastest one, because, as far as I understand how NNs work, they're supposed to be blazingly fast once trained (it's just lots of parallelizable linear algebra). Is the inference part in your solution doing anything more complicated than that in real-time?
A modern laptop will run Bodypix at about 30 fps. There could be additional bottlenecks but the deep (and wide) NNs are usually not super fast, they're just fast for the wondrous things they do.
You can usually alter performance (with Bodypix that's an accuracy/speed tradeoff) or do something silly like downscale, run, and upscale the mask. I'd like to try this.
BodyPix does downsample before masking OOTB, the article is doing 'medium' (50%) (though for this script we ought to move that over to the python side), it's still not 30fps though without egregiously sacrificing quality, at least on my (fairly powerful) machine unless I've missed something.
Amusingly I did some hacking on this and the current bottleneck is actually reading from the webcam which is capped at <10fps without doing anything else. Switching the capture to MJPG helps.
Okay now please use your skills around AI and Virtual Webcams to Create a script that just generates a Picture of me that nods at the right moments during a zoom call ;)
Fascinating write-up Ben, who would have known that you were a genius with image processing as well as running containers :-) Love the gory details and I didn't know about pyfakewebcam either.
Do you have a live video recorded showing how quickly it can process a stream?
The demo at the end of the page is a video (webm), but there's not a ton of motion to reference besides the blinking.
IIRC it's something like 10FPS currently which is sufficient enough for meetings so far (about 1/3 what you might get with sufficient bandwidth in most video conference tools).
Amusingly the current bottleneck is actually reading from the webcam with the suboptimal ~default capture config. Without doing anything else that's ≤ 10fps. Low hanging fruit still :-)
Great read!
I am curious then why the Linux client doesn't supports this, if all it takes is to send out our webcam stream data to be processed server-side?
P.S.
What happens when they do e2ee on the webcam stream?
Why use expensive server side processing when you can do it for free on the client? Doing it on the client could theoretically save bandwidth, if the background was a changing scene.
Obviously Zoom isn't end to end encrypted, it's client-server encrypted.
I think server side processing is really not a good idea for this. Zoom is really seeing a lot of use right now. It would not be sustainable for them to not take advantage of all the computing power of the clients.
Now, I know that there are many companies that force people to be on with a live video feed, and that many don't really like it.
How about recording a 3-min clip and playing that in an infinite loop - creating a fake feed (remember Keanu Reeves' Speed?) - so that people can avoid not being seen, but still get things done better? A mask on the face is a simple addition to avoid detection. As the saying goes, modern problems require modern solutions!
Thanks for sharing! I have been working on the exact same project with Tensorflow & BodyPix, really helpful to compare notes & see the pyfakewebcam approach!
The magic is pyfakewebcam and v4l2loopback, I was looking foe a way to turn myself into a potato on Teams. The bit I was missing was how to create a virtual webcam.
Pretty cool write up of how this can be replicated with opencv and out of the box libraries like bodypix. I imagine Zoom is using something like this too.
Docker made it easier to package up the dependencies (especially aligning cuda etc.), and containers are my dayjob :-)
The web requests are just an easy mode of IPC to pass around some bags of bytes, "high frame rate" is at most 30 qps ... that part isn't really interesting performance wise and this isn't a production tool :-)
I'm not sure I'd be so confident about tensorflow.js being so fast on the CPU ... you can see a marked difference in the backends httpss://www.tensorflow.org/js/guide/platform_environment
It's got the option for "real" chromakey, but like the author, I don't have a green screen, Amazon isn't scheduling any deliveries for another month, and I don't feel like a trip to the fabric store would count as "essential" travel (especially if it's only so I can screw around with stupid backgrounds).
Tried a few different sheets/blankets I had at home, but none are a suitable color or uniform/matte enough to work well, even with proper lighting. I admit this is such a non-issue and only something I want to play around with, but it would be fun nonetheless.