He is also merging images together, so he'd need to run the raw (unflipped) feed through a video processing program, and then back out (flipped) to his video switcher, and then capture that stream back. The switcher is how he adds video or powerpoint slides to the talk. It's really the video switcher that makes this whole thing a minimal post-production affair. So instead of adding another computer to the mix, he solves the problem with a $25 mirror that isn't very difficult to troubleshoot.
This could all be done in software like Max/MSP very easily, no hardware necessary except for capture cards, assuming the video cameras don't already stream over USB.