3D Video Capture with Three Kinects

tlarkworthy · on May 13, 2014

Apparently you can shake a Kinect to make its field of view overlap with other kinects. More kinetics would make this already mind-blowing demo even more mind blowing, and allow more people in the room (and perhaps open up cross calibration)

http://www.precisionmicrodrives.com/tech-blog/2012/08/28/usi...

teraflop · on May 13, 2014

Now that is what I call a clever hack.

solistice · on May 14, 2014

Thats for the generation 1 Kinects though, which project a dot pattern and calculate distances from the dot offset. Since the camera and projector are in the same housing, they will shake synchronously, and only the dot pattern that is projected by that kinect will show up clearly on it's camera image.

As far as i know the new kinects use Time of Flight technology though (which is the reason they bought PrimeSense), which sends out short pulses of light, and times when they arrive back. Since there is no dot pattern to blur, the shaking technique won't work to my knowledge.

sigterm · on May 14, 2014

>which is the reason they bought PrimeSense

Nope. MS licensed first gen Kinect technology from PrimeSense. Apple is the company acquired it.

The ToF Kinect was developed fully in house. Here's the published paper on the depth sensor in ISSCC2013:

[A 512×424 CMOS 3D Time-of-Flight image sensor with multi-frequency photo-demodulation up to 130MHz and 2GS/s ADC]

http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=675737...

solistice · on May 14, 2014

Oops, thanks for correcting me on that point. I only remembered that there was some kind of Kinect related aquisition (i think the was an article a couple of months back), but it was late at night yesterday, and I was way to tired to look up the article.

wildpeaks · on May 14, 2014

Now that's a sweet hack indeed, thanks for the link :)

idm · on May 14, 2014

I'm struck by the vignette involving the table leg, in which Oliver describes it as feeling "unnatural" to pass his kinect-sensed leg through the virtual table leg, even though there was no obstruction in "real life." I believe this is a demonstration of fully-convincing immersion, and Oliver is sure to point out its implications for the uncanny valley. This is so exciting from the perspective of social/cognitive science, too. I'm floored.

wildpeaks · on May 14, 2014

Looks good, kudos :) Although it painfully reminds me that we did something similar some years ago[1]; it even was a fight to convince the project's official contractor (aka my client, although after 8 years on-site fulltime, I guess I'm more the second longest employee by now, lol) that we can do such a proof of concept, and furthermore to use Kinect to make it given that was started back when only the OpenNI driver without audio was available, no official SDK from Microsoft.

Sadly we never got the funding to go further to do multi-cameras and I had to move on to other urgent things, so I'm glad to see others might get to solve it: imho there are many applications, even simple things like making better video conferencing using 3d capture viewed in the oculus :)

One suggestion however: the "fat points" pointcloud rendering of potree[2] might improve the appearance of the generated model instead of using meshes, could be worth a try.

-----------

[1] http://ivn.net/demo.html (you can skip the cheesy first minute of the video)

[2] http://potree.org

daenz · on May 13, 2014

Too bad they don't make Kinects with different IR wavelengths. It would solve a lot of the problems with colliding data, and allow you to use more Kinects. I don't imagine it is simple though, because if I understand how diffraction gratings work (that's what produces the IR dot pattern), they're designed to work for specific wavelengths. If you send another wavelength through it, you're not going to get the original pattern. And since the depth-sensing algorithm is hard-coded in the hardware, you wouldn't be able to use this new pattern to detect depth.

devindotcom · on May 13, 2014

The new ones shouldn't have this problem since the time-of-flight method isn't as vulnerable to interference... I think.

rasz_pl · on May 13, 2014

it is, it sends modulated IR at you, so two kinect 2s will interfere

devindotcom · on May 13, 2014

It's very directional though, isn't it? I'd think there won't be nearly as much scatter as with dot tracking like the kinect 1.

DEinspanjer · on May 13, 2014

There really is something amazing about this setup that neatly bypasses the uncanny valley.

Toward the last quarter of the film, at one point he clips through the table, and it was shocking to see, but even after seeing it, when he moved back out, it still felt like a "person" more than a "CGI Ghost".

moskie · on May 13, 2014

Right. An excellent way to avoid the uncanny valley is to not even attempt to cross it.

devindotcom · on May 13, 2014

This is cool, but I have to say I'm basically holding off on getting excited about Kinect stuff until the Kinect 2 gets out there to this same researchers and hackers. They are going to have a goddamn field day. The Kinect 2 is a straight up future toy. It's going to make these fabulous Kinect experiments look like 64K scene demos. I can't wait.

lycanthus · on May 13, 2014

Except for the fact that Kinect2 4 Windows only supports Windows 8...I don't know of any good hackers or developers who use Windows 8 yet.

nightski · on May 13, 2014

Good hackers or developers use whatever tools they need to get the job done. There are plenty using Windows 8.

privong · on May 13, 2014

My recollection is that part of the uptake of the Kinect for this sort of hacking corresponded with the developement of an open source driver [0]. I imagine something similar will happen for the Kinect2, rendering the Win8 support issue moot.

[0] https://en.wikipedia.org/wiki/Kinect#Open_source_drivers

kjell · on May 14, 2014

Yep!

https://twitter.com/theowatson/status/444573423088320512 https://twitter.com/theowatson/status/444578281200058368

akiselev · on May 14, 2014

Good hackers and developers usually have virtual machines.

kayoone · on May 14, 2014

most graphics programmers are on windows and i guess it won't be long before almost everybody is on some kind of win 8.x...8.1 is quite decent after all

rpwverheij · on May 13, 2014

they do exist though

rasz_pl · on May 13, 2014

One way of improving original Kinect would be swappinch visible light camera module for something that does fullHD, there should be plenty of space inside kinect to do that mod.

There is enough 3D data in kinect stream, but 640x480 video is just pathetic.

DanielBMarkham · on May 13, 2014

So I'm thinking 5 Kinect-IIs, a common wooden table, some hardware, and you could have a team room/meeting room in virtual 3-D with folks from all over the world?

Once you made something like this, then you'd start writing apps for it -- I would imagine you'd start off with virtual "pictures" for the walls that could have a web browser, spreadsheet, etc. built in. Then you could work up to truly interactive 3-D tools, but I'm not sure users could easily grasp moving to holographic toolsets right off the bat. It's an interesting marketing question.

nikhizzle · on May 13, 2014

I've been wondering whether voxels or light fields will win the 3d video war. This is the first cheap voxel capture I've seen working well.

Voxels are nice because they are well understood by most 3d developers, and have the same spatial resolution characteristics as we are used to on 2D formats.

Light fields on the other hand have easier capture going for them, don't change transmission formats (a light field can be transmitted in a 2d video or image), and don't suffer from interference problems.

I'm excited to see what happens.

frik · on May 13, 2014

A bit offtopic:

Does someone know a DIY Lidar project or a cheap Lidar?

(Lidar are usually very expensive, e.g. the Lidar that Google uses for its autonomous cars cost 78k dollar.)

solistice · on May 14, 2014

I guess that might float your boat? http://hackaday.com/2014/01/23/lidar-with-leds-for-under-100...

ilaksh · on May 14, 2014

Someone said kinect 2 uses time of flight. Don't know if its light or lasers or sound but I think maybe some kind of light.

dandelany · on May 14, 2014

Try looking up the Neato XM-11 - it's a vacuum cleaner that uses Lidar and can be had for ~$400

skizm · on May 14, 2014

Alright Google, listen up. I think it is time to attach a bunch of kinects or kinect like devices to drones and have them 3D map the country or at least major cities. Then let me attach an VR headset that isn't the Occulus Rift to my computer and take virtual tours of cities and navigate with a game controller. Call it "World View" or "Street View++".

awjr · on May 14, 2014

This may just be my thought pattern, but I could see this being rather useful to the porn/webcam industry.

pilom · on May 13, 2014

This needs multiplayer so badly. Even locally so you can stand next to someone and interact with them.

corysama · on May 13, 2014

Check out his earlier vid on collaborative visualization:

http://www.youtube.com/watch?v=B4g9J-aSF-c

Lots of interesting UX ideas in there.

BHSPitMonkey · on May 13, 2014

I wonder how much bandwidth is needed for even one user broadcasting this setup across the Internet.

TheCoreh · on May 14, 2014

Surprisingly, not a lot more than 1080p video!

There's no reason the color and depth data produced by the Kinects can't be encoded and compressed in a "regular" video stream.

A very naive approach (where you simply stitch the images side by side in a larger video stream) would require you to transmit 6 separate 640x480 video streams.

640 x 480 = 307,200 pixels

A 1080p video stream, that can now be easily broadcasted to any decent household internet connection, has 1920 x 1080 pixels.

1920 x 1080 = 2,073,600

2,073,600 / 307,200 = 6.75

So the 6 separate video streams would fit just right into it!

I'm not taking into account some factors, like:

* The effect of existing video compression algorithms on depth maps (might cause some severe artifacts, since they're tuned for color vision perception)

* The fact that the depth map has a single color channel, and can probably be represented more efficiently than a full color RGB 640x480 image.

* Framerate (60 fps is probably needed for a more immersive feel)

But I do think it's very feasible to stream this type of 3D video in real time with current Internet speeds.

dclowd9901 · on May 14, 2014

Wouldn't it be more similar to the kinds of streams used in 3D online multiplayer games?

nitrogen · on May 14, 2014

Not in this case, because the raw scanner data of the person has to be compressed and transmitted, while 3D multiplayer games only have to transmit the positions and actions of models that all the computers already have.

everyone · on May 13, 2014

This blew my mind! amazing!

gfodor · on May 13, 2014

glad to see the matrix is coming online smoothly

ajuc · on May 14, 2014

More like cyberspace as cyberpunk predicted. Now we only need cybernetic limbs and 80s fashion to return.