Microsoft has poured lots of money into basic research R&D over the years, which did not seem to pay off much in terms of popular products. Apple, Nintendo, Google, etc. seemed to come out with more innovative products, despite spending nowhere close to what Microsoft spends on R&D.
This looks like a big exception to that. Johnny Lee, as a recently minted PhD, certainly knows the state of the art in this field, so if he says it is way beyond what other researchers are doing I believe him.
If they can deliver the kind of sensing, voice and face recognition in the demo, along with higher end graphics than the Wii, they should easily own the next generation of consoles.
This isn't about the next generation of consoles, this is about the next generation of input technology.
Apple won multi-touch, so Microsoft is going to push the conversation ahead to gesture-based ubiquitous computing devices. Once they get the cameras small enough they can just stick these things on Accelerando-style glasses or a wifi-enabled necklace akin to the sixth sense TED demo- that's when we have true ubiquitous computing.
I think the key word there is probably "product"; there's certainly been some great demos that have come out of MSR. A thought that comes to mind is that this could be the ideal product for those demonstrators: the concept has been proven successful by Nintendo, and they can apply their research to take it to the next level.
Holy crap. I hadn't bothered to look at the video about this project yet (I thought only Nintendo cared about the way I want to game, since XBox and PS3 distinctly lack all the stuff that makes gaming fun for me on the Wii). But, that is simply astonishing. I knew the Wii-mote wasn't anywhere near the peak of technology in this direction, but this is dramatically cooler than I expected in the next couple of years.
So, when can I buy it? I'd buy an XBox to play games that are controlled this way.
Maybe I'm paranoid but isn't he basically saying that the demos don't reflect what's possible today in a shipping device i.e. the proverbial "vapourware".
Other comments, here and elsewhere, reflect everyone loving this, yet when I watch the demos I think a) that's not actually possible, and far more importantly b) even if it was possible, it doesn't look like fun.
I was watching the conference live, and I had the same reaction: "There's no way this actually works." Then they brought out a lady who did a live demo with it... I can't find that video right now, but it was very similar to this: http://www.youtube.com/watch?v=L3vWzzoLHrM
Notice how you can clearly see the on-screen body mapping to the person's body. Between that and the fact that I doubt they would make such a big announcement if it didn't already work fairly well, I'm now on the "this probably works" side.
Well, I don't know that it really matters. If Microsoft knows what they're doing, the only way they're going to release this thing is either bundled with a game that's expected to be popular, or as a part of the next Xbox.
Console perhipherals released in other ways have historically not been successful, no matter how good they are.
Given this, I'd bank on them shipping the thing with the next Xbox. That should give them a few years to work the bugs out.
Want to clarify and expand on (a) for us? Impossible is a big word.
As to (b) - The voice recog Buzz looked pretty amazing to me. Some of the mini games - not so much... The face tracking stuff was cool though.
(I think MS might be setting themselves up for a fall though - I agree this comment makes it look like this tech wont work as well as the ad makes it out to be)
This is phenomenal, but devil is in the details in terms of sensitivity. They have to be using a really phenomenal camera and processor to get any real level of precision.
"The CPU, named Xenon at Microsoft and "Waternoose" at IBM, is a custom triple-core PowerPC-based design by IBM. The CPU emphasizes high floating point performance through multiple FPU and SIMD vector processing units in each core. It has a theoretical peak performance of 115.2 gigaflops and is capable of 9.6 billion dot products per second. Each core of the CPU is simultaneous multithreading capable and clocked at 3.2GHz. However, to reduce CPU die size, complexity, cost, and power demands, the processor uses in-order execution in contrast to the Intel Coppermine128-based Pentium III used in Xbox which used more advanced out-of-order execution.... A 21.6 GB/s front side bus, aggregated 10.8 GB/s upstream and downstream, connected Xenon with the graphics processor/northbridge. Xenon was equipped with a 1 MB Level 2 cache on-die running at half CPU clock speed. This cache is shared amongst the three CPU cores."
I don't think the processor will be the problem. The camera could be.
Agreed, the devil in this design will be the camera. From the video it looks like they're using two. My guess will be one high resolution to produce reasonable pictures and live video (as they show) and a lower resolution companion to triangulate from. I'd also guess that the companion sensor will be placed further away from the main sensor in the final project.
Who knows, they could have a secondary sensor that you place on the side of the TV, away from the main sensor module. Distance between sensors is usually the key to reduce processing requirements.
It's their project vision, so who knows what they'll have to do to get the job done.
Where is the second camera? To gleen 3 dimensional data from a single 2 dimensional image is hard, and requires intense algorithms. Two cameras makes it easier but still hard.
I would love nothing more if they would publish this and or open it up to community development.
3DV, the company whose 3D camera technology MS bought a few months ago, has published some papers on their technology. I haven't read them, so I cannot vouch for their depth (no pun intended).
While you're correct, it isn't mandatory to have two cameras for decent depth perception. However it would require a new sensor type.
Humans are capable of sensing depth with only one eye, this is due to the shape of our eye. Camera's work by receiving a 3D image on a 2D sensor. Our eyes work by receiving a 3D image on a 3D sensor. Our focal point is akin to a 2D sensor, but our peripheral vision wraps around a huge portion of our eye, which gives us the ability to triangulate.
While I doubt they'll have come up with a 3D sensor, I wanted to make it a point that depth perception with a single sensor is difficult for a computer presently. It would be entirely possible to design a more complicated camera, to compensate for the algorithms.
From the video, I'd guess they're either using two similar resolution cameras. Or they might be cheaping out. Using a 12MP camera would give you amazing picture resolution, however for 3D recognition, you could probably get away with amazing resolution for a project like this with something less than 1MP as the companion. This is essentially how one of our eyes work, we have incredible resolution in the focal point, but our peripheral vision is rather poor, but our brain superimposes the images scanned (quite literally our eye scans the environment using the focal point) and fakes everything.
I'm sure using a small, cheap companion sensor would provide 3D extremely well for movement recognition.
I was at the event yesterday and they showed a bar with two cameras in the middle, which you can see in the video.
The nice thing about the Xbox 360 and PS3 for these kinds of input devices is that they're multicore, so the budgeting is pretty straightforward if they simply consume one core at all times.
I was cringing when they had the live avatar using the motion capture, though. It was gutsy and impressive, but seriously, guys, probabilistic kinematic constraints. People don't usually have their wrist twisted all the way around when they're standing in place, even if it's physically possible.
I'm very pleased with Microsoft's vision, though. I think it's a great step in the right direction for the industry. I do tend to wonder, though, if there is a fundamental difficulty in an input device that would seem to be inherently more fuzzy than the Wii controller.
i'm not understanding the optimism. i'm sure natal will be another major step forward like the wii was. what it will not be is a quantum leap. the video was clearly ridiculous.
This looks like a big exception to that. Johnny Lee, as a recently minted PhD, certainly knows the state of the art in this field, so if he says it is way beyond what other researchers are doing I believe him.
If they can deliver the kind of sensing, voice and face recognition in the demo, along with higher end graphics than the Wii, they should easily own the next generation of consoles.