This is a great example of how we (collectively) are using virtual worlds as "dreams" for Deep learning.
In the Deep Vision world we as a group are trying to segment, classify and reinforce our NN training on labeled real world data. The challenge is, it's a very manual process to label data - specifically images. The more that we can do inside the computer, for example automatically labeling pixels inside an image, without having to acquire and label real world data (or making it easier to do with real world data) the easier and faster training becomes that can be applied to real world use cases.
The trick is making the virtual world match the real world as closely as possible so that the nets we make are accurate representations of real world scenarios.
In order to transfer learnings from virtual worlds to the real world more efficiently, I imagine a module that translates real world footed to an abstract representation that is the same as for rendered footage.
In other words, you would train a NN on input from a footage of a walkthrough of a real building to output rendered footage from the same path (3D data from architects is already there). The middle layer of this quasi autoencoder then is the basis to train fully simulated tasks, e.g. autonomous vehicles. In a way, it would be similar to colorization of b/w footage. Would that scale training data?
Yea, I mean that is basically what we do with our home furnishings app - except we have to SFM to build the models.
The challenge is labeling - or autolabeling pixels.
One thing we are trying to work out is how do you label and build nets on volumes - rather than just pixels? I'm thinking it's going to be a magnitude harder.
I was surprised, since I didn't think it would be realistic enough for this purpose, but Grand Theft Auto V is being used in this manner to train self-driving cars.
In the Deep Vision world we as a group are trying to segment, classify and reinforce our NN training on labeled real world data. The challenge is, it's a very manual process to label data - specifically images. The more that we can do inside the computer, for example automatically labeling pixels inside an image, without having to acquire and label real world data (or making it easier to do with real world data) the easier and faster training becomes that can be applied to real world use cases.
The trick is making the virtual world match the real world as closely as possible so that the nets we make are accurate representations of real world scenarios.