+1 that the meta-learning spin on this approach is the really interesting part. The normal approach would be as follows:
"you want to stack 6 blocks on one another? great, let me collect 1,000 examples of doing that in VR, and I'll train my policy on this and see how that works"
instead, we change the question:
"you want to stack 6 blocks on one another? great, that's one possible thing out of thousands you might want to do. so lets create a dataset of 1,000 examples of tuples: one 'query' demonstration, and a second demonstration as the target behavior to train the network on, when it sees the query. The training data is now 1,000 tuples of (query_demo, target_demo)), trained again with supervised learning."
Once this is trained, we can sub in (in theory) any arbitrary desired demonstration, and the network will learn how to "extract" what is intended, and uses the demonstration as a crutch that is being imitated. It's a bit of a change of mindset, but a very powerful one, much more general one, and much more exciting one.
karpathy, I see you are at Stanford for Deep learning and NLP... I'm working on a project for audio/sound classification and have been sniffing around for some folks who may have encountered a similar set of feature points for audio data in deep learning. Would you be open to connecting? If so, let me know an email or other way to contact you and I'll reach out.
For some constructive feedback, this is a really awkward way of getting in touch with someone. Assume they're busy and professional, you're sending them a message to ask them to send you a message to tell you how to send them a message...
If you want to contact someone, check their public profile on their website and see if they've said there's a preferred way (some people want everything to a certain email address, call them directly, never call them, flat out tell you not to contact them or more commonly just say email them and get to the point). Follow whatever they suggest.
Write something simple and clear, and be upfront about what you're asking for. Make it as easy as possible for the person to help you (this applies to both reading and answering the question). I'm far more likely to reply if I can open an email, type a sentence or two, and then move on.
With your message, I don't know if you're just after datasets, help with a particular problem, a mentor, business partner or what. I also don't know what area of audio/sound classification so if I was actually in that area then I'd not know right now if I could help or not (whereas if you'd said human voices, bird chirps, etc. I'd have a better idea).
Essentially, assume most people are pleasant and helpful but also extremely busy.
"you want to stack 6 blocks on one another? great, let me collect 1,000 examples of doing that in VR, and I'll train my policy on this and see how that works"
instead, we change the question:
"you want to stack 6 blocks on one another? great, that's one possible thing out of thousands you might want to do. so lets create a dataset of 1,000 examples of tuples: one 'query' demonstration, and a second demonstration as the target behavior to train the network on, when it sees the query. The training data is now 1,000 tuples of (query_demo, target_demo)), trained again with supervised learning."
Once this is trained, we can sub in (in theory) any arbitrary desired demonstration, and the network will learn how to "extract" what is intended, and uses the demonstration as a crutch that is being imitated. It's a bit of a change of mindset, but a very powerful one, much more general one, and much more exciting one.