As a follow up to your idea, we should explore two paths: first create the most powerful AI, second create subsystems devised to be interpretable. The powerful method could be used to train the interpretable method, that is we need an interpreter to translate from machine AI to human AI, and interpretable systems provide a middle ground.
I think training one function to approximate another function wouldn't help much; we'd inevitably lose the subtleties of the higher-order function and any insights that come with it. If we could train a decision tree to do what a CNN does and then interpret the outcome, why not use decision trees in the first place?
I think the answer must be in figuring out how to decompose the black box of a CNN - it is, after all, just a set of simple algebraic operations at work, and we should be able to get something out of inspection.
I have to imagine Hinton et al. have done work in this regard, but this is far afield for me, so if it exists I don't know it.
Having a machine that gives you feedback in the middle of the game perhaps could be used to describe what is the weak point of a decision tree, and in which situations the method is good. It could detect some situations in which decision trees are good, then use that decision tree to understand what is happening and with that new understanding devise a new method in the middle. We could train a decision tree using new very powerful information about the value of the game in the middle of the game, that is new and powerful.