I’m imagining that by having conditional computation you’re basically partitioning the state space up, with each part of the space getting its own smaller network. this might allow you the generality of a larger network, but with the ability to hone in and analyze the smaller networks for certain parts of your state space.