Interesting, I've long wondered why parametric nonlinearities are not used more ...

_ntka · on Feb 9, 2015

Genetic programming is an inefficient search method, and will require many evaluations of the cost function to optimize anything. In the case of DCNNs, evaluating an architecture can keep a modern GPU busy for days, so genetic algorithms are pretty much out of the question.

I think an easy way to improve our models in the short term is to make more of the parameters we use be learnable: the parameters of the non-linearity are a good place to start with, and another would be the parameters of the data augmentation transformations.

One could consider that learned data augmentation schemes implement a form of guided visual attention.