Hacker News new | past | comments | ask | show | jobs | submit login

So you don't think some of these details will not be automated away in the near future so that it doesn't require a specialist to do operate a neural network?



Already, it's not nearly as hard as this demo makes it look. There's one recent advance in particular that isn't in this demo, and that is Batch Normalization.

If you've played around with it a bit, I'm sure you have seen that deeper layers are hard to train... You see the dashed lines representing signal in the network become weaker and weaker as the network gets deeper. BatchNorm works wonders with this. It takes statistics from the minibatch of training examples, and tries to normalize it so that the next layer gets input more similar to what it expects, even if the previous layer has changed. In practice you get a much better signal, so the network can learn a lot more efficiently.

Without BatchNorm, more than two hidden layers is tedious and error-prone to train. With it, you can train 10-12 layers easily. (With another recent advance, residual nets, you can train hundreds!)

Such advances pushes the limit for what you can train easily, and what still requires GSD ("graduate student descent", figuring out just the right parameters to get something to work through intuition, trial and error). You still have to watch out for overfitting, but the nice thing about that is that more training data helps.


I think most of those things will remain important:

+ Designing the network architecture is a means to instill your knowledge of the problem into the network. For example, using convolutions over images encodes some translational invariance into the network. It makes up for lack of data. I don't think data augmentation alone is enough, either: if you use a "stupid" architecture with heaps of data, the computation will become too expensive or slow.

- The systems engineering part will probably get automated. I bet there are Amazon engineers crying at their desks while working on AWS Elastic Tensorshift right now. So unless you're specifically interested in that side of things, maybe this isn't the best area to focus on.

+ There are always going to be problems, so knowing how to debug is a useful skill.

+ ML/stats fundamentals aren't going away. You need to know what you're trying to do before you can do it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: