Hacker News new | past | comments | ask | show | jobs | submit login

This sounds important and interesting but isn't wide the key word here?

They talk about shallow NNs and deep fully connected NNs but that would seem to leave out a lot.

I mean, the article puts forward a distinct language/model to expression neural nets in, which is cool but are they talking about all or most of the NNs you see today? If so, huge but still.

Fo-get-a-bout-it, see SiempreViernes' comment: "I couldn't find any mention about a trained NN, this is strictly about the initial state. "(emphasis added)

https://news.ycombinator.com/item?id=21653516




Hi, the author here. Thanks for your interest! Let me try answering some of your questions.

> This sounds important and interesting but isn't wide the key word here?

Yes, width is very important for this result. Given the size of modern deep neural networks, I (and most people in the deep learning theory community, by now) believe the large width regime is the appropriate regime to study neural networks.

> I mean, the article puts forward a distinct language/model to expression neural nets in, which is cool but are they talking about all or most of the NNs you see today?

Try throw me an architecture and watch if I can't throw you back a GP :)

> Fo-get-a-bout-it, see SiempreViernes' comment: "I couldn't find any mention about a trained NN, this is strictly about the initial state. "(emphasis added)

Yes. I will have things to say about training, but that requires building up some theory. This paper is the first step in laying it out. Stay tuned! :)


> Try throw me an architecture and watch if I can't throw you back a GP :)

On the pragmatic side, would that GP train faster than the NN? In my little experimentation with GPs, I found them awfully slow. However, maybe what I tried (it was black box for me) used some brute force approach, and there are other more fine-tuned algorithms. Since you are an expert in the area, what's your take?


I think in general, "training" a GP, i.e. doing GP inference (or kernel regression) is not done for speed reasons, but rather because they are sample efficient. More concretely, the practical folk wisdom regarding GPs is that when there are not much data, then GP inference with a well-chosen kernel can give you much more bang for the buck than a neural network. However, when there are a lot of data (especially in the perceptual domains like vision and language), neural networks typically train faster and generalize better.

I wouldn't say I'm an expert at using GPs, so actual GP practitioners feel free to correct me if I'm wrong :)


Looking forward to your further results




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: