This keeps popping up but while technically true, its essentially nonsense- norm... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

Straw 11 months ago | parent | context | favorite | on: Every model learned by gradient descent is approxi...

This keeps popping up but while technically true, its essentially nonsense- normally when people talk about kernel machines, the kernel doesn't depend on the data, or at least not very much- we might use a Gaussian kernel and tune the radius or even the covariance.

This construction has a kernel which depends on the entire training trajectory of the neural network! So its completely unclear what's happening, all of the interesting parts may have just moved into the kernel. So basically this tells us nothing- we can't just add a new data point as in a kernel method, incorporating it just by adding its interaction- every new data point changes the whole training trajectory so could completely change the resulting kernel.

mildmay_rider 11 months ago [–]

You may enjoy some works on the connections between gaussian processes, neural networks, and linear(in rkhs) models.

https://papers.nips.cc/paper_files/paper/2019/hash/39d929972...

https://papers.nips.cc/paper_files/paper/1996/hash/ae5e3ce40...

https://arxiv.org/abs/1711.00165

Straw 11 months ago | [–]

I'm familiar with the NN-as-GP papers. In practice we don't train infinite width neural networks!

Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact