Hacker News new | past | comments | ask | show | jobs | submit login

This keeps popping up but while technically true, its essentially nonsense- normally when people talk about kernel machines, the kernel doesn't depend on the data, or at least not very much- we might use a Gaussian kernel and tune the radius or even the covariance.

This construction has a kernel which depends on the entire training trajectory of the neural network! So its completely unclear what's happening, all of the interesting parts may have just moved into the kernel. So basically this tells us nothing- we can't just add a new data point as in a kernel method, incorporating it just by adding its interaction- every new data point changes the whole training trajectory so could completely change the resulting kernel.




You may enjoy some works on the connections between gaussian processes, neural networks, and linear(in rkhs) models.

https://papers.nips.cc/paper_files/paper/2019/hash/39d929972...

https://papers.nips.cc/paper_files/paper/1996/hash/ae5e3ce40...

https://arxiv.org/abs/1711.00165


I'm familiar with the NN-as-GP papers. In practice we don't train infinite width neural networks!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: