This keeps popping up but while technically true, its essentially nonsense- normally when people talk about kernel machines, the kernel doesn't depend on the data, or at least not very much- we might use a Gaussian kernel and tune the radius or even the covariance.
This construction has a kernel which depends on the entire training trajectory of the neural network! So its completely unclear what's happening, all of the interesting parts may have just moved into the kernel. So basically this tells us nothing- we can't just add a new data point as in a kernel method, incorporating it just by adding its interaction- every new data point changes the whole training trajectory so could completely change the resulting kernel.
This construction has a kernel which depends on the entire training trajectory of the neural network! So its completely unclear what's happening, all of the interesting parts may have just moved into the kernel. So basically this tells us nothing- we can't just add a new data point as in a kernel method, incorporating it just by adding its interaction- every new data point changes the whole training trajectory so could completely change the resulting kernel.