Agree. (D)NNs have a powerful but somewhat loose inductive bias. They're great at capturing surface-level complexity but often miss the deeper compositional structure. This looseness, in my opinion, stems from a combination of factors: architectures that are not optimally designed for the specific task at hand, limitations in computational resources that prevent us from exploring more complex and expressive models, and training processes that don't fully exploit the available information or fail to impose the right constraints on the fitting process.
The ML research community generally agrees that the key to generalization is finding the shortest "program" that explains the data (Occam's Razor / MDL principle). But directly searching for these minimal programs (architecture space, feature space, training space etc) is exceptionally dificult, so we end up approximating the search to look something like GPR or circuit search guided by backprop.
This shortest program idea is related to Kolmogorov complexity (arises out of classical Information Theory) - i.e. the length of the most concise program that generates a given string (because if your not operating on the shortest program, then there is looseness/or overfit!). In ML, the training data is the string, and the learned model is the program. We want the most compact model that still captures the underlying patterns.
(D)NNs have been super successful, their reliance on approximations suggests there's plenty of room for improvement in terms of inductive bias and more program-like representations. I think approaches that combine the flexibility of neural nets with the structured nature of symbolic representations will lead to more efficient and performant learning systems. It seems like a rich area to just "try stuff" in.
Leslie Valiant touches on some of the same ideas in his book "Probably approximately correct" which tries to nail down some of the computational phenomena associated with the emergent properties of reality (its heady stuff).
Gaussian Process Regression (a form of Bayesian Optimisation to try and get to the right "answer"/parameter space sooner) - explained in some context here...
The ML research community generally agrees that the key to generalization is finding the shortest "program" that explains the data (Occam's Razor / MDL principle). But directly searching for these minimal programs (architecture space, feature space, training space etc) is exceptionally dificult, so we end up approximating the search to look something like GPR or circuit search guided by backprop.
This shortest program idea is related to Kolmogorov complexity (arises out of classical Information Theory) - i.e. the length of the most concise program that generates a given string (because if your not operating on the shortest program, then there is looseness/or overfit!). In ML, the training data is the string, and the learned model is the program. We want the most compact model that still captures the underlying patterns.
(D)NNs have been super successful, their reliance on approximations suggests there's plenty of room for improvement in terms of inductive bias and more program-like representations. I think approaches that combine the flexibility of neural nets with the structured nature of symbolic representations will lead to more efficient and performant learning systems. It seems like a rich area to just "try stuff" in.
Leslie Valiant touches on some of the same ideas in his book "Probably approximately correct" which tries to nail down some of the computational phenomena associated with the emergent properties of reality (its heady stuff).