I am myself trying to get closer to a clear formulation of this problem, which is why I'm writing here. Here's what I have so far,
ML systems (eg., NN) remember averages (, compressions) of historical data. They are useful, wrt the problem, iif (1) the problem's target function exists; (2) the data is relevant, unambiguous, well-carved; and if (3) these properties will hold regardless of likely permutations to the problem's framing.
Systems are given data with these properties by significant amounts of experimental design, work, and effort by people. Absent these properties, data is useless.
Producing data with these properties requires intelligence, and no machine systems exist which can do it.
My issue with research into ML on the whole, is that it *assumes* these properties and then explains how the systems work. I understand why this is interesting from a formal perspective... but it fails to note that this situation is almost never how ML is used.
There is no function from Image->Animal, ie., biologists arent just idiots who could have just looked at some pixel patterns. Pixel patterns are radically ambigious wrt to `Animal`, and so even an infinite sampling of (Image, Animal) is not enough for ML.
... so what on earth are ML systems doing?
This is a bigger research question: to characterise how ML performs when this assumed setup fails. And you know, that research almost doesnt exist. This is an industry led by partisans to its success.
What do you think would happen if research actually talked about the dynamics of ML systems performance when (1) the target doesnt exist; (2) data isnt relevant & unambigious; (3) the problem framing will permute most times its deployed....
Suddenly we'd have an explaination of why 2016 wasnt the year self-driving cars were delivered. And indeed, likewise, of why even 2036 wont be.
One good lens I know is that a neural network is just good at approximating stuff. Trained properly, you can have it approximate a distribution. A conditional distribution like p(animal_species=dog | image=what I am seeing) (discriminative model, e.g. classifier), or even a joint one p(animal,image) (generative model, Autoencoder/GAN/VAE/Diffusion).
There is also an information theoretic lens about compression, which is probably very close to what you are thinking about, but I haven't studied it yet.
Regarding Image -> Animal. An image of an animal is a projection of the animal onto a 2D plane, plus lots of noise. So there is some dependance between an image of an animal and the animal. Biologists can get a lot of information looking at a photo of an animal. In some sense they are always looking at two images from their eyes.
But the problem you are talking about is indeed serious, and far from solved. You can't understand the real world from 2D images with the current approaches. Ideally we want neural networks to build a 3D (or even 4D, with time?) model of reality. Instead we find them trying to guess labels based on patterns. May favourite example is the tiger-dog [1]. Still, there is evidence NNs are doing some clever things [2]. My guess is that the problem is that we just haven't found a way to formulate the task for the solution we want. In the current formulation it's easiest for the model to minimize the loss by sticking to patterns, so why do something else?
There is a lot of research on more applied ML that asks the questions you are asking. It's just that this paper is another attempt at a theoretical explanation.
I agree on the self-driving cars. We can't have truly self-driving cars until a model can generalize, which none can't at the moment. The core question is: if we had a "dumb" model that does clever averaging and successfully covers 99% cases, such that the car is dumb in special cases, but smarter than most human drivers in usual cases, would it justify deploying the cars? If this was the case, dumb ML might be enough for self-driving. It's definitely enough for self-driving in walled garden conditions, so there is some evidence that with enough data we can brute force our way to a tolerable solution.
Yes, we do require models parameterised by both space and time -- but really, we require implementations of these models -- the implementation i'm thinking of is called a body.
Why? Well, consider the best sort of such models: physics. What is "the mass of the sun"? What is "the sun"? There is nothing in all of physics which says anything exists, nor what its boundaries are. Least of all what "the sun" is.
Physics, all of science, is counter-factual: if something exists, then.
You're never going to get to "what a table is" just by table(x, t) -- because there is something in the background which asserts "tables exist" and that is the concernful actions of animals which care to partition reality this way.
Reality, in the end, measured in every possible way is still ambiguous. It still leaves open how one actually refers to any of it. Where one places a boundary. What the unit is going to be, in our descriptions.
There is no way around starting from the other direction: not with facts already provided; but with no facts at all. You have to build a system which cares, that then induces a partitioning, that can then change its caring; and so on.
This is an extremely partical concern. A car cannot drive itself, in the relevant sense, if it doesnt care about anything; and more severely, if it doesnt care like we do.
The car isnt going to be able to modify its concepts in response to being challenged -- by other people, by the environment, etc. because it has no reason to. There is nothing which is important to it. And hence, when confronted by the need to adapt, the car will kill people.
There is no function from Image->Animal, ie., biologists arent just idiots who could have just looked at some pixel patterns. Pixel patterns are radically ambigious wrt to `Animal`, and so even an infinite sampling of (Image, Animal) is not enough for ML.
... so what on earth are ML systems doing?
This is a bigger research question: to characterise how ML performs when this assumed setup fails. And you know, that research almost doesnt exist. This is an industry led by partisans to its success.
What do you think would happen if research actually talked about the dynamics of ML systems performance when (1) the target doesnt exist; (2) data isnt relevant & unambigious; (3) the problem framing will permute most times its deployed....
Suddenly we'd have an explaination of why 2016 wasnt the year self-driving cars were delivered. And indeed, likewise, of why even 2036 wont be.