I see this as the MIT school of though. This group was saying that we’d have strong AI back in the 60s. I believe that a brute force math approach that lacks a larger theory of how neural networks give us intelligence is going to be the slow path forward.
I was proposing how to make progress in understanding some of the data manipulations being done now by what appears to be most of ML/AI. That is, for the data manipulations, get those from the consequences of some theorems. For an example, apparently regression is important in current ML/AI: Well, going way back, 50+ years, we have a lot of solid math for regression.
Then I mentioned that from Euclid through calculus to wave equations, there's more math from theorems and proofs we can and sometimes do apply. And we can stir up still more applicable math.
My goal was just suggesting how to do better with real, valuable applications to important real world problems. We have such examples from US national security -- the A bomb, the H bomb, GPS, stealth, phased array radar, adaptive beam forming sonar, etc.
By analogy, I was suggesting a better socket wrench or numerically controlled milling machine and not a self-driving car. I was not suggesting anything that we could regard as intelligent, not even as intelligent as a field mouse.
IMHO, powerful math with valuable applications is now very doable. We can schedule equipment maintenance, airline crews, airline fleets, workers, trucks, etc. We can have computers do the data manipulations specified by the math and have the Internet move the data.
But for also having systems that are as intelligent as a field mouse, kitten, puppy, octopus, song bird, even walk as well as a cockroach, that's harder.
At one time I worked in some AI based on some MIT AI work; I wrote software, worked with GM Research, gave a paper at the Stanford AAAI IAAI conference, as sole author or co-author published a list of papers. Here I was not suggesting anything that has anything to do with any of the MIT AI work I've known about.
For people at MIT who have done work more like I have in mind, I can think of D. Bertsekas and M. Athans.
Athans was in deterministic optimal control, and the OP mentioned that that field did "back propagation" long ago. Athans is, was, a good applied mathematician. When I was at FedEx, I chatted with him in his office on how to find how best to climb, cruise, and descend airplanes. He told me a cute story about how an F-4 could get minimum time to climb, say, IIRC, to 100,000 feet: Climb up to just 5,000 feet or so, go into a dive, get supersonic where actually the drag was less, and then go nearly vertical, all supersonic, directly to 100,000 feet.
For neural networks, IIRC there is some nice math that shows how general those can be for representing functions, and for some parts of stochastic optimal control Bertsekas proposed such a use for neural networks. There he, as usual, was being mathematical.
> Stochastic processes and control theory, as in Çinlar or Bertsekas, will definitely contribute towards strong AI, if we ever achieve that.
My view is that for high end applications of computing now, drawing from, building on, Çinlar or Bertsekas is about the best we can do and, due to current computing and the Internet, suddenly terrific.
But my view of something real in AI, say, as good as a kitty cat, ..., human will much more directly use very different approaches; that if in the basic core programming there is some math in there, then it will be darned simple.
So, my current view is that the good approaches to such AI will make direct use of little or no pure or applied math. Instead, my guess is that animal ... human intelligence is just some dared clever programming, rediscovered and re-refined so far many times here on earth. From the many times, my guess is that there is basically one quite simple way to do it.
My guess is that the sensory inputs, first, feed data that becomes in the brain essentially nouns: Floor, rock, water, etc. Early on the data on the nouns is quite crude, but later with more experience gets refined. E.g., a kitty cat quickly learns that floors are solid to stand and run on, and some are shaky and might result in a fall. Then with more input and experience, some
verbs are combined with some of the nouns. The strength of the combining is mostly just from experience; yes, we could write out some simple strength updating algebra. But to be cautious, the learning is deliberately slow: E.g., not everything round on the floor is good to eat.
There is a continual process to simplify this data, i.e., a form of data compression, into causality, e.g., learn about gravity. The learning is good enough to identify the concept of gravity as the cause that makes things fall and to reject irrelevant data like just what are falling from, a table, a window seat, the top of a BBQ pit, a tree limb, the second floor landing, etc. Also reject night, day, hot, cold, and other irrelevant variables -- that's smarter than current multi-variate curve fitting that has a tough time appraising what variables are likely irrelevant.
If I were going to program AI, that would be the framework I would use. I regard the learning as close to a bootstrap operation -- the first learning is very simple and crude but permits gathering more data, refining that learning, and doing more learning. To get some guesses on the details, watch various baby animals and humans as they learn.
I see no real role for math or anything I've heard of in current ML/AI, and I don't think it's much like rules in expert systems. And my guess is that the amount of memory needed is shockingly small and the basic processing, surprisingly simple.