Wolfram explains the basic concepts of neural networks rather well, I think. He trains and runs a perceptron at the beginning und then a simpler network. Then he dwells into replacing the continuous functions they constitute into discrete binary ones — and ends up with cellular automata he thinks emulate neural networks and their training process. While this surely looks interesting all „insight“ he obtains into the original question of how exactly networks do learn is trained networks do not seem to come up with a simple model they use to produce the output we observe but rather find one combination of parameters in a random state space being able to reproduce a target function. There are multiple possible solutions that equally work well — so perhaps the notion of networks generalizing training data is perhaps not quite accurate (?). Wolfram links this to „his concept“ of „computational irreducibility“ (which I believe is just a consequence of Turing-completeness) but does not give any novel strategies to understand trained machine models or how to do machine learning in any better way using discrete systems. Wolfram presents a fun but at times confusing exercise in discrete automata and unfortunately does not apply the mathematical rigor needed to draw deep conclusions on his subject.
We know there are infinitely many solutions, it is just hard to find a specific configuration of parameters. The question is, do all those configurations generalize or not.