This is a really good explanation for a fully connected network. Most models will have more complex architectures so are even more complicated than this.
I haven't been great about keeping up with advances in the field, but to my understanding most if not all architectures in effect merely enforce symmetries upon the network. That is, they can be represented by a fully connected network but in that representation not all weights are free, in that some are fixed (0 or 1) or some are dependent (A(1,1) will also be equal to C(1,1)).
I don’t blame you, things are pretty diversified so I’m white knuckling it through my own subfield.
But as a simple example, convolution would be a little tedious to describe in the general notation you wrote above without getting into the image dimensions, stride, padding etc., not to mention residual layers or norm layers that are commonly used. Then there are things like stop gradient, dropout or even other training targets which are only used during training but not inference.
Convolution actually can be pretty elegantly translated into multiplication by a matrix with symmetries. It's been a few years, but that was an example we had to work out by hand in my Deep Learning course at university.
It's pretty much the quintessential example of an enforced symmetry, in that it introduces a symmetry against translation.