Has anyone else noticed the connected layers in neural networks and then wondere...

hansvm · on Feb 16, 2020

There's been some work in this area. In practice, finding a good topology is hard, and graph or sparse matrix operations require a pretty significant level of sparsity before they're more efficient than just including every weight in a sense matrix and setting some parameters to zero.

Any DAG you choose is equivalent to a network with some number of dense layers with some of the weights zeroed, so you aren't losing any modeling capability by sticking with dense networks. The current trend of massively overparameterizing networks and training them for a long time in the zero error regime (so-called "double dip" error) exploits this idea a little. With sufficient regularization, you bias the network toward a _simple_ explanation -- one where most weights are near zero, effectively the same as having chosen an optimal topology from the beginning.

If you're talking about cyclic directed graphs, those are implemented in places too, but they're extremely finicky to get right. You start having to worry about a time component to any signal propagation, you have to worry about feedback loops and unbounded signals, they're harder to get to converge during training, and so on. Afaik there isn't a solid theoretical reason why you might want to add cycles since the layered approach can already handle arbitrary problems (not that we shouldn't keep investigating -- I'm sure some people know more than me on the topic, and I don't think there's any definitive proof that cycles are always worse either, so it seems like it might be worth investigating even from a practical point of view).

dnautics · on Feb 17, 2020

> therefore the layered topology must have some mathematical superiority

Isn't it just that backpropagation on the layered topology is relatively straightforward?

That's not to say you can't write a backpropagation on an arbitrary digraph, but as you get to more and more complex digraphs, things will get harder.

I could be wrong on this.

dodobirdlord · on Feb 17, 2020

> Isn't it just that backpropagation on the layered topology is relatively straightforward? That's not to say you can't write a backpropagation on an arbitrary digraph...

Moreover, any arbitrary digraph can be expressed as a layered topology (possibly with a lot of 0-weights). Since there's no fundamental difference you might as well work with whatever's easiest to compute with.

Buttons840 · on Feb 16, 2020

I was, and still am, intrigued by the idea of Boltzmann machines, which form a complete graph between a number of "neurons". With the right weights they could be shaped into any architecture. They could be shaped into a fancy recurrent network, or a multi-layer linear network, or anything in between. Indeed, with the right training algorithm the computer could learn how to structure the "neurons" it is given. I don't think we know any such training algorithm though.

They're also not very efficient, because every "feed forward step" you would have to use a matrix that is N*N (where N is the number of neurons), which is a worst case scenario. Maybe with sparse matrices it could be reasonably efficient if most weights were zero. I don't think sparse matrices are used much in machine learning currently.

These are my thoughts as a machine learning novice.

RocketSyntax · on Feb 17, 2020

Spiking neural nets are a lot like these "sparse dags"

nl · on Feb 17, 2020

The vanishing gradient problem makes this hard. ResNets and LSTMs use additional connections to help with this (and ironically that makes them more graph like).