Hacker News new | past | comments | ask | show | jobs | submit login

Has anyone else noticed the connected layers in neural networks and then wondered if a more generalized topology such as a directed graph could be applied to neural networks the first time they were introduced to the concept some decades ago and then realized that they must not have been the first one to notice and therefore the layered topology must have some mathematical superiority over a generalized form but never found a concrete answer as to why?



There's been some work in this area. In practice, finding a good topology is hard, and graph or sparse matrix operations require a pretty significant level of sparsity before they're more efficient than just including every weight in a sense matrix and setting some parameters to zero.

Any DAG you choose is equivalent to a network with some number of dense layers with some of the weights zeroed, so you aren't losing any modeling capability by sticking with dense networks. The current trend of massively overparameterizing networks and training them for a long time in the zero error regime (so-called "double dip" error) exploits this idea a little. With sufficient regularization, you bias the network toward a _simple_ explanation -- one where most weights are near zero, effectively the same as having chosen an optimal topology from the beginning.

If you're talking about cyclic directed graphs, those are implemented in places too, but they're extremely finicky to get right. You start having to worry about a time component to any signal propagation, you have to worry about feedback loops and unbounded signals, they're harder to get to converge during training, and so on. Afaik there isn't a solid theoretical reason why you might want to add cycles since the layered approach can already handle arbitrary problems (not that we shouldn't keep investigating -- I'm sure some people know more than me on the topic, and I don't think there's any definitive proof that cycles are always worse either, so it seems like it might be worth investigating even from a practical point of view).


> therefore the layered topology must have some mathematical superiority

Isn't it just that backpropagation on the layered topology is relatively straightforward?

That's not to say you can't write a backpropagation on an arbitrary digraph, but as you get to more and more complex digraphs, things will get harder.

I could be wrong on this.


> Isn't it just that backpropagation on the layered topology is relatively straightforward? That's not to say you can't write a backpropagation on an arbitrary digraph...

Moreover, any arbitrary digraph can be expressed as a layered topology (possibly with a lot of 0-weights). Since there's no fundamental difference you might as well work with whatever's easiest to compute with.


I was, and still am, intrigued by the idea of Boltzmann machines, which form a complete graph between a number of "neurons". With the right weights they could be shaped into any architecture. They could be shaped into a fancy recurrent network, or a multi-layer linear network, or anything in between. Indeed, with the right training algorithm the computer could learn how to structure the "neurons" it is given. I don't think we know any such training algorithm though.

They're also not very efficient, because every "feed forward step" you would have to use a matrix that is N*N (where N is the number of neurons), which is a worst case scenario. Maybe with sparse matrices it could be reasonably efficient if most weights were zero. I don't think sparse matrices are used much in machine learning currently.

These are my thoughts as a machine learning novice.


Spiking neural nets are a lot like these "sparse dags"


The vanishing gradient problem makes this hard. ResNets and LSTMs use additional connections to help with this (and ironically that makes them more graph like).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: