Reservoir computing

qaute · on Oct 19, 2018

"Reservoir computing" taken very literally: wave interference in a bucket of water computed a simple speech recognition task (differentiate "zero" and "one") [1].

[1] https://link.springer.com/chapter/10.1007%2F978-3-540-39432-...

sgentle · on Oct 19, 2018

PDF: https://pdfs.semanticscholar.org/af34/2af4d0e674aef3bced5fd9...

Also the Liquid State Machine paper they cited, "Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations": http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.5.8...

Fascinating idea. If I understand correctly, it's like an extreme version of data pre-processing. If you're trying to figure out whether an audio clip is saying "zero" or "one", analysing the raw amplitude data is pretty tough going. Instead you could run it through a Fourier transform in the hope that the clip's frequency content would be easier to analyse. If that doesn't help, maybe a wavelet transform, or something more fun like, uh, the "inverse Fourier transform of the logarithm of the squared magnitude of the Fourier transform"

In a sense, it doesn't really matter what pre-processing you do, provided that the differences between "zero" and "one" are more distinct in the output than the input. This is the "separation property" that the papers mention: important differences get magnified at the expense of unimportant ones. If that's true, your final analysis will have a lot less work to do.

What's cool about this is that "anything that magnifies important differences" is a pretty open-ended requirement, leaving you free to choose pre-processing that's easy to implement in hardware. In this case, fluid dynamics has the desired properties, and the laws of our universe make it very easy to implement a fluid simulation using actual fluid in an actual bucket.

Perhaps there are other systems with similar separation properties that are even easier to implement in hardware. Maybe something with electromagnetic waves, like in time-domain reflectometry? Even if such a system's behaviour is uninterpretable to us, it might still provide useful pre-processing to a machine learning algorithm.

bra-ket · on Oct 19, 2018

thanks, this is very interesting

273 · on Oct 20, 2018

This is a very accessible introduction by Quanta magazine: https://www.quantamagazine.org/machine-learnings-amazing-abi...

gyre007 · on Oct 19, 2018

There are a few interesting resources about this[1]. The concept of "microelectromechanical neural network application" [2] is super interesting. We've been hearing about memristors for a while, but has anyone seen them being widely deployed anywhere?

[1] http://www.physnews.com/nano-physics-news/cluster1837307157/

[2] https://aip.scitation.org/doi/full/10.1063/1.5038038

woopwoop · on Oct 19, 2018

After writing the following, I realized it might come off as more critical than I meant it. I'm genuinely curious, I just wasn't able to understand the motivation for this idea from the Wikipedia page.

First, let me attempt a summary in language that makes sense to me. You want to interpolate a function f:R^n to R^m. So you pick a "random non-linear dynamical system", and compose f with the output of this. To be concrete, let's say you pick a polynomial map F: R^n to R^n of degree at most k for some fixed k with random Gaussian coefficients. Let g(x) = h(1), where h' = F(h) and h(0)=x. Now, given training data (x_i, y_i), let z_i = g(x_i), and interpolate the function f* on the training data (z_i, y_i) using whatever method you like (least squares or a Neural net or whatever). Given test data x, guess f(x) = f*(g(x)).

Is this correct? Why is this plausibly a good idea? Why have the function g be the output of a dynamical system, and not a random matrix? Or if you want g to be nonlinear for some reason (what reason?), the output of a random polynomial or a random Fourier series? What happens if two clusters in the initial data which are well separated and map to different well-separated outputs in the codomain get mixed after composing with the dynamical system map g? Or what if g is not even one-to-one?

jphoward · on Oct 19, 2018

What I don't understand is if you can train a final layer on the reservoir's random representation, why is this better than just training the final layer on your data directly? I assume the answer has something to do with dimensionality reduction?

breuderink · on Oct 19, 2018

The reservoir can compute non-linear functions of the input over time. So, with a reservoir, you can train a linear projection of the reservoir state that responds non-linearly to the input. Without the reservoir, the output projection can only be linearly related to the input.

falcolas · on Oct 19, 2018

If I were to guess, I would say that the reservoir would amplify differences in data that may not be easily separated in the source data.

Sort of acting like a hash function - small differences in the source become large differences in the output.

Seanny123 · on Oct 19, 2018

FYI, if you analytically determine the recurrent connections, you get way better results than setting the connections randomly. http://compneuro.uwaterloo.ca/publications/voelker2018.html

ovi256 · on Oct 19, 2018

You should be aware that echo-state networks are considered special cases of recurrent networks. One variant of recurrent networks, LTSMs, dominates state of the art on a wide variety of problems.

They're expensive to train, in computing time and data-requirements, but they're the closest thing we have to trained programs.

rusticpenn · on Oct 19, 2018

As far as I know, reservoir computing (liquid state systems) works with Spiking Neural Systems, which are different from "classic" artifical neurons.

fnord77 · on Oct 19, 2018

eli5?

krastanov · on Oct 19, 2018

I am simplifying it to a level where it becomes borderline useless/wrong, but it should give a gist of the idea: it turns out that frequently you do not need to train anything but the last layer of a deep neural network, if all the other layers are sufficiently big arbitrary and weird. In reservoir computing you replace all but the last layer of the network with a "dynamical system", i.e. a large and meaningless map.

AllegedAlec · on Oct 19, 2018

I may be having some misunderstanding here, but with a large enough set of random intermediate layers, wouldn't yet expect that some of the connections would accidentally be really good signal carriers, and that by training the last layer, you're basically training it to find the good signal carriers in between all the noise?

gbrown · on Oct 19, 2018

So treat it like a basis expansion?

GlenTheMachine · on Oct 19, 2018

If I understand the question correctly, ie is a reservoir computing approach simply projecting an input vector into a higher dimensional space, like a wonky support vector machine, then I think the answer is: unclear.

Reservoir computing approaches usually have the intermediate layers be recurrent, eg they implement a dynamical system. Theoretically, this is actually Turing complete, although good luck programming it. But in any case, the range of behaviors — transformations in the data — that a dynamical system can implement is much more powerful than just a basis expansion. However, whether this is what is happening in actual practice is really, really unclear to me. A lot of recurrent neural networks aren't doing anything more than an equivalent feedforward network, and the same thing may be true here: reservoir approaches might, for the most part, really just be performing nonlinear projection to higher dimensional spaces, and then the output layer is being trained to classify those patterns.

gbrown · on Oct 19, 2018

Thanks, that makes sense.

marviel · on Oct 19, 2018

Thanks for the eli5! Helpful

Jtsummers · on Oct 19, 2018

https://www.nature.com/articles/ncomms1476?WT.ec_id=NCOMMS-2...

Seems like a decent introduction (not versed in the topic, but this is a more informative article than the wikipedia page).

Check out Figure 1 for a model like what krastanov discusses.