Event-based backpropagation for exact gradients in spiking neural networks

periheli0n · on June 2, 2021

This is about achieving Deep learning on Neuromorphic hardware. Large research teams have been working on it for decades. Billions of dollars/Euros/Pounds must have been poured into it. Still, their devices and algorithms get blown out of the water by an off-the-shelf GPU plus tensorflow, pytorch, what have you.

Hats off for the authors' achievement, this is no small feat and something that has been tried for years. But IMHO it's time that field moved on from running after matrix accelerators and focused on the real advantages of event-based computing: asynchronous, low-latency, event-based signal processing.

FrereKhan · on June 2, 2021

It's not quite correct to say this is only for achieving deep learning. Gradient-based parameter optimisation is still a useful tool, even for small shallow networks that would be ideal for event-based signal processing.

Even for small-network tasks, training spiking networks has been non-trivial. This paper provides a way to get exact gradients, implying probably faster optimisation than using surrogate gradients or other approximation methods for SNNs.

periheli0n · on June 2, 2021

You are totally right. The algorithm itself is a potential game-changer. I guess I was carried away by the pitch in the abstract that starts off with deep learning.

Personally I think that way too many resources were wasted on trying to make better deep networks with spikes. In my opinion it is much more promising to apply spiking networks on problems that are inherently event-based.

Having a functional backpropagation algorithm such as the one provided can help with that, obviously.

datameta · on June 2, 2021

Based on reading just the abstract so far, it seems to me the event-based application of this algorithm makes absolute sense. Temporal importance can be effectively characterized in memristors. At the risk of making a comparison similar to that of Andrew Ng's a decade ago, I think this approach paired with something like a ReRAM crossbar is quite an effective rough analogue to the voltage potentials across a group of neurons in the brain.

I applaud this team's efforts. A real breakthrough.

mpfundstein · on June 2, 2021

interesting. where would one start reading about all this?

periheli0n · on June 2, 2021

You could start with Intel‘s Loihi Press release: https://www.intel.com/content/www/us/en/research/neuromorphi...

There you get the full dose of hype for neuromorphic computing, but without any critical reflection (naturally, since it’s a press release advertising a product).

Unfortunately I am not aware of literature that provides critical review of neuromorphic computing. You have to read between the lines of the research papers to find out that the field has failed to live up to the promise of lower-energy deep learning (which was a misguided promise from the outset, IMHO).

dtjohnnyb · on June 2, 2021

Could you elaborate on why you think low energy deep learning was a misguided promise for SNNs? Just came across them for the first time last week and the low energy promise seemed like their most interesting aspect!

periheli0n · on June 2, 2021

Deep learning is fundamentally linear algebra. Spiking networks are fundamentally event-based processors. The two concepts don’t play well together.

Many researchers have been trying hard to shoe-horn deep ANNs into spiking networks for the last 10 years. But this doesn’t change the fact that linear algebra is best accelerated by linear algebra accelerators (i.e. GPUs/TPUs).

Generally, spiking networks will likely have an edge when the signals they are processing are events in time. For example, when processing signal streams from event based sensors, like silicon retinas. There’s also evidence that event-based control has advantages over their periodically-sampling equivalents.

FrereKhan · on June 2, 2021

If you bring activation sparsity into the mix, the advantage of SNN processors over GPUs/TPUs becomes more clear. Loss-gradient-based optimisation approaches are great because they give you a tool to include e.g. sparsity regularisation into the loss. Encouraging sparse activity makes simple linear algebra a poor fit for network activation, and SNN processors a much better fit.

periheli0n · on June 2, 2021

But is sparse activation sufficient to motivate the use of SNNs? In my opinion one needs a temporal component as well.

Sparse activations that don't also have a time component (i.e. are sparse in space and time) can be very well implemented without events.

Granted, SNN processors can handle sparse activations better than matrix accelerators. But then again, SNN accelerators might carry lots of SNN overhead that is not required for sparse activations alone.

Edit: A good example for a non-spiking sparse activation accelerator is the NullHop architecture [1].

[1] https://ieeexplore.ieee.org/abstract/document/8421093

FrereKhan · on June 3, 2021

I agree. The use case needs to justify having state, otherwise the ideal architecture is something like NullHop. Temporal signal processing / vision processing tasks are ideal for SNNs, especially if the inputs can also be sparse.

orbifold · on June 2, 2021

I agree with these points, however the main advantage of the method presented in the paper is precisely that both the forward propagation and backward propagation can be seen as being performed by a network operating on temporally sparse events. We absolutely had event-based sensors and control as a motivation in mind. The fact that you can write down the connectivity of the neurons in terms of a weight matrix, does not mean that it can't be sparse. Since you are actually processing one spike at a time (potentially asynchronously), you don't need to implement any matrix multiplication. Current neuromorphic hardware achieves at least some degree of sparsity in their synaptic crossbars (BrainScales2, Spinnaker) or largely eliminates them like Loihi.

FrereKhan · on June 2, 2021

Ultra-low-power neuromorphic processors such as DynapSE[1] have been cross-bar free for several years now, making them a perfect fit for sparse networks (both weight- and activity-sparsity). [1] https://arxiv.org/abs/1708.04198

orbifold · on June 2, 2021

Yes, that would be another example.

periheli0n · on June 2, 2021

Yes, the algorithm you proposed is impressive and has the potential to become a game-changer.

However, I think the MNIST and the Ying/Yang dataset, using latency-coding, are not the ideal example to demonstrate its performance.

These datasets are useful to demonstrate nonlinear classification, and it's certainly great to see that the spiking network performs competitively. However, the transformation into a latency code costs time, in terms of computation, and also in terms of representation, before even one item is classified. Perceptron-based ANNs with continuous outputs don't require this step and will always have an edge over spiking networks in such scenarios.

I think what the field is really lacking is an ML problem that can leverage spiking networks directly, that does not require costly conversion of data into a representation that is suitable for spiking networks.

orbifold · on June 2, 2021

I agree that the choice of task is not ideal. It is something that I struggled quite a bit with, since coming up with a good task can be a lot of work. Unfortunately even some of the "neuromorphic" datasets that are in use can be solved by massive temporal averaging or result in reduced performance of the network relative to "analog" temporal input (e.g. on Google Speech Commands). I'm collaborating with a group that is interested in event-based vision and control, so hopefully this will result in more practical/impressive demonstrations in the future.

PaulHoule · on June 2, 2021

I have always wondered if results against the MNIST digits are generic. One might think it would work if you put in some other digits such as 一, 二, 三, 四 would they cluster the same with tSNE?

sroussey · on June 2, 2021

Does that include rain.ai ?

periheli0n · on June 2, 2021

I don’t know too much about their technology and the website isn’t giving away too much detail. It doesn’t look like they are using spiking networks, so no event-based neuromorphic tech, but perhaps good old linear algebra/ANN ML. They’re using analog computation which is attractive power-wise, but in the past has always suffered from variability due to device mismatch. Unless they have some really revolutionary process or algorithm that magically makes the downsides of mismatch disappear, they’ll have a hard time going beyond what has been tried in analog computing before (and which had its heyday in the 70s).

wombat23 · on June 2, 2021

Looks like a different approach. Intel's chip is based on digital circuits. They try an analog approach.

clara732 · on June 2, 2021

Here is a paper from the same group which includes actual results of an algorithm running on the neuromorphic chip: https://arxiv.org/abs/1912.11443

marmaduke · on June 2, 2021

The EU Brainscales project built a wafer that runs 10k times faster than real-time,

https://electronicvisions.github.io/hbp-sp9-guidebook/pm/pm_...

orbifold · on June 2, 2021

One of the authors here, happy to answer any questions you might have.

nairboon · on June 2, 2021

Could you comment on this general SNN critique? https://news.ycombinator.com/item?id=27366059

periheli0n · on June 2, 2021

As the author of that "general SNN critique" I'd like to add that this was not a general critique, but a specific reply to the question why I think low-energy deep learning is a misguided promise for SNNs.

Personally I think SNNs are a very exciting research field, both from a neuroscience as from a computer science angle. The work we are discussing here is deeply impressing for its rigour, and it addresses an important problem in spiking network research.

Whether spiking networks will provide lower-energy deep learning is a totally different question.

orbifold · on June 2, 2021

I actually think I largely agree with the points that you made in the comment thread above. The name of the game can't be to just map a deep neural network to a SNN, I find it far more interesting to identify more natural mechanisms by which SNNs are able to perform information processing on sparse asynchronous event-based data. Also thank you for the nice words :).

nairboon · on June 2, 2021

Maybe "critique" was the wrong choice of word. Your point is now much more clear to me, thank you for your insights.

krishnadevi · on June 2, 2021

Absolutely right

ALLTaken · on June 2, 2021

I'm kind of an amateur, but incredibly curious, willing to learn and do hard work.

I have many ideas and questions regarding your paper:

- How do you adjust weights between different spikes?

- Do you use or implement a kind of wavelet for wave-propagation, in example for spike interferences?

- What neuromorphic hardware can I buy to run your code/ the SNN?

=)

orbifold · on June 2, 2021

- Weights are adjusted at the end of the backward in time (adjoint) integration, according to the weight gradients accumulated at pre-spike times.

- We only consider one kind of model system in this paper but this method would work for any kind of hybrid dynamical system, so also other physical substrates (a lot of exciting work to do there).

- We used to sell a neuromorphic hardware system Spikey for ~3000 Euro (basically at cost), we've recently completed a similar project, we also provide access to remote users via the ebrains collaboratory (https://ebrains.eu/service/collaboratory/). There are a number of commercial offers in the works (SynSense, Inatera). You can also buy SpiNNaker boards or access them via ebrains. Loihi and TrueNorth either don't sell or are pretty expensive, but they have "research agreements" in place.

wombat23 · on June 2, 2021

> What neuromorphic hardware can I buy to run your code/ the SNN?

Current neuromorphic hardware is not easily accesible, but you can simulate spiking neural networks. Check out, e.g. https://brian2.readthedocs.io/en/stable/ or Nengo.ai

haffi112 · on June 2, 2021

In table 3 you compare your result with other publications. Reference 44 has a larger accuracy than you with fewer hidden neurons. What is the difference between your method and theirs?

Also, will you release your method as code?

orbifold · on June 2, 2021

That number is what they reported in their publication, but it turns out they actually used both recurrent neurons (compared to feedforward as they state) and 512 instead of 100 neurons (see here https://zenkelab.org/publications/errata_zenke_vogels_2021/). We will adjust those numbers in the final publication.

My aim is to release the method as part of Norse https://github.com/norse/norse. There is some subtlety involved in implementing it for a given integration scheme, though. The event based simulator underlying the paper will also be released in due time.

throwaway245234 · on June 2, 2021

Other author here, as was pointed out this was a typo in the cited publication. We will definitely be releasing the source code in due time.

drdeca · on June 3, 2021

I'm unclear on what \tau_{syn} and \tau_{mem} mean. I assume that syn stands for synapse, and mem stands for memory, but I'm not sure what the \tau is about? Time? I imagine that this would be clear to someone in the field, who would be the target audience for the paper, so this isn't really a criticism.

jegp · on June 3, 2021

\tau is a common symbol for time constants which, briefly put, determines how fast something decays over time. Syn for synaptic current and mem for membrane voltage. We've written some documentation around our neuron equations in Python that explains this: https://norse.github.io/norse/auto_api/norse.torch.functiona...

See also our tutorial on neuron parameter optimization to understand how it's useful for machine learning: https://github.com/norse/notebooks#level-intermediate

There's also a great book on the topic by Gerstner available online: https://neuronaldynamics.epfl.ch/

Disclaimer: I'm a co-author of the library Norse

Regarding the target audience, it's actually not entirely clear to me. This lies in the intersection between computational neuroscience and deep learning, which isn't a huge set of people. So, I think you're question is highly relevant and we (as researchers) have a lot of work in front of us to explain why this is interesting and important.

drdeca · on June 3, 2021

Thanks! I will take a look at those