Boltzmann Encoded Adversarial Machines

cs702 · on April 26, 2018

Very interesting.

At a high level (ignoring many details) the main idea is to replace generator networks in GANs with Restricted Boltzman Machines, or RBMs, which are easier to train (more stable). The authors call this kind of architecture "Boltzmann Encoded Adversarial Machines," or BEAM for short.

The experiments provide persuasive evidence that BEAMs outperform GANs. Figure 3, in particular, I find very persuasive -- it compares the ability of different architectures to learn to generate low-dimensional mixtures of Gaussians, with BEAMs very clearly outperforming GANs. The results in higher-dimensional applications such as image generation also suggest that BEAMs outperform GANs, but the improvement is somewhat more subjective due to the nature of high-dimensional data. Obviously, these results need to be replicated by others.

It looks promising to me. That said, it's been years since I've touched an RBM -- I only have a vague recollection of how they work and how they're trained, layer by layer, as proposed by Hinton in 2006 or so. Time to re-read old papers!

drams · on April 26, 2018

To clarify: in the case of a BEAM both the generator and all but the top layer of the discriminator is replaced with an RBM. The adversary in this case operates on features encoded by the RBM, not raw data samples. Secondly the RBM is trained with a combined loss involving log-likelihood and the adversarial term.

cs702 · on April 26, 2018

Yes. For simplicity's and brevity's sake, I ignored many important details in my summary.

drams · on April 26, 2018

No worries! :)

cs702 · on April 26, 2018

Thanks. Have you made any code available online?

drams · on April 26, 2018

Yes; The following recent review article actually provides code samples: https://arxiv.org/abs/1803.08823 which use an open-source version of our software called 'paysage' (https://github.com/drckf/paysage). This has currently not been updated too recently, but we expect to put out a new update quite soon. The update will clean up code, docs, features, but might not yet contain the BEAM training code. The latter is pending some decisions about IP, etc.

cs702 · on April 26, 2018

Thank you. I'll take a look!

cycrutchfield · on April 27, 2018

I am not entirely convinced. In particular, the results shown in Fig 7 remind me of the BEGAN paper which was similarly hyped. But I'll defer further judgment until I read through it more and maybe run some experiments.

DanWaterworth · on April 27, 2018

There's a good reason that the pictures look similar. Both architectures produce somewhat blurry images.

The problem, in BEGAN's case, is that when your idea of similarity is based of mean squared error, high frequency details are just not important. [1] You can see this by doing PCA on natural image patches. BEGAN uses an autoencoder trained on MSE.

RBMs produce blurry images because the architecture is not good at representing multiplicative interactions. You just get splodges of colour.

[1] http://danielwaterworth.com/posts/what's-wrong-with-autoenco...

w-m · on April 26, 2018

It's amazing to see deep learning blast through all the benchmarks, for example in computer vision, over the last couple of years. At the same time something starts to feel off about having all these single-use asymmetric feedforward networks solving their own little task. Being trained in one direction, then used in the other, then thrown away. Maybe being chained together for a more complex task, but that seems to be about it for the average (real-world application) use case of deep learning nets.

I'm sure there's plenty of interesting work being done in ML to improve on this situation and come up with new architectures. Yet I was moderately surprised when I rediscovered Boltzmann machines recently, and found not much work seemed to be going on there at all (very little at NIPS 2017 for example?).

This BEAM seems intriguing, here's hoping it opens the door to a better understanding and modeling of our world.

nafizh · on April 26, 2018

RBMs went out of fashion after 2010-2011, as other architectures worked better than them in almost all of the tasks in vision.

rm_-rf_slash · on April 26, 2018

I have had a similar thought recently. The office I work at has a large e-recycling bin for old computers. I have recovered quite a few desktops, laptops, and monitors, as well as a bunch of tidbits like adapters and RAM.

A lot of the RAM, for instance, is DDR2 and usually a measly 1gb apiece. They take up the exact same amount of space as RAM with 4gb apiece or more. I don’t know entirely why I still have them. Now that I’m doing physical computing/IoT development, Im seeing how pointless it is to have a bunch of desktops/laptops when I can get much more done - conveniently I might add - with a teeny tiny RedBear microcontroller.

I think an inherent feature of technology is having to get used to the idea that things age and die much faster than other products. Whether that’s physical hardware or trained neural networks, there comes a point when we just have to let go.

radarsat1 · on April 27, 2018

I found this one pretty interesting in that regard. The basic gist is that learning the function that projects onto the boundary of the dataset is useful for a variety of (linear inverse) problems.

One Network to Solve Them All --- Solving Linear Inverse Problems using Deep Projection Models https://arxiv.org/abs/1703.09912

rememberlenny · on April 26, 2018

Can someone explain the basic implications of this against current GANs and also provide a practical ML application?

drams · on April 26, 2018

I can try. (I am a coauthor of this paper) First off, Unlearn.ai is a startup working to build new tools that make precision medicine a reality. We needed to be able to build generative models which allow us to 1. model multimodal data easily (consider medical datasets with categorical data, binary, and continuous, with various bounds etc. all mixed together) 2. be able to answer counterfactual questions about data (for example if I down regulate a gene how does this effect the rest of the gene expression?) 3. be able to build models which handle time-series data (give me a likely progression of this person's cognitive scores given their current scores and other indicators)

RBMs are natural candidates for models which handle these kind of issues quite well. 1. Although people have done work trying to get GANs to work well with multimodal data, it's pretty kludgy. 2. GANs do not provide a means of inference (contrast VAEs which can satisfy this demand). 3. We have built a solid extension of RBMs to temporal models which work quite well.

However, as explained in this paper, stock RBMs have significant training issues. This paper attempts to improve the situation.

tlarkworthy · on April 26, 2018

RBMs have a native probabilistic output (the output is a distribution you can slice), but vanilla neural networks don't (the output is a vector). Is that right?

drams · on April 26, 2018

It's best to say that an RBM is an undirected NN which models a probability distribution of some variables. You can sample from the distribution (which is a stochastic process). There are other NN models which use feed-forward NNs to do similarly --such as GANs and VAEs and others. The generation process is also stochastic, but the difference is that you sample a noise distribution and then feed that through the NN. In all cases the generated samples are still vectors.

MarkMMullin · on April 26, 2018

I'm wondering if the work on adversarial systems, this one being quite interesting, can help us with our giant bugaboo of "OMG, its overfitted :-(" Right now we model, train, test, fail, and start all over again, and usually fiddle with the hyperparameters to boot - what would happen if we turned training into a two phased approach, with a BEAM/GAN whatnot used on each cycle to measure how 'brittle' the backprop is? The idea being to round down the spikes in the learned model by penalizing the backprop when it is too narrow - training would take longer, but we'd throw away fewer sets, I'd think

babak_ap · on April 26, 2018

Is there a reference, open source, implementation available? (on Github or similar)

TheAnig · on April 26, 2018

I too was interested in this

drams · on April 26, 2018

See the comment above.

bra-ket · on April 26, 2018

can this be applied to sequence learning?

johnfactorial · on April 26, 2018

Just what the AI/ML crowd needs in the midst of burgeoning fear of AI: a new technology with "Adversarial Machines" right there in the name.