A Probabilistic Theory of Deep Learning

j2kun · on April 3, 2015

I am well versed in the usual theories of learning (PAC, SQ-learning, learning with membership and equivalence queries, etc.)

Can anyone comment on how this relates to those standard models? There does not appear to be any mention in the paper of the standard learning models, and as a result I'm inclined to think this paper is not worth reading.

noelwelsh · on April 3, 2015

On a veeeery quick skim I think it's a Bayesian generative model for deep learning architectures. I thought Zoubin Ghahramani's group already had done some similar work, but :shrug: it's not my field.

dustintran · on April 3, 2015

To clarify, it has been studied in Zoubin Ghahramani's group [1] (and also more recently in Ryan Adam's group [2]), and it's most widely known through Radford Neal [3] who's won a lot of competitions using the Bayesian approach to NNs.

[1] http://mlg.eng.cam.ac.uk

[2] http://hips.seas.harvard.edu

[3] http://www.cs.utoronto.ca/~radford/res-neural.html

therobot24 · on April 3, 2015

56 pages! They should really reorganize this into 10-16 pages to get the basic ideas and results across.

nmrm · on April 3, 2015

Arbitrary page limits cause far more harm than good in academic writing.

Dissertations and journal articles are some of the most readable and useful academic publications in computer science precisely because this attitude of "16 pages of bits ought to be enough for anyone" isn't enforced.

Without artificial page limits, it's possible to explain an idea from the ground up without taking shortcuts. Enforcing a prohibitive page limit is far more likely to force clever writing than good communication.

jsyedidia · on April 3, 2015

What a strange attitude! There's plenty of zero-cost bits available to allow for articles of all sorts of lengths. For example, the "Foundations and Trends" journals (e.g. Foundations and Trends in Machine Learning, or Foundations and Trends in Optimization) publish articles that are usually at least 100 pages long. These articles tend to be well-respected and highly cited.

Longer articles have many advantages in allowing for a more in-depth explanation, and it is certainly not the case that every reader wants papers shoe-horned into an artificial page limit.

therobot24 · on April 3, 2015

> There's plenty of zero-cost bits available to allow for articles of all sorts of lengths.

Straw man - i wasn't talking about cost, hard drive space, or any relation near what you're referring to.

> ...articles that are usually at least 100 pages long. These articles tend to be well-respected and highly cited.

Another straw man - why does length (short or long) correlate with quality again?

> Longer articles have many advantages in allowing for a more in-depth explanation

Ah the real comment. Ok. I definitely agree - longer usually means more space to explain.

> and it is certainly not the case that every reader wants papers shoe-horned into an artificial page limit.

So be it. There's usually an appendix or supplementary materials that can offer expanded derivations. Often the authors trim a lot of the fat for the published paper and put a longer version in a book/thesis.

> What a strange attitude!

As a writer of publications i want more space and agree, but as a reader of publications (way more than i write) there's just too much out there to spend my time going through 50+ pages. I can put in the time for 10-20 pages and if i still want more i'll check out other publications, appendix, supplementary, thesis...whatever. It's an important and necessary skill for academics to be able to concisely present their work - not just for publications, but for grant applications, presentations, etc.

jsyedidia · on April 3, 2015

If you don't like to read long articles, don't read them. I like to read them; they tend to be much easier to understand than artificially shortened articles. It seems to me that complaining that somebody wrote a long article because they take too long for you to read is exactly analogous to saying that since you only like to read short stories, nobody should write a long novel because it would take too much time for you to read.

eli_gottlieb · on April 4, 2015

Academic writing is dense. When you shorten the page limit, you force authors to make it denser, to explain less and rely more on the readers' previous knowledge. As a writer, I'm often surprised how easy it is fill pages and pages, and as a reader, I'd rather read clear expository prose than wizardly hand-waving.

shas3 · on April 3, 2015

There's a place for brevity, but not in this paper. You are assuming somehow that there is redundancy in the paper. There doesn't seem to be, at first glance. The scope of the paper is large enough to merit the length.

therobot24 · on April 3, 2015

I'm not assuming redundancy, i'm assuming what every academic does - that the author(s) should be able to concisely and accurately present their work. There's plenty of examples where complex ideas and derivations can make the journal page limit (which does vary, but i think 16 pages is good), no reason this one can't as well.

tacos · on April 3, 2015

I read it, started to doze off, forced myself to dig in, and was left with the feeling they just redefined the Constant of Integration as part of their own mystery lingo. It shouldn't be like this. Sheesh.

GFK_of_xmaspast · on April 4, 2015

Maybe you should stick to twitter.

matsiyatzy · on April 3, 2015

Does anyone know if there is any code available for this paper? Would be extremely interesting to see comparisons of the EM-algorithm mentioned with regular SGD in terms of speed and precision.

mrdrozdov · on April 3, 2015

What is the hypothesis? How was the hypothesis tested?

disgruntledphd2 · on April 3, 2015

From my reading of the first twenty or so pages, it appears to be a theory of how neural networks model images (the running example). The authors claim that popular neural network architectures can be reduced to particular cases of their model.

If this model can provide an explanation for the small noises impacting NN performance on images (from karpathy.github.io, posted to HN earlier today) then that would be rocking.

Nonetheless, it does not appear to be an experimental paper, rather providing a mathematical theory of some particular classification problems.

Nvn · on April 3, 2015

> If this model can provide an explanation for the small noises impacting NN performance on images (from karpathy.github.io, posted to HN earlier today) then that would be rocking

This paper [0] does a pretty good job of giving an explanation for that.

[0] http://arxiv.org/abs/1412.6572

disgruntledphd2 · on April 3, 2015

Honestly, that abstract makes me more upset. If these are due to NN's nature as linear classifers, then we are all in trouble, given that almost everything useful is based off linear models. Given the title of the paper though, I should probably be more hopeful :)

darkmighty · on April 3, 2015

The adversarial noise issue is not so hard to understand, it's just costly to correct.

That link provides the explanation: if your classifier is not very regularized, then the classification regions are going to be close and irregular, s.t. a small vector may lead you from one to another. It's more of a geometrical fact f you think of classification regions in those spaces (of high dimension).

Guaranteeing a large minimum distance is hard (essentially why error correcting codes are pretty hard to encode/decode)

throwaway_bob · on April 4, 2015

the fact that random noise can be classified as strongly belonging to some class, and the fact that classification results can be unstable, is simply a result of the fact the input space is very high dimensional, and the output space is very small (say a few isolated points). That is, if you are discriminatively training a mapping from images in R^(224 x 224 x 3) to 1000 points (class labels), there is going to be a tremendous amount of instability in the inverse direction.