At risk of embarrassing the rest of the thread, please note that the paper was r...

LolWolf · on Feb 14, 2019

I don't know if it's at all embarrassing (though there are some HNers that definitely did not at least skim the paper, which is a whole other thing).

The examples they gave in the paper were all tiny: MNIST was the biggest with 784 features (Crossfit has <40, Forest has <60, Taxi has ~11), so this is an absurd comparison. You can get better results than all of the models presented, in much less time, using random kitchen sinks (also sometimes known as 'random features') [0].

Plus, it's a little weird to me that their polynomial regression "runs out of memory"—they're solving a least squares problem for which we have a ton of good algorithms for. More specifically, there's no need to generate the data all at once (e.g., why they mention the "polynomial matrix would be huge") much less run QR factorization on this super dense matrix... rather, you would do this in a streaming fashion (e.g., generate the points, one by one, and proceed via SGD, or a nice pre-conditioned version of it since we have a good approximation of the Hessian via some simple streaming dimensionality reduction).

At the risk of sounding a bit harsh, this paper reads like someone recently found out about least squares and tried a few of the bases until they realized that simple things do pretty well in small cases when compared to fancy models like NNs (which is something almost every researcher that I work with knows—though I don't work directly in ML).

---

[0] https://people.eecs.berkeley.edu/~brecht/papers/08.rah.rec.n... , perhaps a shorter, much simpler exposition can be found in 11.3 of here: https://web.stanford.edu/~boyd/vmls/ under "Random Features."

myWindoonn · on Feb 15, 2019

Indeed, there is the unfortunate fact that directly pointing out that a particular person has not read the paper is against HN's rules. Let us not violate the Prime Directive further.

If I understand the origin of this paper correctly, it is mainly the idea and work of an undergraduate student. From this perspective, several problems become more understandable. The limited computation done in support of the paper may have been done by a setup as meager as a lone student's laptop. The naïve view of machine learning as a field could be explained by undergraduate eagerness; indeed, perhaps they did "recently [find] out about least squares".

I agree with your thoughts on SGD. The authors did do PCA for some of their tests, but this entire setup should be buildable in a more efficient and incremental way.

I think, however, that the reason that this paper got the other three co-authors on it, despite its controversial nature, is that the core of the original idea is interesting and has yielded related fruit like UMAP [0] in the past. We should be more comfortable exploring it.

I like your book reference. It's informative and direct, without mumbo-jumbo. To accelerate the lookup for the next follower, start at p279.

[0] https://github.com/lmcinnes/umap

nonbel · on Feb 14, 2019

Can they get a polynomial regression to play mariokart from screenshots and button push data?

https://www.kevinhughes.ca/blog/tensor-kart