Hacker News new | past | comments | ask | show | jobs | submit login

I haven't read the paper, but how can that be true when the polynomials produced by polynomial regression always converge at the "edges" (taking x far left or far right) to either positive or negative infinity (unless the polynomial is a constant), while DNNs can be built to have a sigmoid (or gaussian) function at the output, which doesn't have the same problem (it constrains the output to a range)?

Polynomials (without "activation functions" at the end) make for poor classifiers.

(This is a practical problem I've actually had when trying to replace DNNs with polynomial regression.)




It's not true. They make their main claim as follows: "For general activation functions and implementations, we can at least say that the function is at close to a polynomial, by appealing to the famous Stone-Weierstrass Theorem [Katznelson and Rudin(1961)], which states that any continuous function on a compact set can be approximated uniformly by polynomials.5 In any event, for practical purposes here, we see that most activation functions can be approximated by a polynomial. Then apply the same argument as above, which implies"

Note that it's only true on a compact set, implying bounded set, implying it will not be valid out to infinity, only on some bounded, closed set.


Ok, but then that makes this a purely theoretical paper about something already pretty well known. Almost all interesting problems that DNNs are solving nowdays are some kind of classification problem (or with finite discrete outputs), which polynomials are really unsuited for (unless there are an infinite number of terms).




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: