Hacker News new | past | comments | ask | show | jobs | submit login

> I'm wondering what these "solid reasons" could be?

For the classic assumptions for regression, we know that a linear function exists, our data has additive errors that have mean zero, constant variance, and are Gaussian.

Then we can get an F ratio test on the regression, t-tests on the coefficients, and confidence intervals on the predicted values. This is just the standard stuff in the classic derivations in several of the books I listed.

And with some weaker assumptions, there are still some such results we can get.

This is just for regression.

There's a lot more in applied statistics, that is, where we make some assumptions that seem accurate enough in practice, get some theorems, and benefit from the conclusions of the theorems.

For an example, also in this tread I wrote about estimating how long some submarines would last. The assumptions were explicit and, on a nice day, somewhat credible. If swallow the assumptions, then have to take the conclusions seriously.

I've done other problems in applied statistics -- have a problem, collect some data, see what assumptions can make, from the assumptions, have some theorems, from the theorems have some conclusions powerful for the problem. If swallow the assumptions, trust the software, ..., then have to take the conclusions seriously. So, get to check the software and argue about the assumptions.

For machine learning as in the Bloomberg course, have some training data and some test, validation data. Fit to the training data. Check with the test data. Assume that the real world situation doesn't change, apply the model, and smile on the way to the bank.

I guess, okay when it works. But:

(A) how many valuable real applications are there? I.e., regression has been around with solid software for decades, and I've yet to see the yachts of the regression practitioners -- maybe there are some now but what is new might mostly be hype. Or the problem was that the SAS founders were not very good at sales? Same for SPSS (now pushed by IBM), Mathlab, R, etc.?

(B) Wouldn't we also want confidence intervals on the predictions? Okay, maybe there are some resampling/bootstrap ways to get those, maybe.

(C) There's more to applied statistics than empirical curve fitting, and commonly there we can have "solid reasons". For more on applied statistics, what I've done is only a drop in the bucket, ocean, but there research libraries are awash in examples.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: