I am under the impression that to learn statistics one must first have a working knowledge of probability theory which rests upon grad level math analysis. Can machine learning be studied without any of that?
A lot of this is done in discrete math. You know, the actual probability is defined by this integral, but there is no closed form solution to the integral, so we do sums to find the approximate answer. Anyone can understand sums. And, it's probabilities, so the sums must equal one. Not that hard, right ;)
It sure helps to understand the integral equations, especially if you want to read the original literature. But realistically you are going to need to understand summing, normalizing, algorithms for clustering, and so on. You probably don't want to write your own numerical code anyway; someone else did it, and they handled all the edge cases that a naive implementation misses.
You can find PDFs of the James, Witten, Hastie, Tibshirani book "An Introduction to Statistical Learning" [1]. Scroll on through - there is nothing intimidating math wise. All the heavy lifting is left to R.
I don't really know what you're looking for. If it's a replacement for that Coursera ML class in Python, then I don't think there really is one. The basic tenets of ML aren't going to change depending on your language, though.
Thanks a lot for this! I didn't realize probabilities would be so important but I've been working with conditional expectations (not sure if it is relevant in machine learning) but it was an eye opener.
Another great introduction are the descriptive and inferential statistics courses on Udacity!
Conditional expectations are an important part of regression and in other scenarios where you might want to adjust a parameter estimate ("for every unit increase in x we get this much of a difference in y") for confounders. Generally, in machine learning, parameter estimates are not the (exclusive) basis for prediction, instead you put data in and a prediction comes out and what's in between is somewhat of a black box.
Well, technically we do know what's in the black box of course, it's just that for many methods it's not easy to summarize because there's so much happening under the hood. Leo Breiman (who invented random forests) gives some examples of how to do it, though: https://projecteuclid.org/euclid.ss/1009213726
Like most fields, it depends on your definition of "studied." If you want to push the envelope in theoretical non-applied research, you're going to want to learn analysis & measure theoretic probability theory. If you want to apply existing techniques, read (well-written) papers and code up the algorithms you find there, you can get away with undergraduate-level linear algebra & probability knowledge - Bayes' rule, expectations, independence, the general ability to think about random variables (and matrices thereof) as values that can be transformed and combined. And of course, you can fire up a classifier in SciPy without knowing any of this at all. But that's stretching the definition of "studied" quite a bit!
I personally went into a graduate-level probabilistic machine learning course with probability knowledge consisting of an undergraduate course that followed Ross http://www.amazon.com/Introduction-Probability-Models-Tenth-... - so there's certainly no need to have been a math major. But if you've never dealt with random variables whatsoever, you'll hit a wall following research from the last 20 years.
There is applied machine learning (using machine learning to solve business problems) and theoretical machine learning (Optimization bounds, proofs, algorithm design).
With applied machine learning it is certainly possible to quickly get a working knowledge without too much reliance on statistics or difficult theory. You can compare this a bit with using a sorting function without knowing exactly how it works (but you know how fast it is and when to use it).
If you have an engineering background, take a look at the wide array of high-quality ML code and tools. Study trendy and powerful tools like XGBoost.
What do you mean, grad level math analysis? Much of probability theory can be learned with basic multivariate calculus. (Perhaps there's a terminology misunderstanding here - when I see "grad level" I think "grad school," ie masters/phd). Certainly basic probability theory is a plus.
I agree with many of the responses here, that Math. Analysis (epsilon-delta proofs, continuity, etc.) is not strictly necessary for statistics. But...it certainly will help.
The problem with dumping the measure-theoretic probability is that you won't really know what a random variable is. It has a definition (a measurable function into the reals), and without that, you will have a tendency to think of it as "a box that produces something random when you look into it". This will limit your ability to understand papers, and will make you insecure in talking to people.
Besides "random variable", other common notions will also be hard to understand without measure-theoretic probability, like "almost surely", convergence concepts, the difference between the SLLN and WLLN, etc.
The problem with dumping analysis is that you will not know some basic things like what a continuous function is. What is everywhere continuous? What is a C1 function? And again, you will have a hard time reading and speaking.
For what it's worth, I found analysis to be not that fun, but measure-theoretic probability to be really a fun, tight, theory. It was enjoyable to learn.
Measure theory being necessary to statistics is rather contentious; a better discussion is on Andrew Gelman's blog [1].
My school's PhD stats program does require real analysis before the prelims, but for most intents and purposes, 'multi' and 'linal' (as the cool kids say) should be sufficient for machine learning from a comp sci perspective.
I haven't fully worked through ESLR (Hastie and Tibsharini's advanced version of ISLR posted above) but the majority of the math there is linear algebra with some differential equations and calculus thrown in. I've heard Harvard Stat 210 and Berkeley Stat 205A/B cited as good examples of mathematical stat classes - if you're seriously interested maybe take a look at those syllabi.
Elementary (ie: math-speak for "undergrad-level") probability theory is quite accessible to someone with only a computer scientist's math classes. You really don't need real analysis until you start reading research papers on probability and they drop down into measure theory for this-and-that.
i would say you can can certainly get away with applying machine learning techniques without knowledge of probability theory, but if you want to do stuff like compare models, compare results, determine accuracy of your model, etc., you are going to quickly have to dive into basic statistics (bayes + frequentist)
Some nice courses there, also check out Dan Cremer's lectures on variational methods for computer vision if you're interested in that sort of thing. There's also a nice series on computer vision for special effects.
For linear algebra, I like the "No BS guide to linear algebra" (https://gumroad.com/l/noBSLA) which also includes a high school math refresher for people who need it (I did).
For probability, "Probability Demystified" is a good basic intro.
For statistics, I would really recommend Allen Downey's Think Stats (http://greenteapress.com/thinkstats2/index.html), especially if you're coming from a programming background. Most introductions to statistics focus heavily on the mathematics needed to enable certain analytical approximations to difficult probabilistic calculations (e.g. the t-test), whereas Think Stats just bites that bullet and focuses on simulation / brute force so you can spend more time on the actual fundamental theory behind statistics.
Depends on how basic you're imagining. Khan Academy [0] is a fairly well-regarded free resource for high-school and undergraduate level mathematics video lectures. They have probability and statistics as well as linear algebra courses.
If you prefer textbooks, I have heard good things about "Linear Algebra Done Right," [1] but I would not recommend it unless you are "math literate" at an undergraduate level already.
For linear algebra check out Prof. Gilbert Strand's course on MIT OCW. He's great at explaining the material and the course resources are comprehensive.
The videos on computervisiontalks.com are exactly the same as videos on youtube because the site is pulling these videos from youtube. The post points to the spring 2015 lectures. You are pointing to earlier lectures in 2014,2013
I like his teaching style, but it seems some of the lecture videos (1.3, for example) are cut off - very frustrating! For anyone watching nonetheless, I recommend going into YouTube and changing the speed to 1.5x.
The lectures on computervisiontalks are directly being taken from youtube (but tags, navigation, bookmarking and in-video search capability is added). The lecture 1.3 (for spring 2015 class) is exactly of the same length. However on youtube, the lectures for machine learning class 2013 (also by Alex Smola) are available which are of a different length.
I like Smola's ML book, and it's great to see a full-depth ML course online, I'll certainly watch some videos. Other than that, the audio quality could be better.
I took the course (Alex Smola's 10-701, Spring '15) in the classroom. Personally I don't like his lecture style -- too vague, too many assumptions, too much reliance on jargon he hasn't already explained. YMMV.