A lot of this is done in discrete math. You know, the actual probability is defined by this integral, but there is no closed form solution to the integral, so we do sums to find the approximate answer. Anyone can understand sums. And, it's probabilities, so the sums must equal one. Not that hard, right ;)
It sure helps to understand the integral equations, especially if you want to read the original literature. But realistically you are going to need to understand summing, normalizing, algorithms for clustering, and so on. You probably don't want to write your own numerical code anyway; someone else did it, and they handled all the edge cases that a naive implementation misses.
You can find PDFs of the James, Witten, Hastie, Tibshirani book "An Introduction to Statistical Learning" [1]. Scroll on through - there is nothing intimidating math wise. All the heavy lifting is left to R.
I don't really know what you're looking for. If it's a replacement for that Coursera ML class in Python, then I don't think there really is one. The basic tenets of ML aren't going to change depending on your language, though.
Thanks a lot for this! I didn't realize probabilities would be so important but I've been working with conditional expectations (not sure if it is relevant in machine learning) but it was an eye opener.
Another great introduction are the descriptive and inferential statistics courses on Udacity!
Conditional expectations are an important part of regression and in other scenarios where you might want to adjust a parameter estimate ("for every unit increase in x we get this much of a difference in y") for confounders. Generally, in machine learning, parameter estimates are not the (exclusive) basis for prediction, instead you put data in and a prediction comes out and what's in between is somewhat of a black box.
Well, technically we do know what's in the black box of course, it's just that for many methods it's not easy to summarize because there's so much happening under the hood. Leo Breiman (who invented random forests) gives some examples of how to do it, though: https://projecteuclid.org/euclid.ss/1009213726
It sure helps to understand the integral equations, especially if you want to read the original literature. But realistically you are going to need to understand summing, normalizing, algorithms for clustering, and so on. You probably don't want to write your own numerical code anyway; someone else did it, and they handled all the edge cases that a naive implementation misses.
You can find PDFs of the James, Witten, Hastie, Tibshirani book "An Introduction to Statistical Learning" [1]. Scroll on through - there is nothing intimidating math wise. All the heavy lifting is left to R.
Jump in, the water is fine!
[1] http://web.stanford.edu/~hastie/pub.htm