There is a professor who was at Wisconsin, Charles Manski, who developed partial identification, which uses tons of these decompositions.
Idea is let's say you have a binary survey question where 80% respond and 90% of them respond "yes". What can we say about population "yes" rate (assume sample size is huge for simplicity)?
Then 0 <= P(Yes | no response) <= 1, so 0.72 <= P(Yes) <= 0.92. This example is somewhat trivial but it's a useful technique for showing exactly how your assumptions map to inferences.
Confusing to me how Var[Y] can be computed only from terms that are conditional on X ... surely its marginal variance has components arising from variance outside its distribution that is conditional on X? (or is that a typo ....)
You can consider the tower rule for intuition on why this can be true:
E[X] = E[E[X|Y]]
Essentially, what's happening is that, in E[X|Y], we first "hold" the randomness of Y, then consider the expected value of X for every possible value of Y. Once this is determined, we then take the expectation over all values of Y, which integrates out its effects, leaving us with the expected value of X.
This rule is similar, and arises from the tower rule above.
Another set of notes that I refer to often are from ECE 830, also at UW Madison[0]. It was a great class that really represented the culmination of all the probability theory and signals classes I had taken over the years.
Agree, I typed up something similar (but less detailed) as a reference during my undergrad stats major. I'd expect a cookbook to have worked out examples of applications of these topics. But still looks very useful as a reference
This is a nice collection of definitions and key results, but it's not a cookbook. I think of a cookbook as a collection of useful, focused examples, demonstrating best practices, and listing caveats.
If you're interested in probability and statistics it's well worth your time to get more comfortable with the math.
There's a common mistake in thinking among programmers that there's a one-to-one mapping between math and code and that mathematical notation is just an annoying terse short hand.
As someone who spends a lot of time implementing mathematical ideas into code I can tell you this is not remotely true. Mathematics is dealing with a level of abstraction and thinking that is fundamentally distinct from the computational implementation of these.
A clear example of this is the Gamma function which appears all over those notes. It's an essential function for working deeply with statistics, you'll find it shows up just about everywhere if you look carefully enough. You can manipulate it mathematically to solve a range of problem.
However if you want to implement this from scratch in code, that is to understand how to compute the Gamma function, you're going to have to spend a lot of time studying numeric methods if you want to do more than robotically copy it from Numerical Recipes.
Similarly many of the integrals used in statistics can end up quite difficult to compute, but that difficulty doesn't impact their ease of use in a mathematical context. This is a common theme when working with applied math: you can do quite a lot of mathematical work on problems that you don't necessarily know how to compute yet. Once you solve your problem mathematically, then you can go on to solving how to actually compute the answer.
To add to this comment, it's hard to make any useful program if you don't have at least a clear conceptual understanding of what you're trying to do.
For example, perhaps you are trying to calculate a variance. But do you have a set of raw data from which you will estimate the variance? Or some summary statistics? Or do you already have a probability distribution from which you will compute the variance? How is it represented? Is that the probability distribution over the particular variable you want the variance for, or is it a related variable that needs to be transformed first somehow?
You don't necessarily need to know how to handle all the math by hand, but there's no avoiding the need for at least a clear idea of what you're doing and what the sticking points might be.
> it's hard to make any useful program if you don't have at least a clear conceptual understanding of what you're trying to do.
Completely - I've run into the exact issues you were talking about. Trying to really understand ML without some foundation in stats doesn't work well.
I was wondering about code examples, not as a copy/paste solution ala Stack Overflow, but to use as a Rosetta stone to aid in learning. I've seen something similar for physics equations and chunks of the Quantum wave formula and they were pretty enlightening. "Oh, that's what that means..."
I can follow the logic of a chunk of C pretty well, especially with decent variable names, but I would probably need a long time to catch-up on all the math, from both a syntax and conceptual perspective without some sort of crutch.
I realized this hard while optimizing signal processing code, integral over infinite spaces (what great example), it is game over for my level of skills, sadly, I was not able to even optimize code correctly without first writing optimized simplest examples possible and showing it to physicist (it is like I understood/felt way to simplify math but have no skill to actually do it), physicists quickly realized how he can rewrite math in more simpler way, the final result was code x30 faster in worst cases.
but the most recent version (0.2.7) is dated 2021 - and available at https://github.com/mavam/stat-cookbook/releases/download/0.2...
There exist "release notes" pages with the differences (at https://github.com/mavam/stat-cookbook/releases )