Probability and Statistics Cookbook (2011) [pdf]

mdp2021 · on June 6, 2022

Note that the linked PDF is a 2011 version (as is explicit),

but the most recent version (0.2.7) is dated 2021 - and available at https://github.com/mavam/stat-cookbook/releases/download/0.2...

There exist "release notes" pages with the differences (at https://github.com/mavam/stat-cookbook/releases )

Bo0kerDeWitt · on June 6, 2022

Nice, original LaTeX code is there too.

mturmon · on June 6, 2022

Actually quite good.

I've TA'd this class but it's surprising how many of these little facts can be helpful if you recall them at the right time. I was just reminded of:

  Var[Y] = E[Var[Y|X]] + Var[E[Y|X]]

and it unstuck me from a little puzzle.

More recent version at: http://statistics.zone

willdearden · on June 6, 2022

There is a professor who was at Wisconsin, Charles Manski, who developed partial identification, which uses tons of these decompositions.

Idea is let's say you have a binary survey question where 80% respond and 90% of them respond "yes". What can we say about population "yes" rate (assume sample size is huge for simplicity)?

P(Yes) = P(Yes | response) * P(response) + P(Yes | no response) * P(no response) = 0.9 * 0.8 + P(Yes | no response) * 0.2 = 0.72 + P(Yes | no response) * 0.2

Then 0 <= P(Yes | no response) <= 1, so 0.72 <= P(Yes) <= 0.92. This example is somewhat trivial but it's a useful technique for showing exactly how your assumptions map to inferences.

evandwight · on June 6, 2022

For those who are wondering how this equation is true:

https://en.wikipedia.org/wiki/Law_of_total_variance

zmmmmm · on June 7, 2022

Confusing to me how Var[Y] can be computed only from terms that are conditional on X ... surely its marginal variance has components arising from variance outside its distribution that is conditional on X? (or is that a typo ....)

addcninblue · on June 7, 2022

You can consider the tower rule for intuition on why this can be true:

    E[X] = E[E[X|Y]]

Essentially, what's happening is that, in E[X|Y], we first "hold" the randomness of Y, then consider the expected value of X for every possible value of Y. Once this is determined, we then take the expectation over all values of Y, which integrates out its effects, leaving us with the expected value of X.

This rule is similar, and arises from the tower rule above.

zmmmmm · on June 7, 2022

thanks! that kind of does make sense!

jarenmf · on June 6, 2022

Similar to the law of total expectation, easier for me to think of partitioning a weighted mean :)

kiernanmcgowan · on June 6, 2022

Another set of notes that I refer to often are from ECE 830, also at UW Madison[0]. It was a great class that really represented the culmination of all the probability theory and signals classes I had taken over the years.

[0] https://nowak.ece.wisc.edu/ece830/index.html

buzzdenver · on June 6, 2022

I would call this a cheat-sheet rather than a cookbook.

jmt_ · on June 6, 2022

Agree, I typed up something similar (but less detailed) as a reference during my undergrad stats major. I'd expect a cookbook to have worked out examples of applications of these topics. But still looks very useful as a reference

snicker7 · on June 6, 2022

There is a book titled “All of Statistics” if you’d like a whirlwind tour.

conformist · on June 6, 2022

https://www.stat.cmu.edu/~larry/all-of-statistics/index.html

cjohnson318 · on June 6, 2022

This is a nice collection of definitions and key results, but it's not a cookbook. I think of a cookbook as a collection of useful, focused examples, demonstrating best practices, and listing caveats.

russellbeattie · on June 6, 2022

I don't know math at all. I'd love a programmer version of this, with all the algorithms in code. Probably already exists in NumPy or something.

time_to_smile · on June 6, 2022

If you're interested in probability and statistics it's well worth your time to get more comfortable with the math.

There's a common mistake in thinking among programmers that there's a one-to-one mapping between math and code and that mathematical notation is just an annoying terse short hand.

As someone who spends a lot of time implementing mathematical ideas into code I can tell you this is not remotely true. Mathematics is dealing with a level of abstraction and thinking that is fundamentally distinct from the computational implementation of these.

A clear example of this is the Gamma function which appears all over those notes. It's an essential function for working deeply with statistics, you'll find it shows up just about everywhere if you look carefully enough. You can manipulate it mathematically to solve a range of problem.

However if you want to implement this from scratch in code, that is to understand how to compute the Gamma function, you're going to have to spend a lot of time studying numeric methods if you want to do more than robotically copy it from Numerical Recipes.

Similarly many of the integrals used in statistics can end up quite difficult to compute, but that difficulty doesn't impact their ease of use in a mathematical context. This is a common theme when working with applied math: you can do quite a lot of mathematical work on problems that you don't necessarily know how to compute yet. Once you solve your problem mathematically, then you can go on to solving how to actually compute the answer.

jbay808 · on June 6, 2022

To add to this comment, it's hard to make any useful program if you don't have at least a clear conceptual understanding of what you're trying to do.

For example, perhaps you are trying to calculate a variance. But do you have a set of raw data from which you will estimate the variance? Or some summary statistics? Or do you already have a probability distribution from which you will compute the variance? How is it represented? Is that the probability distribution over the particular variable you want the variance for, or is it a related variable that needs to be transformed first somehow?

You don't necessarily need to know how to handle all the math by hand, but there's no avoiding the need for at least a clear idea of what you're doing and what the sticking points might be.

russellbeattie · on June 7, 2022

> it's hard to make any useful program if you don't have at least a clear conceptual understanding of what you're trying to do.

Completely - I've run into the exact issues you were talking about. Trying to really understand ML without some foundation in stats doesn't work well.

I was wondering about code examples, not as a copy/paste solution ala Stack Overflow, but to use as a Rosetta stone to aid in learning. I've seen something similar for physics equations and chunks of the Quantum wave formula and they were pretty enlightening. "Oh, that's what that means..."

I can follow the logic of a chunk of C pretty well, especially with decent variable names, but I would probably need a long time to catch-up on all the math, from both a syntax and conceptual perspective without some sort of crutch.

neatze · on June 7, 2022

I realized this hard while optimizing signal processing code, integral over infinite spaces (what great example), it is game over for my level of skills, sadly, I was not able to even optimize code correctly without first writing optimized simplest examples possible and showing it to physicist (it is like I understood/felt way to simplify math but have no skill to actually do it), physicists quickly realized how he can rewrite math in more simpler way, the final result was code x30 faster in worst cases.

tomcatfish · on June 7, 2022

I saw a page once that was something like "7 Statistics Rules of Thumb for Programmers" that included some neat stats like

* To get N trading cards which each have 1/N probability, buy something like either 2.7N or 3N cards

Can never find it though.

jenny91 · on June 6, 2022

This actually seems pretty good and has great coverage!