A Cheat Sheet on Probability

mavam · on Jan 26, 2017

Shameless plug of my own attempt to summarize key concepts in probability theory and statistics: https://news.ycombinator.com/item?id=13060021

curiousgal · on Jan 26, 2017

That is precisely the kind of content I expected a cheat sheet on Probability to contain. No offence to OP but if you need a cheat sheet for basic concepts like the ones included in the original post then you'll find it really hard to do grasp any "useful" Probability concepts.

max_ · on Jan 26, 2017

>No offence to OP but if you need a cheat sheet for basic concepts like the ones included in the original post then you'll find it really hard to do grasp any "useful" Probability concepts.

FYI, "Basic" varies from person to person

curiousgal · on Jan 26, 2017

True but when compared to Measure Theory (the foundation of Probability Theory) almost everyone can agree that those concept are simpler, i.e. basic.

max_ · on Jan 26, 2017

"almost everyone" is not everyone

mjfl · on Jan 26, 2017

That's really cool, how did you go about writing that? Like, do you have a deep background in stats or did you just pick up some books?

mavam · on Jan 26, 2017

When I took stats classes at UC Berkeley, we covered a lot of ground quickly. The courses (STAT 200A and 200B) where breadth-first in the sense that you just got a short amount of exposure to each concept. But since many concepts build on top of each other, I was needed a way to make sure I fully understood the previous step. I started with writing down the most basic axioms that most of the rest would found on. (Not really in a fundamental sense, e.g., I did not go into sigma algebras and measure theory in more detail, but in a practical sense). The sheer amount of identities and distribution relationships were initially hard to remember, so those I jotted down next. Then more advanced concepts built on the simpler ones, and the process repeated itself.

ronald_raygun · on Jan 26, 2017

Wow, this is a really nice sheet. I did a stats master's and this seems like it covers 90%+ of what I learned. A++

thomasahle · on Jan 26, 2017

> Disjoint probability (weather and coins)

I think the weather is probably independent from a coin, rather than disjoint.

Disjoint events would be something like "the coin lands heads up" and "the coin lands tails up"

foxrob92 · on Jan 26, 2017

Holy shit, that visual display of Bayes Theorem makes so much sense. Writing it as P(A intersect B)/P(B) and having the Venn diagram just made it all click in my head.

thyrsus · on Jan 26, 2017

I follow the arithmetic behind the birthday paradox, but that doesn't help my intuition about similar problems: can someone point me to heuristics for dealing with very large numbers of events in a very, very large universe of possibilities? E.g., sha1 collisions[0] on blobs in Github?

Suppose I had N initially non-communicating instances of github. Would a merge of all those repositories be more likely to have a sha1 collision if each used the full 160 bits for their blobs, or if each repository assigned a random log(N)+e bit prefix to itself, using only 160-(log(N)+e) bits for its own blobs, but incurring a possibility of collision within the log(N)+e bit prefixes? And, of course, one wants to know the increased likelihood of internal collisions now that we're only using 160-(log(N)+e) bits for the local identifiers (which of course depends on the number of internal distinct blobs).

[0] A collision is two distinct blobs with the same identifier; two blobs containing the same bits having the same identifier is a feature, not a collision.

tchalla · on Jan 26, 2017

Off topic but I just read a comment on the link

> These examples remind me of a paper I came across a few years ago using probability to show why the author would never have a girlfriend. It's a fun read and can be found at https://logological.org/girlfriend if interested.

It actually refers to the article on the FP. Nice co-incidence!

https://news.ycombinator.com/item?id=13490063

MLWiDA · on Jan 26, 2017

I like to throw in the first chapter of "Statistical Inference for everyone" for a little bit more explanation for these rules (https://github.com/bblais/Statistical-Inference-for-Everyone).

It looks to me like an easier approach to the heavy weight "Probability Theory: The Logic of Science" by E.T. Jaynes

tossaway322 · on Jan 29, 2017

The cheat sheet has at least one error. Using set notation, A="long hair", and B="woman", then the phrase

"not long hair and not woman"

does not correspond to the expression

P(complement(A intersect B))

shown on the cheat sheet.

It instead corresponds to

P(complement(A) intersect complement(B))

Which is equivalent to

P(complement(A union B))

unklefolk · on Jan 26, 2017

This looks awfully like set theory and SQL. I wonder what the SQL equivalent for each concept would look like. For example (picking the easy one!) marginal probability for long hair would be:

SELECT SUM(CASE WHEN LongHair=1 THEN 1 ELSE 0 END) / SUM(1.0) FROM B

ludicast · on Jan 26, 2017

Tangential, but could anybody recommend a proof-heavy first-principles book for probability/statistics?

I know stats etc. from an applications perspective (excel/sas/r), but 100% sure of the theoretical underpinnings behind much of it.

dagw · on Jan 26, 2017

On the probability front I recommend:

- Probability and Random Processes by Grimmett

- Probability with Martingales by David Williams

Grimmett is probably a better bet to start with since it doesn't expect quite as much prior math knowledge and covers a lot more topics. The Williams book is shorter, denser and doesn't cover much in the way of applications, but gives a really good theoretical underpinning of how probability theorists think about probability (ie in terms of Lebesgue measures and Sigma algebras).

edit: There is also a companion book to Probability and Random Processes called One Thousand Exercises in Probability which contains an interesting selections of problems and solutions that will let you apply the theories taught in the main book.

ludicast · on Jan 26, 2017

Thanks, the first book (along with the exercises) seems like a great approach. Just peeked at the contents online, and it's pretty close to what I'm looking for.

Much appreciated!

dagw · on Jan 26, 2017

It's a good book.

Williams is probably better for mathematicians coming at probability with an already solid mathematical understanding rather than practitioners who want to try to understand the underlying theory.

MayeulC · on Jan 26, 2017

I also liked the article linked in one of the comments: https://logological.org/girlfriend

Edit: Actually, it also popped on the front page.