That is precisely the kind of content I expected a cheat sheet on Probability to contain. No offence to OP but if you need a cheat sheet for basic concepts like the ones included in the original post then you'll find it really hard to do grasp any "useful" Probability concepts.
>No offence to OP but if you need a cheat sheet for basic concepts like the ones included in the original post then you'll find it really hard to do grasp any "useful" Probability concepts.
When I took stats classes at UC Berkeley, we covered a lot of ground quickly. The courses (STAT 200A and 200B) where breadth-first in the sense that you just got a short amount of exposure to each concept. But since many concepts build on top of each other, I was needed a way to make sure I fully understood the previous step. I started with writing down the most basic axioms that most of the rest would found on. (Not really in a fundamental sense, e.g., I did not go into sigma algebras and measure theory in more detail, but in a practical sense). The sheer amount of identities and distribution relationships were initially hard to remember, so those I jotted down next. Then more advanced concepts built on the simpler ones, and the process repeated itself.
Holy shit, that visual display of Bayes Theorem makes so much sense. Writing it as P(A intersect B)/P(B) and having the Venn diagram just made it all click in my head.
I follow the arithmetic behind the birthday paradox, but that doesn't help my intuition about similar problems: can someone point me to heuristics for dealing with very large numbers of events in a very, very large universe of possibilities? E.g., sha1 collisions[0] on blobs in Github?
Suppose I had N initially non-communicating instances of github. Would a merge of all those repositories be more likely to have a sha1 collision if each used the full 160 bits for their blobs, or if each repository assigned a random log(N)+e bit prefix to itself, using only 160-(log(N)+e) bits for its own blobs, but incurring a possibility of collision within the log(N)+e bit prefixes? And, of course, one wants to know the increased likelihood of internal collisions now that we're only using 160-(log(N)+e) bits for the local identifiers (which of course depends on the number of internal distinct blobs).
[0] A collision is two distinct blobs with the same identifier; two blobs containing the same bits having the same identifier is a feature, not a collision.
> These examples remind me of a paper I came across a few years ago using probability to show why the author would never have a girlfriend. It's a fun read and can be found at https://logological.org/girlfriend if interested.
It actually refers to the article on the FP. Nice co-incidence!
This looks awfully like set theory and SQL. I wonder what the SQL equivalent for each concept would look like. For example (picking the easy one!) marginal probability for long hair would be:
SELECT SUM(CASE WHEN LongHair=1 THEN 1 ELSE 0 END) / SUM(1.0)
FROM B
Grimmett is probably a better bet to start with since it doesn't expect quite as much prior math knowledge and covers a lot more topics. The Williams book is shorter, denser and doesn't cover much in the way of applications, but gives a really good theoretical underpinning of how probability theorists think about probability (ie in terms of Lebesgue measures and Sigma algebras).
edit: There is also a companion book to Probability and Random Processes called One Thousand Exercises in Probability which contains an interesting selections of problems and solutions that will let you apply the theories taught in the main book.
Thanks, the first book (along with the exercises) seems like a great approach. Just peeked at the contents online, and it's pretty close to what I'm looking for.
Williams is probably better for mathematicians coming at probability with an already solid mathematical understanding rather than practitioners who want to try to understand the underlying theory.