Hacker News new | past | comments | ask | show | jobs | submit login
Anscombe's Quartet (wikipedia.org)
89 points by tosh on Nov 6, 2022 | hide | past | favorite | 10 comments



Also check the “Datasaurus Dozen”, with animations:

https://www.autodesk.com/research/publications/same-stats-di...


They're awesome, never heard of those. Going to add at the end of my next stats-oriented slide deck to sow fear and loathing in mere mortals.


They're the single best example of why bar charts need to die that I think I've ever encountered. Particularly in biology, where you get two bars with maybe two error bars between them and the inevitable "*" – at best, something that conveys four independent numbers, badly; at worst, actively misleading. I'm very much with Tufte on the "data-ink ratio" being a good metric for a plot: violin plots have a lot of data and surprisingly little ink.


This really depends on what you're trying to convey. I think bar charts with (mostly non-overlapping) error bars in the appropriate context can be quite useful.


Also remember good old Pearson and his spurious correlations.

If you have x/z and y/z and you plot 'em out do not be surprised if there is a corrrelation.

https://en.wikipedia.org/wiki/Spurious_correlation_of_ratios

For examples see chocolate and nobel prizes ;o)


One of the funniest assignments I got in university was a recent one from a finance course: The prof had us generate a bunch of random company characteristics and yearly returns, then fish for correlations and come up with fake newspaper-style stories about why these "effects" we found made sense.


Mean, variance, and linear regression lines are all the similar for the datasets that look very different when graphed.

Probably a great statistics book that covers this but it’s also in books that teach machine learning. For example, this book covers this and many other fundamentals:

https://probml.github.io/pml-book/book1.html

See section 2.2.6: Limitations of summary statistics


British comedy even shows up in their serious and influential statistics work.


Data scientists: what are your favorite methods for detecting this behavior in high-dimensional data (that can't be adequately displayed graphically)?


I learned about this sneaking a science stats subject into my arts degree. Fascinating.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: