Hacker News new | past | comments | ask | show | jobs | submit login

"Missing confidence intervals" is my (IMO) legit pet peeve.

People fit frighteningly complicated models with millions-to-billions of parameters. They throw in hairy regularization schemes justified by fancy math.

When it comes time to evaluate the output, however: "Our model is better because this one number is bigger than their number [once, on this one test set]."




As an undergrad, my circle had this drinking game called "Big Number." It came about because two of the biggest wastoids were the last people awake in the wee hours of Sunday, and one of them said to the other, "I'm too drunk to deal. Let's just roll dice, and the one with the biggest number drinks."

Of course, over the years, the game developed dozens of other rules.


Cool Story. Any details on more of the rules?


Confidence intervals aren't even that informative. Like using boxplots when you could inform the viewer so much more with sina plots [1]. Why not show me the whole posterior probability distribution (perhaps helpfully marking the 95% highest density region [2])? Or if you don't have a distribution, show me the 95%, 97.5% and 99.5% intervals.

[1] https://clauswilke.com/dataviz/boxplots-violins.html

[2] https://www.sciencedirect.com/topics/mathematics/highest-den...


Sure, there are better ways to actually do it; I was just riffing off the bit in the article.

It is super weird that an field devoted to doing inference somehow just...doesn't when it comes to evaluating their/our own work.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: