You're in the write direction but I think you don't have the right words to desc...

dxbydt · 2024-02-08T14:14:43 1707401683

> It's useful because if you have both and they are far apart you know there's a lot of variance

ummm. I would refrain from using nonparametric skew to make a comment about the magnitude of variance.

Essentially, the gap between mean and median will always be bounded by 1 sigma. The ratio abs(mean-median)/sigma is nonparametric skew. It is atmost 1, for any distribution( hence nonparametric, no distributional assumption required).

For unimodal distributions, especially symmetric unimodals, this ratio is 0. As the gap between the mean and median grows, the data gets more spread out, and the ratio captures that spread and consequent nonsymmetry. But you are using the value of this upper bound to make a comment about s^2. Which is very clever, but inaccurate. Say you standardize the rv and you have a nonsymmetric dist. Then mean 0, say median 100. Then stdev can be atmost 100, so variance can be atmost 10000. Which looks like “a lot of variance”. But is it really? Variance has a scaling problem, precisely why we take the square root, so the stdev remains in the scale of the mean. So at best one can say the stdev can be as big as the median. But that’s not very informative- because if the mean is -50 and median is +50, we are left with the same absolute gap of 100, so the same statement applies to the stdev even now.

I guess if I had to compare the variance of some sample X to another sample Y to make some claim that variance of X is much larger than Y, I would use a standard F test. Cooking up a test based on the gap between mean and median in a single sample seems somewhat shaky. It is very creative though, I grant you that.

godelski · 2024-02-08T20:58:14 1707425894

Perhaps I gave the statement too much strength, far more than intended. But I don't view any metric as anything more than a guide. The reason I use parametric skew in this way is explicitly for a quick and dirty interpretation of the data. Essentially trying to understand if I should take someone's data at face value or not. It's about being a flag. The reason is because when going about the world in an every day fashion I am generally not going to have access to other data like variance (which if we did, we wouldn't need this hack) and can't really do an F-test on the fly. Usually you're presented with the mean and it can still be hard to find a median but it is usually more obtainable than the variance or any other information. So I get your concern and I think you are right for bringing it up because how I stated things could clearly be mistaken (I'll admit to that) but wanted to assure you that no strong decisions were being made using this. I only use it as a sniff test. I do think it helps to give people a bunch of different sniff tests because it is hard for us to navigate data and if you're this well versed I'm sure you have a similar frustration in how difficult it can be to make informed decisions. So what tools do we have to can set off red flags and help us not be deceived by those who wish to just throw numbers at us and say that this is the answer?

littlestymaar · 2024-02-08T07:31:13 1707377473

> why you should be suspicious when anyone discusses averages

I like to say that the average human being has one testicle and half a vagina, which is not very representative of anyone around.

> On that note, the median and average are always within one standard deviation of another

Oh really? That's cool.

godelski · 2024-02-08T08:40:35 1707381635

Haha yeah that is accurate. The right language is situational though haha. People are generally overconfident in their ability to mathematically describe things. There's a clique "all models are wrong" and like all cliques it is something everyone can repeat but not internalize lol

agos · 2024-02-08T09:36:16 1707384976

you should not forget the second part: all models are wrong, some are useful

littlestymaar · 2024-02-08T11:21:27 1707391287

Yes, but unfortunately the second part is usually employed by people who want to put under the rug the fact that their model is dubious.

All models are wrong, only some of them are useful, and only when handled with care.

godelski · 2024-02-08T18:05:29 1707415529

The second part is the obvious part that often doesn't need restating. Models can be incredibly powerful tools.