From the article: *> NB: Correlated does not mean linearly correlated* *> For si...

dash2 · 2024-08-19T15:40:48 1724082048

The standard mathematical definition of correlation means linear correlation. If you are talking about non-independence, it would be better to use that language. This early mistake made me think the author is not really an expert.

rlpb · 2024-08-19T16:06:08 1724083568

That seems a bit harsh. People can independently become experts without being familiar with the terminology used by existing experts. Further, if intended for a non-expert audience, it may even be deliberate to loosen definitions of terms used by experts, and being precise by leaving a note about that instead, which apparently is exactly what this author did.

dash2 · 2024-08-19T16:14:08 1724084048

It's much better to use vocabulary consistently with what everyone else does in the field. Then you don't need to add footnotes correcting yourself. And if you are not familiar with what everyone else means by correlation, you're very unlikely to be an expert. This is not like that Indian mathematician who reinvented huge chunks of mathematics.

rlpb · 2024-08-19T16:33:12 1724085192

> It's much better to use vocabulary consistently with what everyone else does in the field.

Fine, but...

> And if you are not familiar with what everyone else means by correlation, you're very unlikely to be an expert.

Perhaps, but this is not relevant. If there's a problem with this work, then that problem can be criticized directly. There is no need, and it is not useful, to infer "expertise" by indirect means.

cubefox · 2024-08-19T16:04:22 1724083462

What is an appropriate measure of (in)dependence though, if not Pearson correlation? Such that you feed a scatter plot into the formula for this measure, and if the measure returns 0 dependence, the variables are independent.

currymj · 2024-08-19T16:11:47 1724083907

it's a tough problem.

there are various schemes for estimating mutual information from samples. if you do that and mutual information is very close to zero, then I guess you can claim the two rvs are independent. But these estimators are pretty noisy and also often computationally frustrating (the ones I'm familiar with require doing a bunch of nearest-neighbor search between all the points).

I agree with the OP that it's better to say "non-independence" and avoid confusion, at the same time, I disagree that linear correlation is actually the standard definition. In many fields, especially those where nobody ever expects linear relationships, it is not and everybody uses "correlated" to mean "not independent".

cubefox · 2024-08-19T18:32:06 1724092326

Yeah. It would be simpler to talk about causal graphs if the nodes represented only events instead of arbitrary variables, because independence between events is much simpler to determine: X and Y are independent iff P(X) * P(Y) = P(X and Y). For events there also exists a measure of dependence: The so-called odds ratio. It is not influenced by the marginal probabilities, unlike Pearson correlation (called "phi coefficient" for events) or pointwise mutual information. Of course in practice events are usually not a possible simplification.