Hacker News new | past | comments | ask | show | jobs | submit login

From the article:

> NB: Correlated does not mean linearly correlated

> For simplicity, I have used linear correlations in all the example R code. In real life, however, the pattern of correlation/association/mutual information we should expect depends entirely on the functional form of the causal relationships involved.




The standard mathematical definition of correlation means linear correlation. If you are talking about non-independence, it would be better to use that language. This early mistake made me think the author is not really an expert.


That seems a bit harsh. People can independently become experts without being familiar with the terminology used by existing experts. Further, if intended for a non-expert audience, it may even be deliberate to loosen definitions of terms used by experts, and being precise by leaving a note about that instead, which apparently is exactly what this author did.


It's much better to use vocabulary consistently with what everyone else does in the field. Then you don't need to add footnotes correcting yourself. And if you are not familiar with what everyone else means by correlation, you're very unlikely to be an expert. This is not like that Indian mathematician who reinvented huge chunks of mathematics.


> It's much better to use vocabulary consistently with what everyone else does in the field.

Fine, but...

> And if you are not familiar with what everyone else means by correlation, you're very unlikely to be an expert.

Perhaps, but this is not relevant. If there's a problem with this work, then that problem can be criticized directly. There is no need, and it is not useful, to infer "expertise" by indirect means.


What is an appropriate measure of (in)dependence though, if not Pearson correlation? Such that you feed a scatter plot into the formula for this measure, and if the measure returns 0 dependence, the variables are independent.


it's a tough problem.

there are various schemes for estimating mutual information from samples. if you do that and mutual information is very close to zero, then I guess you can claim the two rvs are independent. But these estimators are pretty noisy and also often computationally frustrating (the ones I'm familiar with require doing a bunch of nearest-neighbor search between all the points).

I agree with the OP that it's better to say "non-independence" and avoid confusion, at the same time, I disagree that linear correlation is actually the standard definition. In many fields, especially those where nobody ever expects linear relationships, it is not and everybody uses "correlated" to mean "not independent".


Yeah. It would be simpler to talk about causal graphs if the nodes represented only events instead of arbitrary variables, because independence between events is much simpler to determine: X and Y are independent iff P(X) * P(Y) = P(X and Y). For events there also exists a measure of dependence: The so-called odds ratio. It is not influenced by the marginal probabilities, unlike Pearson correlation (called "phi coefficient" for events) or pointwise mutual information. Of course in practice events are usually not a possible simplification.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: