> For all practical purposes, you can write causality like
> p(x|y) = 1 and time(y) < time(x)
This isn't true at all. For a counterexample, x and y may both have a common cause.
Pearl's work is on this and extends the language to talk about p(y | do(x)), meaning that you talk about what happens to y when you take some hypothetical intervention to change x. Causation framed in terms of intervention talks about "what if it had been this instead of that?" and is probably the most common model of causation.
For more info look up the rubin causal model, the potential outcomes framework, and pearls's "do" notation.
This is called Granger-causality (and work on it led to a Nobel prize, so it's important and useful)... it's stronger than just correlation, and way easier to determine than true causation, but it's possible that z causes both x and y, and z's effect on x is just more delayed than its effect on y.
But it at least rules out x causing y, which is something.
> but it's possible that z causes both x and y, and z's effect on x is just more delayed than its effect on y.
This is in fact the case with the barometer falling before a storm. Both the falling barometer and the subsequent rain and wind of a storm are consequences of an uneven distribution of heat and moisture in the atmosphere approaching equilibrium under the constraints of Earth's gravity and Coriolis force.
Still doesn't work. Suppose I flip a coin and write the result in two places. I write it on sheet y then sheet x. We have that X == Y, so p(x|y) = 1, p(x|!y) = 0, and time(y) < time(x), but neither causes the other. I can write more later if you have interest, but I gotta run.
For all practical purposes, you can write causality like
p(x|y) = 1 and time(y) < time(x)
I.e. causality is just when one event always happens after another event. Any additional requirements for causality are basically philosophy.
But typical ML systems don't construct networks of causal relations, is basically what he's getting at from my reading of it