I should say this is not my wing of the world since in social science typically theory precedes estimation and there would be a strong disciplinary norm against "I have no idea what causes what". So I don't actually use this stuff. That being said, I have played with a few of the packages and read a few pieces on causal discovery.
Jonas Peters et al. - Elements of Causal Inference is a textbook that covers a little bit of what they called "learning cause-effect models". For algorithms, check SGS (Spirtes-Glymour-Scheines) and PC (Peter Spirtes and Clark Glymour). I believe both these algorithms are implemented in R in the package `pcalg`. There's another R package on BioConductor that implements them too, but I'm far enough afield from biostats I don't remember the name or have any notes I can find.
Some recent cites of note: Peters and Buhlmann - "Identifiability of Gaussian structural equation models" (2014), which led to Ghoshal and Honorio - "Learning linear structural equation models in polynomial time" (2018) who generalize the Peters/Buhlmann claim.
Other authors to Google: Dominik Janzig; Joris Mooij; Patrik Hoyer -- all of these people write papers with the above people, so you should be able to map out the network.
What the pieces all have in common is that they're trying to establish empirical differences in the joint distributions of X and Y between scenarios where X -> Y and where Y -> X. This is only possible in some cases.
Jonas Peters et al. - Elements of Causal Inference is a textbook that covers a little bit of what they called "learning cause-effect models". For algorithms, check SGS (Spirtes-Glymour-Scheines) and PC (Peter Spirtes and Clark Glymour). I believe both these algorithms are implemented in R in the package `pcalg`. There's another R package on BioConductor that implements them too, but I'm far enough afield from biostats I don't remember the name or have any notes I can find.
Some recent cites of note: Peters and Buhlmann - "Identifiability of Gaussian structural equation models" (2014), which led to Ghoshal and Honorio - "Learning linear structural equation models in polynomial time" (2018) who generalize the Peters/Buhlmann claim.
Other authors to Google: Dominik Janzig; Joris Mooij; Patrik Hoyer -- all of these people write papers with the above people, so you should be able to map out the network.
What the pieces all have in common is that they're trying to establish empirical differences in the joint distributions of X and Y between scenarios where X -> Y and where Y -> X. This is only possible in some cases.
Hope this helps.