Graphical models are just a way to encode relationships between different variables in a probabilistic model. Directed acyclic graphs (DAGs) allow you to specify (most of) the conditional independence structures that you can have between things like parameters and random variables.
This is really useful information because it can help you identify what information is truly relevant for the estimation of certain parameters (so sufficient statistics) or help you crystallize your understanding of the implications of the model you’ve created. In other words, it helps show you the ways in which your model says different aspects of your data should influence others.
This creates testable implications of the model. If your model says that two variables should be conditionally independent given a third, but they’re not, you have an avenue for refinement. You can also clearly identify your assumptions or the implications of your assumptions.
Another great thing about them is that exact inference for certain (most) structures is known to be computationally infeasible. There are a lot of different inference schemes available that can help you with different approximations with various drawbacks/advantages, heuristics that sort of work, or even ways of drawing samples from the true distribution if you can identify the structures. See belief propagation, loopy belief propagation, sequential Monte Carlo, and Markov chain Monte Carlo methods.
On top of this it helps you see everything in a general framework. Lots of the fundamental pieces of ML models are really just slight tweaks to other things. For instance, SVMs are linear models on kernel spaces with a specific structural prior. Same with splines; it’s just a different basis function. All of this helps you see the pieces of different methods that are actually identical. This helps you make connections and learn more effectively, in my opinion.
Absolutely. Very strongly agree with the last paragraph, and that's how I aspire to learn in general. Can you point me to some resources(book or otherwise) that goes over all these relationship in a general framework?
Unfortunately, it's just something you start to notice once you become more familiar with the fundamental math underlying all of this.
The book, Machine Learning: A Probabilistic Perspective by Kevin Murphy (the original book everyone in this thread is talking about) is probably the closest thing I can think of. Its goal is to frame everything around graphical models and probability. It's quite a tome. Still, despite its breadth, it can't possibly cover everything.