You could probably call a Kalman filter an ML model if you wish. One major diffe...

You could probably call a Kalman filter an ML model if you wish. One major difference is that you usually have to prescribe the prediction model. It doesn't learn to predict the next value like an LLM, instead it learns to find an optimal weighted sum of its prediction and the measured values. So it's a prediction-correction loop that requires an interpretable model, that has the side effect of allowing you to estimate system states. This is quite different than having an arbitrary hidden state with learned structure.

In other words it performs well for certain applications specifically because it allows you to bring in domain knowledge in the form of the process model and known uncertainties. Whereas deep learning models try to generalize the model and learn implicit structure from data.