There are a class of probabilistic models that have exactly this property -- exponential family models. While pedagogic examples of such models tend to be very simple, they need not be. A huge class of graphical models fall in this class and can be very flexible. The underlying statistical model of a Kalman filter is in this class.
These models have what are called sufficient statistics, that can be computed on data, s = f(D) where s is the sufficient statistics, D is the past data. The clincher is that there is a very helpful group theoretic property:
s = f(D ∪ d) = g( s', d) where s' = f(D). D is past data, d is new data, D ∪ d is the full complete data.
This is very useful because you don't have to carry the old data D around.
This machinery is particularly useful when
(i) s is in some sense smaller than D, for example, when s is in some small finite dimension.
(ii) the functions f and g are easy to compute
(iii) the relation between s and the parameters, or equivalently, the weights ⊝ of the model is easy to compute.
Even when models do not possess this property, as long models are differentiable one can do a local approximate update using the gradient of the parameters with respect to the data.
⊝_new = ⊝_old + ∇M * d.
(∇M being the gradient of the parameters with respect to data, also called score)
With exponential family models updates can be exact, rather than approximate.
This machinery applies both to Bayesian as well as more classical statistical models.
There are nuances also, where you can drop some of the effects of old data under the presumption that they no longer represent the changed model.
These models have what are called sufficient statistics, that can be computed on data, s = f(D) where s is the sufficient statistics, D is the past data. The clincher is that there is a very helpful group theoretic property:
s = f(D ∪ d) = g( s', d) where s' = f(D). D is past data, d is new data, D ∪ d is the full complete data.
This is very useful because you don't have to carry the old data D around.
This machinery is particularly useful when
(i) s is in some sense smaller than D, for example, when s is in some small finite dimension.
(ii) the functions f and g are easy to compute
(iii) the relation between s and the parameters, or equivalently, the weights ⊝ of the model is easy to compute.
Even when models do not possess this property, as long models are differentiable one can do a local approximate update using the gradient of the parameters with respect to the data.
⊝_new = ⊝_old + ∇M * d.
(∇M being the gradient of the parameters with respect to data, also called score)
With exponential family models updates can be exact, rather than approximate.
This machinery applies both to Bayesian as well as more classical statistical models.
There are nuances also, where you can drop some of the effects of old data under the presumption that they no longer represent the changed model.