Gaussian Processes from Scratch (2019)

itissid · on July 11, 2021

A very intuitive entry into gaussian processes comes from Chapter 12 of Statistical Rethinking by Richard McElreath:

He comes at it from the regression side and explains that GP's basically occur when you have continuous variables in your regression problem like ages or income instead of individual units like countries or chimapanzee subjects. Here is a paragraph that sort of explains it

> But what about continuous dimensions of variation like age or income or stature? Indi- viduals of the same age share some of the same exposures. They listened to some of the same music, heard about the same politicians, and experienced the same weather events. And individuals of similar ages also experienced some of these same exposures, but to a lesser extent than individuals of the same age. The covariation falls off as any two individuals be- come increasingly dissimilar in age or income or stature or any other dimension that indexes background similarity. It doesn’t make sense to estimate a unique varying intercept for all individuals of the same age, ignoring the fact that individuals of similar ages should have more similar intercepts.

The beauty of the author's explanation is that Mixed slope and Intercept models are very intuitive and so are GP's which are just their extension to the continuous random variables to model their covariances.

(BTW The author is explains "regression" of the kind used in Controlled Experiments in like social sciences or botanist and not really as an optimization problem in ML to reduce error; The coefficients are interpreted as effect sizes).

dumb1224 · on July 12, 2021

I'm not statistically trained but am trying to study by myself. The idea that GP is the extension to the mixed linear models is very interesting. However I still struggle to understand the relationships here, i.e in mixed models the different slopes are reflecting the different factors (categories, groups etc), but how do they extend to the GP space?

Also, it's easier to imagine the effects (or surrogates) of random variables such as Age, but a lot of the times in molecular biology there are independent variables that are not explainable (e.g gene expression level). Can they become intuitive with GPs?

itissid · on July 12, 2021

Here is an example say you are modeling cancer rates on a county level to discover anomalies and prima facie you don't have any hypothesis to believe that counties interact in a special way for this kind of cancer(there maybe inherent unknown interactions). All you have are data on counties(discrete) you can model the cancer rate T_i using a Poisson and a log link function:

  `T_i ~ Poisson(y_i)`

and the link function:

   `y_i~ a_{unit[i]} + b_{unit[i]}*x_i`

In a Mixed model without GPs you would assume a(simple) normal prior on each of the slopes and intercepts. Then you would say something like: `a_{unit[i]} ~ N(c, d)` and and similarly for b. You could also do a multivariate normal priors on all the a's and b's which can also help you pool across groups that are "similar", but you still don't know what those interactions are.

But say we change the hypothesis, like say there is reason to believe that certain counties might interact for cancer rates strongly with a continuous variable like distance to each other then you could try and model the covariance of the prior as a function of the distance matrix between counties

`b ~ MVN(c, K) ` where $K ~ \gamma (exp(-1*D_{i, j}^2))$

where D is the distance matrix between counties. That is what a GP helps with; More model structure. You essentially have a "distance" measure of some sort in this case. And this can be distance in any space like for modeling housing prices `b` could be a "locality" coefficient modeled on how similar it is to houses in a 10 block radius or something.

~~~~~~

This is the technical part. The intuitive part is essentially about seeing if your problem has information providing more structure. Problems benefitting from GPs are ones where you have lots of auxillary data your model can gain from like demographics, census data which is general and not collected as a part of the problem you are solving but you have an hypothesis upfront that needs to incorporate that data and structure.

The other piece of intuition is that similar populations behave similarly than those that are very distinct. Like ages 15-25 will be more similar amongst themselves, so will the group 40-50, but not when you compare across groups. There is an exponential falloff between these groups of some similarity measure, but its continuous not discrete.

dumb1224 · on July 12, 2021

Thanks very much for the detailed example. This particular description is very helpful >There is an exponential falloff between these groups of some similarity measure, but its continuous not discrete.

That's a lot of food for thoughts for me to digest, however I still struggle to imagine the more refined structure that GPs can capture via the form of covariance function. More reading I guess. Thanks again :-D

sillysaurusx · on July 11, 2021

Just want to say, the website itself is totally gorgeous. Love how it looks on mobile, love the math rendering, looks awesome. (And thanks for making the code available too.)

EDIT: Turns out, there’s more info here about how to set up a site like this: https://peterroelants.github.io/posts/about-this-blog/

herbps10 · on July 12, 2021

This is great, using Brownian motion to introduce stochastic processes is a neat way to introduce the idea of priors over a function space.

My small contribution to this area is an interactive notebook where you can add data points and see how different GP kernels behave: https://observablehq.com/@herbps10/gaussian-processes

carbocation · on July 11, 2021

I'm very familiar with regression and really enjoy descriptions that include code. For my personal learning needs, this is probably the best demonstration of GPs that I have seen.

gentleman11 · on July 11, 2021

I briefly studied stochastic processes as part of querying theory once. What other applications are these concepts used for?

rualca · on July 11, 2021

Kriging, aka Gaussian process regression, is used extensively in modeling to generate surrogate surrogate models.

https://en.wikipedia.org/wiki/Kriging

herbps10 · on July 12, 2021

Gaussian Processes in particular can be very useful in regression problems when you don't want to make strong assumptions about the functional form of the relationship between variables. (You can still introduce more general assumptions, like that the relationship is "smooth" to some degree or is periodic, by your choice of GP kernel function.)

this-pony · on July 11, 2021

In academia people study stochastic versions of PDEs in order to try to answer regularity and existence questions. Think for example about the famous millennium problem of Navier-Stokes. Sometimes the stochastic viewpoint can even give more results about the non-stochastic setting.

yarky · on July 12, 2021

Unless you assume that everything/anything is a constant, stochastic processes are everywhere. They happen to be used a lot in finance and physics, since real world phenomena can be intuitively modeled as a random process, in contrast with deterministic models. If you're interested try googling "stochastic processes in [insert field]".

softwaredoug · on July 11, 2021

We use GPs for our own Bayesian optimization of search engine parameters for relevance. I suspect it could also be a backbone of doing active learning to learn about the search or recos click data’s blind spots. (Basically also Bayesian optimization, but a stronger explore bias)

blt · on July 12, 2021

Did you mean queueing theory?

sydthrowaway · on July 12, 2021

Do people use these for stock markets?