An Introduction to Probabilistic Programming

tbenst · on Oct 1, 2018

I was introduced to probabilistic programming by Frank Wood and Brooks Paige at DARPA’s PPAML program, and they are excellent teachers. Highly recommend this resource! Anglican is a great language embedded in Clojure so this is particularly good for those on JVM

wodenokoto · on Oct 1, 2018

How does this compare to probmod[1] and webppl[2]?

[1] https://probmods.org/

[2] http://dippl.org/

molteanu · on Oct 1, 2018

http://probabilistic-programming.org/

__skeptic · on Oct 1, 2018

I always feel like I'm missing something deeper when I hear / read about "Probabilistic Programming" - am I wrong in thinking that it's a well thought way of expressing Bayesian Models through the language of OOP? I always feel like the marketing side of this is hinting at something greater, but I haven't seen any deeper connections past a good design choice.

pedrosorio · on Oct 1, 2018

Having a way to express Bayesian models is a requirement for probabilistic programming but that’s not the “point”.

Representing a Bayesian model in some language is not useful by itself. Being able to do efficient inference without requiring the programmer to implement an inference algorithm for each model they want to specify is the point of probabilistic programming.

__skeptic · on Oct 1, 2018

I see that, so I should be thinking of the use of the phrase "probabilistic programming" to be more analogous to "Nonlinear programming" than to "Functional programming".

alew1 · on Oct 1, 2018

Or maybe more analogous to “differentiable programming” or “logic programming”: a style of programming that supports a broader range of operations than “running.”

Also, it’s not quite that you are using a DSL to encode Bayesian models traditionally represented in other ways; you are using a full-featured programming language to express a set of models larger than the set that is easy to express using existing formalisms. (That said, there are also some models that are easier to express using those other formalisms; you can think of them as being DSLs in a Turing-complete probabilistic programming language, which make it easier to express certain limited classes of programs.)

mlevental · on Oct 1, 2018

can some one give a quick tldr of probabilístic programming? is it a convenient way to use Bayes (update priors) or is there more to it?

alew1 · on Oct 1, 2018

"Updating priors" is quite complex, and arguably describes every interesting task you'd want to do as a learner, reasoner, analyst, etc. One view of probabilistic programming languages is that they allow you to express your priors—i.e., uncertain beliefs about the world—as programs, and help you to answer all sorts of interesting questions in the contexts of those programs.

As a concrete example, you might write a program expressing your beliefs about how the scenes you see come to be: some random objects are placed at some random positions, lighting sources are placed, and then a rendering algorithm converts that scene description into the light intensities perceived by your eye. If you write this up in a probabilistic programming language, you can then ask questions like: supposing I see this image, what are some likely scenes (positions and shapes of objects) that might have produced it? How does that distribution over scenes change if I know that the light source was here, or that there was a small chair there, or that my vision isn't that great and I should trust the image's pixels a little less? Section 1.3 of the introduction linked above lists many more applications for which probabilistic programming turns out to be useful: breaking CAPTCHAs, constrained procedural graphics, automatic program induction/synthesis, recursive multi-agent reasoning... and, as the authors write at the end of this section, "the list goes on."

Of course, solving the inference task in full generality, for any program and query that the user might write, is very difficult; you can design pathological cases where it boils down to solving the halting problem, and is therefore uncomputable. Different languages take different approaches to addressing this difficulty. They might limit you to writing certain kinds of programs for which a limited class of inference algorithms can always be made efficient; they might try to plan a fancy inference algorithm based on the specifics of your model; they might just expect you to learn which programs will be tractable and which won't; or they might provide abstractions that make it possible for you to safely experiment with writing new inference algorithms (or combinations of existing ones) specialized for your use case. Probabilistic programming is still a young field and this is an area of active research.

pacala · on Oct 1, 2018

> supposing I see this image, what are some likely scenes (positions and shapes of objects) that might have produced it?

Sounds like there exists a device where I can write some program, run it over some inputs, pass the program and the outputs to the device and have it magically infer the possible inputs. This is basically logical programming. Then we add a probabilistical layer on top of that.

Maybe the probabilistic part hides some significant simplifications. Even relatively simple programs are nontrivial to express logically. See, for example, the pure treatment of arithmetic in http://okmij.org/ftp/Prolog/Arithm/arithm.pdf.

What kind of programs are practically feasible with Probabilistic Programming?

alew1 · on Oct 1, 2018

Great connection. Here's one way of looking at it: when you add stochasticity to your program, you make it easier to search the space of possible solutions (and you also relax the criteria for what it means to be "a solution").

In (deterministic) logic programming, the default search strategy is backtracking. We search the space of "inputs that could have produced this output" exhaustively. Each set of inputs we try either works or doesn't, and doesn't tell us anything about whether similar inputs will behave similarly.

In probabilistic programming, we can use the information we have about probability densities to tell whether we're on the right track; as we search, we can tell if we're getting "warmer" or "colder." Suppose I have the simple probabilistic program:

   def f():
     a = normal(0, 1) # mean 0, stddev 1
     b = normal(0, 1)
     c = normal(a+b, 0.1)
     return c

In this generative model, I choose two random numbers, add them together, and then pick a third number very close to their sum. This could be thought of as a probabilistic version of the deterministic program f(a, b) = a+b. If I want to "invert" it, and condition on c=2, say, I can walk around the space of a priori likely (a, b) values, "scoring" them based on how likely "c=2" would have been given the particular (a, b) under consideration. The tight bell curve around the value c=2 can be seen as an implicit "loss function," telling us how good or bad the current solution is. This is very different from a logic programming approach, which would be able to exactly invert the + operation; but as you pointed out, it's nontrivial to formulate `+` correctly as a logic program, whereas here, we are free to use an imperative implementation of `+` (or much more complex deterministic operations, like a 3D renderer).

There is research in probabilistic logic programming languages, which can be seen as combining some of the strengths of both paradigms, though that's not covered by the introduction linked in the OP.

ssivark · on Oct 1, 2018

Computational models are ubiquitous, and using them involves mainly two things: fitting/inferring parameters, and making predictions using model+parameters. Probabilistic programming languages (PPLs) aim to provide a rich set of tools to make both these actions easy, given a model. To really exploit this to the fullest, the framework needs to also be able to handle uncertain parameters (rather than just "best fit" values), which goes over into the regime of "hierarchical" models i.e. probabilistic models for parameters in a probabilistic model.

If we were to delve into details a little further, being able to make predictions from a generative stochastic model would call for efficient samplers, and being able to infer parameters might call for good variational optimization schemes (and various other methods on the landscale of Bayesian inference).

currymj · on Oct 1, 2018

The tl;dr is it sort of lets you write a program that describes sampling from your data-generating model, perhaps setting priors over certain unknown parameters.

by allowing for programming language constructs, you can define very rich and interesting data-generating distributions.

one can then execute the program to generate samples. more importantly, given observed data, one can efficiently "use Bayes" to update the priors.

this is usually computationally nontrivial due to the Bayes rule denominator aka "partition function", but there are various good algorithms to do it approximately. One of the most popular (used in Stan) is known as Hamiltonian Monte Carlo.

umdiff · on Oct 1, 2018

I'd add that sampling is still very useful in contexts when the partition function is known or efficient to calculate.

What's generally interesting is the general character of the posterior -- its modes, the shape of variance around these modes etc. High-quality sampling helps with this, whereas being about to calculate the PDF exactly at particular points sometimes does not.