Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"Measure theory a la Bourbaki" is probably not what you meant if you did probability theory and statistics. Bourbaki famously sidestepped classical measure theory with sigma algebras by constructing Radon measures as functionals via functional analysis. This is sufficient for many purposes but not for probability theory. First and foremost, sigma algebras in probability theory are not just a technical device to avoid paradoxes with unmeasurable sets. Sigma algebras play a starring role in conditional expectation, stochastic processes and martingale theory.

The best way to appreciate the information-theoretic role of sigma algebras is to look at them in the simplest case, where you have a discrete-time, finite-valued process. Then a sigma algebra is equivalent to a partition of the state space and it represents the information that can be gained from an observation; it's like a random variable without specific values, just the discriminating information from different outcomes. To say that a random variable is measurable with respect to the sigma algebra is to say that its value may only depend on information that can be gained from an observation. A filtration of sigma algebras corresponds to a causal series of observations where the observer learns more information over time.

The conditional expectation of a random variable with respect to a sigma algebra (or partition or other random variable) is another random variable that tells you the expectation over the states consistent with a given observation; this new random variable is measurable with respect to the sigma algebra you conditioned on, which as mentioned earlier means it only depends on the information gained from an observation. The conditional expectation is the best least-squares estimator given the information from an observation in the same way that the usual expectation is the best least-squares estimator given no information.



As someone who moved from pure mathematics -> applied mathematics -> machine learning, perhaps I can offer a similar perspective to GP that explains why the measure theory stuff might not seem so useful to some people. Basically, in application you never need to worry about anything but your simplest case - that of a discrete-time, finite-valued process. All of the subtleties of measure theory, which occupy most of what mathematical measure theorists/analysts work on, are introduced when you assume you've got an infinity or a continuum somewhere in a model, and that assumption is never borne out in practice. Unless you really want to do theoretical math, you're better off learning only the aspects that relate to finite functions, i.e. almost none of what a real mathematician would consider to be "math". And that's not just out of laziness or disinterest - once you've restricted yourself to finite functions for empirical applications, you encounter a whole new realm of engineering difficulties to deal with, that theoretical mathematics doesn't touch. You've got to learn how to solve a different class of problems that aren't as purely logical, but are every bit as challenging.

To put it another way - think about all the pathologies you may encounter working with even a "nice" function space like L^2(R). You'll never deal with those pathologies in reality, because every empirical function is much better behaved - finite domain, finite range, and even if you assume a continuous domain, you can choose a model that only has a finite number of discontinuities, is Lipschitz continuous in between them, has finite total variation, etc. And that's why the hand-wavey, "intuitive" approach works so well if you're not a theorist, at least in my opinion.


You're right. Classical measure theory lives in a model divorced from physical reality. You can show that all of the fancy counterexamples which necessitate the complicated constructions of measure theory are artificial (e.g., the characteristic function of a non-measurable set is uncomputable).

There are better approaches to measure theory which live in different "foundations". For example, you can build measure and probability theory based on the locale of valuations on a locale instead of a sigma-algebra on a topological space. You can do even better by starting in a constructive metatheory and adding some anti-classical assumptions which are modeled by all computable functions.

The reason we are teaching classical measure theory as the foundation of probability theory is historical and because there are no good expositions available for most alternative approaches. It is really not the most straightforward approach.

---

Before you accuse me of being overly negative: classical measure theory offers a consistent approach to probability theory which is well understood and for which carefully written textbooks are available. If you really need to go back to the definitions to derive something then you need to know at least one consistent set of definitions. So it is useful to teach measure theory, even if it is more complicated than it has to be...


>Classical measure theory lives in a model divorced from physical reality.

Everything in mathematics is divorced from reality. Unbounded integers are divorced from reality. (Once you move beyond naive realism, bounded integers are divorced from reality, but that's a deeper philosophical debate.) The only question is whether these models are more or less effective for their various theoretical and applied purposes.

> There are better approaches to measure theory which live in different "foundations". For example, you can build measure and probability theory based on the locale of valuations on a locale instead of a sigma-algebra on a topological space.

Better by what definition? According to the practical needs of students, pure and applied mathematicians, etc? I've studied some topos theory and know a little bit about locales from the Topology Via Logic book, but it's hard for me to see that as anything more than a fun curiosity when considering the practical needs of mathematics as a whole. In my mind that kind of thing is much closer to navel-gazing than something like measure theory.

> It is really not the most straightforward approach.

The onus is on critics to do better. Dieudonne/Bourbaki made a valiant and elegant attempt even if they intentionally snubbed the needs of probability theory. And "better" will obviously be judged by the broader community.


> Everything in mathematics is divorced from reality.

Mathematics is an abstraction, but it is still useful for talking about concrete problems. Your mathematical assumptions can be either close or far away from your problem domain. Sometimes we introduce idealized objects, such as unbounded integers, in order to abstract further and simplify our reasoning.

These ideal objects can then either be "compiled away" in specific instances, or really do ignore corner cases which might invalidate your results.

For an example of the former, you can assume that there is an algebraically closed field containing a specific field, give an argument in terms of this closure and then translate this argument to one which does not construct the closure explicitly. The translation is mechanical and does not represent additional assumptions you made.

The second kind of ideal object is something like the real numbers applied to physics. We can think of a real number as an arbitrarily good approximate result. In practice we can only ever work with finite approximations. At the scales we are operating on the difference is usually not relevant, but there might, for example, be unstable equilibria in your solutions which are not physically realizable.

> Better by what definition?

Informally, better because it is "simpler". There are fewer corner cases to consider, theorems are more inclusive, constructions are more direct.

Formally, the theory has more models and is therefore more widely applicable. Theorems have fewer assumptions (but talk about a different and incompatible type of objects).

> The onus is on critics to do better. Dieudonne/Bourbaki made a valiant and elegant attempt even if they intentionally snubbed the needs of probability theory. And "better" will obviously be judged by the broader community.

Oh, sure, but that's not what I want to argue about.

I can tell you with certainty that classical measure theory is complicated by the interplay of excluded middle and the axiom of choice. This is a technical result. You can see this yourself in textbooks every time the author presents an intuitive "proof idea" which then has to be refined because of problems with the definitions. In alternative models, or in a metatheory with alternative assumptions, the simple proof idea usually works out fine.


> Everything in mathematics is divorced from reality. Unbounded integers are divorced from reality. (Once you move beyond naive realism, bounded integers are divorced from reality, but that's a deeper philosophical debate.) The only question is whether these models are more or less effective for their various theoretical and applied purposes.

It's a matter of degree. You can't point to a particular large integer as being "too large" to matter in real life, but there are plenty of objects in measure theory (like unmeasurable sets) that blatantly violate physical intuitions and don't seem to exist in any sense in real life.


> All of the subtleties of measure theory, which occupy most of what mathematical measure theorists/analysts work on, are introduced when you assume you've got an infinity or a continuum somewhere in a model, and that assumption is never borne out in practice.

That's not entirely fair. People doing controls work with continuous-time models often enough, so there are some practical benefits.

I agree with the rest of your comment though.


Oh yeah agreed, and also in other applications as well, and i tried to speak to that with the end of my comment - even with continuous time/space models, you're never going to encounter the truly pathological functions that require, e.g., the careful definitions that go into defining Lp spaces. Everything physical is much better behaved.


For what it's worth, my professional background is entirely in game development and systems programming. I'm not a professional academic. All the professional work I've done with mathematics is applied.

I think you're overstating some things, but I mostly agree. My main disagreement is with your implicit premise that the best practical theory should exist at the same level of abstraction as practical applications. The real numbers have been a really successful practical theory. Physicists and applied mathematicians know they don't "really" exist, but more "realistic" alternatives are awkward and messy. The same applies to more extravagant theoretical constructions like Hilbert spaces. They're an extremely nice mathematical setting for applications (e.g. optimal control, approximation theory, finite element methods, quantum mechanics). No-one should be losing much sleep over their ubiquity in applications. If your point is that we shouldn't belabor some of their technical details when teaching them to practitioners, sure, but that's already the case.


Yeah, my point is exactly what you say at the end - for practical applications, those theoretical details (which are enormously challenging) generally aren't necessary, and the problems they address aren't the ones application practitioners will encounter. I definitely understand the motivation for theoretical structures within math itself, and i have no issue with e.g. the axiom of choice for people doing mathematics itself.


For folks developing or analyzing new applied math techniques (for solving differential equations or function approximation or whatever), it is helpful to make formal proofs about their behavior and bounding their error etc., and from the papers I have looked at those are often (usually?) done on top of measure-theoretic models.

It might be possible to develop alternative proofs on purely finite/approximate mathematics, but for a working applied mathematician who already went through standard math grad school curriculum that is probably more trouble than it’s worth.

The users of those mathematical tools (whether software implementors or people just calling some software library) usually don’t need to care about the details of the proofs.

This is similar for other kinds of science/engineering.


> The users of those mathematical tools (whether software implementors or people just calling some software library) usually don’t need to care about the details of the proofs

Oh for sure, and perhaps this is just a confusion of terms but i think that's what the thread parent was speaking with "applied statistics". In academia, "applied math/statistics" can mean "I'm doing theoretical math with an eye towards applications but it still requires heavy mathematical machinery", but it can also mean "I'm using mathematical tools to solve empirical problems, and I'm never going to need to worry about Lebesgue measures".


Correct on your first point. I do not mean I learned the specific version published by Bourbaki, but rather in the modern mathematical style (hence the "a la", as opposed to 19th century hand waving style that nearly all undergraduate first-courses do).

And by the modern style, I meant starting out with analysis, defining the axioms of what measures are, demonstrating the existence of nonmeasurable sets with the axiom of choice, etc. IIRC, the course did rely on Borel algebras for its buildup, but did not openly buildup from sigma algebra machinery.


I've only studied measure theory in a probability context. Can you talk a little bit about why the Radon measure isn't sufficient there, and what it is useful for outside of probability?


It's been forever since I looked at this stuff, sorry. But I think Bourbaki's shortcut to integration theory via continuous linear functionals only works in locally compact spaces. It doesn't let you construct the Wiener measure on a path space corresponding to Brownian motion or other continuous-time stochastic processes. And the example I gave of the conceptual, information-theoretic role played by sigma algebras in stochastic processes shows that sidestepping them is the wrong move for probability theory, even if they weren't used for some of the advanced technical constructions like Wiener measure.

Bourbaki's shortcut to Radon measures is very elegant but it's noteworthy that unlike many other Bourbaki innovations I don't think it was picked up by other textbook authors. Already at that point there was a mathematical consensus that measure theory was a valuable part of the foundations of modern mathematics and shouldn't be eliminated or minimized.

Outside probability theory, measure theory is primarily used as a foundation for integration ("expectation"). There are also more specialist subjects like geometric measure theory; there's an excellent introductory textbook called Measure Theory and Fine Properties of Functions, and if you look at its table of contents you can get an idea of the breadth of topics.


It's as if I was trying to get from Camden to Islington and was given directions via Edinburgh. Suffice it to say, if I had a vague idea in the first place it's gone now.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: