Assuming that π is normal and infinite (that's redundant but for clarity), then ...

gabemart · on Oct 19, 2012

I don't have a strong mathematics background, but would I be right in thinking that the probability of a normal, infinite number containing a given arbitrary string of digits is "almost sure" rather than "sure"? Or can it actually be proven mathematically that it contains such a string?

lutusp · on Oct 19, 2012

> ... but would I be right in thinking that the probability of a normal, infinite number containing a given arbitrary string of digits is "almost sure" rather than "sure"?

No. Given the assumptions (a normal number, therefore infinite in extent), the probability that a given finite sequence appears somewhere in it is 1.0.

> Or can it actually be proven mathematically that it contains such a string?

Here's one way to think about it. Let's say that you flip a perfectly fair, random coin, heads counts as 1 and tails 0, and each flip adds a bit to a sequence. Let's further say that some finite bit sequence is chosen as a finite-subsequence target, let's say the text of "War and Peace" by Leo Tolstoy (a famously big book, 1,225 pages).

We can agree that, as the normal bit sequence increases in length, the probability of encountering "War and Peace" within the sequence increases as a strict function of the number of bits in the sequence.

Now the clincher -- what is true for a finite sequence (that the probability of encountering a given subsequence increases but never becomes certain) is not true for an infinite sequence. For an infinite sequence, the probability for the appearance of a given subsequence becomes certain, i.e 1.0. The only requirement is that the subsequence be finite.

gabemart · on Oct 19, 2012

Thanks for the explanation. While I have no doubt you're right (because I am completely outside my field of expertise), I have difficulty intuitively accepting the answer. By way of comparison, the wikipedia entry on the infinite monkey theorem [1] refers to the probability of producing the Shakespeare corpus as "almost sure", a situation that seems broadly analogous to the one you describe.

I suspect the difference is that the monkeys are required to produce an output which is merely random rather than evenly distributed.

[1] http://en.wikipedia.org/wiki/Infinite_monkey_theorem

tzs · on Oct 19, 2012

> While I have no doubt you're right (because I am completely outside my field of expertise), I have difficulty intuitively accepting the answer.

When intuition fails while contemplating something involving infinity, I find it comforting to recall something John von Nuemann said: "Young man, in mathematics you don't understand things. You just get used to them".

lutusp · on Oct 19, 2012

> I suspect the difference is that the monkeys are required to produce an output which is merely random rather than evenly distributed.

Perhaps, or the fact that probability theorists regularly describe a probability of 1.0 as "almost certain", a wording that to me contradicts common sense:

http://en.wikipedia.org/wiki/Almost_surely

jasomill · on Oct 19, 2012

Of course it does. It's similarly unreasonable to call a Möbius Strip with a disc attached along the boundary circle a "plane"[1], the surface of Cartesian (x, y) coordinates a "line"[2], and numbers having counterintuitive special features, "normal".

[1] http://en.wikipedia.org/wiki/Real_projective_plane [2] http://en.wikipedia.org/wiki/Complex_line

JadeNB · on Oct 19, 2012

> numbers having counterintuitive special features, "normal".

I don't think that 'normal' implies 'intuitive', even in normal usage.

Evbn · on Oct 19, 2012

Any technical field has to define more concepts than can be distinctly named by casual vocabulary using their plain England meaning, so some words will be jargon.

Don't pick on "almost sure". We also have group, ring, field, ideal, prime, countable, closed, ....

dagw · on Oct 19, 2012

"almost sure", in probability, means an event with probability 1.0. I realize this is somewhat counter-intuitive from an English point of view, but it makes sense from a Math point of view.

lutusp · on Oct 19, 2012

I just looked into this, and to me it represents philosophy masquerading as logic. Reasonable people may differ.

http://en.wikipedia.org/wiki/Almost_surely

A quote: "In probability theory, one says that an event happens almost surely (sometimes abbreviated as a.s.) if it happens with probability one."

This is like saying a dollar is approximately 100 pennies, or every woman is approximately pregnant (or the reverse).

Another quote: "Equivalently, we can say an event E happens almost surely if the probability of E not occurring is zero."

With probability theory being guided by notions like this, no wonder insurance is so expensive. To me, it is to probability theory what post-modernism is to literature -- it reduces everything to a matter of opinion.

In analysis, we have the notion of a limit, the idea that we can say something about a quantity that is only approached, never reached. Example:

lim x -> 0, 1/x = ∞

Technically this says "as x approaches zero, 1/x approaches infinity." But I can't tell you how many times I have heard this stated as "as x approaches zero, 1/x equals zero." But that's not so -- division by zero is undefined, so the problem as stated can never be described as anything but a limit, an approach.

The reverse logic applies to a probability of 1.0 -- 1.0 is not approximately anything, it's 1.0. It's not "almost one", it's "one". And if you subtract one from it, the result is not almost zero, it's zero.

> but it makes sense from a Math point of view.

Only if probability theory transcends the axioms of logic, and with the assumption that logic is the foundation on which mathematics is built.

cdavid · on Oct 19, 2012

You're confused about the meaning of almost surely, and your comparison with dollars and pregnancy is misleading.

The almost does not apply to the probability being one, but to the meaning of of P(S) = 1 for S a member of your sigma-algebra. So when one says that a real number is almost surely normal, it means P({a is normal; a is real}) = 1, but it does not mean that any real is normal. Indeed, there are infinitely many non normal numbers (for example, every integer is obviously not normal), even though the probability that a number is not normal is 0.

In the case of finite probability spaces, this usually does not happen, because for two finite sets A and B, #A != #B usually implies P(A) != P(B) [1], which is why your comparison with pennies/pregnancy is misleading.

[1] it is certainly possible to build probability spaces on finite sets where P(A) == P(B) and #A != #B.

lutusp · on Oct 19, 2012

> You're confused about the meaning of almost surely ...

Yes, but I'm not the only one. I think most people who read the justification for describing p = 1.0 as "almost surely" will have the same reaction, regardless of their training. Not to argue that logic must accept responsibility for our confusion.

> your comparison with dollars and pregnancy is misleading.

As to dollars, yes (because it's finite and trivial), but as to the parturient state of a set of women, it depends on the size of the set. For an infinite set of women, to describe them all as almost surely pregnant (or not), is a reasonable corollary to describing p = 1.0 as "almost surely" in the general case.

> Indeed, there are infinitely many non normal numbers (for example, every integer is obviously not normal), even though the probability that a number is not normal is 0.

This compares two infinite sets and draws a conclusion based on the outcome (the set of integers is infinite in size, and the set of numbers (reals and integers) is also infinite in size). I'm not sure this is comparable to making a statement about the mapping of a single well-defined probability (for example the appearance of War and Peace) into a single infinite set, like the presumed infinite set of the digits of Pi.

Which leads to an obvious question -- does the "almost surely" qualifier apply to all p = 1.0 probabilities, or only a subset? In other words, does the qualifier depend on the construction of the probability, or does it always apply?

jules · on Oct 19, 2012

To understand these issues, you should read up on measure theory: http://en.wikipedia.org/wiki/Measure_(mathematics)

pi is a single fixed number, so it doesn't help to talk about probabilities. Either every finite length sequence appears in pi, or at least one doesn't.

The technical term "almost surely" applies to all p=1 probabilities, even those where it's really certain that the event happens. This does not match with common understanding of "almost surely" because we'd not usually make a statement like "the decimal expansion of pi almost surely contains the number 4", but mathematically it's a correct statement.

It's important to keep in mind that this is technical terminology, so intuition does not always apply. It also means that it does not make sense to talk about women being almost surely pregnant without first defining a probability distribution. If we define each woman to be pregnant with a probability of 1%, independent of the pregnancy status of other women, and we have an infinite number of women, then almost surely at least one of them is pregnant. That is, the probability that at least one of them is pregnant is 1. But it's not certain that at least one of them is pregnant, it could be that all of them are not pregnant. Similarly, if we choose a number uniformly in the interval [0,1], then the probability of choosing 1/3 is 0, but it's possible that we chose 1/3. We cannot ignore that as an impossibility either, otherwise when we chose a number x, we could always claim "it's impossible that we chose exactly x!".

xyzzyz · on Oct 19, 2012

In probability theory, the statement "event S occurs almost surely" means by definition that probability of S occuring is 1. This is indeed confusing, but it's completely justified, given constraints.

When non-mathematicians talk about probability, the situation they usually have in mind is that they have n possible events that can occur, each is equally likely, for instance the fair dice throw. This leads to assigning each one of the events the same probability, in the case of dice it's 1/6. We also want to be able to assign the probability to the situation of some subset of events occurring, for instance the probability of the situation that number on a dice is even corresponds to a subset {2 was thrown, 4 was thrown, 6 was thrown}. By common sense, this should be 1/2, and generally the probability should be (number of events in the subset / number of all events). This approach is how probability is usually taught in high school.

The problem is that this approach, while useful, is totally inadequate approach for lots and lots of things that people care about. For instance, say we want to investigate the probability of earthquake occurring. We want to be able to answer the question "what's the probability that we will not have an earthquake in the next t seconds?". Assuming that earthquake are totally unpredictable, we should expect that if we wait s seconds, the answer to this question is still the same, so in probability terms, P(next earthquake will not occur in next t+s seconds provided it will not occur in the next s seconds) = P(next earthquake will not occur in next t). It's really not obvious how to apply the finite set of equally likely events method, if it's at all possible.

That's why most of the mathematicians use similar, but much more general approach. Just like before, we consider set of possible elementary events that can occur, but this time we don't assume it's finite, and we don't assign each one equal probability. Instead, we assign probabilities to subsets of events, so that it's subject to few simple rules, but otherwise in a completely arbitrary way. The rules are: probability of any of the events occurring is 1, and A_1, A_2, ... are pairwise disjoint sets of events, we assign to their union the sum of probabilities.

For instance, we want to model the archer shooting the circular target, so we let events be the points on the target (there are infinity of them), and we let the probability of subset of events be its area (assuming that whole target has area one). Now it makes sense to ask, say, what's the probability of archer hitting arrow within half of a radius from the center of the target? (it's 1/4). But, since every point has area 0, the probability of an archer hitting any particular point is 0, thus probability of archer _not_ hitting that point is 1, meaning that archer will _almost surely_ not hit that point. Almost, because while for every point the probability of archer hitting it is 0, archer will hit _some_ point. That's why mathematicians say almost surely, and not surely, when talking about evens with probability 1.

notatoad · on Oct 19, 2012

It's not proven, it's the definition. If there is a subsequence that isn't contained in a non-repeating infintely long sequence, then it isn't a non-repeating infinitely long sequence.

xyzzyz · on Oct 19, 2012

This is not true. For instance, sequence 00 does not occur in sequence

010110111011110111110111111...

which can hardly be described as "repeating" (or periodic) sequence.

The point here is normality. Fix alphabet A having b letters. We say that a sequence S of elements of A is normal if, for every word w, we have

lim n->inf C(S, w, n)/n = b^(-k)

where C(S, w, n) is the number of occurrences of w in first n letters of S, and k is length of w. More intuitively, S is normal if every word occurs in S with the same asymptotic frequency.

So, if a sequence does not contain some word as a substring, it cannot be normal by above definition.

ColinWright · on Oct 19, 2012

You're conflating two things. The technical definition of "normal" is that it contains every finite string as a substring. That will imply that it's infinite and non-repeating.

However, you can have a string that is infinit and non-repeating but not normal.

Pi is infinite and non-repeating, but it is not known to be normal.

cdavid · on Oct 19, 2012

It is proven that there is an infinite number of normal numbers, as quoted in the wiki article you're mentioning (p({a real; a is normal}) = 1).

lutusp · on Oct 19, 2012

Yes, because among other things it's clearly implied by "While a general proof can be given that almost all numbers are normal ... " If true, then the set is infinite.

rorrr · on Oct 19, 2012

> But this is as meaningless as it is true

Well, it's a little more complicated than that.

You see, it means that there's the complete description of our universe in there, position of every atom, at every nanosecond. Somewhere in pi there's you living your life.

Also, every possible you.

lutusp · on Oct 19, 2012

Still meaningless, because of the insoluble problem of access. Within Pi there's a cure for cancer, written entirely in iambic pentameter, but that fact has no practical meaning because we can't find it.

Evbn · on Oct 19, 2012

Emphases on insoluble, as to find an item of interest, on average, would take longer than the history of the Universe till now, longer than human civilization or any encoding scheme is likely to be in use....