Assuming that π is normal and infinite (that's redundant but for clarity), then yes, it's true -- it contains every finite-sized number, every textual sequence, every secret, every book ever written. Scientific secrets we may never discover. Cures to nonexistent diseases. Everything!
But this is as meaningless as it is true, because the problem is not that those secrets are present, the problem lies in locating them.
I had this conversation years ago on Usenet -- the same question, the same answer. My correspondent seemed to think it was remarkable that all those riches lay within π. He didn't seem to realize that the same could be said of any normal, infinite sequence of digits, of which there may be an infinite number (not proven, but it's possible that e, √2, indeed the square root of any non-perfect-square number, all fall into this category).
If the grains of sand on a beach could be decoded as binary bits (or as digits in any base) by some accepted convention, and making the same assumptions about normality and infinity, then a beach also contains every secret, every book that has ever been written or will ever be written. But it's the same problem - those secrets cannot be located, distinguished from sequences we might label meaningless, simply because they're answers we're not interested in.
A quote: "While a general proof can be given that almost all numbers are normal (in the sense that the set of exceptions has Lebesgue measure zero), this proof is not constructive and only very few specific numbers have been shown to be normal. For example, it is widely believed that the numbers √2, π, and e are normal, but a proof remains elusive."
I don't have a strong mathematics background, but would I be right in thinking that the probability of a normal, infinite number containing a given arbitrary string of digits is "almost sure" rather than "sure"? Or can it actually be proven mathematically that it contains such a string?
> ... but would I be right in thinking that the probability of a normal, infinite number containing a given arbitrary string of digits is "almost sure" rather than "sure"?
No. Given the assumptions (a normal number, therefore infinite in extent), the probability that a given finite sequence appears somewhere in it is 1.0.
> Or can it actually be proven mathematically that it contains such a string?
Here's one way to think about it. Let's say that you flip a perfectly fair, random coin, heads counts as 1 and tails 0, and each flip adds a bit to a sequence. Let's further say that some finite bit sequence is chosen as a finite-subsequence target, let's say the text of "War and Peace" by Leo Tolstoy (a famously big book, 1,225 pages).
We can agree that, as the normal bit sequence increases in length, the probability of encountering "War and Peace" within the sequence increases as a strict function of the number of bits in the sequence.
Now the clincher -- what is true for a finite sequence (that the probability of encountering a given subsequence increases but never becomes certain) is not true for an infinite sequence. For an infinite sequence, the probability for the appearance of a given subsequence becomes certain, i.e 1.0. The only requirement is that the subsequence be finite.
Thanks for the explanation. While I have no doubt you're right (because I am completely outside my field of expertise), I have difficulty intuitively accepting the answer. By way of comparison, the wikipedia entry on the infinite monkey theorem [1] refers to the probability of producing the Shakespeare corpus as "almost sure", a situation that seems broadly analogous to the one you describe.
I suspect the difference is that the monkeys are required to produce an output which is merely random rather than evenly distributed.
> While I have no doubt you're right (because I am completely outside my field of expertise), I have difficulty intuitively accepting the answer.
When intuition fails while contemplating something involving infinity, I find it comforting to recall something John von Nuemann said: "Young man, in mathematics you don't understand things. You just get used to them".
> I suspect the difference is that the monkeys are required to produce an output which is merely random rather than evenly distributed.
Perhaps, or the fact that probability theorists regularly describe a probability of 1.0 as "almost certain", a wording that to me contradicts common sense:
Of course it does. It's similarly unreasonable to call a Möbius Strip with a disc attached along the boundary circle a "plane"[1], the surface of Cartesian (x, y) coordinates a "line"[2], and numbers having counterintuitive special features, "normal".
Any technical field has to define more concepts than can be distinctly named by casual vocabulary using their plain England meaning, so some words will be jargon.
Don't pick on "almost sure". We also have group, ring, field, ideal, prime, countable, closed, ....
"almost sure", in probability, means an event with probability 1.0. I realize this is somewhat counter-intuitive from an English point of view, but it makes sense from a Math point of view.
A quote: "In probability theory, one says that an event happens almost surely (sometimes abbreviated as a.s.) if it happens with probability one."
This is like saying a dollar is approximately 100 pennies, or every woman is approximately pregnant (or the reverse).
Another quote: "Equivalently, we can say an event E happens almost surely if the probability of E not occurring is zero."
With probability theory being guided by notions like this, no wonder insurance is so expensive. To me, it is to probability theory what post-modernism is to literature -- it reduces everything to a matter of opinion.
In analysis, we have the notion of a limit, the idea that we can say something about a quantity that is only approached, never reached. Example:
lim x -> 0, 1/x = ∞
Technically this says "as x approaches zero, 1/x approaches infinity." But I can't tell you how many times I have heard this stated as "as x approaches zero, 1/x equals zero." But that's not so -- division by zero is undefined, so the problem as stated can never be described as anything but a limit, an approach.
The reverse logic applies to a probability of 1.0 -- 1.0 is not approximately anything, it's 1.0. It's not "almost one", it's "one". And if you subtract one from it, the result is not almost zero, it's zero.
> but it makes sense from a Math point of view.
Only if probability theory transcends the axioms of logic, and with the assumption that logic is the foundation on which mathematics is built.
You're confused about the meaning of almost surely, and your comparison with dollars and pregnancy is misleading.
The almost does not apply to the probability being one, but to the meaning of of P(S) = 1 for S a member of your sigma-algebra. So when one says that a real number is almost surely normal, it means P({a is normal; a is real}) = 1, but it does not mean that any real is normal. Indeed, there are infinitely many non normal numbers (for example, every integer is obviously not normal), even though the probability that a number is not normal is 0.
In the case of finite probability spaces, this usually does not happen, because for two finite sets A and B, #A != #B usually implies P(A) != P(B) [1], which is why your comparison with pennies/pregnancy is misleading.
[1] it is certainly possible to build probability spaces on finite sets where P(A) == P(B) and #A != #B.
> You're confused about the meaning of almost surely ...
Yes, but I'm not the only one. I think most people who read the justification for describing p = 1.0 as "almost surely" will have the same reaction, regardless of their training. Not to argue that logic must accept responsibility for our confusion.
> your comparison with dollars and pregnancy is misleading.
As to dollars, yes (because it's finite and trivial), but as to the parturient state of a set of women, it depends on the size of the set. For an infinite set of women, to describe them all as almost surely pregnant (or not), is a reasonable corollary to describing p = 1.0 as "almost surely" in the general case.
> Indeed, there are infinitely many non normal numbers (for example, every integer is obviously not normal), even though the probability that a number is not normal is 0.
This compares two infinite sets and draws a conclusion based on the outcome (the set of integers is infinite in size, and the set of numbers (reals and integers) is also infinite in size). I'm not sure this is comparable to making a statement about the mapping of a single well-defined probability (for example the appearance of War and Peace) into a single infinite set, like the presumed infinite set of the digits of Pi.
Which leads to an obvious question -- does the "almost surely" qualifier apply to all p = 1.0 probabilities, or only a subset? In other words, does the qualifier depend on the construction of the probability, or does it always apply?
pi is a single fixed number, so it doesn't help to talk about probabilities. Either every finite length sequence appears in pi, or at least one doesn't.
The technical term "almost surely" applies to all p=1 probabilities, even those where it's really certain that the event happens. This does not match with common understanding of "almost surely" because we'd not usually make a statement like "the decimal expansion of pi almost surely contains the number 4", but mathematically it's a correct statement.
It's important to keep in mind that this is technical terminology, so intuition does not always apply. It also means that it does not make sense to talk about women being almost surely pregnant without first defining a probability distribution. If we define each woman to be pregnant with a probability of 1%, independent of the pregnancy status of other women, and we have an infinite number of women, then almost surely at least one of them is pregnant. That is, the probability that at least one of them is pregnant is 1. But it's not certain that at least one of them is pregnant, it could be that all of them are not pregnant. Similarly, if we choose a number uniformly in the interval [0,1], then the probability of choosing 1/3 is 0, but it's possible that we chose 1/3. We cannot ignore that as an impossibility either, otherwise when we chose a number x, we could always claim "it's impossible that we chose exactly x!".
In probability theory, the statement "event S occurs almost surely" means by definition that probability of S occuring is 1. This is indeed confusing, but it's completely justified, given constraints.
When non-mathematicians talk about probability, the situation they usually have in mind is that they have n possible events that can occur, each is equally likely, for instance the fair dice throw. This leads to assigning each one of the events the same probability, in the case of dice it's 1/6. We also want to be able to assign the probability to the situation of some subset of events occurring, for instance the probability of the situation that number on a dice is even corresponds to a subset {2 was thrown, 4 was thrown, 6 was thrown}. By common sense, this should be 1/2, and generally the probability should be (number of events in the subset / number of all events). This approach is how probability is usually taught in high school.
The problem is that this approach, while useful, is totally inadequate approach for lots and lots of things that people care about. For instance, say we want to investigate the probability of earthquake occurring. We want to be able to answer the question "what's the probability that we will not have an earthquake in the next t seconds?". Assuming that earthquake are totally unpredictable, we should expect that if we wait s seconds, the answer to this question is still the same, so in probability terms, P(next earthquake will not occur in next t+s seconds provided it will not occur in the next s seconds) = P(next earthquake will not occur in next t). It's really not obvious how to apply the finite set of equally likely events method, if it's at all possible.
That's why most of the mathematicians use similar, but much more general approach. Just like before, we consider set of possible elementary events that can occur, but this time we don't assume it's finite, and we don't assign each one equal probability. Instead, we assign probabilities to subsets of events, so that it's subject to few simple rules, but otherwise in a completely arbitrary way. The rules are: probability of any of the events occurring is 1, and A_1, A_2, ... are pairwise disjoint sets of events, we assign to their union the sum of probabilities.
For instance, we want to model the archer shooting the circular target, so we let events be the points on the target (there are infinity of them), and we let the probability of subset of events be its area (assuming that whole target has area one). Now it makes sense to ask, say, what's the probability of archer hitting arrow within half of a radius from the center of the target? (it's 1/4). But, since every point has area 0, the probability of an archer hitting any particular point is 0, thus probability of archer _not_ hitting that point is 1, meaning that archer will _almost surely_ not hit that point. Almost, because while for every point the probability of archer hitting it is 0, archer will hit _some_ point. That's why mathematicians say almost surely, and not surely, when talking about evens with probability 1.
It's not proven, it's the definition. If there is a subsequence that isn't contained in a non-repeating infintely long sequence, then it isn't a non-repeating infinitely long sequence.
This is not true. For instance, sequence 00 does not occur in sequence
010110111011110111110111111...
which can hardly be described as "repeating" (or periodic) sequence.
The point here is normality. Fix alphabet A having b letters. We say that a sequence S of elements of A is normal if, for every word w, we have
lim n->inf C(S, w, n)/n = b^(-k)
where C(S, w, n) is the number of occurrences of w in first n letters of S, and k is length of w. More intuitively, S is normal if every word occurs in S with the same asymptotic frequency.
So, if a sequence does not contain some word as a substring, it cannot be normal by above definition.
You're conflating two things. The technical definition of "normal" is that it contains every finite string as a substring. That will imply that it's infinite and non-repeating.
However, you can have a string that is infinit and non-repeating but not normal.
Pi is infinite and non-repeating, but it is not known to be normal.
Yes, because among other things it's clearly implied by "While a general proof can be given that almost all numbers are normal ... " If true, then the set is infinite.
You see, it means that there's the complete description of our universe in there, position of every atom, at every nanosecond. Somewhere in pi there's you living your life.
Still meaningless, because of the insoluble problem of access. Within Pi there's a cure for cancer, written entirely in iambic pentameter, but that fact has no practical meaning because we can't find it.
Emphases on insoluble, as to find an item of interest, on average, would take longer than the history of the Universe till now, longer than human civilization or any encoding scheme is likely to be in use....
But this is as meaningless as it is true, because the problem is not that those secrets are present, the problem lies in locating them.
I had this conversation years ago on Usenet -- the same question, the same answer. My correspondent seemed to think it was remarkable that all those riches lay within π. He didn't seem to realize that the same could be said of any normal, infinite sequence of digits, of which there may be an infinite number (not proven, but it's possible that e, √2, indeed the square root of any non-perfect-square number, all fall into this category).
If the grains of sand on a beach could be decoded as binary bits (or as digits in any base) by some accepted convention, and making the same assumptions about normality and infinity, then a beach also contains every secret, every book that has ever been written or will ever be written. But it's the same problem - those secrets cannot be located, distinguished from sequences we might label meaningless, simply because they're answers we're not interested in.
http://en.wikipedia.org/wiki/Normal_number
A quote: "While a general proof can be given that almost all numbers are normal (in the sense that the set of exceptions has Lebesgue measure zero), this proof is not constructive and only very few specific numbers have been shown to be normal. For example, it is widely believed that the numbers √2, π, and e are normal, but a proof remains elusive."