If FLT hadn't already been proven then Feynman's argument would explain away the substantial numerical evidence collected in it's favor. In other words, it would suggest that FLT is just a statistical accident and true for no particular reason.
Of course, we now know that FLT is related to deep ideas in number theory.
I mean Feynman's argument really seems to show that it should hold for sufficiently large N. I think one could reasonably argue that it's a statistical accident that FLT holds for N >= 3 as opposed to say, N >= 100. I think even Wiles' proof only works for N sufficiently large (at least N >= 5). Small N of course were handled by earlier results.
Very cute argument, and very much in his style--he was famous, as the author notes, for heuristic arguments that weren't very formalizable but had a lot of beauty.
One story I heard is that a computer scientist tried to explain the P=NP? problem to him; Feymnan couldn't understand why this was a problem. It was obviously true that P != NP, what even needed proving?
Per "Surely You're Joking" and the article, one of the cognitive techniques Feynman used was to keep a physical example, a demo, in his mind that conformed to the math being explained. I wonder if he used that for this, and what his model was.
NP is most certainly not "trivially definitionally not equivalent (equal) to P". (Do you think everyone puzzling over the question of whether P = NP is a moron? You are aware there is a million dollar prize for a proof one way or the other, yes?). What definition are you using for NP?
It's true I did not understand you were joking, but I didn't personally malign you either. If you are referring to the word "moron", I didn't call you a moron; I was shocked that you thought P vs. NP was so trivial, and asked whether you therefore thought everyone puzzling over it was a moron.
As for whether "the physical model of P != NP is definitionally trivial", I suppose it depends on what "the physical model" is, but I don't see any physical reason why it should trivially be impossible to decide the Boolean satisfiability problem in polynomial time... Why shouldn't there be some algorithm to do so?
I didn't malign others, either, as I didn't call anyone a moron... Yes, it's true I implied that you might think that others who puzzled over P vs. NP were morons. I was, after all, laboring under the belief that you thought P vs. NP was trivial. Is that a malignment?
Anyway. Nevermind that. Feynman joked that P was trivially not equal to NP? Where can I find that joke given by Feynman?
I presume you mean the idea of physical examples in general comes up in "Surely You're Joking"; I can't find any discussion of P vs. NP (or Fermat's Last Theorem, for that matter) within it.
As for why Feynman didn't publish a full(er) proof: a full(er) proof of what? He didn't have anything near a proof of P != NP or of Fermat's Last Theorem... There was nothing of the sort to publish.
Oh, my question was more in the vein of "Obviously the standard notation doesn't make sense here. I assume you are not an idiot, so you must have meant something else."
Alas, jokes and sarcasm don't travel well over text. One of the reasons they are frowned upon in the HN guidelines.
Wasn't the idea that he integrated on n for 100 to infinity, to arrive at that probability? So the probably should be interpreted as the probability of an eventual counter-example.
If I understand it right, it's a number with which he could have estimated if it's worth using a computer to perform a brute force search for a possible solution.
Once he had 1e-33 for all n > 100 that could mean that even trying with 1M computers where each tries 10G solutions per second (1e6*1e10) some millions of years could pass without the positive result. Then it's exactly reasonable to say "for my money Fermat’s theorem is true" as in, really not worth trying blindly.
I can't believe that the right interpretation of what he said. He must have known what the actual FLT meant. Furthermore, it would go against the mathematical tradition of what it means to "think that a conjecture is true". Compare with RH, for example: when people say it's probably true they absolutely do not mean it's true for all small numbers, in part because there are built-up areas of mathematics that depend on it being exactly true. Besides, brute-force search is a pretty terrible algorithm (in general), so finding out that it fails on a particular problem isn't that interesting.
Imagine somebody came to you at these "early" times (Wiles proved the theorem in 1995) with a "grand project" to use a lot of computers to search for a possible "solution," being able to try 10 billion Ns in one second. What would be your argument against such a project? Would you try some similar derivation to get an estimate of success?
I know, brute forcing all integers is impossible, but you can even "imagine" a "superquantum (and now not existing) computer" which can do a lot of small integers in parallel.
And yes, I know, probabilities aren't proofs, especially not for integers. One counterexample is enough:
3987^{12} + 4365^{12} = 4472^{12}
Also interesting to see the accidental(?) order of magnitude of the numbers involved.
I am also not 100% sure Feynman wrote something like this. However the 2014 Fields Medal was awarded to Manjul Bhargava for studying random elliptic curves.
There's a similar argument for why Goldbach's conjecture* might just be a statistical fluke. I remember realizing this in high school and deriving the probability – alas it's not a novel idea and Wikipedia has a great summary: https://en.wikipedia.org/wiki/Goldbach%27s_conjecture#Heuris...
It's very possible that it's an unprovable true conjecture, which is kind of interesting in itself.
* every even number can be written as a sum of two primes
What's interesting about this argument is, if its true, it's an example of a simple conjecture that is true but unprovable. But not like Godel sentences, which can be proved under different axioms, or proved to be unprovable. This would be completely unprovable. And not because of some logical paradox, but simply because there is no mathematical reason for it to be false, it's just a coincidence.
There are also probably many conjectures like this. Freeman Dyson constructed one, that no digits of powers of two, reversed, make a power of 5. The reasoning being that the digits grow big quickly, and the base 10 representation is basically random and uncorelared with it.
I think other conjectures about digits of numbers are similar. E.g. Mathematicians have found it impossible to prove that any numbers are normal, or even if pi contains infinite 5's, and many similar properties. These things may only be probabilistically true.
I guess every false conjecture can be made to pass it. The trick is to make the set of items searched in large enough.
For example, to show that no elephants exist, start with the (infinite) set of all possible chromosome sets. The proportion of them that produces an elephant is zero. QED.
Examples from mathematics:
The number 42 does not exist (logic: pick an integer. The probability that it equals 42 is zero. QED)
Whoa, hang on. Statistical arguments for unproven conjectures are bad, but this counterargument is as bad or worse, especially when you start talking about infinity. Just to address your first example:
> The number 42 does not exist (logic: pick an integer. The probability that it equals 42 is zero. QED)
I object! What is your probability distribution function over the integers? Your phrasing sort of implies a uniform distribution, but there is no such thing as a uniform distribution on an infinite set, and as soon as you pick a plausible pdf the argument stops working.
It is easy to address your objection. On a uniform distribution on [n] := {0, 1, 2, ...n}, P(X=42)->0 as n->infty. This is similar to Feynmann's argument.
Those are two separate events. I'm talking about the event n = 42, which makes sense for all [n]. You're talking about the event n >= 42, which is a very different event.
No, I'm not talking about integrating the probability that n >= 42.
What's the chance that a random number in [n] is 42? It's 1/n, if n is at least 42. Sum that over all x in [n] and you get 1, i.e. the probability that there is a number in [n] that is 42.
Richard Feynmann was trying to argue probabilistically about the possible existence of a counterexample to FLT. To do this, he more or less tries to put a probability value on the event "a counterexample exists". The problem is that his kind of reasoning is similar to the probability of an event such as "x = 42". Counterexamples to FLT may, a priori, be as rare as the number 42 is within the integers, i.e. with probability tending to zero as you consider larger and larger sets of integers. But a very small probability of a counterexample, even a probability that tends to zero, does not mean a counterexample doesn't exist anymore than it says that 42 doesn't exist.
See I can't see the analogy from your argument to Feynman's for the following reason: you are computing the probability that x=42 for an individual x and saying that it is going to 0 as x->infinity. Feynman first computes the probability that N is x^n + y^n but then integrates that over all N and n in order to get the probability that there is a solution anywhere. His argument takes into account the fact that there could be just one single lone solution in a infinite sea of non-solutions, since his probability estimate is for "there is a solution anywhere" not "this particular number is a solution". And I think that when you add that part to your estimate, you do get a probability of 1 that 42 exists somewhere, even if individually you get a low probability of it being at any given large x.
This detail doesn't matter. He's still computing a tiny nonzero probability that is only zero in the limit. We came up with other examples of tiny nonzero probabilities that are zero in the limit that can lead to the wrong conclusion, such as saying that 42 doesn't exist.
Feynmann's argument is clumsy and wrong. This kind of approximation is ok for a physicist but it doesn't work for mathematics.
Naw most of these wouldn't work when actually written out as Feynman did. Example: you can easily give an upper bound to the "chance that N is a prime" that goes to zero as N increases. But you would also need to show that it's sum from 0 to infinity over all N also goes to zero.
In fact, there's the classic result that this probability is about 1/log(N) [1], which diverges towards infinity. Hence you would probabilistically expect infinite primes and would be correct.
That's because you start with a set (the integers) that contains disproportionally many primes. Just pick a larger set of numbers to compare things with. If you look at the reals, the probability of finding a rational already is zero, and there aren't primes that aren't rational numbers.
Mathematicians of course, would not do that. They do use the technique of estimating the solution size to get a feeling for the difficulty of a problem, but always try to pick a reference set that is such that the exercise teaches them anything, and starting with all reals doesn't (how do they know that? Intuition, or they may do it anyway, but realize half-way through that it is silly, or even publish it, and, eventually, get corrected)
Feynman's solution has more good math, but still makes the fatal mistake of stating that "measure zero implies does not occur" (although he probably knew, since he states it was good enough for him)
One can 'prove' the non-existence of any countable infinite set of numbers this way.
Your arguments are just fundamentally different from Feynman's. We're trying to estimate whether something exists. If you are using a counting measure on the integers, say, then whether or not something exists is whether or not the measure of that set is zero. If you're dealing with, say, a Lebesgue measure on the reals then the measure being zero tells you nothing about whether the set is empty. Put the counting measure on the reals and then you can work, but then the answers you get won't be zero.
> to show that no elephants exist, start with the (infinite) set of all possible chromosome sets. The proportion of them that produces an elephant is zero
This is not obvious. If you're going to postulate an infinite number of possible chromosome sets, you're also going to have to admit that an infinite subset of them all produce elephants.
For example, if you show genome A which does not produce an elephant, perhaps I could show genome A', consisting of (1) genome A; (2) a reference elephant genome; (3) some chromosomes that have the effect of disabling genome A.
Several of these don't work, as it's assumed you tackle smaller values in another way, before your -> infinity method kicks in, so most of these would be easily knocked off.
"the integers don't exist, because the percent of real numbers that are integers is effectively 0" is what the parent post is implying, which "works", in the sense that using this proof methodology "works"
But it's not the proof "methodology" given by Feynman. Fenyman's result also gives a moderate-sized probability for small N, then eliminates those through known mathematics.
By this probabilistic reasoning, we shouldn't bother searching for "meaningful" polynomial time algorithms because they are practically non-existent in the sea of all exponential time algorithms. If anything, this argument should be a motivator to work on a problem because if an underlying structure exists, it would be rare and beautiful.
Very nicely written article, showing us a glimpse of the inner workings of a great mind.
"the probability that N is a perfect n^nth power..."
Can someone explain what probability means here in relation to N? From my understanding, it depends on what N is for you. If it's a constant, that probability is obviously 0 or 1. So that can't be it. Then N must be some kind of random variable. But with what distribution? And in what kind of system can the probability of event(random_variable) involve random_variable itself?
The meaning here is: pick a positive integer N, what is the chance (aka probability) of it being an n-th power of another positive integer. And you are correct, this probability depends on N.
I don't get it. The probability must depend on how likely I am to select any specific integer, i.e. probability mass function of N. The probability cannot depend on the value of the random variable itself but could involve any parameters that define its distribution.
I would consider something like the following a valid question:
"Let N be a random integer between 0 and M-1 with uniform distribution. What is the probability that N is even?"
Then an answer could be "the probability that N is even is 1/2 if M is even and (M+1)/(2M) if M is odd". See this does not involve N but does involve M which is a parameter for the distribution of N.
Your explanation seems to invoke some "common sense" which I am not able to unify with my understanding of probability theory.
One meaning is that summing the probabilities for N=1,2,3, ... and checking how many numbers actually meet the criteria yield the same result in the limit. More precisely, their ratio goes to 1 as N goes to infinity.
Edit: define f(N) as the number of numbers below N that match. Consider the function f(N)/N. We call a function of N the probability of N matching if it asymptomatically approaches f(N)/N.
You are thinking "given N, what is the probability that N is a perfect nth power?". Think about it in the reverse. "given N, what is the probability that the nth root of N is an integer". For a whole bunch of different Ns, we will get some real number between 0 and 1.
To match up to your example. "For a number M, m is the remainder of M: m = M - floor(M), what is the probability that m is zero given M"
Intuitively, you can think of N as a rough measure of the "ballpark" around where you are in the number line. As you move up to higher numbers, perfect powers get more and more scarce, so the probability of finding one by picking a number at random becomes lower.
Imagine you don't have an easy way to solve the equation x^n = N where x is an integer. Yet, for a given N you want to find a probability that such x (solution) exists. There is nothing here about how you select N.
Apologies to rain on the parade, but this sort of argument is very well known and commonly used in number theory. Since these techniques don't lead to proofs, they may not get as much attention in standard math education, but they are still well-known. Many number theory papers use this technique (often with more refinements) to estimate the number of expected solutions. I would be stunned if it wasn't used on Fermat's last theorem long before Feynman did.
Using way simpler math, the probability that an even number is prime is zero. One cannot conclude from that that there are no even primes (in fact, many mathematicians think the smallest prime is even (https://cs.uwaterloo.ca/journals/JIS/VOL15/Caldwell2/cald6.h... ))
Mathematicians do use this kind of back of the envelope calculations to get a feeling for whether a statement may be true, but they can never prove something.
Second, your first statement can substitute "an even number is prime" with "a multiple of n is prime" for all prime n, leading us to conclude that the odds of any multiple of 3, 5, 7, 11, 13, et al being prime are zero thus seemingly statistically disproving the existence of any prime number which is absurd.
1 could be a prime, too, as for example Legendre, Lesbegue, Cayley (in the Encyclopædia Britannica), Kronecker, Hardy and Sagan stated at least once (see the URL I linked to earlier)
One isn't typically called a prime for the same reason as mathematicians typically say 0^0 equals 1; it makes many theorems and proofs look better.
I don't understand why the distance between root(n, N) and root(n, N+1) is the probability that N is a perfect nth power. What if we choose to compute the distance between root(n, N) and root(n, N+0.5) ? By the same logic, if N is a perfect nth power, there exists one integer in the interval [root(n, N),root(n, N+0.5)], and since the distance between all consecutive integers is one, the probability is the ratio etc ... And what about 0.2? 0.1? We can keep going until the probability becomes zero ! And this can't be !
That's not exactly true. Density arguments similar to this are fairly common in number theory (even if the arguments aren't rigorous). Granted, one has to be careful with arguments like this, but they can often be useful.
There's hardly any denying that Feynman was quite a character; he (sometimes spectacularly) failed to live up to our expectations as to how such a brilliant intellect ought to behave.
That is probably a significant reason for his enduring appeal; a streetwise, bongo-playing physicist with a talent for quips.
finding that the probability is extremely small doesn't really get you any closer to proving it. for such a hard to prove theorem, it makes sense the probability is small. that's why it was interesting in the first place.
I find it hard to believe Feynman concluded from this that the theorem is probably correct. it only takes 1 case among an infinity to make the theorem false.
I think the idea here is that, in general, simple coherent properties of number tend to show themselves on smaller numbers more readily than on larger ones.
It would be quite unusual to have a problem stated in such simple terms and require a solution so far away. Thus this probability is used to infer just how large such a number must be, and eventually you're going to hit a point where there is more information encoded in the number than there is necessary to solve the problem, as which point no larger number could be a solution.
This is basically how induction works anyway: you produce a base case and an algorithm, and infer that the information contained therein meets the needs of the problem. Then any number which would encode more information is irrelevant (and thus sufficient) and you have an inductive proof.
My math is passably college level, so most of these proofs go over my head. I've always thought of FLT as stating "You cannot make n-3 cubes of side z by stacking/tiling n-3 cubes of side x and y", at which point it becomes something of a tessellation problem in 3 dimensions, which would seem to me to be the intuitive way to go about proving this.
I like your approach. It's funny, I never thought about it that way, I never even considered visualizing it. I always looked at it from the perspective of pure number theory.
My bad - I meant a^(n-3) many different cubes of side a plus b^(n-3) many different cubes of side b needing to cover the total volume occupied by c^(n-3) many different cubes of side c. In other words it is a way to try and find a proof by thinking in terms of shapes and using some kind of tesselation method.You could do the same in squares too I suppose.
Or just in lines... a^n many lines of length 1 + b^n many lines of length 1 to cover the total length of c^n many lines of length 1.
Or just in n-dimensional cubes from the start: 1 n-dimensional cube of side-length a + 1 n-dimensional cube of side-length b to cover the volume of 1 n-dimensional cube of side-length c.
Of course, both of these don't amount to much different than just saying a^n + b^n = c^n directly. :)
I don't suspect that decomposing it into a^(n - k) a^k + b^(n - k) b^k = c^(n - k) c^k for arbitrary k is particularly helpful, and, for reasons as illustrated above, I suspect the aid to intuition from thinking in terms of volume and tessellations geometrically isn't very great, but, I'm not an expert on Fermat's Last Theorem. (For all I know, something like this does get used in the proof...)
I don't understand what "the probability of" means here. Does the probability of 10^-31 mean that if we construct the integers in a different way, it might result in the theorem being false? Or is it that if you change the equation there is only at 10^-31 probability that you arrive at a theorem which is false/true? Or something else altogether.
That description of what Sophie Germain did is slightly glib and inaccurate, I believe. I will give an abbreviated, but accurate, to the best of my knowledge, explanation of what happened:
Note that, if you have proven Fermat's Little Theorem for some exponent, then you have also proven it for every multiple of that exponent. Thus, to prove it for all exponents (> 2) up to some limit, it suffices to prove it for all odd prime exponents up to that limit, as well as for exponent 4. Fermat himself gave an argument which worked for exponent 4, so afterwards, one could consider only odd prime exponents.
Note also that, if p is prime, and we have a solution to x^p + y^p = z^p, then out of {x, y, z}, precisely 0, 1, or all 3 are divisible by p (i.e., if any two were divisible by p, then so would be the third). If indeed all 3 are divisible by p, we may accordingly divide all through by p to obtain a smaller solution; thus, if there is any solution, there is a minimal solution where precisely 0 or 1 of {x, y, z} are divisible by p.
So to prove Fermat's Little Theorem for prime exponent p, it suffices to prove both of the following claims for all solutions to x^p + y^p = z^p:
(A) It cannot be the case that precisely 0 of {x, y, z} are divisible by p
(B) It cannot be the case that precisely 1 of {x, y, z} is divisible by p
If this can be done for every odd prime p up to some limit, FLT is established for all exponents up to that limit.
Germain did not do this. She did not have a strategy for proving (B). This prevented her from establishing FLT in full for any exponent.
Germain did manage, however, to discover a strategy for establishing (A) for various p. Specifically, she discovered a sufficient condition for (A) was the existence of another prime t in a certain decidable relation to p. Germain then manually searched for and discovered such t for each prime p < 100.
Why'd she stop there? Because it seemed like a nice place to stop. Nothing special about 100 except the human factor. One could keep going, and indeed, Legendre extended Germain's results to each prime < 197. Germain and Legendre were both aided by certain techniques they had developed in order to quickly identify certain candidate t that could work as the auxiliary primes for particular exponents p; however, these techniques broke down at 197, and while it is possible to find by brute force an auxiliary prime t in the appropriate relation to p = 197, the smallest such t is very large and the scale of the calculation was beyond feasible at the time.
A little more detail, since I bothered to go learn it:
Lemma: If x and y are coprime, then the gcd of x + y and (x^n + y^n)/(x + y) divides n.
Proof: Expand out the polynomial division (note that x + y does indeed divide x^n + y^n), and then divide the result by (x + y) again, observing a remainder of ny^(n - 1). Thus, the gcd of interest is the same as gcd(x + y, ny^(n - 1)). As y^(n - 1) is coprime to x + y (by coprimeness of x and y), this is furthermore the same as gcd(x + y, n), completing the proof.
Lemma: If x^p + y^p + z^p = 0, for odd prime p, with z indivisible by p, then (x + y) and (x^p + y^p)/(x + y) are coprime p-th powers.
Proof: As z is indivisible by p, so is -z^p = x^p + y^p, and thus so is its factor x + y. At this point, invoking the above lemma and the primeness of p, we find that (x + y) and (x^p + y^p)/(x + y) are coprime; as their product is a p-th power (-z^p), we conclude they are furthermore each p-th powers.
Sophie Germain's theorem: Suppose p is an odd prime, t is a prime, and the mod-t exponent-p case of FLT is true (in the sense that there is no solution to x^p + y^p + z^p = 0 (mod t) where x, y, and z are nonzero (mod t)), but the non-modular exponent-p (A) case of FLT fails (in the sense that there IS some solution to x^p + y^p + z^p = 0 in integers, all indivisible by p, which we can assume minimal so that x, y, and z are pairwise coprime). Then p is a p-th power modulo t.
Proof: Invoking the above lemma, we have some a, b, c such that a^p = y + z, b^p = x + z, and c^p = x + y.
Furthermore, since mod-t FLT is true, we have that (precisely) one of x, y, or z is zero mod t; WLOG, let this be x. But as 2x = b^p + c^p - a^p, we can invoke mod-t FLT again to conclude that one of b, c, or a is zero mod t. It cannot be b or c, as y and z, respectively, are nonzero mod t; thus, a is zero mod t. Thus, y + z = 0 (mod t), and therefore, by expanding out the polynomial (y^p + z^p)/(y + z), we see that its integer value is equal to py^(p - 1) modulo t. As x is zero mod t, we must have furthermore that (x^p + y^p)/(x + y) must be y^(p - 1) modulo t (note that this is nonzero modulo t). As the former and latter are both p-th powers by the above lemma, so is their ratio in modulo t arithmetic, which is p, completing the proof.
We can now rephrase Sophie Germain's theorem like so: To establish the (A) case of FLT for exponent p, it suffices to find some prime t such that BOTH the mod-t exponent-p case of FLT is true AND p is not a p-th power modulo t.
Such a t is the auxiliary prime for p discussed above. Note that the truth or falsehood of this condition on t relative to p is decidable by finite search.
A couple more comments:
As a consequence of the multiplicative group modulo a prime being cyclic, we have that, for primes p and t, EVERY value is a p-th power modulo t unless t is 1 mod p. Thus, an auxiliary prime t for odd prime p must be of the form kp + 1. Furthermore, as t cannot be 2, it must be an odd prime, and therefore k must be even. Finally, we can also rule out k divisible by 3 by again invoking the cyclicity of the multiplicative group modulo t (were k divisible by 3, we would have some primitive cube root of 1 modulo t which was furthermore a p-th power; the three powers of this value would sum to zero and thus provide a counterexample to the mod-t exponent-p case of FLT). So when searching for auxiliary t to prime p, we are looking for primes of the form kp + 1 where k is an even value not divisible by 3; for all such k up through 16, Germain and Legendre managed to prove that conversely, whenever kp + 1 is prime, it satisfies all the conditions to be an auxiliary prime for p (with a slight exception for the two cases where p = 3 and k is 14 or 16), which provides auxiliary primes for each p < 197; however, at p = 197, one must go beyond this, to a minimal k of 38.
I do not at the moment know if there are auxiliary primes for each p (and I suspect no one does?), nor how one goes about showing that, for various k, kp + 1 is automatically an auxiliary prime whenever prime (or whether this generalizes beyond k = 16). Oh well. More to learn, always.
The immediate problem with this result, upon first inspection, is that it's not unique to FLT. One can derive other integrals and inequalities that also yield tiny probabilities.
This comment isn't meant to be dismissive. I think it's interesting to explore arguments like this. Math progresses by exploring thoughts. But...
Couldn't we argue in the same way that there are unlikely to be any solutions to the simpler equation x^n = z^n?
After all, taking p(x) = x^(1/n - 1)/n to be the probability that x is an n-th power, as Feynman does, and then integrating p(x^n) dx from x = x_0 to infinity to find the expected number of n-th powers of the form x^n for x > x_0, as Feynman does with p(x^n + y^n), we find for n > 2 that this comes out to 1/(x_0^(n - 2) * n * (n - 2)), which is quite small for sizable x_0 and n.
This yields, for example, that the expected number of solutions (and thus an upper-bound for the probability of the existence of any solutions) to the equation x^100 = z^100 with x > 10 should be 1 out of 98 googol. This is about as certain as certain gets that there are no solutions... and yet solutions are as ubiquitous as ubiquitous gets!
Well, Feynman is assuming that x^n + y^n is as likely as any other number to be an nth power. He shows that under this assumption, FLT is very likely true. So if FLT is false, it probably wouldn't be due to some coincidental counterexample. There would have to be a mathematical structure forcing x^n + y^n to be an nth power in some cases. In your example, the mathematical structure forcing x^n to be an nth power is obvious.
The downside of Feynman's approach is that it can't get you any intuition about the structure, the things that aren't statistical randomness. And whether FLT was true or false depended exactly on whether the structure pointed one way or the other. So I have no idea why he was so confident in this analysis.
Fair enough on your first paragraph. (Though what distinguishes mathematical facts which are "coincidence" from mathematical facts forced true by mathematical structure?)
Actually, I'd say it is odd to find this sort of analysis to give great confidence about the results, not just because it ignores the possibility of structure, but also because it ignores the possibility of coincidence!
After all, there are some things which we probably want to call mathematical coincidences which heuristic argument would tell us are bogglingly unlikely and yet which nonetheless happen. For example, another probabilistic argument: Consider the basic arithmetic 2-ary operations +, -, * , and ^. There are 20!/(10! * 11!) full binary trees with 11 leaf nodes, 10^4 ways to label their internal nodes with one of these 4 operations, and 11! ways to assign the values 0 through 10 to those leaf nodes in some permutation. Thus, there are at most 20!/10! * 10^4 (overall, less than 10^16) values which can be generated using these operators and the natural numbers up through 10 once each (and this is an overestimate, ignoring the structure of, e.g., commutativity of + and * which causes less distinct values to be produced).
We would expect, therefore, that the closest we could get one of those values to line up with a particular unrelated mathematical constant, even focusing only on the fractional part, is no more than about 16 or so decimal digits. If there are thousands of mathematical constants we might compare it to, perhaps we'd get a match to about 20 digits on one.
And yet! And yet (1 + 9^(0 - 4^(6 * 7)))^(3^(2^(10 * 8 + 5))) lines up with e to 18457734525360901453873570 digits.
Now, what's happening here is that we have this other nice lining up of 9^(4^(6 * 7)) = 3^(2^(10 * 8 + 5)), which we can then plug into e ~= (1 + 1/n)^n with huge n. And this, in turn, is because 9 = 3^2 and 4 = 2^2 and 1 + 2 * 6 * 7 = 10 * 8 + 5. There's a bit of an explanation. And yet... surely if anything is a coincidence, this is?
So... I guess what I'm saying is, it is odd to use probabilistic arguments to rule out coincidence. The whole thing that makes a coincidence remarkable is that it is the sort of thing we would consider unlikely, and yet such things do occur, even in mathematics.
I don't consider the example you gave to be a coincidence, because if you know the definition of e then you know why one of those numbers could line up really, really well with e.
I'm basically saying that Feynman's argument rules out seemingly random counterexamples like 27^5 + 84^5 + 110^5 + 133^5 = 144^5, which disproved the sum of powers conjecture.
It seems to Feynman's argument works via the opposite of ruling out coincidences of that sort. It shows "If nothing surprising happens, then we expect very nearly zero counterexamples to Fermat's Last Theorem"... But the "surprising" in "If nothing surprising happens" encompasses both surprising structure (things which are true for some deep, clean reason) AND surprising numeric coincidences (things which are true for no good reason, but just happen, surprisingly, to line up in a nice, unexpected way).
Of course, we now know that FLT is related to deep ideas in number theory.