Two-hundred-terabyte math proof is largest ever

CJefferson · on May 27, 2016

Darn, I had no idea one could get into the media with this kind of stuff.

I had a much larger "proof", where we didn't bother storing all the details, in which we enumerated 718,981,858,383,872 semigroups, towards counting the semigroups of size 10.

Uncompressed, it would have been about 63,000 terabytes just for the semigroups, and about a thousand times that to store the "proof", which is just the tree of the search.

Of course, it would have compressed extremely well, but also I'm not sure it would have had any value, you could rebuild the search tree much faster than you could read it from a disc, and if anyone re-verified our calculation I would prefer they did it by a slightly different search, which would give us much better guarantees of correctness.

nsimplexx · on May 27, 2016

I have a question about computer-generated proofs, as I am doing something similar.

When you encode the search tree as a proof, is there a technical name for that technique?

In my situation I encoded the proof as a sequence which is just the depth-first linear labelling of the tree. Then I proved why the sequence represents a proof. But if there is a most standard terminology/literature on this step, it would make it easier for me to talk about.

CJefferson · on May 27, 2016

I've had a similar problem, there is nothing that useful I'm afraid. Probably best to look into general discussions of trees.

One thing worth trying to do (if you can) is separate the generation of the tree from the tree itself -- hopefully you can imagine the tree existing as a complete object (even though you never store it all), and formulate proofs based on that.

clarus · on May 27, 2016

In the community of the theorem prover Coq, a proof obtained by running a proven correct decision procedure is called a "proof by reflection".

There is also the "proof by certificate" approach. You generate a potentially large proof certificate by an unverified program. Then you run a proven correct (small) checker on the certificate to concude that the theorem is correct.

tel · on May 27, 2016

Proof by exhaustion?

tluyben2 · on May 27, 2016

> Darn, I had no idea one could get into the media with this kind of stuff.

With the right spin anything can get into the media. It helps if no-one used that before; your 63000 are far more impressive but won't get much traction after this. However it is summer now so the coming months it'll be quite easy to get mostly anything in the news(paper).

ifdefdebug · on May 27, 2016

I don't understand your objection. Couldn't you have published just the algorithm of your proof - an deterministic algorithm to reproduce those terabytes of data is arguably a compressed version of the data itself, and thus of the proof.

This problem has much more than just a search tree. They had to check 10 to 2300 possibilities to proof their counter example, so a simple search tree was no option and they had to boil it down to "just under 1 trillion". So lots of maths involved here.

jerf · on May 27, 2016

I don't understand why you think CJefferson could have published their algorithm but the people in the news somehow couldn't. I guarantee you they did not write a 20-terabyte algorithm.

paulddraper · on May 28, 2016

Maybe they had a lot of typewriters and then visited a zoo

CJefferson · on May 27, 2016

Hey, we did lots of maths too! (Well not me personally, but lots was done).

There is actually 12,418,001,077,381,302,684 semigroups of size 10. We only had to find 718,981,858,383,872 by brute force search (which still involved a whole bunch of maths and case splitting), which is 0.006% of the total. There is a whole PhD in the maths involved.

Of course, we could (and have) just published the search which has to be redone, and the maths which is required to prove the piece of search to be done.

John_Baez · on May 27, 2016

He didn't get into the news because he didn't solve a problem for which a famous mathematician (Ronald Graham) had offered a prize. Also, it's somewhat more exciting to use a huge calculation to answer a yes-or-no question that anyone can understand, rather than to count a huge number of fairly obscure gadgets: semigroups of size 10.

(Personally I love semigroups, and I think it's cool that you counted all the ones of size 10. But any story about them in the popular press would focus on the sheer size of the calculation rather than what it actually achieved.)

ifdefdebug · on May 27, 2016

OK, sounds interesting enough. So maybe your publication didn't make it into the news because the problem you solved didn't have Pythagoras in it's name :)

jacobolus · on May 27, 2016

Once you have a full explicit list of all the semigroups of a particular size, is there anything else interesting you can do with it?

CJefferson · on May 27, 2016

They are useful for exploratory maths. If someone has an idea for a theorem "All semigroups with properties X and Y are of type Z", while an automated proof might be major work, a simple check of all semigroups up to a certain size is trivial, and a counter-example doesn't require a complex proof.

However, for numbers of the size I was looking at, storing them is impossible. We were just interested in how many there were (which wasn't known, and we don't know how to calculate without enumerating them through search).

curiousgal · on May 27, 2016

Where the f did you store all of that?!

CJefferson · on May 27, 2016

Oh, nowhere, just dumped it as we generated it. In my experience storing these kinds of things often isn't that useful anyway, when proving correctness of the answer it's as easy to start from scratch as go through a trace of what I did previously.

SilasX · on May 27, 2016

Ah, the benefits of having your problem in PSPACE...

hatsunearu · on May 27, 2016

If you can reduce a proof of a theorem to a finite number of operations, then I'd argue (and you'd agree) that the scheme + program (or just the scheme) is sufficient proof.

Aardwolf · on May 27, 2016

Probably the code used to generate the 200TB was not 200TB itself, right?

So in a sense, the code is the proof, and the 200TB is just something it generates as a side effect.

Houshalter · on May 27, 2016

In fact you can encode any proof to a few hundred lines. Just write a program that brute forces through all possible proofs. If that counts, then no proof need be longer than it's conjecture plus a constant.

timlyo · on May 27, 2016

And then if we get enough monkeys/children in a room tapping away, eventually they'll write enough programs to solve every problem. That's it, maths is solved.

firethief · on May 27, 2016

I don't know, they tried that with Windows Phone apps...

edko · on May 27, 2016

I'm not sure about that. Those "all possible proofs" will be an infinite numerable set. To pick a specific proof from the set, you will need an arbitrarily large number as the index, so your few hundred lines will not be enough (assuming line sizes are finite).

rspeer · on May 27, 2016

I think the "plus a constant" here refers to a bound on the size of the proof. So you don't need the exact index, just a really rough upper bound. (This would, of course, be unreasonably inconvenient to check.)

poppingtonic · on May 27, 2016

Hooray for proof search techniques. How much data would the Godel-numbering process generate, though? I'm assuming such a search would use this, unless I'm mistaken about my mathematical logic.

ifdefdebug · on May 27, 2016

I'm not sure: doesn't the halting problem apply to this? In other words, can you write such a brute force program that is guaranteed to halt after a finite number of steps, providing a yes or no? I guess not.

hyperpape · on May 27, 2016

It doesn't, and it's interesting why. You can enumerate all possible proofs. If there is a proof of a given statement, you will eventually find it. But if there is no proof, you will never know, because it always could be the next proof you haven't looked at.

For the same reason, for any program that halts, you can run it long enough, and see that it halts. But (unless you are lucky enough to detect a loop), if it doesn't halt, you will never witness that, you will only be able to say "it hasn't halted yet".

It definitely can take a while until that kind of asymmetry becomes intuitive.

ifdefdebug · on May 27, 2016

> But if there is no proof, you will never know, because it always could be the next proof you haven't looked at.

But that's exactly what I meant. So the brute force program proposed in the post I replied to doesn't exist, right?

Let's say, you have proposition A. You write a program to brute force proofs for A. If it finds a proof, it will halt and output Yes, if it finds a disproof it will halt and output No, otherwise it will check the next proof.

So if you say that your brute force program can proof/disproof proposition A (and A1, A2, A3, ... for that matter), you will have to proof that your program will halt for sure after a finite number of steps. Now that's proposition B.

If you find a clever proof for proposition B, then you can claim that you wrote a program that can proof or disproof proposition A - maybe in a zillion years, but it can.

If you can't find such a proof for proposition B, you can always try to write a brute force program to proof/disproof proposition B... then C, D, E, etc. :)

And here comes the halting problem: you may be able to proof that a specific program will halt on a specific input, but it can be very hard to proof. But you can't proof that a specific program will halt on every input. This is directly related to Godel's incompleteness of formal systems.

Certhas · on May 27, 2016

Every conjecture that is provable can be proved by the proposed program of length (conjecture + constant) in finite time.

That was the original statement:

> you can encode any proof to a few hundred lines.

This type of encoding is funny of course. You can not determine from the encoding of the proof whether the encoding is the encoding of a proof (as you point out). But if you have a proof, and thus know that a proof exists you can "encode" it very succinctly in this program for whatever that is worth.

ifdefdebug · on May 27, 2016

> you can encode any proof to a few hundred lines.

Yeah, I misread that. The sentence implies the existence of a proof. So I was clearly talking about something different. Thanks.

jerf · on May 27, 2016

There's also the problem of identifying a proof when you have it. Hooray, after running a super-exponential number of steps in the real universe somehow it turns out Turing Machine #38,384,768,209,040,100 halts after fifteen googleplex and change steps. What does that mean?

(This is not criticism, this is elaboration. The approach is useless in practice, but deeply understanding all the reasons why is a good exercise in computer science.)

sp332 · on May 27, 2016

This was an issue in another recent story. https://news.ycombinator.com/item?id=11676918 The incompleteness theorem bites you before the halting problem does.

Ar-Curunir · on May 27, 2016

You just look for all proofs of a certain length. This is equivalent to checking if a Turing machine halts within a bounded number of steps. This is a NP problem.

bweitzman · on May 27, 2016

You are not trying to prove the thing, just trying to encode it, The assumption is that the proof already exists.

tel · on May 27, 2016

There's a lot of fine distinction, and a hell of a lot of study, of these kinds of questions. One thing you might ask of your proofs is that they are immediate—which is to say, they can be (a) grasped and (b) believed, at least in principle, after only a finite amount of work. Further, you should be able to prove that your proof has these properties.

Without these properties, it's hard to call something a proof since it might simply go on forever and never resolve the question in the way you wish it would. This is absolutely the realm of things like the Halting Problem.

So 200TB of data plus a short proof that the search completed could at least in principle be immediately checked in finite work. On the other hand, the mere program which generates this data could not be unless it also comes with a proof of its own termination. We have no general process to generate these, so you must create a specific one for your generation program.

If it's cheaper to just spit out 200TB than communicate why you're sure that your program won't spin, then the better proof is 200TB long.

---

As a side note, this is sort of odd sounding because we're implicitly saying that "small is better" in proofs. In an ultimate sense this is probably true---if I had before me all of the proofs of something I would want to order them and pick the smallest.

But in practice, we cannot enumerate proofs and must expend energy to achieve new ones. Thus, there are at least two kinds of energy optimization which prevail. The first is how much effort is required to achieve a proof and the second is how much effort is required to communicate/transmit the proof to someone else.

This 200TB proof might be well-optimizing the search question and it might also be failing on the communication question.

Dylan16807 · on May 27, 2016

Once you know the answer, you say "Take this program and run it with n=7825". That is your compact finite proof, not the untested program. You'd don't have to meta-prove halting, you just give it the required CPU hours.

tel · on May 28, 2016

While I think what your saying is true, it's frowned upon. It's saying "I've discovered a truly marvelous proof and its at the bottom of this cave". We've dodged the infinity risk, but not the resources involved in communication risk.

So it's kind of just crap UI at this point. I think that given that proofs are essentially social objects, this is a pretty big negative on that style of proof even if it doesn't totally sink the ship.

Dylan16807 · on May 28, 2016

The communication is easy. You show someone your nice compact program, and they don't have to run it to understand it.

A 200TB proof is literally impossible to communicate to a human. Where is the benefit to that formulation?

The huge version is not better for a machine either. 200TB is a lot of storage, and you could rerun 30000 CPU-hours of calculations in under a month on a single semi-high-end 4U server.

tel · on May 30, 2016

But if I don't trust you then your program and limit can only be verified by my execution of it. This might be incredibly or even in feasibly expensive and unless you've also got a resource bound proof then I might be sunk.

On the other hand, code which verified a fully expanded proof may be very simple and easy to check/trust/believe.

prof_hobart · on May 27, 2016

Depends on how you look at it. The 200TB seems to be the data required to prove that none of the possible combinations works.

If it had been temporary data to get to a simple proof of something, then I'd agree that this data was just a side-effect.

But if I'm understanding it correctly, as it stands even if you're told that it stops at 7,825, you'd still need all of this data to prove that it's true.

mafribe · on May 27, 2016

There is an increasing appreciation that large proofs need "proof engineering" that is similar to software engineering. See e.g.

- D. Aspinall, C. Kaliszyk, Towards Formal Proof Metrics.

- G. Klein, Proof Engineering Considered Essential.

- G. Gonthier, Engineering Mathematics: The Odd Order Theorem Proof.

- T. Bourke, M. Daum, G. Klein, R. Kolanski, Challenges and Experiences in Managing Large-Scale Proofs.

JadeNB · on June 2, 2016

Links, unfortunately non-free:

Towards formal proof metrics: http://link.springer.com/chapter/10.1007/978-3-662-49665-7_1...

Proof engineering considered essential: http://link.springer.com/chapter/10.1007/978-3-319-06410-9_2

Engineering mathematics: http://dl.acm.org/citation.cfm?id=2429071

Challenges and experiences …: http://link.springer.com/chapter/10.1007/978-3-642-31374-5_3

infruset · on May 27, 2016

A relevant article on large proofs: http://gallium.inria.fr/blog/large-proofs/

cantagi · on May 27, 2016

"but when you reach 7,825, it is impossible for every Pythagorean triple to be multicoloured"

So they didn't know in advance how long it was going to take or the size of the output?

ColinWright · on May 27, 2016

Correct. They didn't even know if there was a finite number where colouring became impossible, but it was strongly believed there was such a limit. Privately, people had their own ideas - sort of Fermi estimates. I don't know anyone who made such estimates public.

dmichulke · on May 27, 2016

Exactly, the program would have never halted if it were possible to colour all triples.

Worse even, no one would know why it wouldn't halt...

In other words, you have to be lucky to find a counterexample to the given assumption otherwise you're stuck publishing an increased lower bound

userbinator · on May 27, 2016

I think "brute force math" would be a good name for this technique --- that diagram reminds me of finding hash collisions:

http://www.links.org/?p=6

dylanz · on May 27, 2016

Is the data compressed and piped into a known compression algorithm, or is there some other technique used? I'd love to know how the actual program is run. Side note... the visual output is really interesting to see, and I'll admit I crossed my eyes at it to see if I could see some sort of pattern (read: there was no sailboat).

nrfc · on May 27, 2016

The authors used this tool:

https://www.cs.utexas.edu/~marijn/drat-trim/

There are publications and examples on the website that explain how the compression the authors used works.

vanderZwan · on May 27, 2016

> Although the computer solution has cracked the Boolean Pythagorean triples problem, it hasn’t provided an underlying reason why the colouring is impossible, or explored whether the number 7,825 is meaningful, says Kullmann.

Kind of reminds me of why we were always taught to avoid proof by induction (the mathematical kind[0]) when possible.

Also, funny coincidence, but the shortest math paper published in a serious journal[1] was also a "no insight into why" kind of proof.

[0] https://en.wikipedia.org/wiki/Mathematical_induction

[1] http://www.openculture.com/2015/04/shortest-known-paper-in-a...

j2kun · on May 27, 2016

That's the problem with counterexample proofs, though. If something is only true for a large-but-finite range of numbers, there might not be a satisfying reason why.

gsam · on May 27, 2016

If you verified the code performed the correct function, ensured it ran correctly and computed the result, then the code itself serves as most of the proof (not the 200TB). Really what you want to be peer-reviewed is the code, not the swathes of data.

nrfc · on May 27, 2016

> If you verified the code performed the correct function, ensured it ran correctly and computed the result, then the code itself serves as most of the proof

But then you have to trust their program doesn't have any bugs, and checking the proof requires a lot of computational power.

Instead, the authors produced a very large input file to a very small program. This way, instead of trusting the program they used to produce the input file, you only need to trust:

1) the input file corresponds to the theorem statement in their paper; and

2) the certificate checking program is correct

The benefit of this approach over the one you suggest are numerous:

* The effort of part 2 amortizes and isn't specific to any particular problem.

* The computational cost of re-checking their proof is considerably smaller than the cost of finding the proof certificate, allowing for the proof to be re-checked by anyone with a bit of spare computing power (as opposed to requiring a super computer in order to replicate the result).

umanwizard · on May 27, 2016

> But then you have to trust their program doesn't have any bugs

This is no different from a typical math paper. Proofs can have "bugs", too; and sometimes they're fatal. That's -- in theory -- one of the points of peer review.

nrfc · on May 27, 2016

> This is no different from a typical math paper

It's substantially different from a typical math paper.

First because programs are typically much larger than even large mathematical proofs (more on this below).

Second, because even after you trust the program, you then still have have to run the program to make sure the paper's result is correct. Far better to run the program once and then produce something you can check quickly than to require every peer reviewer to buy hundreds/thousands of dollars of compute time. Think P vs. NP: why force the reviewer to solve an NP problem when you can just as easily hand them a P problem.

> Proofs can have "bugs", too

But the reviewer's bug checking obligation is typically isolated to the paper at hand; i.e., the reviewer doesn't have to worry about the correctness of cited papers -- it's assumed those results are correct. By contrast, software implementations -- even for purely mathematical code -- often contain tens of thousands of lines of implicitly trusted code [1].

Reviewers therefore (rightly!) say of ad hoc programs submitted as a portion of a proof: "sorry, I can't possibly know whether there are any important bugs in these tens of thousands of lines of code upon when you depend".

As a result, the SAT and theorem proving communities have developed highly trustworthy proof checkers so that the portion of code the reviewer has to trust can be meaningfully peer reviewed (and, more-over, peer reviewed apart from any particular theorem).

[1] This is true even if you don't include things like the underlying operating system. Standard libraries, mathematics packages, compiler implementations, solver implementations, specialized software packages for the application domain, etc. etc. Some of this software a reasonable person will trust, but some of it really isn't interrogated well enough for the purpose of mathematical proof... and even just sorting out what's trustworthy and what's not can take a significant amount of time.

alimw · on May 28, 2016

Surely the value of this is that it provides a counterexample to a long-standing conjecture. The proof that it is a counterexample may not be too enlightening, but that's hardly the point.

jbmorgado · on May 27, 2016

This isn't mathematics, this is data analysis.

A mathematician doesn't come up with a research where he says: "Ei look, we are now extremely confident that this theorem applies in every case... we don't have a full mathematical proof for it but I can show you all these cases where it holds true."

No, a mathematician proposes a result and then goes on to prove or disprove mathematically that it holds true under specific conditions in absolutely every case where those conditions are met.

auggierose · on May 27, 2016

Of course this is mathematics. Obviously, it would be nicer to have a real understanding of what's going on, but now that people know what to prove, such a proof might come along soon.

mathetic · on May 27, 2016

You are either completely oblivious to computer assisted/generated proofs or you have a niche definition of mathematics for yourself.

Also they are not saying "all these cases where it holds true," they are saying "in all the cases, it holds true," which is the definition for validity.

BugsBunnySan · on May 27, 2016

IANAMathematician, but, a brute force proof is still a proof though, isn't it?

If I have the theory, that all positive integers smaller than 5 are also smaller than 6, I would assume I'm mathematically allowed to 'brute force' the proof by enumerating all the numbers > 0 and < 5 and then testing for each that it's also < 6.

If I can prove I enumerated all the positive integers smaller than 5 and didn't find any exceptions, I guess I proved my theory...

It's a silly way to prove my silly theory, but it is a proof. (I have also a truly remarkable proof, which this comment is too small to contain. ;) )

dagw · on May 27, 2016

Computer assisted proofs are historically frowned upon because you eventually end up in a black box you have to blindly trust. Can you verify that your code is correct according to the language spec? Can you verify that your compiler is bug free and follows the spec? Can you verify that your CPU is bug free and runs the output from your compiler correctly? Can you verify that your storage medium is correct and hasn't corrupted any of your intermediate results etc. etc.

Practically people will eventually say that if your test has been run N times using M different compilers and P different computers and they all give the same answer then you're probably correct, but having a proof that humans can reason through from beginning to end makes people feel more comfortable.

That being said computer assisted proofs are becoming more and more common and mathematicians are getting more and more comfortable with them as the verification tools surrounding them keep getting better. And even if computers aren't used for the final proof they are often used for intermediate tests. For example if you're making a statement about all primes it makes sense to quickly test the first few hundred million primes to make sure there aren't any obvious counter-examples.

thfuran · on May 27, 2016

>it makes sense to quickly test the first few hundred million primes to make sure there aren't any obvious counter-examples

It's kind of absurd the sorts of endeavors which computers have rendered not only possible, but relatively trivial.

throwawaykeno · on May 27, 2016

FWIW the answer to all of the questions you ask in the first paragraph is "yes, but come on, that level of skepticism is a waste of time and money". It's like demanding proof that the reviewer wasn't hallucinating when reviewing the paper.

> but having a proof that humans can reason through from beginning to end makes people feel more comfortable.

Saying that humans can't reason about computer-checked proofs is like saying humans can't reason about source code; which is to say, it's obviously false in the general case.

j2kun · on May 27, 2016

> IANAMathematician, but, a brute force proof is still a proof though, isn't it?

IAAMathematician and yes, it is still a proof.

gone35 · on May 27, 2016

Conjectures are mathematics too.

[1] https://en.wikipedia.org/wiki/Conjecture

infinidim · on May 27, 2016

I have discovered a truly marvelous demonstration of this proposition that this margin is too narrow to contain.