Hacker News new | past | comments | ask | show | jobs | submit login
Proving false in Coq using an implementation bug (github.com/clarus)
125 points by clarus on March 24, 2015 | hide | past | favorite | 61 comments



Well, I saw the title and was all prepared to link to Falso, but alas, I am beaten to the punch. However I am even more impressed by the fact that since the last time I looked at the Falso page, they now have a list of available implementations, which has already been updated now to include Coq!

Meanwhile, my three year old is now looking over my shoulder asking "What's so funny, dad? What's so funny?" and I have to admit... I have no idea how to answer that.


I'm 35. Does that help? Can you try to explain it to me then? :)

(I looked at the Readme, looked at the All.v file and .. I don't get it.)


Coq is a proof assistant, where "proof" is in the mathematical sense of the term. Thanks to Curry-Howard isomorphism [1], it happens that there's a mapping between proofs and programs, meaning that the "proof assistant" can do things like "prove a sorting algorithm correctly sorts all inputs" in a relatively straightforward manner. I mean, still fairly complicated and not something that you'll encounter in an undergrad computer science education in general, but feasible. For instance, a similar tool was recently used to prove that the sorting algorithm in Python and Java had a bug [2], which was a bit surprising given how well-exercised they were.

Something that a computer science program should cover is that a logical implication is always true when the antecedent is false. X -> Y is always true when X is false. Thus, if you can convince your system to prove a false term, you can set that as the X and write anything you want for the Y, and prove anything. (That statement is actually glossing over a bit of discussion on how you can legally introduce statements into your system, and there's variance between the logics on that point anyhow, but that's what it comes out to in most if not all systems.) Therefore, if you can prove a false statement, you can literally prove anything.

This turns out to be an unhelpful characteristic for a proof system to exhibit. In the case of Coq, it means you can now convince it to take any program as satisfying any property. Any program, satisfying any property. You want to prove that "return []" is a proper sorter for any input? This can do that. Heck, you want to prove that "return []" is a valid fully HTTP5 rendering engine? We can do that too!

In this particular case, it exploits a bug in Coq to do that, but the fear mathematicians have developed over the course of the 20th century is that while the bug may be obvious, deeper, more subtle problems in Coq could conceivably be used to prove "False" oven the course of a proof, accidentally and buried in potentially hundreds or thousands of statements, subtly rendering all such proofs flawed. Such are the things that keep mathematicians up at night.

As long as I'm explaining things, let's also explain the joke in Falso. There are, as the page says, a number of different axiom systems. Some of them are simply wildly different; for instance, Euclid's axioms create Euclidean geometry, whereas axioms about vertices and edges create graph theory. Some are more related... the 20th century resulted in several "set theories", which are ultimately characterized by different axiom sets that lead to different places. Axioms are neither right nor wrong, they are the definition of a branch of math, so you can choose them as you like, then see where they take you. Per the previous paragraphs, you prefer not to have an axiom system that allows you to prove false because it reveals your system doesn't actually say anything because it just says "Yes!" no matter what you ask. In some sense this isn't "invalid" or "wrong" either, it's just useless. (And in some sense, there's only one axiom set that just says "return true", there's just more and less complicated ways of spelling it, so there's not much value in trying to "study" one because it's identical to "return true" anyhow.)

The joke in Falso is that it simply takes as an axiom "For all P, the proof of P is the null statement", which, for the purposes of this post, is essentially "Assume false is true." From this, Falso can therefore prove... anything! In one statement! At this point the "comparison table" ought to start making sense. But the page is full of other little mathematician jokes... for instance:

    Let us prove that P = NP by contradiction. Assume that P ≠ NP.
    Then, by the axiom of the Falso system, we have a contradiction.
    Therefore, P = NP.
The joke here being that under the Falso system, "proof by contradiction" isn't actually necessary; you want "P = NP"? Just set it as the "P" in the axiom and read it off in one step, "The empty statement proves P = NP". Proof by contradiction is unnecessary frippery providing the thinnest, thinnest of covers over the triviality of the proof, which is itself the joke. As opposed to a huge proof of dozens of steps, which could be funny in its own way, it is also mathematician-funny to provide the smallest possible obfuscation. (The remainder of the joke is a slightly disguised allusion to the fact that "P ≠ NP" is equally trivially provable under Falso. Estatis Inc. need not wait anxiously by their mailbox for their million dollars from the Clay institute.)

Mathematicians have a strange sense of humor. Strange, but wonderful.

As I said, the last time I encountered this page [3], which HN says was 2 years ago, it didn't have any implementations, it was just an extended joke. The fact that it is now actually "available" is also sort of funny.

Oh, and finally, I can't read or use Coq either. Plus, even if you could, this is ultimately a bug exploit, not a "real proof", which would make it even harder to understand how what is being done relates to the final result; it's like seeing some random-looking assembler that turns out to produce arbitrary code execution due to some quirky combination of bugs, there isn't any apparently relationship between what the assembler looks like it's doing and the result it actually produces. Haskell's close enough for me for now. I'm also willing to wait and see if homotopy type theory follows through on its promises to make proofs simpler... at the moment, as I am ultimately a software engineer and not a mathematician, I can't help but see current proof technology as too much work for too little gain. I wish all those working to change that assessment all the best, though.

[1]: http://en.wikipedia.org/wiki/Curry%E2%80%93Howard_correspond...

[2]: http://envisage-project.eu/proving-android-java-and-python-s...

[3]: https://news.ycombinator.com/item?id=5677646


The way the exploit works isn't actually that bad. It just has to do with exploiting Coq's assumption that its bytecode VM evaluator is equivalent to normal CBV reduction. Turns out that's false---though that was certainly always a potential weakness in that assumption.


Yes, if I did not make it clear, what was actually demonstrated here was not the sort of scarybad mathematical weakness that can only be extracted from a long proof that I was talking about, it's basically just a bug in Coq, and in the long term has little consequence other than providing a bit of fun.


When proving there are choices of "proof systems" to use. Each have different properties. One really vital property is non-contradiction. "Proving False" (note the capital 'F') is terminology for showing contradiction. From False you can (trivially) prove anything regardless of whether it is true.

Falso is a joke proof system which assumes False. This makes proofs in the system very short since they are all of the form "If False, [conclusion] is true; False is true; Therefore [conclusion] is true; QED."

Practically, proving False undermines the value of a proof system. Thus Coq, like all real proof systems, would like to ensure that does not contain contradiction.

The first joke is that this bug calls itself an implementation of Falso in Coq---e.g., a demonstration that Coq has a bug sufficiently dangerous to admit contradiction.

The second joke is that Falso's site---which is tongue-in-cheek hyperbolic on the supposed "benefits" of Falso as a proof system---lists "implementations" of Falso. In other words, that section is a "wall of shame" for proof systems which have flaws sufficient to "prove False".

---

To explain the exploit it has to do with a weakness in the `vm_compute` instruction under large sum types.

Essentially, Coq is based on the idea that "generalized lambda terms" are useful as proofs in a proof system. Part of this usefulness arises from the fact that their "proof system" meaning is invariant under evaluation. Normally evaluation of large lambda terms is slow so `vm_compute` compiles them to bytecode in a VM [0] and executes them there.

Apparently the VM chokes on sums larger than 256 variants. Since Coq assumes that "VM computation" is equivalent to "standard lambda term reduction". The VM bug breaks that equivalence and allows us to state that `true = false` which is contradiction.

---

Impact-wise this is pretty minor. It ought to be fixed since accelerating verification is important for UX purposes, but it doesn't show a weakness in the core calculus---the weakness arises from the assumption that VM evaluation is substitutable for standard, theory-driven lambda term reduction. Clearly this is false.

A much worse weakness, for example, was Girard's Paradox which showed that theories of a certain simplifying form were contradictory [1].

[0] https://coq.inria.fr/distrib/current/refman/Reference-Manual...

[1] https://coq.inria.fr/library/Coq.Logic.Hurkens.html


Bahaha... Available implementations is genius.


This incorrectly passes the compiler, which is certainly a bug, but it does not pass the validator.

  $ ./configure.sh
  $ make
  coqdep -c -slash -R . Falso "All.v" > "All.v.d" || ( RV=$?; rm -f "All.v.d"; exit ${RV} )
  coqc  -q  -R . Falso   All
  $ make validate
  coqchk -silent -o -R . Falso All
  Type error
  Makefile:133: recipe for target 'validate' failed
  make: *** [validate] Error 1


What is the validator doing that the compiler isn't?


My understanding is that it's just smaller. By focusing entirely on validation it opens up less room for bugs.


Oh my! That `||` is gorgeous. I hadn't seen that before.


It's like the opposite of && (continue until one alternative is false), for if you want to continue with alternatives in sequence until one alternative becomes true: a || b || c || ...

This usage is commonly found in other places than the shell, like JavaScript, Perl, and PHP.


Thanks, I know `||` in general--I meant I hadn't seen the specific construction GP had used. The error handling to delete the output file in case the command failed is really elegant.


It's actually a little dangerous; what happens when you hit "^C" and the process dies before getting a chance to clean up? It's safer to do:

  some_cmd >foo.tmp && mv foo.tmp foo
The mv will do an atomic rename(), so that you either get the fully formed contents, or nothing at all.


Only if you interrupt the actual `rm`, no? If you interrupt the main command, you won't get a 0 exit code so the `||` branch still executes.

Your mechanism is clearly safer, but the user is required to clean up `foo.tmp` themselves in case of failure, so it's not really comparable.


There's a problem with dependently typed languages, neither of them have a small enough kernel to be verified formally.

As far as I know, the only system which achieves this goal is HOL.


The Calculus of Constructions is dependently typed, and it has a reasonably sized kernel, compared to HOL and ZFC:

Freek Wiedijk, "Is ZF a hack?", http://www.cs.ru.nl/~freek/zfc-etc/zfc-etc.pdf

OTOH, Coq is based on the Calculus of Inductive Constructions, which is both (a) more complex that the calculus of constructions, and (b) has an implementation in Coq that changes rapidly and is not formalized in any published paper, as I understand it.

Finally, do you know if this was a bug in the kernel, or if some other component (the compiler or caml virtual machine) can be blamed? It apparently was a problem with `vm_compute`, which I don't think I've ever used.


>The Calculus of Constructions is dependently typed, and it has a reasonably sized kernel, compared to HOL and ZFC:

However, Calculus of Constructions doesn't have inductive constructions, without which it's very hard to formalize interesting mathematics in it.

>Finally, do you know if this was a bug in the kernel, or if some other component (the compiler or caml virtual machine) can be blamed? It apparently was a problem with `vm_compute`, which I don't think I've ever used.

Actually, I am aware of another, much more serious problem with CoIC formalism which was found when Coq was tried to be used to formalize homotopy type theory. It was a bug in CoIC, not a software bug.


> Actually, I am aware of another, much more serious problem with CoIC formalism which was found when Coq was tried to be used to formalize homotopy type theory. It was a bug in CoIC, not a software bug.

Do you have any references? Intrigued :)


The notion of guardedness (for structural recursion) was a bit too permissive. This lead to inconsistency if you assume e.g. (False->False = True) (which is implied by propositional extensionality, and by the univalence axiom). The same issue affected Agda also. It was discussed a lot on the Coq and Agda mailing lists, e.g. https://sympa.inria.fr/sympa/arc/coq-club/2013-12/msg00119.h... http://agda.chalmers.narkive.com/E2UeRTOx/re-coq-club-propos...


> However, Calculus of Constructions doesn't have inductive constructions, without which it's very hard to formalize interesting mathematics in it.

Inductive types is something I've wondered about for a couple of weeks.

I know it's possible to simulate inductive types in some formalisms, then add them as a conservative extension. I think this is how HOL-Light works - inductive types aren't in the logical kernel, but they can be added without any new axioms.

To what extent is this possible in the Calculus of Constructions without inductive types?

Wiedijk has pointed out that it's impossible to prove "0 =/= 1" in plain CoC. Is this related?

One of the original CiC papers can be read here:

Frank Pfenning, Christine Paulin-Mohring, "Inductively Defined Types in the Calculus of Constructions", http://repository.cmu.edu/cgi/viewcontent.cgi?article=2907&c...

It makes some claims about what can't be proven in plain CoC that I don't know the justifications for.


Actually, AFAIU, you can define simple inductive types via church literals. However, you can't define induction principle for them. However, without induction principles, you can't prove anything useful.


You can define induction, but you can't prove it, you need to introduce it as an assumption. That's inconvenient.


You can define induction operator, but it won't have required type.


For example, let's see how to define induction principle for Nat.

We should define two terms with the following types:

NatRec : (R : Set) -> R -> (R -> R) -> Nat -> R

NatInd : (C : Nat -> Set) -> (C zero) -> ((n : Nat) -> C n -> C (suc n)) -> (n : Nat) -> (C n)

For the first we can represent zero and suc as functions of the first type (it will be church literals). However, we can't do the same for the second. We can create a term which has that type but it won't typecheck.


Can you explain further?


See my comment above.


Ok. But if you just introduce an symbol with type NatInd, everything works fine?


Although this proof, it can be found in [1], is subject to the constraints that Goedel's 2nd theorem enforces: the proof uses a stronger logic that HOL itself. In this case it uses a large cardinal axiom.

[1] J. Harrison, Towards self-verification of HOL Light. https://www.cl.cam.ac.uk/~jrh13/papers/holhol.html


I've always thought that Coq type checker was small enough to be formally verified.


What must be trusted in Coq in the kernel, which contains 30,000 lines of code for the latest development version.

It cannot be completely verified by Coq itself due to the Gödel theorem (well, now with this bug this is possible). Still, some part of it can be verified, like the byte-code interpreter. There is actually an ongoing project to make a certified JIT compiler for Coq: http://www.maximedenes.fr/download/coqonut.pdf


> It cannot be completely verified by Coq itself due to the Gödel theorem

What can't be proved in Coq is that the proof system is consistent.

However, we can in theory verify that the implementation of the proof system satisfies its specification (edit: maybe that's what you were saying!).


Doesn't the Gödel theorem say that such proof cannot be found for all conceivable systems; but that for any particular system, a proof could in principle be found?


I think it's something like this: you can't prove the consistency of a formal system using that same formal system, unless it is in fact inconsistent, but you can model a formal system from inside a larger formal system and then prove things like consistency about the smaller system from inside the larger one. But then the consistency of the larger system is unproven and you're stuck with an endless regress into ever larger FS's as you pursue the chimera.


Only for sufficiently powerful systems, I believe.

Things like first order logic are too simple to introduce the kinds of statements Godel used to prove the incompleteness theorem about arithmetic.


Weirdly enough, as Gentzen showed, you can use a theory that is "sideways" to first-order logic, neither strictly more powerful nor weaker nor equivalent in power, to demonstrate the consistency of first-order logic. Strong normalization theorems are used for typed lambda calculi like the Calculus of Inductive Constructions to show consistency, as well.


It's for _any_ such system. Edit : it says that if there is such a proof, then the system in inconsistant.


Is it possible to prove most of the system so that the remaining unprovable parts are minimized?


Unfortunately, it is sometimes the case that the remaining unprovable part is the most epistemologically dubious part.

See, for instance, "Coq in Coq", by Bruno Barras and Benjamin Werner, http://www.lix.polytechnique.fr/~barras/publi/coqincoq.pdf , in which the Calculus of Inductive Constructions is used to prove the consistency of the Calculus of Constructions.

See also John Harrison's "Towards self-verification of HOL Light", http://www.cl.cam.ac.uk/~jrh13/papers/holhol.pdf , and Gentzen's consistency proof, https://en.wikipedia.org/wiki/Gentzen%27s_consistency_proof , which use similar strategies (by assuming the existence of an inaccessible and assuming recursion up to epsilon_0).

Sometimes one can get relative consistency results. For instance, I believe it is provable in ZF that if ZF is consistent, so if ZF+AC+(V=L)+GCH.


Yep.

And there are proof systems that use this approach (i.e. build a series of provers that each prove the next (more complex) one).


It's not small at all. It's about 30KLOC of code.


Anyone care to ELI5?


Coq is a proof assistant. It helps you to prove things, or at least check that your proofs are correct. For instance, people use it to prove mathematical theorems, or correctness of critical software.

But apparently, there's a bug in the system that allows you to prove any properties, even invalid ones.

So suppose that you proved Fermat last theorem in Coq, now you're not sure your proof is correct anymore since the deduction system is flawed!

In theory, it invalidates all the proofs written in Coq. In practice, I don't think it's a big deal because it's unlikely that those proofs rely on this bug. But it's pretty funny to see that a system designed to be the ultimate verification tool is bugged.


Logic systems can have bugs in them that make them inconsistent, which means that you can prove false (or bottom or whatever), which means that you can prove anything in this logic. And that, in turn, makes the logic system useless since for any given formula, it is provable in that logic. And being able to prove everything is not fun.

Then you have to patch your logic and make sure that none of your proofs relied on that inconsistency. Or something like that.


You can re-run your proofs once the bug is patched, so its not like this makes all those proofs useless. Actually, that is the nice thing about Coq.


How do you know there isn't another, yet undiscovered, bug?


You don't, but so what? Even proofs done well by hand can have obscure mistakes; most of the time they don't invalidate the proof. Take a proof as strong evidence of correctness and it all works out from a epistemological point of view.


Can you explain what the program itself is doing to do that? That is, can you explain the Coq program itself (and how it triggers the bug)?


The program defines a type `t` with 257 constructors (`C_0` through to `C_256`). As an analogy, the type `boolean` has two constructors (`true` and `false`).

Each of these constructors takes a natural number as an argument.

It then defines a function `is_256` which checks whether it's argument has the form `C_256 n` for some `n`, returning a boolean.

It then defines the value `falso` of type `False` (which shouldn't be possible). It does this by defining two values: one has type `is_256 (C_256 0) = true`, the other has type `is_256 (C_256 0) = false`. This is clearly a contradiction, which is used to construct the value `falso`.

The reason one value gives `true` and the other `false` is that the former is getting calculated by the regular Coq implementation, whilst the latter is getting calculated by the buggy `vm_compute` alternative. vm_compute should give identical results, but faster; but since there's a bug, we can derive a contradiction by mixing correctly-calculated values with incorrectly-calculated values.


How does proving something false as true gives the logic system the ability to prove anything ? Can you explain it with an example


Any statement is either true or false. If we can prove that false is true, then any statement is true.

EDIT: For example: 1+1=3. That's false. We've learnt that false=true, therefore 1+1=3 is true.


I don't think it's that easy. Not every statement is either true or false in Coq, you'd need to accept the law of excluded middle for that.


How much does that invalidates Coq results ? like the 4 colors theorem.


AFAIK the four color theorem was not first proved with Coq, and people consider it correct regardless of whether Coq has bugs.


Yes, the four color theorem was proven by Appel and Haken on an IBM 370-168 at the University of Illinois in June 1976. Coq was released in May 1989.


I'd be surprised Appel and Haken did a formal proof on their machine -- I thought they made a conventional proof with a machine-assisted enumeration of special-cases.


My apologies for the lack of clarity: I was following the thought path that was concerned with the validity of the four color theorem proof because of the Coq bug.


It shouldn't, really. It'll only invalidate results that

a) rely on this bug, and

b) haven't been formally proven any other way.

It is a valid concern when talking about theorem provers, though.


Too bad this comes too late to earn a bounty in bitcoin at Proof Market. https://proofmarket.org/problem/view/12


Obligatory refrence:

> Beware of bugs in the above code; I have only proved it correct, not tried it.

[Donald Knuth]


Fortunately, this technology allows us to prove code correct and be sure about it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: