Money quote for those checking the comments first:
"Furthermore, we argue that the C standard does not allow Turing complete implementations, and that its evaluation semantics does not preserve typing.
Finally, we claim that no strictly conforming programs exist. That is, there is no C program for which the standard can guarantee that it will not crash."
Well at least the section explaining why they think C is not Turing complete is beyond ridiculous (TLDR: memory is not infinite so there exist a state machine... WTF????? technically mathematically true but has always been completely irrelevant when talking about real implementations)
And it seems to be written with a serious tone, it is probably not even a joke!
The point isn't that memory isn't infinite, but that C's memory model isn't infinite. There are other languages that do have an infinite memory model (for instance, any language with a builtin bignum type).
I mean, if you want to judge ridiculousness, then their explanation of why there are no strictly conforming programs (specifically: even a program with an empty main function cannot be guaranteed not to overflow the stack) is also ridiculous, but it's still interesting.
I agree that it is interesting. For example, what if you can implement, in C, an interpreter for a language that has a bignum type? I am thinking along the lines of Greenspun's tenth rule (which I realize is not even close to a rigorous argument.)
I'd say the interpreter is interpreting a non-Turing complete subset of the bignum language. That said, what is and isn't "Turing complete" is hardly relevant for practical purposes. C has obviously been very useful as a language, despite whatever theoretical statements can be proven about it.
It is not clear to me why it would be a subset (I'm assuming proper subset here) or why it would have to be non- Turing-complete.
I agree that none of this has any obvious relevance to the practical usefulness of C, but all of us here in this thread have chosen to put that aside, at least for now, in order to contemplate a somewhat esoteric though less practical question.
The key distinction to note is the one between specifications and implementations.
As a small example, consider Brainfuck. The Brainfuck specification (if such a thing exists) has no upper bound on the number of memory cells, but practical implementations generally do have a limit that "useful" Brainfuck programs will rarely hit. The specification describes a TC language, while the implementation does not.
Therefore, if a language is TC according to its specification, but has an implementation in a language that is shown to not be TC, then it follows that the implementation must be of a non-TC subset of the language. This is, of course, assuming both that the specification has been proven TC, and that the implementing language has been proven non-TC.
I believe the point being made in the paper is that the C specification itself is restrictive enough that it describes a non-TC language. In other words, any implementation of C that is truly Turing complete would be violating the specification in some way. In that case, it follows that a C implementation of a TC language (such as the bignum language) must only be an implementation of a non-TC subset of the language.
(As an aside, consider that nothing running on an isolated machine will ever be truly TC, since an isolated machine has a finite amount of memory and so can't be TC. It's not that much of an inferential jump to see that this kind of TC-breaking limitation could end up in the specification of C, a language quite close to the metal.)
Their reasoning why there cannot be strictly conforming implementations of the language is also like that:
The C standard does not make any guarantees about the size of the call stack. But any program program needs to rely on this implementation-defined property (cf a pathological implementation were the initial call to main() already overflows) and thus cannot be strictly conforming.
It does come off as a bit pedantic though. What it comes down to, is that this specification was written by engineers and not mathematicians. These engineers didn't really care how (in)convenient it was to write proofs about the C semantics, they cared about the language being fast and easy to implement. In practice, any real-world C implementation will allow you to call the main function without overflowing the stack, and that much is obvious, otherwise nobody would use those implementations, so it should be taken as a a given, an axiom.
Why doesn't the C standard guarantee a minimum stack size? Because there could be tiny microcontrollers with just 64 bytes of stack on which you can implement some C subset. At the same time, the standard doesn't want to define specifics of stack frame layouts, so, not knowing how much memory is available on your target machine, and not wanting to constrain how a stack frame for some source function is laid out as part of the C standard, it's impossible to compute how deep you can go on your call stack without architectural details, it's implementation dependent. It makes a lot of sense that the standard is the way it is, from an engineering point of view.
The problem with this apology for the standard -- that nobody would interpret it so unreasonably -- is that many programmers (like me) wrote programs assuming what seemed like reasonable bounds on how far compiler writers would go in their language lawyering, which a decade or two later got flouted, making our old code start mysteriously breaking, in a hard-to-figure way, only in the latest versions of the compilers, which at the time were notorious for compiler bugs. If you trust the same compiler writers to respect today's common idea of reasonable engineering tradeoffs, well, I hope it works out better for you.
While I agree with most of what you wrote, it sounds like you are blaming the standard. Seems to me the proper blame is the unreasonableness of the language-lawyering by compiler engineers...which as you point out is a fairly recent phenomenon.
I always like to say that my first C compiler was pre-ANSI-C, so by today's language-lawyering all behavior was undefined and therefore the compiler could have done anything at all. Yet for some strange reason the code produced was more predictable than today.
Be careful: They did not say that C is not Turing complete. They say that the C Standard does not define a language which is Turing complete. There is a subtle distinction between these two statements, but an important one. The larger point that the paper is making is that the C Standard is full of holes. Thus as a programmer you cannot rely on any compiler that implements this "standard" to do anything (even trivial things like allocate enough memory on the stack). Yes, it probably will work most of the time, but will it work always? No, well, then it's not much of a standard, more of a set of loose guidelines. If you want to write secure, performant portable code: you need an actual standard. I think this is the point that the paper is making. Better still, it is highlighting the places where the standard is poorly defined so that it could be fixed.
Just because it's irrelevant for real implementations doesn't mean it's not theoretically irrelevant. All of our computers can be modeled with sequential logic. That doesn't mean it's worthless to study more abstract properties.
Next, they proved that the C standard didn't define a programming language but a variety of rutabaga, and died of starvation by trying to eat source code listings.
Seriously. There's a basic concept involved here called "Quality Of Implementation" which effectively guarantees that the programs produced by a compiler will work as long as you stay within reasonable bounds.
Might those bounds be a bit constrained on embedded systems? Sure. Them's the breaks. However, having a stack size of zero is not worth worrying about.
When you say "C" and "Turing machine" in the same sentence, you should immediately think of how you would go about explaining a proof of, say, the fact that a one tape TM can simulate a k tape TM using C. C is just one of many languages in which you can express the state transition diagram of a TM.
No, any undergraduate CS major at a top 5 university should be able to do this. It's possible to do this for any programming language. If you need help then it's on p. 161 Theorem 7.2 of Hopcroft and Ullman (1979).
There are so many things wrong with this paper, and nearly all of them have to do with their "proof" that C is not Turing complete.
First off, they say that C isn't complete because its functions for reading/writing files are "bounded". Since most other programming languages are really just implemented in C, wouldn't that mean that either ALL programming languages aren't Turing complete, or that C is?
Secondly, they mention that programming languages that have "bignum" types get around the "bounded" memory problem (because you can access "infinite" memory). But you can implement a bignum in C...
> Since most other programming languages are really just implemented in C, wouldn't that mean that either ALL programming languages aren't Turing complete, or that C is?
No: it means that the imperfect emulation of those programming languages, as implemented in C, will fail to be Turing complete. Put another way, there are programs that are valid Haskell programs that can be proved by the rules of Haskell to halt on certain inputs and generate particular results, which will not be able to execute as the C emulator will run out of state in which to perform the simulation. The interesting point to make here is that it isn't just that C is being limited by the real world constraints of hardware, but that C is itself limited as if it were hardware. That said, this was always an obvious result to anyone who ever bothered asking the question and understands the formalisms, so I am not certain of the contribution of this paper ;P.
What authors point out is not trivial observation that programs are running on
finite-memory machines and thus cannot be Turing complete. What they do point
out is that even if C programs have been running on infinite-memory machines
the memory accessible to C programs is still bounded.
In general I would agree with authors on this point as long as we exclude
external storage. They also hint that, while it would be trivial to provide
interface to infinite memory tape, the standardized part of file I/O does not
seem to do that. They claim that it is prohibited by bounded return value of
ftell. I would disagree, because ftell may return an error value in this
scenario as well. Moreover to access infinite tape you don't even need ftell,
relative movement provided fseek suffices.
As the author points out, the standard does not mention stack overflow at all. From this, the author concludes an implementation that always overflows stack is conforming, therefore no strictly conforming programs exist, therefore all implementations are conforming. More reasonable conclusion is that under the current standard, implementations are not allowed to overflow stack, therefore no bounded storage conforming implementation is possible, and all current implementations are buggy. Is anyone surprised that all current implementations are buggy?
It would have been more interesting if the author discussed Microsoft’s solution to this problem, namely __try/__except and EXCEPTION_STACK_OVERFLOW, which is actually used by programmers to recover from stack overflow.
tl;dr; Some interesting points about bits of C that are undefined. Some points valid, the other ones appear to be self-panicking unless you're heavily in the theorem-proving
correctness camp.
I remain unconvinced that memcmp of pointer values is at all interesting (viz the pointer normalization dancing acts of the early x86 memory models, ugh).
From a working programmer's point of view, it's useful to know some of the points in the paper; just read the middle third and skim the rest.
Interesting article that exposes some problems of C, that some take for granted or ignore.
However the arguing that C is not Turing complete after you take away the IO and limit yourself on an non-abstract machine is silly. C can simulate any Turing machine limited by the same previously mentioned rules and the same holds if you remove those rules for both systems, not just for one.
"Furthermore, we argue that the C standard does not allow Turing complete implementations, and that its evaluation semantics does not preserve typing. Finally, we claim that no strictly conforming programs exist. That is, there is no C program for which the standard can guarantee that it will not crash."