Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> parsing C++ is mostly equivalent to becoming a C++ compiler.

It reallt isn't. Parsing a languagr just means validating its correctness wrt a grammar and in the process extract some information. Parsing something is just the first stage and a one of many stages required to map C++ source code to valid binaries.



The presence of template specialisations and constexpr functions means that the GP is right here; you cannot decide whether an arbitrary piece of C++ is syntactically valid without instantiating templates and/or interpreting constexpr functions. Consider

    template <int>
    struct foo {
        template <int>
        static int bar(int);
    };

    template <>
    struct foo<8> {
        static const int bar = 99;
    };

    constexpr int some_function() { return (int) sizeof(void*); };
Now given the snippet

    foo<some_function()>::bar<0>(1);
then if some_function() returns something other than 8, we use the primary template and foo<N>::bar<0>(1) is a call to a static template member function.

But if some_function() does return 8, we use the specialisation and the foo<8>::bar is an int with value 99; so we ask is 99 less than the expression 0>(1) (aka "false", promoted to the int 0).

That is, there are two entirely different but valid parses depending on whether we are compiling on a 32- or 64-bit system.

Parsing C++ is hard.

EDIT: Godbolt example: https://godbolt.org/z/yR3YHW


You only need to parse the "module <module name>" and "import <module name>" statements. No need to parse all of C++ for that. You could probably even do that with a regex.


It also has to do all the preprocessing to see which import statements get hit. I don't think templates could control at compile time which module to import, at least I hope not.


Parsing C++ is literally undecidable. You can encode arbitrary programs which will emit a syntax error depending on their result.

Here's an example of a program which compiles only if the constant N is prime, and otherwise emits a syntax error: https://stackoverflow.com/questions/14589346/is-c-context-fr....


You are misrepresenting the concept of undecidable. If the compiler can say if the program compiles or not, then it is most certainly decidable. What you want to say is that it cannot be determined without full parsing, so no preprocessing is possible.


No, it's actually undecidable. C++ templated have been determined to be turing complete, which means that template instantiations can encode the halting problem. Determining whether a program compiles or not therefore requires solving the halting problem.

In practice, compilers work around this by limiting template instantiation depth.


I gave an example of a template program to show the general method. Obviously, primality is decidable, but there exist candidate C++ programs whose parse tree is undecidable. The trick would be to encode your parser in a template, run it on the undecidable program (i.e., itself), and create a contrary result. Does this have any effect on practical C++ builds? I honestly have no idea.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: