>*1. There is no real reason that trigraphs should be expanded inside a comment....

lifthrasiir · 2024-09-27T06:29:07 1727418547

I'm talking about why trigraphs had to behave in such way, not how. C and C++ have a concept of source character set and execution character set, which can diverge. Let's say trigraphs are indeed for unrepresentable characters, then in which character set are them unrepresentable? If the answer is for source, alternative spelling is sufficient and comments should ideally have no effect or users will be confused. If the answer is for execution, why do other characters have no equivalent?

Also you should be aware that the macro expansion in C/C++ is not like a literal string replacement. `#define FOO bar` doesn't turn `BAREFOOT` into `BAREbarT` or `"OH FOO'S SAKE"` into `"OH bar's SAKE"`. (Some extremely old preprocessors did do so, by the way.) `#define FOO(x) FOO(x)` doesn't make `FOO(bar)` into an infinite recursion because `FOO` is prevented from expansion when `FOO` itself is being already expanded. There are certainly some layers, but they are not what you seem to think.

fsckboy · 2024-09-27T07:05:04 1727420704

you want to be able to convert source code from one system to another and back again, and you want to rules to be simple so that everybody who writes such a coverter gets it right, and you also don't want to think about a zillion edge cases. If the trigraphs exist on the wrong side of the conversion, flag them. otherwise, it's a very simple process.

I was not talking about how the preprocessor is implemented, I was talking about the layering. You keep wanting to mix layers because you think you know better; thar be dragons.

lifthrasiir · 2024-09-27T07:13:27 1727421207

Layering is only valuable when that serves its goals well. I don't see any reason to have an additional layer in the language here. If you are thinking about a strict separation between preprocessor and parser, that is already known to be suboptimal in compilation performance decades ago. (As a related example, a traditional Unixy way to separate archiving and compression is also known to be inefficient; a combined compressing archiver is better in design.)

poincaredisk · 2024-09-27T17:56:34 1727459794

I disagree with the downvotes here. C language "layers" are tricky to get right, source of footguns and a backdoor potential (especially the trigram that started this comment chain), and overall a bandaid invented when there were no better solutions (like modules, or Unicode). Trigrams are a weird archaic quirk of the C language (and no other modern language), and I'm glad to see them gone.

And since we're thinking about layers, character encoding hacks should be entirely outside of a programming language responsibility. Now that would be a proper layering.