> Yes, I know. That's irrelevant here for two reasons: firstly, the pointers might compare equal (which is all that matters for this argument), and secondly, once again, the comparison isn't being made between pointers, the comparison is being made between integers.
You know what, I think you're right, at least on the second point. I guess the code isn't UB after all then. Somehow I kept reading it as if it's comparing pointers.
Regarding the other point about following something in memory:
> I don't consider those guys a reputable source [...] they claim that making equality comparisons is undefined behaviour but their only basis for that is a quote from the section on relational operators which is obviously only talking about relational operators.
There are multiple entities here, in agreement with each other. The author links to bug #61502 in GCC, where you'll see the compiler devs agree with the author of that page: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61502#c15
You might disagree with their stances for whatever reasons you might have, but firing reputational shots at them when there are clearly other experts disagreeing with them isn't really helping make your position more convincing.
> Read the actual standard quotations in context and there is no confusion.
There was no "confusion" on this point to begin with (on my end anyway). Again, as I explained: they might not be intuitive if you're assuming a linear address space, but they're quite fine to interpret. You just need to avoid bringing your own extra assumptions into the table. The notion of "following" in memory hinges on your assumption that the address space is linear. This is pretty similar to how to be able to say your house "follows" another house, you need to first make nontrivial assumptions (e.g., that they're on the same street). To my knowledge C does not make such assumptions globally; all it does is lets you satisfy that criterion by putting them in the same aggregate structure ("street") to begin with, but if you can't do that, you can't satisfy that assumption.
> There are multiple entities here, in agreement with each other. The author links to bug #61502 in GCC, where you'll see the compiler devs agree with the author of that page: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61502#c15
No they don't. The previous comment from that compiler dev explicitly says that this comparison is not undefined behaviour. There is nothing in either the standard or that bug report that supports the claim that you originally posted ("Comparing pointers for equality is defined if both pointers are derived from the same (multidimensional) array object. Thus, if two pointers point to different array objects, then these array objects must be subaggregates of the same multidimensional array object in order to compare them. Otherwise this leads to undefined behavior."). Equality comparisons (not relational comparisons) of pointers to (unrelated) arrays are not and never have been undefined behaviour, and anyone claiming they are is at best confused.
Interesting. So I went back to look at the standard. Like you said, for the relational operators <, <=, >=, and >, they do explicitly say comparing unrelated objects is undefined behavior. But as you suggest, for == and != equality operators, they don't say anything about undefined behavior for comparing pointers to unrelated objects; in fact they do say that unrelated pointers can point to adjacent memory locations if the implementation "chooses" to place unrelated objects in that manner.
However, they also throw in this wrinkle: "The == and != operators are analogous to the relational operators except for their lower precedence."
Does the analogy extend to the UB-ness for different objects? (If not, wouldn't that seem to contradict this sentence?)
On the one hand, it would seem not, since they do say "if and only if" in that clause (which lacks any mention of UB).
On the other hand, I'm not sure how else to interpret that sentence... that sentence would seem superfluous if I just ignored it, since everything else seems to be spelled out already.
So I'm kind of at a loss now. Maybe you're right then, I don't know... at least you seem to be right about the fact that it is confusing. Possibly later standards clarify this? I haven't checked.
"Analogous" generally means similar but not the same; if it's a normative description it's underspecified, whereas the "if and only if" description of the semantics of == and != for pointers is definitely normative. I'm sure the standard could be clearer but I do think there's only one defensible interpretation.
It's underspecified, but it is normative. While it's not mathematically inconsistent for the sentence to be redundant/superfluous, it would be rather silly and misleading. It's reasonable to assume it's there for a reason (i.e. it makes a difference whether it's there or not). Given they spell out pretty much everything else anyway, it would seem this point is where it would make a difference.
Moreover, I have a hard time justifying this on first principles. It seems rather arbitrary to make relational operators UB but make equality operators well-defined for unrelated objects. After all, if you can tell if two pointers to unrelated objects are the same, that means you're willing and able to canonicalize them to a common-denominator representation, in which case you already have an ordering based on that same canonicalization too... why forbid that? It would make sense to prevent comparing pointers derived from unrelated objects (due to future performance & extensibility considerations), and it would certainly also make sense to allow it (more convenient/intuitive/etc.)... but to allow one but not the other just doesn't make sense as far as I can see it. You end up with something that's both unintuitive for the user and restrictive for the implementation. So it's a little tough to swallow that that's really what they intended.
> After all, if you can tell if two pointers to unrelated objects are the same, that means you're willing and able to canonicalize them to a common-denominator representation, in which case you already have an ordering based on that same canonicalization too... why forbid that?
UB wasn't meant to forbid anything, it was meant to allow implementers to offer different behaviour. Implementers who want to represent pointers internally as integers and have comparisons just compare the underlying integers can do that. Implementers who have something more complicated e.g. segmented architectures are required to support equality comparison (which they can do easily by just having pointers to different segments always compare nonequal) but are not required to define an ordering between different segments.
To me it seems like the kind of compromise the standard makes all the time (which is not to say I agree with it). Requiring everyone to define an ordering on all kinds of pointers would be too burdensome for implementers (or maybe some implementers wanted to trap on unrelated pointer comparison, because if you're doing pointer arithmetic with unrelated pointers you've probably got a bug). Whereas making equality comparisons UB would be way too burdensome for users; it would mean you couldn't e.g. have a list of pointers and check whether a given pointer was in the list by searching through its elements.
> UB wasn't meant to forbid anything, it was meant to allow implementers to offer different behaviour.
That's unspecified behavior, not undefined behavior. Unspecified behavior can literally do anything or nothing at all, including aborting the program nondeterministically. That effectively forbids invocation of UB for the programmer since you can't reason about the program after it's invoked, unless you've verified your implementation has actually defined the behavior for you (despite not being required to). That's quite different from unspecified behavior where the implementation is required to pick some sane behavior (often among a set of acceptable behaviors) and stick with it in a self-consistent manner.
> Requiring everyone to define an ordering on all kinds of pointers would be too burdensome for implementers
I don't see how it'd be more burdensome, and I don't think burden is the issue here anyway. Implementers already have to implement canonicalization for equality comparisons, and they already have to implement casting to uintptr_t too. Just put the two together you get comparisons. One reason to not want to do this is that it might be too expensive to canonicalize to an integer under the hood (maybe it's complicated arithmetic or whatever), but if they're already willing and able to jump through all the hoops just for the sake of equality comparisons, ordering is hardly any different.
> (or maybe some implementers wanted to trap on unrelated pointer comparison, because if you're doing pointer arithmetic with unrelated pointers you've probably got a bug).
Not at all. You want to linear-search in a list and want equality to work for that? Well I want to binary search in an array/BST and need comparisons to work for that. This was such an obvious thing to want to do that C++ made it work out-of-the-box for std::less and whatnot.
> That's unspecified behavior, not undefined behavior. Unspecified behavior can literally do anything or nothing at all, including aborting the program nondeterministically. That effectively forbids invocation of UB for the programmer since you can't reason about the program after it's invoked, unless you've verified your implementation has actually defined the behavior for you (despite not being required to). That's quite different from unspecified behavior where the implementation is required to pick some sane behavior (often among a set of acceptable behaviors) and stick with it in a self-consistent manner.
It's very hard to make trapping an acceptable implementation for unspecified behaviour while allowing the implementation to do the usual kind of reordering, so the standard doesn't try. That's the original intention behind e.g. null pointer dereference being undefined behaviour - implementations should be permitted to make null pointer dereference trap, but should also be permitted to reorder or optimize out pointer dereferences.
> Implementers already have to implement canonicalization for equality comparisons
No they don't - they can just make comparison return false for pointers of different types or from different segments. Canonicalisation is not the only way to do equality-comparison!
> and they already have to implement casting to uintptr_t too
But the result of that isn't required to have the same comparison semantics as pointers. E.g. if some of your pointers are aligned and you internally represent those without trailing zeroes (like the JVM does) and use that as the integer cast, then some different types of pointer end up casting to the same int, which is fine. But the standard requires those pointers to not compare equal as pointers, so you can't implement pointer comparison like that. (Well, you can implement >= and <= like that, because those comparisons are undefined behaviour. But you'll have cases where a <= b and b <= a but a != b).
> You want to linear-search in a list and want equality to work for that? Well I want to binary search in an array/BST and need comparisons to work for that.
Look, I'm not saying I agree with the standards committee here, I'm saying that it's a plausible compromise position for them to have taken. Not being able to do equality comparisons would be way more limiting for users than not being able to do relational comparisons. Having to implement relational comparisons would have been a bit more work for implementers than only having to implement equality comparisons.
You know what, I think you're right, at least on the second point. I guess the code isn't UB after all then. Somehow I kept reading it as if it's comparing pointers.
Regarding the other point about following something in memory:
> I don't consider those guys a reputable source [...] they claim that making equality comparisons is undefined behaviour but their only basis for that is a quote from the section on relational operators which is obviously only talking about relational operators.
There are multiple entities here, in agreement with each other. The author links to bug #61502 in GCC, where you'll see the compiler devs agree with the author of that page: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61502#c15
You might disagree with their stances for whatever reasons you might have, but firing reputational shots at them when there are clearly other experts disagreeing with them isn't really helping make your position more convincing.
> Read the actual standard quotations in context and there is no confusion.
There was no "confusion" on this point to begin with (on my end anyway). Again, as I explained: they might not be intuitive if you're assuming a linear address space, but they're quite fine to interpret. You just need to avoid bringing your own extra assumptions into the table. The notion of "following" in memory hinges on your assumption that the address space is linear. This is pretty similar to how to be able to say your house "follows" another house, you need to first make nontrivial assumptions (e.g., that they're on the same street). To my knowledge C does not make such assumptions globally; all it does is lets you satisfy that criterion by putting them in the same aggregate structure ("street") to begin with, but if you can't do that, you can't satisfy that assumption.