There are 2 things I've obsessed about in C & C++ land: 1) Why are we explicitly...

mike_hock · on Feb 5, 2023

1) The compiler is already allowed to reorder loop evaluation if it can prove that it can't have any observable side effects. If it can't, there could be subtle dependencies on the order of evaluation that the person writing the code and tests wasn't aware of, so you're just asking for subtle bugs that are missed by tests and could even turn into vulnerabilities in production.

2) It duplicates, and the standard even comes with asinine "guarantees" that actively sabotage deduplication, such as requiring all functions to have different addresses. Which means the compiler can reuse the same function, but only one function label can start right at the code and the others need to NOP-slide or JMP into it. Which unduly pessimizes precisely those functions that are most likely candidates for deduplication: Those that themselves only consist of 2-3 ALU instructions.

menaerus · on Feb 5, 2023

Nr (2) is not correct. Compilers have been inlining functions for decades so the argument of "all functions to have different addresses" does not hold. Compiler can even _merge_ two different but similarly looking functions into a single one. And that's just part of the code transformations compilers can do.

And this goes without even mentioning LTO, which has been explicitly designed for optimizing the code layout and size during the linking phase.

So, code deduplication is a real thing and happens regularly with your code. Templates are no much different than the non-templated code written for the same purpose.

ynik · on Feb 5, 2023

If you take the address of different functions, the standard requires those to compare unequal. However if you merely call functions and never take their address, the standard doesn't put any requirements on that, so inlining etc. is legal.

But without LTO, it's impossible for the compiler to know whether a non-inline function with external linkage will be called or have its address taken, so the compiler must ensure that it has a unique address.

leni536 · on Feb 5, 2023

There are linker level optimizations aside outside of LTO that can merge functions. gold's --icf option comes to mind, with --icf=safe meant to be conforming.

menaerus · on Feb 6, 2023

All that sounds ... reasonable? What I didn't agree with is the "requiring all functions to have different addresses" argument. It's invalidated both by the compiler (e.g. inlining) and linker (e.g. code folding).

dahart · on Feb 5, 2023

Good example of an Observer Effect in programming! If you look at fn addresses they do one thing, and if you don’t they do another. :P

mike_hock · on Feb 5, 2023

If it can prove that nothing can observe the address of the function(s). Inlining renders the whole discussion moot.

The point stands. Compilers cannot merge equivalent functions in cases when it makes sense to even think about this optimization, which is when it actually has to export an externally visible symbol for the function.

menaerus · on Feb 6, 2023

And that's what happens in probably like 99% of the cases. I objdump the code quite often to understand what happens under the hood and rarely I see that the similar code has been duplicated. There could be of course examples but I just didn't agree with "standard ... actively sabotage deduplication" sentiment which to me reads as something universal and not an exception.

mike_hock · on Feb 6, 2023

I think active sabotage is a correct assessment when a simple, obvious optimization is explicitly prohibited for no (defensible) reason and can only be applied by extensive whole-program analysis that allows the nonsensical rule to be bypassed completely. It's still sabotage, it's just mitigated by the extremely smart compilers we have nowadays that basically pick the program apart and rewrite a functionally equivalent one.

thrwawy74 · on Feb 6, 2023

In regards to #1, I mean we shouldn't be asked to explicitly direct the iteration of the loop. We should say "loop over these things" rather than "start from 1 and stop at 10". MOST of the time we're explicitly stating these things - that the compiler can undo - so what's the point? If the compiler is doing whatever it likes, why are we expending thought on which direction a loop should iterate?

I love that Rust has eliminated the egoism around "you should know at all times when something is allocated or deallocated". We're trusting the compiler to do that better with zero cost to us. So take it to the extreme: Only specify the loop direction if you have a true requirement to do so.

This is my Ted Talk, thank you.

For #2 - LOL! I had no idea it was essentially aliasing those functions to get around a requirement for uniqueness.

taeric · on Feb 5, 2023

1 isn't really the same thing. I don't know any compiler that would turn

    for (i = 0; i < length; i++)

into

    for (i = length; i != 0; i--)

Which is what I read 1 to be.

As for why you sometimes do one over the other. I think it is true that computers branch on zero more easily than other values. That said, I'd be a little surprised to know it makes a difference that can be measured in any meaningful program.

whoopdedo · on Feb 5, 2023

Clang will if `i` is not referenced anywhere else. e.g.

    int i,n;
    for (i = 0; i < length; i++)
        n = something();
    return n;

It's also free to treat `i` as a different variable:

    int i,n = 0;
    for (i = 0; i < length; i++)
        something(n++); /* Compiler will merge the n and i variables */
    return n;

Or eliminate the loop completely if there are no side effects.

    int i, n = 0;
    for (i = 0; i < length; i++)
        n = i + 1; /* A very roundabout way to say "n = length" */
    return n;

taeric · on Feb 5, 2023

In neither of those did the sequence change direction, though?

thrwawy74 · on Feb 6, 2023

See this is why I felt I was obsessing over something very small.

Loop inversion, unrolling, and splitting. The compiler can do so much with a loop to make it run faster on 'this machine' to optimize for speed, memory use, or binary size.

whoopdedo · on Feb 6, 2023

Sorry, I didn't copy the assembler output for brevity but Clang generates `decl %esi` (or `addl $-1,%ebx` depending on the version) in the first case.

taeric · on Feb 6, 2023

Do you know of any cases where it changes the sequence of seen values? I'm assuming it can in this case, because the i is never used?

grogers · on Feb 5, 2023

For #2, many linkers _do_ support "identical code folding", and gold has an option to only merge functions which never have their address taken.

astrange · on Feb 5, 2023

Although doing this means debuginfo won't be available, so you won't get source lines if the merged code crashes.