If I see `x+y` in C, I know 100% that it'll be ~0-1 instructions, O(1), and will...

iudqnolq · on July 9, 2023

However as a novice I found it unintuitive that on an embedded platform without hardware floats x/y will compile but compiles to a polyfill with quite a few instructions.

chongli · on July 9, 2023

That’s the only caveat. With operator overloading, the scope for what happens on a given line of code expands dramatically. Now your entire dependency graph is part of the search space. Heck, the operator might not even terminate at all!

marcosdumay · on July 9, 2023

> That’s the only caveat.

a = b + c;

Is the addition done by itself, so it costs 1 clock cycle? Is it merged into some complex operation so the net cost is less than 1 cycle? Is it completely optimized away at compile time, so it's infinitely faster?

Does the addition trigger some trap, that will run some distant code?

Is the addition by itself? Or are there store and load instructions that can stall for way more than 1000 cycles?

I doubt you can answer any of those questions. All you and everybody else keep repeating is you can micro-optimize C better because that line, that you expect to take something from 0 to 2000 cycles is certain to not do a call and return pair, that takes less than 10 cycles. All while the alternative is almost certain to do the exact same, but you would need to check it up.

Honestly, that argument doesn't make sense; and I keep understanding it as people complaining that they want to micro-optimize a program, but don't know if it's operating on native integers or 10-dimensional hypermatrices.

At the same time, every single person that is good at micro-optimizations look at the compiled binary as a first step, because C is a high-level language that has little relation to the code the compiler actually creates.

For a long time I did just shrug it away and file those complains as "those people don't even know the language they are using". But its universality forces me to consider that there is a reason for complaining, and maybe it's worthwhile to understand. Now, given that this is all the answer I get, it seems quite likely that even the ones complaining don't consciously know what the problem is... But one thing is certain here, the people repeating that execution time is well known didn't actually practice micro-optimizations based on that fact.

chongli · on July 9, 2023

Your argument boils down to this: because we cannot look at an operator and have a 100% iron-clad guarantee of the exact sequence of instructions the compiler will ultimately emit, we should throw it all away and just settle for every operator in the language potentially being a function call that might be O(1) or O(n) or even O(2^n). That's called throwing out the baby with the bathwater.

every single person that is good at micro-optimizations look at the compiled binary as a first step

That isn't an option when you're writing portable code that runs on many different platforms, some of which may not even exist at the time you're writing it. Furthermore, micro-optimization isn't the only reason operator overloading is bad. The implicit flow control dramatically inflates the search space for what every single operation can do, making all code much more complicated to inspect at a glance. This carries over to debugging, where stepping through code is much more cumbersome when each operation can involve large amounts of indirection.

dzaima · on July 9, 2023

> Is the addition done by itself, so it costs 1 clock cycle? Is it merged into some complex operation so the net cost is less than 1 cycle? Is it completely optimized away at compile time, so it's infinitely faster?

Those are generic instruction selection/optimization questions, which are always gonna be *additional* complexity to any and all operations everywhere. So there's still benefit in cutting down the complexity elsewhere.

> Is the addition by itself? Or are there store and load instructions that can stall for way more than 1000 cycles?

..those are questions about the loads & stores, not addition. On embedded, afaik loads & stores will be significantly closer in latency to arith too.

> At the same time, every single person that is good at micro-optimizations look at the compiled binary as a first step, because C is a high-level language that has little relation to the code the compiler actually creates.

Yes, but being able to have good intuition is still quite important, because one can think & read code much faster than compile & read assembly.

> the people repeating that execution time is well known didn't actually practice micro-optimizations based on that fact.

The question of operator overloading is mostly about reading code, not writing it. And it doesn't have to be micro-optimization either, any level of optimization will be affected by a call happening where you don't expect one (probably most importantly the kind where you scan over a piece of code to figure out if it does anything suspiciously bad (i.e. O(n^2) or excessive allocations or whatever thing may be expensive in the codebase in question) but it isn't worth the effort diving into assembly or figuring out how to get representative data for profiling the specific thing).

Or you could just be exploring a new codebase and wanting to track down where something happens, where it'd be beneficial to have to just scan through function calls and not operators.

dzaima · on July 9, 2023

Right, that's definitely quite a strong point against the C operator-function separation. There can be a good argument made for just not providing unavailable operations as operators. But, still, x/y won't touch any of your memory (assuming a non-broken stdlib), so you're still free to skip over it while scanning for a use-after-free or something.