Wow; I really disagree with this viewpoint. I want clever compilers that produce...

dmitshur · on March 8, 2017

You have to keep in mind that adding support for special cases makes the general case a little bit slower (because it has to always check if it's the special case or not). It's not completely free to make pow work fast on small integers.

So, it's a trade-off. I would also prefer less special cases optimized for naive use, but only if it's accompanied by documentation that says "prefer X instead of pow(x, 2), prefer Y instead of pow(-1, k), etc." It shouldn't be guesswork as to what's best to do for common cases.

lomnakkus · on March 8, 2017

That's not necessarily true. I'd wager that most cases where pow(x, 2) is used as a way to square x, the "2" is actually a constant. That's trivially statically optimizable at compile time.

dllthomas · on March 8, 2017

In fact, if you are using pow specifically to "square a number" in the sense where you could replace it with x*x, it is guaranteed to be something you can determine statically (and probably pretty easily) - or you're already doing a lot of unnecessary work.

emteycz · on March 8, 2017

Most of the time, you can opt out of these optimizations.

sunfish · on March 8, 2017

I think you're both right. The world of software is sufficiently diverse that there is no single answer to this question that works best for everyone in every situation.

galacticpony · on March 8, 2017

It really isn't. You want something for free without putting the work. I want to put in some work to get what I need.

You, too, believe in the magic compiler that doesn't exist. You don't get good code from a high-level specification. You really have to understand what the compiler actually can do for you and write your code accordingly, either way - except the predictable case is more straightforward (though maybe not as pleasing).

pcwalton · on March 8, 2017

"Putting in the work" often means not being able to use abstractions, which reduces the maintainability of code. That's what people often miss about optimizations: one of their most important uses is to enable abstractions.

As a simple example, take SROA: without that optimization, you can't make a "Point { float x, y, z; }" struct and have that be as efficient as "float x, y, z;", because x, y, and z can live in registers in the latter, while in the former they remain as memory instead of being promoted to SSA values. But being able to use a Point structure like that is extremely helpful, because then I can define useful methods on it and so forth. I shouldn't have to choose between good engineering practice and performance, and thanks to SROA, I don't have to.

galacticpony · on March 8, 2017

You've just invited the assumption that your compiler can do SROA as the basis for better performance - how is that an abstraction?

If you really care about performance to the point of register occupancy, you need to look at the context. The "abstraction" of having a Point type with methods is almost certainly far from optimal, because it doesn't fit SIMD well. It's then also not "good engineering practice" to use it.

pcwalton · on March 8, 2017

> You've just invited the assumption that your compiler can do SROA as the basis for better performance - how is that an abstraction?

"Expands to the exact same code everywhere on every imaginable compiler, even toy compilers nobody uses" is not part of the definition of "abstraction".

> If you really care about performance to the point of register occupancy, you need to look at the context. The "abstraction" of having a Point type with methods is almost certainly far from optimal, because it doesn't fit SIMD well.

What do you think http://www.agner.org/optimize/#vectorclass is then?

> It's then also not "good engineering practice" to use it.

Yes, it is! It makes your code more readable, and if you're compiling on any production-quality C compiler anywhere your Point class will have the same performance as the raw version. Lower maintenance cost, fewer bugs, same performance.

galacticpony · on March 8, 2017

> "Expands to the exact same code everywhere on every imaginable compiler, even toy compilers nobody uses" is not part of the definition of "abstraction".

MSVC compiler doesn't do SROA, as far as I know.

> What do you think http://www.agner.org/optimize/#vectorclass is then?

Have you actually looked at that thing? It's not a Point struct, I can tell you that. There's nothing abstract about it.

If you want to take advantage of SIMD fully, you need to lay out your data in a very specific way. A Point {x,y,z} struct doesn't naturally fit a SIMD register.

Now, if you're willing to make a lot of assumptions on your compiler, you can do something like this: http://www.codersnotes.com/notes/maths-lib-2016/

Still, you need to put in the work and the research. No magic.

> Yes, it is! It makes your code more readable, and if you're compiling on any production-quality C compiler anywhere your Point class will have the same performance as the raw version. Lower maintenance cost, fewer bugs, same performance.

If performance really matters then your abstract solution is almost certainly suboptimal and it's not good engineering practice to use it for the sake of readability.

pcwalton · on March 8, 2017

> MSVC compiler doesn't do SROA, as far as I know.

Yes, it has since at least 2010 (and probably earlier). See "scalar replacement": https://blogs.msdn.microsoft.com/vcblog/2009/11/02/visual-c-...

> Have you actually looked at that thing? It's not a Point struct, I can tell you that. There's nothing abstract about it.

The Vec classes can be used as Point structs.

> If you want to take advantage of SIMD fully, you need to lay out your data in a very specific way. A Point {x,y,z} struct doesn't naturally fit a SIMD register.

So pad it out to 4 fields, using homogeneous coordinates.

> Now, if you're willing to make a lot of assumptions on your compiler, you can do something like this: http://www.codersnotes.com/notes/maths-lib-2016/

That's not making a lot of assumptions about your compiler. The x87 floating point stack, for example, has been obsolete for a long time.

> If performance really matters then your abstract solution is almost certainly suboptimal and it's not good engineering practice to use it for the sake of readability.

I disagree. Let's look at actual examples. "Almost certainly" suboptimal abstractions are not what we've seen in Rust, for example, which leans on abstractions heavily.

galacticpony · on March 9, 2017

> Yes, it has since at least 2010 (and probably earlier).

See my reply to whitequark_, this seems to be loop-specific.

> The Vec classes can be used as Point structs.

Oh, sure. Which one though? How do I abstract this, again?

> So pad it out to 4 fields, using homogeneous coordinates.

In other words, "do something else than what I originally did" and "potentially leave 25% throughput on the table".

I think you're trying to pull a fast one on me.

> That's not making a lot of assumptions about your compiler. The x87 floating point stack, for example, has been obsolete for a long time.

Did you even read beyond the first paragraphs? There's five compiler-specific flags you'll have to get right for any of this to work.

Again, there's no free lunch...

> I disagree. Let's look at actual examples. "Almost certainly" suboptimal abstractions are not what we've seen in Rust, for example, which leans on abstractions heavily.

Could you just scroll up for a moment? We are talking about this because someone wrote the "abstract" version of squaring a number and some optimization didn't kick in, resulting in significantly degraded performance. And that's a trivial example! In a real codebase, you'll have to carefully audit compiler output to see if it does the right thing, potentially requiring you to rewrite code. For Rust, you also have the blessing that there is only one compiler...

whitequark_ · on March 8, 2017

> MSVC compiler doesn't do SROA, as far as I know.

In the future, consider verifying your extraordinary claims. Of course it does, and it took me about two minutes to demonstrate that: https://godbolt.org/g/9kT1NP

galacticpony · on March 9, 2017

"As far as I know" implies that I might not know, so that's not an extraordinary claim.

You haven't actually verified SROA. The scalar replacement didn't kick in for the non-inlined method, so this could be a result of loop-specific optimizations. The code is also far from optimal, but that's besides your point of course.

Also note that you're testing an RC of the latest version. There's a feature request for SROA from 2013, so it may be implemented by now: https://connect.microsoft.com/VisualStudio/feedback/details/...