I remember learning about compiler optimization at university. Sadly all my colleagues knew(and were taught) was Java and the Sun JVM we used at the time didn't even do most of the stuff we learned, not even the tail-recursion optimization, therefor the opinion of most of the students was that the professor should stop babbling about things that have no application in real life.
This was the one time i was on the side of academia...
Thankfully since then the HotSpot JVM has become one of the most advanced VMs in terms of JIT optimization, doing things that are so wild and crazy that other JITs can only dream of.
Quibbles: the strlen() example is poorly explained. It doesn't hoist it because it's "built-in", it hoists it because strlen() is marked "pure" (i.e. without side effects) in the declaration. Any function can get that attribute.
Actually I think this is a glibc/BSD difference. You're right that any pure function would be hoisted in the same way, but in fact strlen() is not marked pure in my OS X header (which looks to be the same as in FreeBSD). Compiling with -fno-builtin defeats the optimization, which shows that it is due to strlen being built-in on those platforms.
On my Linux machine, on the other hand, strlen() is marked pure, and there the optimization persists even with -fno-builtin.
There must be more to it than that. For example, if something in the loop body writes to s, then the result of "strlen" could change and the optimisation will be wrong.
So, it must know a bit about what strlen() does, mustn't it -- otherwise it wouldn't know that it has to ensure that \0 isn't written to s inside the loop body.
The compiler can see the loop, so it can optimize it. Hosting constant expressions out of loops is a long-standing optimization in all compilers. Here, the expression "s" is unchanged* inside the loop, so the compiler can hoist it. If the expression were "s[0]" it could do the same. But can it do it for strlen(s)? It can, if it knows that strlen() has no side effects. So gcc extends C's syntax to add a "pure" attribute to tell the compiler exactly that.
* Pedants will point out all subtleties about C pointer aliasing rules apply here. But for the purpose of explanation, assume that nothing assigning throuh s means "s is constant".
GCC can statically prove that `s` isn't changed at all in the loop body, just accessed. Thus the analysis goes: `strlen()` is pure, and its argument never changes, so its result never changes, so I can hoist it.
Note that C99 has added "restrict" for such cases, which essentially promises that the restricted pointer (and any pointers derived from it) is the only pointer to that chunk of memory.
Can GCC handle more complicated loops? For example what if I go several functions deep in in my loop with s? What if I call a function in a shared object? Does gcc do this analysis for all cases where a function is marked pure or is strlen a specific case?
The compiler can't see inside functions in the general case (because they're opaque binary blobs in a library that hasn't been linked yet), so no. It's limited to optimizing stuff it can "see", which broadly means the current function and anything inlinable into it.
But yes, if all your functions are marked inline and visible to the compiler, there's no theoretical limit to GCC's ability to hoist constants out of the innermost parts of the loop. In practice, there are surely some performance limits to how much inlining will be expanded.
Thanks. If this helps anyone else, the realisation I was missing was that the compiler must treat any write based off a pointer as a write to the thing pointed at. Which is conservative if there is no aliasing.
Oh, I remember getting upset last time with this "test". For example:
3. Multiplication by 2 to addition - integer
Will GCC transform an integer multiplication by 2 to addition?
The statement (not function) x * 2 is shift x left 1 bit on almost all compilers. Shift has a lot less dependencies than ADD/LEA and has better reciprocal throughput. Meh.
This surprises a lot of programmers. A barrel shifter is a big circuit for an ALU. Smaller than a multiplier obviously, but much larger than a ripple carry adder and somewhat larger even than the fancy pipelined adders used in fast CPUs.
It's the kind of thing that got skimped on in Atom. Lots of stuff runs surprisingly slow on Atom. Another one I find striking is that there are two pipes for ADD/SUB instructions, but ADC/SBB (the carry/borrow variants) are not just single-issue, but apparently unpipelined. Intel lists their throughput at 3 instead of 0.5!
> Intel lists their throughput at 3 instead of 0.5!
Agner Fog lists ADC/SBB on Atom as having latency 2 and throughput 1/2. I guess Intel could have made them run with ADD/SUB-level performance in cases where there are no intra-pair dependencies on the carry/borrow flag, but the extra interlocks for handling that were probably not worth the cost.
The comment about tail calls is wrong -- it's not an optimization, it's a guarantee about semantics of your program. Not optimizing tail calls is like making array dereference run in linear time instead of constant: it changes the programs you can write.
Irrespective of how it changes the semantics of (mostly functional) languages that rely on it for iteration, tail calling is also an optimization in languages that don't need to rely on it for this, and it is not wrong to describe it as such.
There are two things here: an optimization, and a semantics guarantee. Obviously, if your language provides the guarantee, then implementing it correctly isn't just an optimization. But the original article claims that the transformation proves that proper tail calling is "just an optimization", which it doesn't prove at all. It's still an important semantic guarantee which increases the expressiveness of a language. This is true regardless of whether it is the only means of expressing iteration in a language.
What the factorial example proves is that the set of recursive functions that can run in constant space is larger than just tail-recursive functions.
You're right to say that TCO guarantees increase the expressiveness of a language. If we could provide the same guarantees for an even larger class of recursive functions, it stands to reason our expressiveness would increase even further!
In a language that requires TCO then you are right, it's not about optimization it's about correctness. In a language (like c) that has no such semantics it is 'just' an optimization. In scheme however it is a correctness issue.
I have a little doubt about number 6. A switch() without a "default:" label has undefined behavior if x is negative or above 5, while nothing would happen in the cascading if's. Wouldn't a correct optimization of that be
void function(int x) {
switch (x) {
case 0: f0(); break;
case 1: f1(); break;
case 2: f2(); break;
case 3: f3(); break;
case 4: f4(); break;
case 5: f5(); break;
default:
}
}
Why does it have undefined behavior? Switch statement with no match and without the default case has defined behavior: the switch statement does nothing.