I am not sure about the other but #3 is completely superseded by gcc's optimization algorithms. Besides, is this kind of low level stuff really something a developer should be wasting their time with?
You trust GCC way too much. Certainly one should do a bit of low-level profiling before making such changes, but I have actually never seen GCC intelligently unroll a loop in all my years of C programming. It completely ignores constant propagation when doing so (even in gcc 4.4!), making the unrolling algorithm basically useless, since constant propagation is far more useful than the savings gained from taking fewer jumps.
I actually have one case where I made a function 3 times shorter code-wise and run over 2x faster simply by unrolling it completely--the rules for deriving values based on the loop index were so complicated that they were actually longer than the unrolled loop itself.
The recommendationi in that case seems to be to write the comparison chain yourself. How is my comparison chain better than the compiler's?