I program C signal processing code on x86. Sure, it's usually sufficient for my needs, but "perfectly-optimized assembly"? Not for those tight loops where you really want it. It's decent and if you hold the compiler's hand will get vectorized somewhat, but the code is still heavy.