If and only if someone has spent the time to write optimizations for your specific platform.
GCC for AVR is absolutely abysmal. It has essentially no optimizations and almost always emits assembly that is tens of times slower than handwritten assembly.
For just a taste of the insanity, how would you walk through a byte array in assembly? You'd load a pointer to a register, load the value at that pointer, then increment the pointer. AVR devices can load and post-increment as a single instruction. This is not even remotely what GCC does. GCC will load your pointer into a register, then for each iteration it adds the index to the pointer, loads the value with the most expensive instruction possible, then subtracts the index from the pointer.
In assembly, the correct AVR method takes two cycles per iteration. The GCC method takes seven or eight.
For every iteration in every loop. If you use an int instead of a byte for your index, you've added two to four more cycles to each loop. (For 8 bit architectures obviously)
I've just spent the last three weeks carefully optimizing assembly for a ~40x overall improvement. I have a *lot* to say about GCC right now.
GCC for AVR is absolutely abysmal. It has essentially no optimizations and almost always emits assembly that is tens of times slower than handwritten assembly.
For just a taste of the insanity, how would you walk through a byte array in assembly? You'd load a pointer to a register, load the value at that pointer, then increment the pointer. AVR devices can load and post-increment as a single instruction. This is not even remotely what GCC does. GCC will load your pointer into a register, then for each iteration it adds the index to the pointer, loads the value with the most expensive instruction possible, then subtracts the index from the pointer.
In assembly, the correct AVR method takes two cycles per iteration. The GCC method takes seven or eight.
For every iteration in every loop. If you use an int instead of a byte for your index, you've added two to four more cycles to each loop. (For 8 bit architectures obviously)
I've just spent the last three weeks carefully optimizing assembly for a ~40x overall improvement. I have a *lot* to say about GCC right now.