What kind of speed increase are you talking about? Your comment's logic seems to suggest that, for instance, C (a language without templates) is inherently slower than C++, which is not the case. Can you back your statements up with evidence?
Because templates are basically automated code copy-paste, the compiler can more easily inline at compile-time. Unfortunately, it also translates to lots more compiled code, which means slower compile times and fatter binaries. Still, writing something like the STL library in C with the same performance characteristics would be much more difficult, if not impossible.
I'm not a C++ expert, but that's my understanding of things, at least.
If you are worried about the performance hit caused by function calling through a function pointer you might want to give a look at http://sglib.sourceforge.net/.
Templates are much more powerful than C with macros.
Templates are more than copy/paste code, the compiler can make strong assumptions about the code path during the optimization phase because it's actual code.
It's not just a question of inlining calls but mainly a question of being able to remove branches altogether. Branches is the performance killer, remember?
Examples:
- with template specialization you can make "compile time" switches (horrible to reproduce with C + macros)
- you can compute values at compile time (not quite possible in C + macros)
- inlining permits copy elision (an optimization that doesn't exist in C since return by value is a non sequitur)
- the CRTP is faster than an object with vtables because it avoids indirect calls
- removing of dead branches, by inspecting the templates recursion, the compiler can determine which branches will never be taken and doesn't compile them
- probably more!