Even worse is that modern x86 (and probably other) CPUs also have instructions for 16-byte vector registers that can be used to copy or compare data much faster than 1 byte per cycle. Recent versions of glibc use some linker magic to pick the optimal code to use for strcmp, memcpy and friends based on the instruction set available to the CPU at runtime. Of course gcc and glibcxx's developers must not trust glibc and will sometimes replace calls to these functions with thier "optimized" builtin versions that use the lowest common denominator ISA. An easy way we got a 5% boost in througput in mongodb was to force them to call into glibc. https://github.com/mongodb/mongo/blob/master/pch.h#L47-56