My theory is that it's also being written to the RPi's serial port console, which runs at 115200 baud by default, so 11520 characters/second or ~7ms to write 80 characters.
If I understood the article correctly, the ~1ms cost came from having rsyslog running, not from printk. Killing rsyslog led to a ~1ms speedup, but it was removing printk() that led to the rest.
Printing to the console on the RPi could involve either or both of the serial port and the software-character-generator-based text-mode video output. Depending on screen resolution, the video output might be quite slow to print to, especially if it had to scroll a line, depending on exactly how it's implemented.
> ...except that in that case, why is it trying to output to software-based video?
I don't have much experience with rsyslog, but syslog-ng can be configured to log to all consoles. [0] I know that consoles are usually configured to blank themselves after a period of inactivity. I wonder if a blanked console can be written to faster than an non-blanked one. [1]
Edit: The Fine Article supports my assumption about how rsyslog is configured on the RPi. Search for "printk". Also remember that -when syslog is disabled- printks are written to the active console.
[0] The default configuration appears to log everything to /dev/tty12. I see no reason why you couldn't log everything to -I guess- /dev/console. (Or, failing that, /dev/tty[0-12].)
[1] I know that screen-scrolling updates that happen in a console that is not the current one (or that happen in a screen window that's not the current one) are substantially faster than on-screen screen-scrolling updates.