After trying with the css suggested here on Google inside Chrome, the problametic characters don't have the top going across too high like originally, but it gets capped on the top of the first line of the paragraph, instead of the line the character is actually at.
Doh, you're right. For some reason I had it in my head that the block itself wrapped onto multiple lines but I must have been thinking of something else.
It seems like there isn't really any CSS only solution for this without wrapping every character in its own element like jQueryIsAwesome suggested.
It's a huge hammer and a considerably larger webpage footprint just so that a type of character isn't allowed to run amok, something that's not going to happen anyway in the vast majority of cases.
We have very different definitions of "huge hammer", in any decent modern browser (Chrome, Firefox, IE9) it takes less than 10ms to apply the mentioned code in 20 texts (using the linked 10 Google results as an example).
It also puts every single special character into its own span. That may negatively affect programs that trust HTML for copy-paste, for example Open Office, Microsoft Word (I think?), some IM clients, some text editors, etc. Doesn't seem worth it in the general case.
That said, if those sorts of things bother an individual, they could run that on the page themselves, I suppose, so it's good for that :)
10ms is a huge amount to spend for such an obscure fix. How many obscure corner cases do you think Google webpages have to account for? A lot more than 100, I would bet. So rough methods like these would be horrible for performance.
You're the one who made the 10ms claim, not me. If indeed it is faster than you claimed, then maybe it would be ok. Still probably not worth engineering time for a problem of this magnitude.
It corrupts Unicode data because it splits on code points instead of grapheme clusters. When performed on 'Spin̈al Tap', it splits the base character U+006E (LATIN SMALL LETTER N) from the combining character U+0308 (COMBINING DIAERESIS) and results in the string 'Spin<span class="s_char">̈</span>al Tap', which contains the valid Unicode grapheme cluster '>̈'! If you were to split on grapheme clusters instead, the result would be 'Spi<span class="s_char">n̈</span>al Tap'. However, I still wouldn't support that solution because it could negatively affect text segmentation used by search engine indexing and natural language processing tools.