Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Oh god please no.


explain.


It's a huge hammer and a considerably larger webpage footprint just so that a type of character isn't allowed to run amok, something that's not going to happen anyway in the vast majority of cases.


We have very different definitions of "huge hammer", in any decent modern browser (Chrome, Firefox, IE9) it takes less than 10ms to apply the mentioned code in 20 texts (using the linked 10 Google results as an example).


It also puts every single special character into its own span. That may negatively affect programs that trust HTML for copy-paste, for example Open Office, Microsoft Word (I think?), some IM clients, some text editors, etc. Doesn't seem worth it in the general case.

That said, if those sorts of things bother an individual, they could run that on the page themselves, I suppose, so it's good for that :)


I don't know OpenOffice but In most programs putting spans around a letter or word does not affect pasting.


10ms is a huge amount to spend for such an obscure fix. How many obscure corner cases do you think Google webpages have to account for? A lot more than 100, I would bet. So rough methods like these would be horrible for performance.


I invite you to look at the ridiculous amounts of JS lines twitter uses in its interface.


You're the one who made the 10ms claim, not me. If indeed it is faster than you claimed, then maybe it would be ok. Still probably not worth engineering time for a problem of this magnitude.


And Twitter is horridly slow on its website. That's what we're trying to avoid


It is a huge amount of CPU and memory and wasted resources to correct a problem sometimes and only for some edge cases.


It corrupts Unicode data because it splits on code points instead of grapheme clusters. When performed on 'Spin̈al Tap', it splits the base character U+006E (LATIN SMALL LETTER N) from the combining character U+0308 (COMBINING DIAERESIS) and results in the string 'Spin<span class="s_char">̈</span>al Tap', which contains the valid Unicode grapheme cluster '>̈'! If you were to split on grapheme clusters instead, the result would be 'Spi<span class="s_char">n̈</span>al Tap'. However, I still wouldn't support that solution because it could negatively affect text segmentation used by search engine indexing and natural language processing tools.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: