Hacker News new | past | comments | ask | show | jobs | submit login

Not sure what scripts you intended your comment about, but this is not true in general. If I type anything like किमपि (“kimapi”) and hit backspace, it turns into किमप (“kimapa”). That is, the following sequence of codepoints:

    ‎0915 DEVANAGARI LETTER KA
    ‎093F DEVANAGARI VOWEL SIGN I
    ‎092E DEVANAGARI LETTER MA
    ‎092A DEVANAGARI LETTER PA
    ‎093F DEVANAGARI VOWEL SIGN I
made of three grapheme clusters (containing 2, 1, and 2 codepoints respectively), turns after a single backspace into the following sequence:

    ‎0915 DEVANAGARI LETTER KA
    ‎093F DEVANAGARI VOWEL SIGN I
    ‎092E DEVANAGARI LETTER MA
    ‎092A DEVANAGARI LETTER PA
This is what I expect/find intuitive, too, as a user. Similarly अन्यच्च is made of 3 grapheme clusters but you hit backspace 7 times to delete it (though there I'd slightly have preferred अन्यच्च→अन्यच्→अन्य→अन्→अ instead of अन्यच्च→अन्यच्→अन्यच→अन्य→अन्→अन→अ that's seen, but one can live with this).



Looks like you're right. I don't have experience with languages like this one. I was thinking more of things like é (e followed by U+301), or 🇦🇧 (which is two regional indicator symbols that don't map to any current flag), or a snippet of Z̛̺͉̤̭͈̙A̧̦͉̗̩̞͙LG͈͎͍̺̖̹̘O̵̫ which has tons of combining marks but each cluster is still deleted with a single backspace.


Interesting. The rules seem to be different on different systems. Deleting two RIS symbols (whether they map to a flag or not) seems right in any case. Some other systems (Android included) will take the accents off separately when they are decomposed (but not for precomposed accented characters). Also note macOS takes just the accent off for Arabic (tested on U+062F U+064D).




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: