Interesting! I hadn't thought this deeply about text editing before. I disagree ...

jakear · on Oct 29, 2019

VSCode does the "correct" behavior of Bad #3, but doesn't even need to do the "bad" part about pushing the bytewise carat position around, as it logically maintains two characters, but visually coalesces both the middle position and the front together. Wonder why it wasn't mentioned.

mjevans · on Oct 29, 2019

It's probably bad because there isn't an additional kludge added: decomposition of combined character entities for editing. This would involve a concept of sub-character (code-point) 'tabs' (which ideally would be distinct UTF-8 entities).

Mathnerd314 · on Oct 29, 2019

Another option would be that deleting the 'a' doesn't completely delete it but instead replaces it with a zero-width space or zero-width non-joiner, so it looks like Bad #1 (but is Unicode-compliant) and hitting delete again gives Bad #3.

The whole example is a bit contrived though, nobody is going to enter in a skin tone modifier character by hand in daily use. They'll select an appropriately-colored emoji.

jakear · on Oct 29, 2019

IMO, having delete trigger a zero-width space insertion would be the worst option. I mention in a sibling comment that VSCode gets around this by having two separate logical carat positions combined into a single visual position. So the byte offset of the cursor changes as expected, while still maintaining "Unicode correctness", for whatever thats worth.