U+FE0E is really interesting, it forces monochrome emoji usage on the immediately preceded emoji (like a skin tone modifier or any other modifier). I have previously run into the issue of the play-pause characters (U+25B6, U+23F8) being inconsistently replaced with their color versions when I was trying to use them in a UI. It looks like this is a great guarantee that that won't happen.
Also is it seems hacker news automatically removes emoji, maybe this modifier would allow them to keep them (in b/w form) and still maintain the polished appearance.
It won't, because U+FE0E is just a suggestion, not a mandate. If your system doesn't have an appropriate monochrome replacement it will just fallback on the colored emoji.
I was shocked just now reading about this "variation selector", what is next, conditionals and variables??? I thought (still think) Unicode is for text, these are more like control characters in some markup language or a transmission protocol. It seems gross. I obviously don't know, but something tells me this has little real world support and degrades poorly?
Variant selection was required for some languages, and was a convenient way to implement "combining" characters. Once the feature existed its use got extended to all sorts of cases, just like the flags are "ligatures".
> Also is it seems hacker news automatically removes emoji
And some other characters like UPPER HALF BLOCK, LOWER HALF BLOCK, FULL BLOCK, LEFT HALF BLOCK, & RIGHT HALF BLOCK, and LIGHT SHADE, MEDIUM SHADE, & DARK SHADE
It goes to show that making up a character is a permanent change in human history. Several thousand years later, we're stuck with their characters! I wish the emoji committee would stop adding things that are clearly passing fads like burritos and hot dogs. We're going to have to support this for the rest of human history, please show a little restraint!
I personally like to call "U+200B ZERO WIDTH SPACE" the "breaking non-space", to go along with "U+00A0 NO-BREAK SPACE" (as I usually hear it called "a non-breaking space")
In justified text, depending on the order you should get either a line one space short (zero-width space after no-break space) or a line indented by one space (zero-width space before no-break space).
> This is because U+FEFF had become a special beacon called the byte order mark, that was placed on the beginning of some UTF-8 files.
Shouldn't this be "UTF-16"? "Byte order" doesn't make sense with UTF-8 encoding, and I've only ever seen BOMs in files created in Windows tools (where UTF-16 is fairly common).
Both. It's sometimes used in UTF-8 to indicate that the file is encoded in UTF-8 (and not byte order). Personally, I think that software hiding extra unnecessary invisible stuff in my files is really annoying.
Im missing the wrongly categorized 2 korean hangul fillers. They are in identifiers, but are not. So languages accepting identifiers and care about security (there are only two) must reject them.
https://github.com/perl11/cperl/issues/166
Text shaping engines, which is probably the closest thing to what you mean by “Unicode rendering engine”, are incredibly complicated.
For example, look how long the Microsoft text-shaping docs are for just one script, Tibetan: https://docs.microsoft.com/en-us/typography/script-developme... . Then look at the table of contents and note that there are a bunch of other sections for various other complex scripts.
Also is it seems hacker news automatically removes emoji, maybe this modifier would allow them to keep them (in b/w form) and still maintain the polished appearance.