Code Points

matthberg · on Dec 15, 2020

U+FE0E is really interesting, it forces monochrome emoji usage on the immediately preceded emoji (like a skin tone modifier or any other modifier). I have previously run into the issue of the play-pause characters (U+25B6, U+23F8) being inconsistently replaced with their color versions when I was trying to use them in a UI. It looks like this is a great guarantee that that won't happen.

Also is it seems hacker news automatically removes emoji, maybe this modifier would allow them to keep them (in b/w form) and still maintain the polished appearance.

saagarjha · on Dec 15, 2020

It won't, because U+FE0E is just a suggestion, not a mandate. If your system doesn't have an appropriate monochrome replacement it will just fallback on the colored emoji.

saurik · on Dec 16, 2020

That's not the behavior in the sample screenshot here: https://twitter.com/ridiculous_fish/status/10894210337932369... <- Chrome rendered a replacement character rather than fall back to the emoji.

renewiltord · on Dec 16, 2020

Interesting. My Chrome on Ubuntu seems to have differing behaviour.

saagarjha · on Dec 16, 2020

Safari renders an emoji for me in the tweet right above that one.

sverhagen · on Dec 16, 2020

I was shocked just now reading about this "variation selector", what is next, conditionals and variables??? I thought (still think) Unicode is for text, these are more like control characters in some markup language or a transmission protocol. It seems gross. I obviously don't know, but something tells me this has little real world support and degrades poorly?

pjc50 · on Dec 16, 2020

Variant selection was required for some languages, and was a convenient way to implement "combining" characters. Once the feature existed its use got extended to all sorts of cases, just like the flags are "ligatures".

earthboundkid · on Dec 16, 2020

On my site, I wanted the non-emoji version of on a button, so I had to use U+FE0E to force it to not be red on some browsers.

ridiculous_fish · on Dec 16, 2020

U+FE0E may also make the previous character less wide. It's very surprising that appending a code point can make wcswidth decrease.

https://twitter.com/ridiculous_fish/status/10894161143611023...

boogies · on Dec 15, 2020

> Also is it seems hacker news automatically removes emoji

And some other characters like UPPER HALF BLOCK, LOWER HALF BLOCK, FULL BLOCK, LEFT HALF BLOCK, & RIGHT HALF BLOCK, and LIGHT SHADE, MEDIUM SHADE, & DARK SHADE

shadowfaxRodeo · on Dec 15, 2020

There are some really great Egyptian Hieroglyphs including this derpy bird 𓅮 and one far too rude to print here.

amake · on Dec 15, 2020

More than one: 𓂸 𓂹 𓂺

earthboundkid · on Dec 16, 2020

It goes to show that making up a character is a permanent change in human history. Several thousand years later, we're stuck with their characters! I wish the emoji committee would stop adding things that are clearly passing fads like burritos and hot dogs. We're going to have to support this for the rest of human history, please show a little restraint!

xyzzy_plugh · on Dec 16, 2020

I'm pretty sure burritos and hot dogs will culturally outlive most other modern cultural fads.

Topologically, most foods are burritos or hot dogs anyways.

neatze · on Dec 16, 2020

Topologically, most things are donuts.

WantonQuantum · on Dec 16, 2020

I'd put money on hot dogs and burritos outlasting Unicode.

Cthulhu_ · on Dec 16, 2020

Or like smiling, lol.

kangalioo · on Dec 15, 2020

What's that second one?

koolba · on Dec 16, 2020

A poorly executed Prince Albert.

Groxx · on Dec 15, 2020

I personally like to call "U+200B ZERO WIDTH SPACE" the "breaking non-space", to go along with "U+00A0 NO-BREAK SPACE" (as I usually hear it called "a non-breaking space")

tobr · on Dec 15, 2020

So I guess you can combine them to get a breaking space? Or would you get a non-breaking non-space?

dhosek · on Dec 16, 2020

In justified text, depending on the order you should get either a line one space short (zero-width space after no-break space) or a line indented by one space (zero-width space before no-break space).

flohofwoe · on Dec 16, 2020

> This is because U+FEFF had become a special beacon called the byte order mark, that was placed on the beginning of some UTF-8 files.

Shouldn't this be "UTF-16"? "Byte order" doesn't make sense with UTF-8 encoding, and I've only ever seen BOMs in files created in Windows tools (where UTF-16 is fairly common).

mkl · on Dec 16, 2020

Both. It's sometimes used in UTF-8 to indicate that the file is encoded in UTF-8 (and not byte order). Personally, I think that software hiding extra unnecessary invisible stuff in my files is really annoying.

Some more info here: https://en.wikipedia.org/wiki/Byte_order_mark

pjc50 · on Dec 16, 2020

Yeah, UTF-16 or more technically the thing you get if you build a Windows C++ program with "_UNICODE" defined: https://docs.microsoft.com/en-us/windows/win32/learnwin32/wo...

boffinism · on Dec 15, 2020

The fact that the example demonstrating the box drawing characters is broken on mobile tells you all you need to know...

djxfade · on Dec 15, 2020

Works fine for me (iOS)

rurban · on Dec 16, 2020

Im missing the wrongly categorized 2 korean hangul fillers. They are in identifiers, but are not. So languages accepting identifiers and care about security (there are only two) must reject them. https://github.com/perl11/cperl/issues/166

Waterluvian · on Dec 16, 2020

So, is a Unicode rendering engine unbelievably tedious to implement?

umanwizard · on Dec 16, 2020

Text shaping engines, which is probably the closest thing to what you mean by “Unicode rendering engine”, are incredibly complicated.

For example, look how long the Microsoft text-shaping docs are for just one script, Tibetan: https://docs.microsoft.com/en-us/typography/script-developme... . Then look at the table of contents and note that there are a bunch of other sections for various other complex scripts.

pascalmahe · on Dec 16, 2020

Lots of game codepoints. I'm especially impressed with the mahjong set. Less so with the dominoes, they seem near-unreadable in any size below 14pt.

Zaheer · on Dec 16, 2020

You'd be surprised at how much regular folks on IG / Twitter are exposed to unicode characters. People lͦͯoͦͯvͦͯeͦͯ styling their text.

sllabres · on Dec 15, 2020

ma͢d͟n͡e͡ss :D

suyash · on Dec 15, 2020

Any ideas on how to use this, any example code/apps?

saagarjha · on Dec 15, 2020

tempodox · on Dec 16, 2020

For the non-blind, that's ‘BRAILLE PATTERN BLANK’, U+2800.