Hacker News new | past | comments | ask | show | jobs | submit login
Code Points (github.com/codepoints)
141 points by Tomte on Dec 15, 2020 | hide | past | favorite | 36 comments



U+FE0E is really interesting, it forces monochrome emoji usage on the immediately preceded emoji (like a skin tone modifier or any other modifier). I have previously run into the issue of the play-pause characters (U+25B6, U+23F8) being inconsistently replaced with their color versions when I was trying to use them in a UI. It looks like this is a great guarantee that that won't happen.

Also is it seems hacker news automatically removes emoji, maybe this modifier would allow them to keep them (in b/w form) and still maintain the polished appearance.


It won't, because U+FE0E is just a suggestion, not a mandate. If your system doesn't have an appropriate monochrome replacement it will just fallback on the colored emoji.


That's not the behavior in the sample screenshot here: https://twitter.com/ridiculous_fish/status/10894210337932369... <- Chrome rendered a replacement character rather than fall back to the emoji.


Interesting. My Chrome on Ubuntu seems to have differing behaviour.


Safari renders an emoji for me in the tweet right above that one.


I was shocked just now reading about this "variation selector", what is next, conditionals and variables??? I thought (still think) Unicode is for text, these are more like control characters in some markup language or a transmission protocol. It seems gross. I obviously don't know, but something tells me this has little real world support and degrades poorly?


Variant selection was required for some languages, and was a convenient way to implement "combining" characters. Once the feature existed its use got extended to all sorts of cases, just like the flags are "ligatures".


On my site, I wanted the non-emoji version of on a button, so I had to use U+FE0E to force it to not be red on some browsers.


U+FE0E may also make the previous character less wide. It's very surprising that appending a code point can make wcswidth decrease.

https://twitter.com/ridiculous_fish/status/10894161143611023...


> Also is it seems hacker news automatically removes emoji

And some other characters like UPPER HALF BLOCK, LOWER HALF BLOCK, FULL BLOCK, LEFT HALF BLOCK, & RIGHT HALF BLOCK, and LIGHT SHADE, MEDIUM SHADE, & DARK SHADE


There are some really great Egyptian Hieroglyphs including this derpy bird 𓅮 and one far too rude to print here.


More than one: 𓂸 𓂹 𓂺


It goes to show that making up a character is a permanent change in human history. Several thousand years later, we're stuck with their characters! I wish the emoji committee would stop adding things that are clearly passing fads like burritos and hot dogs. We're going to have to support this for the rest of human history, please show a little restraint!


I'm pretty sure burritos and hot dogs will culturally outlive most other modern cultural fads.

Topologically, most foods are burritos or hot dogs anyways.


Topologically, most things are donuts.


I'd put money on hot dogs and burritos outlasting Unicode.


Or like smiling, lol.


What's that second one?


A poorly executed Prince Albert.


I personally like to call "U+200B ZERO WIDTH SPACE" the "breaking non-space", to go along with "U+00A0 NO-BREAK SPACE" (as I usually hear it called "a non-breaking space")


So I guess you can combine them to get a breaking space? Or would you get a non-breaking non-space?


In justified text, depending on the order you should get either a line one space short (zero-width space after no-break space) or a line indented by one space (zero-width space before no-break space).


> This is because U+FEFF had become a special beacon called the byte order mark, that was placed on the beginning of some UTF-8 files.

Shouldn't this be "UTF-16"? "Byte order" doesn't make sense with UTF-8 encoding, and I've only ever seen BOMs in files created in Windows tools (where UTF-16 is fairly common).


Both. It's sometimes used in UTF-8 to indicate that the file is encoded in UTF-8 (and not byte order). Personally, I think that software hiding extra unnecessary invisible stuff in my files is really annoying.

Some more info here: https://en.wikipedia.org/wiki/Byte_order_mark


Yeah, UTF-16 or more technically the thing you get if you build a Windows C++ program with "_UNICODE" defined: https://docs.microsoft.com/en-us/windows/win32/learnwin32/wo...


The fact that the example demonstrating the box drawing characters is broken on mobile tells you all you need to know...


Works fine for me (iOS)


Im missing the wrongly categorized 2 korean hangul fillers. They are in identifiers, but are not. So languages accepting identifiers and care about security (there are only two) must reject them. https://github.com/perl11/cperl/issues/166


So, is a Unicode rendering engine unbelievably tedious to implement?


Text shaping engines, which is probably the closest thing to what you mean by “Unicode rendering engine”, are incredibly complicated.

For example, look how long the Microsoft text-shaping docs are for just one script, Tibetan: https://docs.microsoft.com/en-us/typography/script-developme... . Then look at the table of contents and note that there are a bunch of other sections for various other complex scripts.


Lots of game codepoints. I'm especially impressed with the mahjong set. Less so with the dominoes, they seem near-unreadable in any size below 14pt.


You'd be surprised at how much regular folks on IG / Twitter are exposed to unicode characters. People lͦͯoͦͯvͦͯeͦͯ styling their text.


ma͢d͟n͡e͡ss :D


Any ideas on how to use this, any example code/apps?



For the non-blind, that's ‘BRAILLE PATTERN BLANK’, U+2800.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: