I find myself agreeing with the rejection. As much as I dislike the ridiculous amount of emojis in Unicode, and their increasingly widespread use, they do fit plain text communication, whereas an "external link" symbol does not. My heuristic is something I'll call SMS sniff test: would it make sense to type that symbol into a text message? If it wouldn't, then it doesn't belong in Unicode.
"External link" symbol tells you, "that last bit of differently-formatted text is an active element in this application, leading to an outside resource". It's not something that makes sense in plain text, because any external link in a plain text message is both visible and obviously a link.
Elsewhere in the thread someone mentioned the play/pause/stop symbols from cassette recorders/VCRs. But those have been used culturally as symbols denoting starting, pausing and stopping for decades now, so they're an idea communication tool that makes sense in plain text, and thus pass the SMS sniff test.
(Note that I'm not sure if all symbols in Unicode actually pass the SMS sniff test. I suppose the best option for those wanting "external link" symbol to be included would be to resubmit it as the DOUBLE ARROW POINTING TOP RIGHT OUT OF SQUARE symbol, or something like that.)
I don't really buy this. Google image search "external link symbol", the symbol is just generically useful.
I can buy that the name in unicode should be something other than "external link". It could be something more generic like "external reference". Boom, now it can be accepted because it's not tied to <a>.
The problem with the rejection is that "that last bit of differently-formatted text is an active element in this application, leading to an outside resource" is that it's kind of a strawman: why be so specific just to reject it because you've decided to be so specific?
Though all of that justification is kind of unnecessary. Really, I just find it ridiculous that the unicode group is going to suddenly become prude about a ubiquitous symbol yet have 20 different check mark symbols. Sure, if everyone made the "but there are 20 different check mark symbols so why can't I have <pet symbol>?" argument, unicode would be even more ridiculous. And that's an argument to reject <pet symbol>, not one of the most ubiquitous symbols on the internet.
"Unicode has a bunch of stupid symbols, so it should at least add the most useful symbols that we use on the internet" really is a good argument.
This made me stop and think because I've never really wondered about what belongs and doesn't belong in Unicode, and then I went looking at what strange corners Unicode has and wow the "Miscellaneous Technical" code block is such
a strange thing.
* It has APL symbols (checking the Wiki article on APL syntax and the APL standard, it seems that APL programs can be represented entirely in Unicode - which makes sense, but is still a little surprising).
* There's a benzene code point: ⌬
* "ERASE TO THE LEFT", better known as backspace, is a thing: ⌫
Unicode was designed to unify all existing character sets in use at the time. So there are lots of weird historical things which wouldn't necessarily be accepted in Unicode as new proposals.
If the "external link" symbol had occurred in some legacy character set, then it would have been included automatically.
But since it is a new proposal, it will be validated according to Unicodes policy for new symbols.
That's simple. The justification for most characters, including the exemplary one you mentioned, is that they were already encoded in some other repertoire. The consortium's task is to collect and unify them.
In particular there was a desire to make sure that a round trip from some legacy character set to Unicode and back to the legacy character set wouldn't lose information. So many characters that you would assume to be the same are given multiple code points to make that possible.
That reasoning would make every conceivable icon a valid target for inclusion.
To actually get the character included, you can't just reason that it might be used in a certain way, you will have to demonstrate it with actual and authentic use in the wild. The accepted proposal for the inclusion of the power symbol is a nice example of how this works.
If you can find a bunch of printed manuals or books or websites that actually use this icon in this manner you might be able to submit a successful proposal.
The web is full of those symbols, so I'm sure there are books (or online resources) explaining them. And if not, someone should do something about that.
I'm not opposed to the backspace symbol, but by that logic you could include anything in unicode as anything could a symbol you'd like to refer to.
I mean, you might want to say "External links can be identified by their □ marker" or "The Windows key is marked □ and can be found to the left of the space bar"
Unicode is intended to replace all previous codes for text, so from the outset and almost by definition it has sought to identify things like APL and encode those because if it didn't you still need an encoding for your old APL programs.
that page lists the NEXT PAGE symbol ⎘ and I wonder if that couldn't do as an "external link" symbol in a pinch, seeing as I've never seen it used as a next page symbol before.
Oh also I just wanna note that if we go by a previous convention, there is a perfectly good symbol for "this is an external link": back in the days when Geocities was relevant, it was pretty common to see external links marked by a little Earth gif after them, because it was a link to somewhere else on the World! Wide! Web.
If your browser feels like displaying Wingdings codepoints, something like this: http://egypt.urnash.com
Scripts like Egyptian Hieroglyphs are exactly the purpose of Unicode. But a bunch of other symbols are mostly there because they come from some preexisting character set which Unicode had to include in order to be a viable replacement for legacy character sets.
Yeah a lot of arguments are of the form "when Unicode can include the pile-of-poop-emoji, surely it should include <my favorite symbol>".
But the question is not if the icon is practical or silly, the question is if it makes sense as a Unicode character or should be handled at some other level like a GUI control or widget.
Should GUI icons, widgets and controls be covered by Unicode? That is a can of worms.
I understand the usefulness of the "external link" icon, but should it really be something you arbitrarily type into a text? Shouldn't it be part of the rendering of a link?
As for the pile-of-poop emoji there is a very good reason for it to be included in Unicode: it is one of the original emoji, made by Softbank in 1997.
The reason: Softbank and Docomo both used proprietary emoji sets, widely used in Japan for text messages. Not including these in Unicode would have resulted in a choice: be compatible with the rest of the world, or keep using the emojis that became part of their culture. Not satisfactory. So basically, they had to put emoji in Unicode for the Japanese to use it, and excluding Japan is not really an option if you want a universal standard.
SoftBank actually put emoji into the Private Use Area of Unicode before they were codified (just as they were in an unused area of Shift-JIS before that). The original iPhone 3GS use the PUA method.
The encoding of emoji was in order to unify Softbank/docomo/au under one set so that iPhones/Androids sold by the different carriers could send emoji among each other without relying on email translators (as had been the technical solution with feature phones until then)
It made total sense to include emojis in Unicode initially.
And a great idea to put them outside the 16-bit plane, so platforms would be forced to support characters beyond the BMP because of the public demand for emojis.
But I think going forward, emojis should be handled outside of Unicode. They are really small illustrations rather than characters or symbols.
Thus the inherent contradiction: Unicode is for text, it must include all existing character sets which it aims to unify, and previous character sets weren’t only used for text.
The obvious hack is to make a custom character set called “Rejected by Unicode”.
It would probably be easier just to design a png or svg icon for the UI widgets you need. Then you don't have to lobby a consortium and wait for them to accept your icon.
Yeah, it was basically a hack to extend the character repertoire in the time of 8-bit character sets. The kind of shenanigans Unicode was designed to replace.
You still see this kind of mischief when Outlook users type a smiley, and it is rendered as a "J" with the wingdings font. Readers of the mail where styling is stripped or who haven't the font installed just sees a confusing "J".
Unicode is full of whole blocks of weird sets of lines and borders. As far as I’m concerned adding the symbol just to make it easier to style links makes total sense.
Being pedantic, those box drawing characters are neither ASCII nor ANSI. They are part of the original IBM PC character sets. Windows refers to them as the OEM character sets.
> My heuristic is something I'll call SMS sniff test: would it make sense to type that symbol into a text message? If it wouldn't, then it doesn't belong in Unicode.
I accept that heuristic, and find myself agreeing with it for the purposes of the external link character. Let me try to convince you:
This is a real conversation:
A> Are you familiar with Homebrew? You can get it from brew.sh.
B> Ok. Where is brew.sh?
If we were not limited by plain text, we could use colour and underlining to distinguish that as a domain name. A lot of websites find this poor for accessibility reasons so they use the link character.
Another way to think about this is to imagine the link-character is pronounced "https://" as in:
* Are you familiar with Homebrew? You can get it from https://brew.sh.
* Are you familiar with Homebrew? You can get it from {link character} brew.sh.
Isn't that plain text? I'm not sure how to pronounce it, but there's a lot of emoji I don't know how to pronounce. Do you think this is important?
So if you get that far, how much further is this really?
* Are you familiar with Homebrew? You can get it from brew.sh {link character}.
There is no squiggle that I can not hypothesize some way of using in a text message. The question is more, is there already some existing demand for it, before you asked the question? To which the answer would seem to be "no".
That's a contingent answer. If you, say, set out to try to convince the world to use it that way, and created that demand, then by golly, that demand would exist at that point and the answer could change. But hypothesizing some possible text message that could use it isn't strong enough to argue for inclusion in the standard because every possible proposal passes that test.
That sounds like an issue that could be solved by updating the renderer's gtld list with the various new ones that have been coming out instead of adding invisible meta characters.
Phones already do it for .com, .gov, etc, but our cutesy .sh, .dev, .rocks, .xyz, etc will take a bit to catch up.
> That sounds like an issue that could be solved by updating the renderer's gtld list with the various new ones that have been coming out instead of adding invisible meta characters.
The request is for a code point to represent the visible external link character, not for an invisible control-code to decorate some structured data (which cannot "appear" on paper).
The use of symbols in text messages is one reason to include them in Unicode, but not the only one. Another is accessibility, particularly for blind people. The more commonly used symbols are in Unicode, the less we have to remind website authors to include alt text, or that the symbols in their custom icon font pose an accessibility problem. Also, co-opting an existing symbol, such as the degrees symbol (which I've observed some websites do for external links), is confusing when using a screen reader.
Wouldn't a screen-reader recognize a link even without the symbol? I can see the problem of the screen-reader telling the user that there's some garbage image at the end of the link, but the image really is garbage in that case, since I imagine the screen-reader would make it obvious that there's a link there.
Absolutely. But if a web author quite reasonably adds a symbol for the benefit of the sighted majority, using an unlabeled image, an icon font, or coopting an existing symbol (e.g. degrees), that leads to extra clutter or confusion for blind users. If the symbol is in Unicode, we can have a solution that works well for everyone.
Right. And I've come across websites where the alt text for their external-link image is something fairly verbose like "external link opens in new window". Now, imagine if instead, we had a standard symbol, and screen readers could map it to short sound effect that users would come to recognize.
My heuristic is something I'll call SMS sniff test...
That's certainly a useful sniff test, and I think you've made a good case with it. I wonder, though, whether one should also consider a "documentation sniff test", asking: would it make sense to type that symbol into some documentation one is writing about how to use, for example, some software? I think the answer to that question is a clear yes: it might make sense to put this symbol onto such a page, and although being on inert paper (it wouldn't also be an active element) it would certainly be a useful symbol in that hypothetical text.
In an RTL language, you don't even need to mirror it.
Or, to put it another way: if you think it needs to point to the right, is that because the text you read flows that way?
This then makes me wonder if the large portion of the world's population that read right-to-left find the currently widely-used external link symbol (as discussed in the article) a bit jarring.
> makes me wonder if the large portion of the world's population that read right-to-left find the currently widely-used external link symbol (as discussed in the article) a bit jarring
I'm not a regular user of SMS, but it seems to me like adding links to SMS messages is either already available or not a bad idea. The "SMS sniff test" seems pretty weak to me. A better one would be a printed document.
You can put a URL into an SMS message, but you can't put an anchor tag (link) which is comprised of [a hidden URL and visible link text]. The latter is the only one of these that benefits from the symbol in question, because the external nature of the hidden URL is disguised. The former doesn't benefit because the external nature of the URL is obvious.
Modern SMS readers will linkify the URL, but will set the visible text equal to the URL, so again it wouldn't benefit from the symbol.
This explanation mirrors the rejection rationale: you don't need a symbol to be a Unicode character if the document is already rich text; the symbol can be an image instead. If you have anchor tags, you probably also have image tags or CSS.
> This explanation mirrors the rejection rationale: you don't need a symbol to be a Unicode character if the document is already rich text;
No, that's not the rejection rationale. The rejection rationale isn't about having access to image tags or CSS. It's about hypertext (click link, go to another text) vs having text (can't click link, no other text). If that was the rationale, they wouldn't allow emojis in unicode, because you could insert them as images.
Sorry, let me rephrase in the way that I did in another comment: it seems that to be a codepoint, it needs to be useful in plain text scenarios.
Emoji are useful (arguably I guess) in plain text scenarios because they exist to convey additional information about an author's emotion, and that author might use plain text. The external link symbol is not useful in plain text scenarios because it exists to convey additional information about an author's preceeding hypertext, and that author isn't using hypertext.
I thought about SMS messages because they're plaintext. Printed documents are raster graphics; once the ink hits the paper, you don't have text anymore.
Unicode has a lot of old "markup" symbols in typography, like the right pointy finger ([1], example in [2]). I see the external link symbol as a widely-used markup symbol of the current age, and it could reasonably be in Unicode.
It serves the same function as footnote daggers. The Unicode consortium has odd priorities when they will happily add hundreds of emojis but more generally useful things are held in disregard.
On the other hand, I used to use both "play" and "link" symbols in past resumes, where I could have really used a proper link symbol. Ideally I wanted something universally known, but the box with an arrow that Wikipedia used wasn't as well known then, and was always a small image and not available as a character. I opted for the interlinked chain links symbol instead.
I would propose updating this SMS sniff test to a Wikipedia sniff test.
FWIW, interlinked chain links symbol is in the Unicode. And it's been the widely used symbol for hyperlink. The box-with-arrow one is a symbol for external link, which is a concept that matters only in few contexts, and which was usually displayed with a globe icon (also in Unicode).
I explicitly don't want it to be "Wikipedia sniff test", because that's equivalent to a "Webpage sniff test", which is equivalent to "anything goes", because icon fonts exist now and are used.
That said, other comments made me realize that Unicode is better described as "whatever was there in all the codepages around the world at the time of bootstrapping the Unicode standard" plus what's typically used in a written language, plus SMS sniff test.
I don’t think it does. On HN you can’t have links with custom text so an external link symbol would be pointless but seeing a dagger or [1] is reasonable. Similarly I don’t think seeing it in an SMS is unimaginable so by the standard of the parent comment I don’t think a dagger is the same as an external link symbol.
Plain text environments never encounter the problem that this symbol solves, which is to reveal that an anchor tag (which is not plain text, and typically has visible text that isn't the URL) leads to some other authority.
But for a text mode interface that does have full blown anchor tags -- not very mainstream -- you have a point.
In Reddit, you [link via Markdown](https://en.wikipedia.org/wiki/Markdown). The site renders this its own way, so it can add a "external link" character if needed - and that character is a part of the user interface; a side channel, not a main signal. E.g. copying from a Reddit post, you wouldn't want to find that character in the pasted text.
> It is unclear that the entity in question is actually an element of plain text, given the inevitable connection to its function in linking to other documents, and thus its coexistence with markup for links. Furthermore, the existing widespread practice of representing this sign on web pages using images (often specified via CSS styles) would be unlikely to benefit from attempting to encode a character for this image.
I don't really have a horse in this race, but... isn't the existing widespread practice of representing the sign using images attributable to the fact there is not a character available? What else do they want people to do? The point of adding the character is so that in the future people can use text instead of an image.
I think that the point is, unlike emojis which have been integrated in end products by some vendors, forcing Unicode comity to integrate them due to the Unicode consortium goal of backward compatibility, this icon is enforced in any vendor charset out there.
I think an issue with the arguments here is that each side is making its arguments based around quite different standards. Unicode have two standards for inclusion of symbols, one for adding new symbols and one for merging in characters from other character sets.
The rejection is based on the first standard for adding new characters.
The arguments against it seem to be based on taking the second standard as precedent. But I think as far as the Unicode committees are concerned, arguments based on the precedent of whatever random crap was grandfathered in from preexisting character sets do not apply to characters that are not from preexisting character sets. I think any argument that relies on “you already allow all this other random crap” must also argue that this symbol exists in some other character set which ought to be merged into Unicode, or that the standard for new characters should be more precedent-based/different, but I don’t see any such arguments other than some implied “common sense guess as to how one expects Unicode to work”
So the smartest way to go about this is to create a whole new character set for external link symbols? That seems like a really roundabout way to get this thing accepted.
One has to make a better argument for the external link symbol getting a codepoint assignment. TFA, for example, makes an argument based on emojis -- certainly that's strong enough to blunt the UTC's rejection rationale, but perhaps not enough to win approval outright.
Addressing all the arguments used in the rejection is important, of course. The fact that currently images are used is hardly dispositive: that's business as usual for missing Unicode assignments!!
But there are probably stronger arguments for rejection than adoption than the UTC made that it could make the next time this comes up.
The best argument for rejection that I can think of has to do with layering. An external link character isn't very useful without the actual external link, but the link belongs a layer up: not in the text, but in the markup. Well, if the link belongs a layer up, why not also the symbol? Alternatively, more markup can move into Unicode. There has been and will continue to be some pressure to move more semantics from markup to text, but it's probably best to resist that pressure.
On the other hand, a solid argument for adoption may involve text rendering of HTML. Think of lynx/elinks and other such browsers, which can't use images. An external link character could prove useful in distinguishing the rendering of linked text from, say, underlined non-linked text.
I'm surprised these arguments didn't come up. Or maybe they did -- I've not gone down the rabbit hole on this one, and probably I won't.
I think that most people find this hard to digest. The committee approves a ton of emojis with a dozen variations each, a gazillion characters that make it possible to make some weird word soup that breaks sane layout, but including one of the most commonly used symbols is somehow out of the question.
Because all of those symbols are used as part of text, while the external link symbol isn't. It is not part of the text itself. If you copied that text, you would not really expect the link symbol to be copied along with it.
I can imagine it showing up in an instructional book or article about how to indicate to your web site visitors that a hyperlink is to an external site, or perhaps in a comment about the Unicode standard body rejecting it. Neither of those are any more contrived than play/pause buttons being in an instructional manual about a television remote that contains those symbols.
I've seen a right-pointing magnifying glass printed in a book, and a left-pointing one would be equivalent for Arabic or Hebrew. I'd expect to see it in a school science book, for example.
And that's the real result of this, people will just settle on a collective commandeering of the closest symbol, making the decision irrelevant and retrospectively worse than pointless.
Annoy everyone enough, and we can redefine the nunicode codepoints as rationals and hand the denominator to another group.
could be because emoji are in the 'Miscellaneous Symbols and Pictographs' Unicode block, but the hourglass symbol is in 'Miscellaneous Technical', so it doesn't count as an emoji according to HN's filter
Alternatively, for a single-character symbol, there's either ⤤ (2924 north east arrow with hook) or ⮳ (2bb3 ribbon arrow up right). The combining square isn't exactly centered, depending on where I paste it.
The closest alternative I’ve seen is definitely north east arrow with hook. Personally, I like its simpler appearance. Ideally you would still want to label it on hover with some alt or title attribute text.
I would expect the north east arrow with hook to be used for "go back up", e.g. after having followed a footnote. The hook implies an element of "returning" for me.
I’ve seen it used there too. That’s probably a better use for it. Personally I’ve never had to call out external links before, and if I did, I think including or referencing the domain the link is on is more useful than a generic symbol that might also mean “opens in new window or tab” sometimes.
> Furthermore, the existing widespread practice of representing this sign on web pages using images (often specified via CSS styles) would be unlikely to benefit from attempting to encode a character for this image.
Obviously enough text-mode browsers would be likely to benefit.
This made my head spin. It's like saying people already using the JIS text encoding would be unlikely to benefit from adding Japanese to Unicode. Absolutely mind blowing.
No, it is saying that Unicode is used to encode what people think of as plain text, and that UI symbols that are not part of the text content are outside the scope of it.
I agree that text mode browsers would benefit, as they can hide the external nature of a URL behind non-URL link text, and not allow an image next to it.
But it seems that to be a codepoint, it needs to be useful in plain text scenarios. Text mode browsers which render links as discussed are not plain text.
> It is unclear that the entity in question is actually an element of plain text...
As the author pointed out, emojis seem to clearly violate this excuse. What possible justification do they give for emojis? There is no way [I originally inserted the poop emoji here, but it was stripped out. That kind of reinforces my point.] is an element of plain text.
The pile of poop would not ever get accepted as a character in unicode on its own. It's there because of including an existing character encoding set (IIRC from Japanese featurephone messaging standards), which had a pile of poop character and other emoji, so for purposes of compatibility all the characters of that set must get Unicode mappings.
So there's a situation of dual standards - all the weird characters that were included in any pre-Unicode text encodings for whatever arbitrary reasons are in Unicode and are always going to be there; but all the new weird characters need appropriate justification for inclusion and are likely to be denied.
Why wouldn't it be part of plain text? People type it as part of their text messages constantly. They certainly use it as a part of plain text.
Nobody would type an external link symbol. It would be added by the UI presentation layer. It is not part of plain text, because it is not typed as part of textual content. But the poop emoji is.
Emojis are used in the same context as other text to communicate in the same fashion; they were being independently recreated in many technologies and there was direct value in standardising it across systems.
> The UTC rejected the proposals to add “external link sign”, most recently in L2/12-169. It is unclear that the entity in question is actually an element of plain text
Nor is Pile of Poo, and that's in Unicode.
If external Link was added to Unicode, I expect it would be more used than 1000s of characters that are in it.
I really really don't understand this point. We have evidence that people use pile of poo in plaintext today in instant messaging all the time. Isn't this enough evidence that pile of poo belongs to plaintext? I'm not trying to be facetious; it's really puzzling to me. Since every single comment in this thread is about pile of poo, whereas it seems to me the worst possible example since it's such a widely used emoji.
It's only possible to use it in a text message because it's a unicode character. If external link was a character, would people use it? I don't know. Do you?
Maybe, maybe not, we don't know. What I know is that even my mom and grandma use pile of poo on facebook. Will they use "external link" symbol? I would guess not, but maybe.
Exactly, my point is pile of poo is such a bad example since recently it's probably in 1%-th percentile of most used unicode symbols in plaintext. It's similar to arguing something like ∆ or é does not belong to unicode.
I'd like the condition on the plain text to be relaxed. I would love to be able to use Unicode for creating basic interfaces. I would love to have basic interface elements such as a magnifying glass, "save" icons and external link to be part of the standard. Maybe they don't strictly find their use but for one people would find the use if they were there, and two, there are already (granted, grandfathered) elements that were used exactly for this purpose back in the era of plain text window interfaces.
I use the awesome versions of fonts in terminal for power line and stuff. But that's kind of the thing, it can't be ubiquitous and easily reused. If every system had it's interface font it would make a lot of stuff easier. An example would be glyphs on buttons that are almost always the same.
For this stuff we have fontawesome and ttf generators from svg’s. Just grab svg icons on thenounproject and make your way own font!
I happen to disagree with the unicode decision because there should be a section for basic commands so they can be rendered in diff fonts. But whatever — as I say don’t wait for standards if they don’t exist, make your own!
How many new characters are proposed each year? What’s the rejection rate? Maybe it’s a healthy process to reject when uncertain and then reconsider after popular appeals, especially if they are inundated with (mostly) questionable applications. It’s premature to judge the process without more context
After getting an emoji accepted I submitted a proposal for the external link symbol in 2018, trying to address the committee's concerns from the several earlier proposals. https://doubly.so/pub/External-Link-2018.pdf
It was summarily rejected with the nonsense statement (in full):
> Thank you for your submission. This was discussed during last week's UTC meeting. I was directed to let you know UTC feels that, as submitted, the proposal does not sufficiently demonstrate a plain text need for such a symbol. The context for usage is mark-up with links by default.
At this point I think a submission of 8 generic "arrow exiting square from top, top-right, etc." might have a better chance of being accepted as it holds a general meaning of exiting something and is not specific to hypertext as seems to be their objection.
> Its main rationale appears to be that the external link icon is not an element of plain text. I would agree that is the case. However I would like to point out that emoji and other similar useful symbols are not plain text either yet they have been accepted and continue to be accepted.
The author seems to be using a very limited definition of "plain text". Emoji are clearly used in the same way as latin-character text in communication, so they are effectively plain text.
I think a more effective argument may be that the external-link symbol should be allowed for the same reasons that the power on/off/toggle and eject symbols were allowed.
One of most peculiar tourist attractions in the Wieliczka salt mine near Krakow are signs with a shaft symbol that is not in Unicode. It looks like a # with a · in the middle and appears in texts instead of the word "shaft".
It is simply factuly incorrect to say that a commonly used sysmbol would not benifit from being included in fonts. How can a committee working in this field posibly fail to understand that?
It is not incorrect, and they do understand but they don't care. They have esoteric philosophical justifications that exclude some commonly used symbols while inventing never before seen ones. This is the problem with putting a tiny number of otherwise powerless people in charge things.
Maybe a few people on twitter? I've never seen it elsewhere and I barely use twitter so saying "everyone" is an exaggeration. Also, if everyone were annoyed by it, then it wouldn't be overused any more almost by definition.
Similar to case law in America, the committee might be avoiding potential abuse in future requests that point back to the rules being seemingly modified for one specific symbol.
Well U+1F4A9 exists because Japanese mobile operators modified their char sets to include emoji, and emoji has been part of Japanese culture for a long time.
Unicode aims to encode every character used for human communication in every culture, and Japan uses U+1F4A9.
This kind of reminded me of this I was reading this morning
http://seancubitt.blogspot.com/2020/04/allonomy-autonomy-and...
"a poem cannot 'contain in itself the reasons why it is so and not otherwise' (Coleridge) since it must be written on top of the infrastructure of a language and orthography that the poet rarely originates"
I'm actually more annoyed that they are there in the first place. Unicode can't seem to decide whether it's about just encoding text or also doing presentation/formatting (and now piles of poo and colorful emoji, what next, animations?), so now it's a bit of both, what a mess. It's making life hard and breaking things for applications that assume text is just text, and the presentational features are not enough to avoid having to implement your own presentational features in a program that needs to do presentation..
now you have applications where you can't find a string because it was written in Unicode bold letters instead of the letters' normal ASCII counterparts. And then you have applications that are confused about those bold letters because they are not actually surrounded by bold markup.
The worst part is that you can't criticize unicode without attracting a crowd of bullies who handwave about human languages being complicated (no, that does not justify poor design) or say it has to be this way because ugh shift-jis (or whatever nasty old encoding you can come up with) is not nice.
Well designed technology makes complex things simple. People defending poorly designed technology blame the problem for being complex.
You're so right... Unicode is making things too complex, and intruding on various formatting issues. Case in point: The skin tone modifiers. Or this stuff: https://en.wikipedia.org/wiki/Zero-width_joiner
There doesn't exist any parser that does it 100% correct. And parsing it is becoming so complex that it's causing bugs and vulnerabilities (it's not a coincidence that so many remote exploits use some kind of unicode to trigger it).
I think from the consortium's point of view it boils down to whether it makes sense to consider the link symbol separately from the text that makes up the link. They apparently decided it did not, but you could disagree. Of course, it does not really matter if you disagree, because they won't change their mind (unless you repropose it in a way that makes a different argument from the original proposal, which their rejection is basically saying could be considered).
I want a tool that scans English 🗣 text for names of unicode symbols. Then I want the English language to move to that symbolic writing. You know, because it is fun to see cultures evolve.
Didn't stop them from making Unicode an overcomplicated clusterfuck of combination codes though... Referring to things like this: https://en.wikipedia.org/wiki/Zero-width_joiner . It's become almost impossible to build an accurate parser now.
I see it often in intranets, wikis, and PDF documentation. It's an additional context clue that you're leaving a closed website. The most egregious examples give you a separate click-through screen when leaving the website. Government web pages seem to do this the most.
It would be nice if we could all use a standard icon or some other constant UI element--like a single underline for internal references and double underline for external? I imagine unique colors would be too difficult to standardize.
Specifically in those circumstances it's more important than normal; intranet, government, and PDF documents.
Intranet: bespoke documentation to internal processes verses generic documents used for reference.
Government: external references can be hijacked (asking for personal information) or may not represent the government but still have relevant info.
PDF documents: jumping around a PDF document (from a table of contents) is different than going to an external website. Especially if you don't have Internet access at that moment.
I think all of this is significantly more important since browser have been hiding more URLs.
Sure. Why not? If you're browsing a hotel you plan to stay at, you may want to spend some time learning about it before booking. Accidentally clicking an external link takes you away from that.
Techies will know to just open it in a new tab, but most will have to remember to browse back to the site manually. It can be a disruptive process.
If you print a web page the links are still there. The icon tells you "this here was a link" and you can go to the computer and look it up. If the link wasn't underlined and had a different colour but you printed in b&w it might be the only way to know that was a link, so I would argue that the icon is more useful in paper than on a live web, actually.
Even assuming that symbols you click on can't be in unicode, you don't click on the external link symbol, you click on the text of the link, and that text will be followed by the symbol. Even if you include the symbol in the link text, it's still part of the text both logically and in practice.
> The 'symbol fallacy’ is to confuse the fact that "symbols have semantic content" with "in text, it is customary to use the symbol directly for communication". These are two different concepts. An example is traffic signs and the communication of traffic engineers about traffic signs. In their (hand-)written communication the engineers are much more likely to use the words "stop sign" when referring to a stop sign, than to draw the image. Mathematicians are more likely to draw an integral sign and its limits and integrands than to write an equation in words.
So where "stop sign" is in Unicode, it's a bit nuanced as per the manner.
"External link" symbol tells you, "that last bit of differently-formatted text is an active element in this application, leading to an outside resource". It's not something that makes sense in plain text, because any external link in a plain text message is both visible and obviously a link.
Elsewhere in the thread someone mentioned the play/pause/stop symbols from cassette recorders/VCRs. But those have been used culturally as symbols denoting starting, pausing and stopping for decades now, so they're an idea communication tool that makes sense in plain text, and thus pass the SMS sniff test.
(Note that I'm not sure if all symbols in Unicode actually pass the SMS sniff test. I suppose the best option for those wanting "external link" symbol to be included would be to resubmit it as the DOUBLE ARROW POINTING TOP RIGHT OUT OF SQUARE symbol, or something like that.)