Hacker News new | past | comments | ask | show | jobs | submit login
Why isn't the external link symbol in Unicode? (2018) (dafoster.net)
167 points by networked on April 30, 2020 | hide | past | favorite | 197 comments



I find myself agreeing with the rejection. As much as I dislike the ridiculous amount of emojis in Unicode, and their increasingly widespread use, they do fit plain text communication, whereas an "external link" symbol does not. My heuristic is something I'll call SMS sniff test: would it make sense to type that symbol into a text message? If it wouldn't, then it doesn't belong in Unicode.

"External link" symbol tells you, "that last bit of differently-formatted text is an active element in this application, leading to an outside resource". It's not something that makes sense in plain text, because any external link in a plain text message is both visible and obviously a link.

Elsewhere in the thread someone mentioned the play/pause/stop symbols from cassette recorders/VCRs. But those have been used culturally as symbols denoting starting, pausing and stopping for decades now, so they're an idea communication tool that makes sense in plain text, and thus pass the SMS sniff test.

(Note that I'm not sure if all symbols in Unicode actually pass the SMS sniff test. I suppose the best option for those wanting "external link" symbol to be included would be to resubmit it as the DOUBLE ARROW POINTING TOP RIGHT OUT OF SQUARE symbol, or something like that.)


I don't really buy this. Google image search "external link symbol", the symbol is just generically useful.

I can buy that the name in unicode should be something other than "external link". It could be something more generic like "external reference". Boom, now it can be accepted because it's not tied to <a>.

The problem with the rejection is that "that last bit of differently-formatted text is an active element in this application, leading to an outside resource" is that it's kind of a strawman: why be so specific just to reject it because you've decided to be so specific?

Though all of that justification is kind of unnecessary. Really, I just find it ridiculous that the unicode group is going to suddenly become prude about a ubiquitous symbol yet have 20 different check mark symbols. Sure, if everyone made the "but there are 20 different check mark symbols so why can't I have <pet symbol>?" argument, unicode would be even more ridiculous. And that's an argument to reject <pet symbol>, not one of the most ubiquitous symbols on the internet.

"Unicode has a bunch of stupid symbols, so it should at least add the most useful symbols that we use on the internet" really is a good argument.


This made me stop and think because I've never really wondered about what belongs and doesn't belong in Unicode, and then I went looking at what strange corners Unicode has and wow the "Miscellaneous Technical" code block is such a strange thing.

* It has APL symbols (checking the Wiki article on APL syntax and the APL standard, it seems that APL programs can be represented entirely in Unicode - which makes sense, but is still a little surprising).

* There's a benzene code point: ⌬

* "ERASE TO THE LEFT", better known as backspace, is a thing: ⌫

...and a little more bizarro: https://en.wikipedia.org/wiki/Miscellaneous_Technical


Unicode was designed to unify all existing character sets in use at the time. So there are lots of weird historical things which wouldn't necessarily be accepted in Unicode as new proposals.

If the "external link" symbol had occurred in some legacy character set, then it would have been included automatically.

But since it is a new proposal, it will be validated according to Unicodes policy for new symbols.


> what belongs and doesn't belong in Unicode

That's simple. The justification for most characters, including the exemplary one you mentioned, is that they were already encoded in some other repertoire. The consortium's task is to collect and unify them.


In particular there was a desire to make sure that a round trip from some legacy character set to Unicode and back to the legacy character set wouldn't lose information. So many characters that you would assume to be the same are given multiple code points to make that possible.



APL is a means of communicating mathematical ideas between people, so it shouldn't be too surprising that it's in Unicode.


The backspace symbol makes perfect sense. You may very well want to say "Press ⌫ to delete the selected object".


But you may also say "Next to the link you will see [EXTERNAL LINK] which indicates that the link leads to another website."


I guess that is a good point, but I guess that the backspace symbol gets a bit of a free ride by literally being on your keyboard.


That reasoning would make every conceivable icon a valid target for inclusion.

To actually get the character included, you can't just reason that it might be used in a certain way, you will have to demonstrate it with actual and authentic use in the wild. The accepted proposal for the inclusion of the power symbol is a nice example of how this works.

If you can find a bunch of printed manuals or books or websites that actually use this icon in this manner you might be able to submit a successful proposal.


The web is full of those symbols, so I'm sure there are books (or online resources) explaining them. And if not, someone should do something about that.


I'm not opposed to the backspace symbol, but by that logic you could include anything in unicode as anything could a symbol you'd like to refer to.

I mean, you might want to say "External links can be identified by their □ marker" or "The Windows key is marked □ and can be found to the left of the space bar"


Unicode is intended to replace all previous codes for text, so from the outset and almost by definition it has sought to identify things like APL and encode those because if it didn't you still need an encoding for your old APL programs.


I made this thing once upon a time to browse all the symbols http://tingletech.github.io/unicodetoy/


that page lists the NEXT PAGE symbol ⎘ and I wonder if that couldn't do as an "external link" symbol in a pinch, seeing as I've never seen it used as a next page symbol before.


Let me pass a few things by your sniffer, here.

Here's some random characters from the "Math Symbols" page of OSX's character viewer.

⨐⨔⊹⊰⨶⨹⩫⫸⧯⧼

Did you know that Unicode has an entire set of dominos in it?

🀱🀲🀳🀴🀵🀶🀷🀸🀹🀺🀻🀼🀽🀾🀿🁀🁁🁂🁃🁄🁅🁆🁇🁈🁉🁊🁋🁌🁍🁎🁏🁐🁑🁒🁓🁔🁕🁖🁗🁘🁙🁚🁛🁜🁝🁞🁟🁠🁡

Twice: once horizontally, once vertically.

🁣🁤🁥🁦🁧🁨🁩🁪🁫🁬🁭🁮🁰🁱🁲🁳🁴🁵🁶🁷🁸🁹🁺🁻🁼🁽🁾🁿🂀🂁🂂🂃🂄🂅🂆🂇🂈🂉🂊🂋🂌🂍🂎🂏🂐🂑🂒🂓

I know I sure use Ogham runes in my daily text communications on a constant basis.

ᚃᚄᚓᚈᚚᚙ᚛ᚘ

And Egyptian hieroglyphs.

𓁑𓀳𓂩𓃋𓇀𓈣𓇻𓇼𓇽𓀫

Oh hey and here's a "next page" and "previous page" symbol that "external link" could comfortably sit next to.

⎘⎗

And some miscellaneous arrows.

⤥↪︎↴↶↺⥀⟳⤩⤭⤱⥹⥰⥼⥺⥽⥬⇆⇼⇟⥣↬⇴


Oh also I just wanna note that if we go by a previous convention, there is a perfectly good symbol for "this is an external link": back in the days when Geocities was relevant, it was pretty common to see external links marked by a little Earth gif after them, because it was a link to somewhere else on the World! Wide! Web.

If your browser feels like displaying Wingdings codepoints, something like this: http://egypt.urnash.com


I really like your website btw. Very beautiful.


Thanks!


Scripts like Egyptian Hieroglyphs are exactly the purpose of Unicode. But a bunch of other symbols are mostly there because they come from some preexisting character set which Unicode had to include in order to be a viable replacement for legacy character sets.


Yeah a lot of arguments are of the form "when Unicode can include the pile-of-poop-emoji, surely it should include <my favorite symbol>".

But the question is not if the icon is practical or silly, the question is if it makes sense as a Unicode character or should be handled at some other level like a GUI control or widget.

Should GUI icons, widgets and controls be covered by Unicode? That is a can of worms.

I understand the usefulness of the "external link" icon, but should it really be something you arbitrarily type into a text? Shouldn't it be part of the rendering of a link?


As for the pile-of-poop emoji there is a very good reason for it to be included in Unicode: it is one of the original emoji, made by Softbank in 1997.

The reason: Softbank and Docomo both used proprietary emoji sets, widely used in Japan for text messages. Not including these in Unicode would have resulted in a choice: be compatible with the rest of the world, or keep using the emojis that became part of their culture. Not satisfactory. So basically, they had to put emoji in Unicode for the Japanese to use it, and excluding Japan is not really an option if you want a universal standard.

It did open an Pandora box though.


SoftBank actually put emoji into the Private Use Area of Unicode before they were codified (just as they were in an unused area of Shift-JIS before that). The original iPhone 3GS use the PUA method.

The encoding of emoji was in order to unify Softbank/docomo/au under one set so that iPhones/Androids sold by the different carriers could send emoji among each other without relying on email translators (as had been the technical solution with feature phones until then)


It made total sense to include emojis in Unicode initially.

And a great idea to put them outside the 16-bit plane, so platforms would be forced to support characters beyond the BMP because of the public demand for emojis.

But I think going forward, emojis should be handled outside of Unicode. They are really small illustrations rather than characters or symbols.


>Should GUI icons, widgets and controls be covered by Unicode? That is a can of worms

They already got all the Wingdings symbols https://en.wikipedia.org/wiki/Wingdings


Yeah because Wingdings was a preexisting character set and Unicode was designed to be a superset of all existing character sets in use.


Thus the inherent contradiction: Unicode is for text, it must include all existing character sets which it aims to unify, and previous character sets weren’t only used for text.

The obvious hack is to make a custom character set called “Rejected by Unicode”.


The goal was to eliminate "we can't switch to Unicode, it doesn't have ® symbols" objections.


It would probably be easier just to design a png or svg icon for the UI widgets you need. Then you don't have to lobby a consortium and wait for them to accept your icon.


To offload the design to the font foundry is even easier :)


Unicode Russell's paradox


Isn’t Wingdings just a font that can work with whatever character set?


Yeah, it was basically a hack to extend the character repertoire in the time of 8-bit character sets. The kind of shenanigans Unicode was designed to replace.

You still see this kind of mischief when Outlook users type a smiley, and it is rendered as a "J" with the wingdings font. Readers of the mail where styling is stripped or who haven't the font installed just sees a confusing "J".


Unicode is full of whole blocks of weird sets of lines and borders. As far as I’m concerned adding the symbol just to make it easier to style links makes total sense.


Those "weird" characters has been used for ASCII and ANSI based text-mode user interfaces since the dawn of UI as a concept.


Being pedantic, those box drawing characters are neither ASCII nor ANSI. They are part of the original IBM PC character sets. Windows refers to them as the OEM character sets.


Unicode’s box drawing seems to go way beyond ANSI, though, and PETscii was only added very recently (IIRC).


> My heuristic is something I'll call SMS sniff test: would it make sense to type that symbol into a text message? If it wouldn't, then it doesn't belong in Unicode.

I accept that heuristic, and find myself agreeing with it for the purposes of the external link character. Let me try to convince you:

This is a real conversation:

A> Are you familiar with Homebrew? You can get it from brew.sh.

B> Ok. Where is brew.sh?

If we were not limited by plain text, we could use colour and underlining to distinguish that as a domain name. A lot of websites find this poor for accessibility reasons so they use the link character.

Another way to think about this is to imagine the link-character is pronounced "https://" as in:

* Are you familiar with Homebrew? You can get it from https://brew.sh.

* Are you familiar with Homebrew? You can get it from {link character} brew.sh.

Isn't that plain text? I'm not sure how to pronounce it, but there's a lot of emoji I don't know how to pronounce. Do you think this is important?

So if you get that far, how much further is this really?

* Are you familiar with Homebrew? You can get it from brew.sh {link character}.


Historically <> were used; what happened to that convention? See <http://example.com/> or email me at <root@localhost> for details.


I think it's been conflated in online messaging in the last decade or so as people were imitating markup tags to emote. Eg:

<sarcasm>Oh yeah, that's right</sarcasm>


There is no squiggle that I can not hypothesize some way of using in a text message. The question is more, is there already some existing demand for it, before you asked the question? To which the answer would seem to be "no".

That's a contingent answer. If you, say, set out to try to convince the world to use it that way, and created that demand, then by golly, that demand would exist at that point and the answer could change. But hypothesizing some possible text message that could use it isn't strong enough to argue for inclusion in the standard because every possible proposal passes that test.


That sounds like an issue that could be solved by updating the renderer's gtld list with the various new ones that have been coming out instead of adding invisible meta characters.

Phones already do it for .com, .gov, etc, but our cutesy .sh, .dev, .rocks, .xyz, etc will take a bit to catch up.


> That sounds like an issue that could be solved by updating the renderer's gtld list with the various new ones that have been coming out instead of adding invisible meta characters.

Negative. My renderer is a piece of paper.


If your renderer is a piece of paper, how is Unicode relevant at all? You’re free to freehand anything you’d like.


... Why would an invisible meta-character for links be necessary if the output is paper? You can already specify a link by prepending 'http://', etc.


Sorry. I thought that would make it clear.

The request is for a code point to represent the visible external link character, not for an invisible control-code to decorate some structured data (which cannot "appear" on paper).


Writing out the "https://" still seems preferable, as long as http continues to exist and bare URLs try it by default.

(The hsts preload list helps a bit with this, but only for domains registered with it.)


website, or site, are the words you are looking for.

Which also lack a decent Unicode symbol despite being a word used in plain text. ️


The use of symbols in text messages is one reason to include them in Unicode, but not the only one. Another is accessibility, particularly for blind people. The more commonly used symbols are in Unicode, the less we have to remind website authors to include alt text, or that the symbols in their custom icon font pose an accessibility problem. Also, co-opting an existing symbol, such as the degrees symbol (which I've observed some websites do for external links), is confusing when using a screen reader.


Wouldn't a screen-reader recognize a link even without the symbol? I can see the problem of the screen-reader telling the user that there's some garbage image at the end of the link, but the image really is garbage in that case, since I imagine the screen-reader would make it obvious that there's a link there.


Absolutely. But if a web author quite reasonably adds a symbol for the benefit of the sighted majority, using an unlabeled image, an icon font, or coopting an existing symbol (e.g. degrees), that leads to extra clutter or confusion for blind users. If the symbol is in Unicode, we can have a solution that works well for everyone.


This symbol is for marking external links, distinct from internal links.


Right. And I've come across websites where the alt text for their external-link image is something fairly verbose like "external link opens in new window". Now, imagine if instead, we had a standard symbol, and screen readers could map it to short sound effect that users would come to recognize.


My heuristic is something I'll call SMS sniff test...

That's certainly a useful sniff test, and I think you've made a good case with it. I wonder, though, whether one should also consider a "documentation sniff test", asking: would it make sense to type that symbol into some documentation one is writing about how to use, for example, some software? I think the answer to that question is a clear yes: it might make sense to put this symbol onto such a page, and although being on inert paper (it wouldn't also be an active element) it would certainly be a useful symbol in that hypothetical text.


I find myself disagreeing with your agreement because of the examples of symbols that are clearly not text.

One of them, BROKEN CIRCLE WITH NORTHWEST ARROW, could actually be used for the external link symbol if it were mirrored!


In an RTL language, you don't even need to mirror it.

Or, to put it another way: if you think it needs to point to the right, is that because the text you read flows that way?

This then makes me wonder if the large portion of the world's population that read right-to-left find the currently widely-used external link symbol (as discussed in the article) a bit jarring.


> makes me wonder if the large portion of the world's population that read right-to-left find the currently widely-used external link symbol (as discussed in the article) a bit jarring

The symbol appears mirrored in RTL languages on Wikipedia, e.g. this random Arabic article: https://ar.wikipedia.org/wiki/%D8%A7%D9%84%D9%83%D8%B3%D9%86...


Or does it point to the right because a new tab will open, to the right of the current one?

Come to think of it, are tabs aligned toward the right side when the browser/OS is set to an RTL language?


I'm not a regular user of SMS, but it seems to me like adding links to SMS messages is either already available or not a bad idea. The "SMS sniff test" seems pretty weak to me. A better one would be a printed document.


You can put a URL into an SMS message, but you can't put an anchor tag (link) which is comprised of [a hidden URL and visible link text]. The latter is the only one of these that benefits from the symbol in question, because the external nature of the hidden URL is disguised. The former doesn't benefit because the external nature of the URL is obvious.

Modern SMS readers will linkify the URL, but will set the visible text equal to the URL, so again it wouldn't benefit from the symbol.

This explanation mirrors the rejection rationale: you don't need a symbol to be a Unicode character if the document is already rich text; the symbol can be an image instead. If you have anchor tags, you probably also have image tags or CSS.


> This explanation mirrors the rejection rationale: you don't need a symbol to be a Unicode character if the document is already rich text;

No, that's not the rejection rationale. The rejection rationale isn't about having access to image tags or CSS. It's about hypertext (click link, go to another text) vs having text (can't click link, no other text). If that was the rationale, they wouldn't allow emojis in unicode, because you could insert them as images.


Sorry, let me rephrase in the way that I did in another comment: it seems that to be a codepoint, it needs to be useful in plain text scenarios.

Emoji are useful (arguably I guess) in plain text scenarios because they exist to convey additional information about an author's emotion, and that author might use plain text. The external link symbol is not useful in plain text scenarios because it exists to convey additional information about an author's preceeding hypertext, and that author isn't using hypertext.


I thought about SMS messages because they're plaintext. Printed documents are raster graphics; once the ink hits the paper, you don't have text anymore.


Unicode has a lot of old "markup" symbols in typography, like the right pointy finger ([1], example in [2]). I see the external link symbol as a widely-used markup symbol of the current age, and it could reasonably be in Unicode.

[1] https://emojipedia.org/white-right-pointing-index/

[2] https://i.ebayimg.com/images/g/t60AAOSwAPVZDOK3/s-l300.jpg


It serves the same function as footnote daggers. The Unicode consortium has odd priorities when they will happily add hundreds of emojis but more generally useful things are held in disregard.


On the other hand, I used to use both "play" and "link" symbols in past resumes, where I could have really used a proper link symbol. Ideally I wanted something universally known, but the box with an arrow that Wikipedia used wasn't as well known then, and was always a small image and not available as a character. I opted for the interlinked chain links symbol instead.

I would propose updating this SMS sniff test to a Wikipedia sniff test.


FWIW, interlinked chain links symbol is in the Unicode. And it's been the widely used symbol for hyperlink. The box-with-arrow one is a symbol for external link, which is a concept that matters only in few contexts, and which was usually displayed with a globe icon (also in Unicode).

I explicitly don't want it to be "Wikipedia sniff test", because that's equivalent to a "Webpage sniff test", which is equivalent to "anything goes", because icon fonts exist now and are used.

That said, other comments made me realize that Unicode is better described as "whatever was there in all the codepages around the world at the time of bootstrapping the Unicode standard" plus what's typically used in a written language, plus SMS sniff test.


Plausible sms:

„Plz send me the <LINK SYMBOL>.

<RAISED HAND WITH PART BETWEEN MIDDLE AND RING FINGERS><EMOJI MODIFIER FITZPATRICK TYPE-4>“


<LINK SYMBOL> already exists as U+1F517.

https://www.fileformat.info/info/unicode/char/1f517/index.ht...

Can't paste it on HN, as the Unicode/Emoji filter seems to be removing it.


I think you’re missing the ZWJ


But a dagger serves a similar purpose.


I don’t think it does. On HN you can’t have links with custom text so an external link symbol would be pointless but seeing a dagger or [1] is reasonable. Similarly I don’t think seeing it in an SMS is unimaginable so by the standard of the parent comment I don’t think a dagger is the same as an external link symbol.


I agree with you but including it would make it easier to express in text-only environments


Plain text environments never encounter the problem that this symbol solves, which is to reveal that an anchor tag (which is not plain text, and typically has visible text that isn't the URL) leads to some other authority.

But for a text mode interface that does have full blown anchor tags -- not very mainstream -- you have a point.


Reddit is mainstream. Other forums with linking might find the symbol handy.. Not sure why unicode has got to be about SMS..


In Reddit, you [link via Markdown](https://en.wikipedia.org/wiki/Markdown). The site renders this its own way, so it can add a "external link" character if needed - and that character is a part of the user interface; a side channel, not a main signal. E.g. copying from a Reddit post, you wouldn't want to find that character in the pasted text.


There's text-based webbrowsers like Lynx too


Here is a different link symbol: on Apple: 􀉣 􀒠 􀒡 􀉤


> It is unclear that the entity in question is actually an element of plain text, given the inevitable connection to its function in linking to other documents, and thus its coexistence with markup for links. Furthermore, the existing widespread practice of representing this sign on web pages using images (often specified via CSS styles) would be unlikely to benefit from attempting to encode a character for this image.

I don't really have a horse in this race, but... isn't the existing widespread practice of representing the sign using images attributable to the fact there is not a character available? What else do they want people to do? The point of adding the character is so that in the future people can use text instead of an image.


I think that the point is, unlike emojis which have been integrated in end products by some vendors, forcing Unicode comity to integrate them due to the Unicode consortium goal of backward compatibility, this icon is enforced in any vendor charset out there.


I think an issue with the arguments here is that each side is making its arguments based around quite different standards. Unicode have two standards for inclusion of symbols, one for adding new symbols and one for merging in characters from other character sets.

The rejection is based on the first standard for adding new characters.

The arguments against it seem to be based on taking the second standard as precedent. But I think as far as the Unicode committees are concerned, arguments based on the precedent of whatever random crap was grandfathered in from preexisting character sets do not apply to characters that are not from preexisting character sets. I think any argument that relies on “you already allow all this other random crap” must also argue that this symbol exists in some other character set which ought to be merged into Unicode, or that the standard for new characters should be more precedent-based/different, but I don’t see any such arguments other than some implied “common sense guess as to how one expects Unicode to work”


So the smartest way to go about this is to create a whole new character set for external link symbols? That seems like a really roundabout way to get this thing accepted.


The word you're looking for is "script". And, no.

One has to make a better argument for the external link symbol getting a codepoint assignment. TFA, for example, makes an argument based on emojis -- certainly that's strong enough to blunt the UTC's rejection rationale, but perhaps not enough to win approval outright.

Addressing all the arguments used in the rejection is important, of course. The fact that currently images are used is hardly dispositive: that's business as usual for missing Unicode assignments!!

But there are probably stronger arguments for rejection than adoption than the UTC made that it could make the next time this comes up.

The best argument for rejection that I can think of has to do with layering. An external link character isn't very useful without the actual external link, but the link belongs a layer up: not in the text, but in the markup. Well, if the link belongs a layer up, why not also the symbol? Alternatively, more markup can move into Unicode. There has been and will continue to be some pressure to move more semantics from markup to text, but it's probably best to resist that pressure.

On the other hand, a solid argument for adoption may involve text rendering of HTML. Think of lynx/elinks and other such browsers, which can't use images. An external link character could prove useful in distinguishing the rendering of linked text from, say, underlined non-linked text.

I'm surprised these arguments didn't come up. Or maybe they did -- I've not gone down the rabbit hole on this one, and probably I won't.


Create a custom font with your custom character in the private use area. Use the popularity of your font to demonstrate pressing need.


I think that most people find this hard to digest. The committee approves a ton of emojis with a dozen variations each, a gazillion characters that make it possible to make some weird word soup that breaks sane layout, but including one of the most commonly used symbols is somehow out of the question.


Because all of those symbols are used as part of text, while the external link symbol isn't. It is not part of the text itself. If you copied that text, you would not really expect the link symbol to be copied along with it.


>If you copied that text, you would not really expect the link symbol to be copied along with it.

Because it's not in Unicode it can't be copied. If it was in unicode symbol I would expect it.


But it's not part of the text, it is a decoration. Even if it was in Unicode, I would not expect it to be part of the copied text.


Isn't this something like a stylistic editorial decision? Basically it could be either way depending on what the author/publisher/editor wants.

On a lot of pages links are hidden as plain text and only show up if someone hovers over them. (Great? Confusing? Bad UX? Sure, but still a choice.)

At the same time someone else might just use underlining, but no different color. And someone might just want to use a symbol.


The reason for rejection, for better or worse, is that this is a functional element, like a button, rather than part of running text.


Like the play/pause symbols then, which are already in there?

And since when was a smiling poop an ‘element of plain text’


> And since when was a smiling poop an ‘element of plain text’

Since the Japanese phone vendors shoved it into the character encoding and forced everyone to deal with it if they wanted to be compatible.


not that I agree with the final decision, but I can imagine "Play"/"Pause" to show up in a device manual and Smiling Poop as part of a chat message.

Neither applies to the "external link" symbol.


I can imagine it showing up in an instructional book or article about how to indicate to your web site visitors that a hyperlink is to an external site, or perhaps in a comment about the Unicode standard body rejecting it. Neither of those are any more contrived than play/pause buttons being in an instructional manual about a television remote that contains those symbols.


Why couldn’t the external link icon appear in technical documentation as it so often does?


Can you imagine it in paper documents or anywhere that is not hypertext?


Yes, I've seen it used as a way to steer the reader to open their browser for further information with URL written in plain text. Like hyper footnote


Yes, I would expect to see it in books about web design, for instance.


What's paper?


That didn't stop the Unicode Consortium from including a left-pointing magnifying glass in the standard. I use this on my blog for the search button.


I don't think that's inconsistent.

I've seen a right-pointing magnifying glass printed in a book, and a left-pointing one would be equivalent for Arabic or Hebrew. I'd expect to see it in a school science book, for example.


And as others have already mentioned, I've seen the external link printed in several manuals for various products I've bought.


Search Google Books for “external link icon” and you will find several web design books that describe and depict the icon.


And that's the real result of this, people will just settle on a collective commandeering of the closest symbol, making the decision irrelevant and retrospectively worse than pointless.

Annoy everyone enough, and we can redefine the nunicode codepoints as rationals and hand the denominator to another group.


U+1F517 https://emojipedia.org/link/ might do this job

Anyway I think we can avoid target="_blank" most of the time, you can have a https://developer.mozilla.org/en-US/docs/Web/API/WindowEvent... event listener if something needs to be saved before the page location changes


I generally see that one used as "permalink/create share link to the current page".


The closest I can think of is U+2b00 (north east white arrow) followed by U+20de (combining enclosing square).

I suspect HN will eat the character: ⬀⃞ although they sometimes pass through if there's enough other text in the comment.


Hacker News lets some emoji and Unicode through, but I’m not sure how it’s chosen. Here’s some I copied from Wikipedia’s page on emoji: ℹ ⌛🀄🈚


Weird, I've never seen any before this thread, I thought they were all stripped. Bug or feature, I wonder?


I’ll ask tomorrow if I can remember and get back to you.


could be because emoji are in the 'Miscellaneous Symbols and Pictographs' Unicode block, but the hourglass symbol is in 'Miscellaneous Technical', so it doesn't count as an emoji according to HN's filter


On my blog I use " U+27AB BACK-TILTED SHADOWED WHITE RIGHTWARDS ARROW"

E.g. https://www.eiman.tv/blog/posts/lannames/index.html


Just FYI, this looks like a book on my mobile phone.


Alternatively, for a single-character symbol, there's either ⤤ (2924 north east arrow with hook) or ⮳ (2bb3 ribbon arrow up right). The combining square isn't exactly centered, depending on where I paste it.

There's a useful selection of arrows here: http://xahlee.info/comp/unicode_arrows.html


The closest alternative I’ve seen is definitely north east arrow with hook. Personally, I like its simpler appearance. Ideally you would still want to label it on hover with some alt or title attribute text.


I would expect the north east arrow with hook to be used for "go back up", e.g. after having followed a footnote. The hook implies an element of "returning" for me.


I’ve seen it used there too. That’s probably a better use for it. Personally I’ve never had to call out external links before, and if I did, I think including or referencing the domain the link is on is more useful than a generic symbol that might also mean “opens in new window or tab” sometimes.


> Furthermore, the existing widespread practice of representing this sign on web pages using images (often specified via CSS styles) would be unlikely to benefit from attempting to encode a character for this image.

Obviously enough text-mode browsers would be likely to benefit.


This made my head spin. It's like saying people already using the JIS text encoding would be unlikely to benefit from adding Japanese to Unicode. Absolutely mind blowing.


No, it is saying that Unicode is used to encode what people think of as plain text, and that UI symbols that are not part of the text content are outside the scope of it.


I agree that text mode browsers would benefit, as they can hide the external nature of a URL behind non-URL link text, and not allow an image next to it.

But it seems that to be a codepoint, it needs to be useful in plain text scenarios. Text mode browsers which render links as discussed are not plain text.


> It is unclear that the entity in question is actually an element of plain text...

As the author pointed out, emojis seem to clearly violate this excuse. What possible justification do they give for emojis? There is no way [I originally inserted the poop emoji here, but it was stripped out. That kind of reinforces my point.] is an element of plain text.


The pile of poop would not ever get accepted as a character in unicode on its own. It's there because of including an existing character encoding set (IIRC from Japanese featurephone messaging standards), which had a pile of poop character and other emoji, so for purposes of compatibility all the characters of that set must get Unicode mappings.

So there's a situation of dual standards - all the weird characters that were included in any pre-Unicode text encodings for whatever arbitrary reasons are in Unicode and are always going to be there; but all the new weird characters need appropriate justification for inclusion and are likely to be denied.


Why wouldn't it be part of plain text? People type it as part of their text messages constantly. They certainly use it as a part of plain text.

Nobody would type an external link symbol. It would be added by the UI presentation layer. It is not part of plain text, because it is not typed as part of textual content. But the poop emoji is.


Emojis are used in the same context as other text to communicate in the same fashion; they were being independently recreated in many technologies and there was direct value in standardising it across systems.


> The UTC rejected the proposals to add “external link sign”, most recently in L2/12-169. It is unclear that the entity in question is actually an element of plain text

Nor is Pile of Poo, and that's in Unicode.

If external Link was added to Unicode, I expect it would be more used than 1000s of characters that are in it.


> Nor is Pile of Poo

I really really don't understand this point. We have evidence that people use pile of poo in plaintext today in instant messaging all the time. Isn't this enough evidence that pile of poo belongs to plaintext? I'm not trying to be facetious; it's really puzzling to me. Since every single comment in this thread is about pile of poo, whereas it seems to me the worst possible example since it's such a widely used emoji.


It's only possible to use it in a text message because it's a unicode character. If external link was a character, would people use it? I don't know. Do you?


Maybe, maybe not, we don't know. What I know is that even my mom and grandma use pile of poo on facebook. Will they use "external link" symbol? I would guess not, but maybe.


> Will they use "external link" symbol? I would guess not.

In that way, it would be no different from 99% of unicode characters then.


Exactly, my point is pile of poo is such a bad example since recently it's probably in 1%-th percentile of most used unicode symbols in plaintext. It's similar to arguing something like ∆ or é does not belong to unicode.


I personally would use it. I'm sure others would too.


Why is it not? People use it in plain text millions of times a day.


Reddit would probably do that in one day.


I'd like the condition on the plain text to be relaxed. I would love to be able to use Unicode for creating basic interfaces. I would love to have basic interface elements such as a magnifying glass, "save" icons and external link to be part of the standard. Maybe they don't strictly find their use but for one people would find the use if they were there, and two, there are already (granted, grandfathered) elements that were used exactly for this purpose back in the era of plain text window interfaces.


There's the PUA. And I mean, it's not like it doesn't happen. See font awesome for instance.


I use the awesome versions of fonts in terminal for power line and stuff. But that's kind of the thing, it can't be ubiquitous and easily reused. If every system had it's interface font it would make a lot of stuff easier. An example would be glyphs on buttons that are almost always the same.


This would make copying links harder as you would need to avoid copying the link symbol, unless the link symbol was part of the URL.


A typical implementation would use a generated pseudo-element that doesn't get copied.


So, not a unicode character, then - a presentation element that is rendered on unicode.


No, e.g.:

  a::after {
    content: "↗";
  }


For this stuff we have fontawesome and ttf generators from svg’s. Just grab svg icons on thenounproject and make your way own font!

I happen to disagree with the unicode decision because there should be a section for basic commands so they can be rendered in diff fonts. But whatever — as I say don’t wait for standards if they don’t exist, make your own!


How many new characters are proposed each year? What’s the rejection rate? Maybe it’s a healthy process to reject when uncertain and then reconsider after popular appeals, especially if they are inundated with (mostly) questionable applications. It’s premature to judge the process without more context



After getting an emoji accepted I submitted a proposal for the external link symbol in 2018, trying to address the committee's concerns from the several earlier proposals. https://doubly.so/pub/External-Link-2018.pdf

It was summarily rejected with the nonsense statement (in full):

> Thank you for your submission. This was discussed during last week's UTC meeting. I was directed to let you know UTC feels that, as submitted, the proposal does not sufficiently demonstrate a plain text need for such a symbol. The context for usage is mark-up with links by default.


At this point I think a submission of 8 generic "arrow exiting square from top, top-right, etc." might have a better chance of being accepted as it holds a general meaning of exiting something and is not specific to hypertext as seems to be their objection.


> Its main rationale appears to be that the external link icon is not an element of plain text. I would agree that is the case. However I would like to point out that emoji and other similar useful symbols are not plain text either yet they have been accepted and continue to be accepted.

The author seems to be using a very limited definition of "plain text". Emoji are clearly used in the same way as latin-character text in communication, so they are effectively plain text.

I think a more effective argument may be that the external-link symbol should be allowed for the same reasons that the power on/off/toggle and eject symbols were allowed.


One of most peculiar tourist attractions in the Wieliczka salt mine near Krakow are signs with a shaft symbol that is not in Unicode. It looks like a # with a · in the middle and appears in texts instead of the word "shaft".


It is simply factuly incorrect to say that a commonly used sysmbol would not benifit from being included in fonts. How can a committee working in this field posibly fail to understand that?


It is not incorrect, and they do understand but they don't care. They have esoteric philosophical justifications that exclude some commonly used symbols while inventing never before seen ones. This is the problem with putting a tiny number of otherwise powerless people in charge things.

They clap become clap petty clap tyrants.


What is the significance of clap?


I'm mocking the overuse of the clapping hands emoji. Here is the first result from Bing for the search "clap emoji overuse":

https://www.reddit.com/r/justlegbeardthings/comments/6rl6mu/...

Which is something no one asked for, but Unicode gave us anyway and now everyone is annoyed by.


Maybe a few people on twitter? I've never seen it elsewhere and I barely use twitter so saying "everyone" is an exaggeration. Also, if everyone were annoyed by it, then it wouldn't be overused any more almost by definition.


Similar to case law in America, the committee might be avoiding potential abuse in future requests that point back to the rules being seemingly modified for one specific symbol.


If the rules are allowing U+1F4A9 and disallowing an actual symbol commonly used in text, they should change the rules.


Well U+1F4A9 exists because Japanese mobile operators modified their char sets to include emoji, and emoji has been part of Japanese culture for a long time.

Unicode aims to encode every character used for human communication in every culture, and Japan uses U+1F4A9.


Unicode does not determine what can or can not be included in fonts. Fonts can include symbols not defined by Unicode if they want.


This kind of reminded me of this I was reading this morning http://seancubitt.blogspot.com/2020/04/allonomy-autonomy-and... "a poem cannot 'contain in itself the reasons why it is so and not otherwise' (Coleridge) since it must be written on top of the infrastructure of a language and orthography that the poet rarely originates"


What's the code point for uppercase superscript Z?

j/k there isnt one.

https://en.wikipedia.org/wiki/Unicode_subscripts_and_supersc...

No doubt those emoji's are more important and historically common than superscript uppercase C F Q S X Y and Z.

https://github.com/jakeogh/unicodehaz


I’m actually fairly annoyed that a lot of obvious superscripts, subscripts, and strikethroughs are missing.


I'm actually more annoyed that they are there in the first place. Unicode can't seem to decide whether it's about just encoding text or also doing presentation/formatting (and now piles of poo and colorful emoji, what next, animations?), so now it's a bit of both, what a mess. It's making life hard and breaking things for applications that assume text is just text, and the presentational features are not enough to avoid having to implement your own presentational features in a program that needs to do presentation..

now you have applications where you can't find a string because it was written in Unicode bold letters instead of the letters' normal ASCII counterparts. And then you have applications that are confused about those bold letters because they are not actually surrounded by bold markup.

The worst part is that you can't criticize unicode without attracting a crowd of bullies who handwave about human languages being complicated (no, that does not justify poor design) or say it has to be this way because ugh shift-jis (or whatever nasty old encoding you can come up with) is not nice.

Well designed technology makes complex things simple. People defending poorly designed technology blame the problem for being complex.

𝔽𝕦𝕔𝕜 𝕦𝕟𝕚𝕔𝕠𝕕𝕖!

(Why don't you try search for "fuck" in Firefox?)


You're so right... Unicode is making things too complex, and intruding on various formatting issues. Case in point: The skin tone modifiers. Or this stuff: https://en.wikipedia.org/wiki/Zero-width_joiner

There doesn't exist any parser that does it 100% correct. And parsing it is becoming so complex that it's causing bugs and vulnerabilities (it's not a coincidence that so many remote exploits use some kind of unicode to trigger it).


I think from the consortium's point of view it boils down to whether it makes sense to consider the link symbol separately from the text that makes up the link. They apparently decided it did not, but you could disagree. Of course, it does not really matter if you disagree, because they won't change their mind (unless you repropose it in a way that makes a different argument from the original proposal, which their rejection is basically saying could be considered).


Agree with the rejection.

The same link in different contexts can be as either external or local. So that is a business of UA/renderer to mark it properly.

CSS is quite adequate for that (https://davidwalsh.name/external-links-css) and the image (or whatever author/UA decided to use) can go inline in CSS itself.


Talk about timing, I was just looking for this in unicode for my website yesterday. Didn't find anything that looked good so ended up going with Font Awesome: https://fontawesome.com/icons/external-link-alt?style=solid

I think it's a missed opportunity for unicode.


try `\u2197`. You can see it used on external links here: https://chadlavi.github.io/clear/#/link#examples


Emojis are definitely plain text, as they have meaning when I write them on a piece of paper


The external link symbol also has meaning when written on papaer, which is "external link".


I want a tool that scans English &#128483; text for names of unicode symbols. Then I want the English language to move to that symbolic writing. You know, because it is fun to see cultures evolve.


click h͟e͟r͟e to learn more about the decision


Glad they say "no" sometimes


Didn't stop them from making Unicode an overcomplicated clusterfuck of combination codes though... Referring to things like this: https://en.wikipedia.org/wiki/Zero-width_joiner . It's become almost impossible to build an accurate parser now.


"unicode was intended only for print media" doesn't bleed with irony already?


    a::after {
      content: url(my_external_link_symbol.gif);
    }


I usually just use `\u2197` or similar.


This article sums up my encounter of the Unicode committee pretty well. They have stopped long ago making sane decisions.


Why do you need to know that a link is an external link?


I see it often in intranets, wikis, and PDF documentation. It's an additional context clue that you're leaving a closed website. The most egregious examples give you a separate click-through screen when leaving the website. Government web pages seem to do this the most.

It would be nice if we could all use a standard icon or some other constant UI element--like a single underline for internal references and double underline for external? I imagine unique colors would be too difficult to standardize.


Yeah but why?

What do people do with the information that a link is external? Do people think 'I'll follow this link - oh no wait a minute it's external I won't'?


Specifically in those circumstances it's more important than normal; intranet, government, and PDF documents.

Intranet: bespoke documentation to internal processes verses generic documents used for reference.

Government: external references can be hijacked (asking for personal information) or may not represent the government but still have relevant info.

PDF documents: jumping around a PDF document (from a table of contents) is different than going to an external website. Especially if you don't have Internet access at that moment.

I think all of this is significantly more important since browser have been hiding more URLs.


Sure. Why not? If you're browsing a hotel you plan to stay at, you may want to spend some time learning about it before booking. Accidentally clicking an external link takes you away from that.

Techies will know to just open it in a new tab, but most will have to remember to browse back to the site manually. It can be a disruptive process.


> Do people think 'I'll follow this link - oh no wait a minute it's external I won't'?

Some do, sometimes. Or perhaps they'll think 'I won't follow this link - oh no wait a minute, it's external I will'.


...and U+1F4A9 is!


God, they spend so much time on emoji which is much the same. They are becoming a bit of a joke!


You don't expect to be able to click emoji, and they don't lose their meaning when printed out on paper.


If you print a web page the links are still there. The icon tells you "this here was a link" and you can go to the computer and look it up. If the link wasn't underlined and had a different colour but you printed in b&w it might be the only way to know that was a link, so I would argue that the icon is more useful in paper than on a live web, actually.


It's not for all links. It's for external links, meaning a different domain than what's currently in your address bar.


References can also be used in paper?

I wonder why there are actually so many people ITT who argue this new symbol would have no use and no meaning in print...


Even assuming that symbols you click on can't be in unicode, you don't click on the external link symbol, you click on the text of the link, and that text will be followed by the symbol. Even if you include the symbol in the link text, it's still part of the text both logically and in practice.


But this particular symbol just lost his meaning when printed. You don't expect to click on a paper, right?


Couldn't the URL be the print representation of the link symbol? Text can be represented differently in different media.


You can do this in CSS, but it's semantically different than an external link symbol.

  @media print {
    a::after{
      content: " (" attr(href) ") ";
    }
  }


Emoji are not at all the same. People type emoji as part of their text. People do not type "external link symbol" as part of their text.


What if external link is the symbol I want to type? Would exit sign be appropriate, because I’m fairly certain there’s an emoji for that.


As per https://www.unicode.org/pending/symbol-guidelines.html

> The 'symbol fallacy’

> The 'symbol fallacy’ is to confuse the fact that "symbols have semantic content" with "in text, it is customary to use the symbol directly for communication". These are two different concepts. An example is traffic signs and the communication of traffic engineers about traffic signs. In their (hand-)written communication the engineers are much more likely to use the words "stop sign" when referring to a stop sign, than to draw the image. Mathematicians are more likely to draw an integral sign and its limits and integrands than to write an equation in words.

So where "stop sign" is in Unicode, it's a bit nuanced as per the manner.




The deadline for YC's W25 batch is 8pm PT tonight. Go for it!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: