Not really related but do we really need 4 different unicode symbols for the same glyph? It's not even like it represents different things - whats the difference between a tricolon and triple colon or the difference between a vertical ellipsis and a presentation form of vertical horizontal ellipsis (which is really a vertical ellipsis described with twice as many words?)
Unicode encodes symbols rather than glyphs. That is why there are different code points for say latin o and greek omicron even though they look the same. It also distinguishes between punctuation symbols and mathematical operators, like between dash and minus, even though they look similar.
As for the "presentation forms" they are not semantically different and are mostly a compatibility thing:
A: Presentation forms are ligatures or glyph variants that are normally not encoded but are forms that show up during presentation of text, normally selected automatically by the layout software. A typical example are the positional forms for Arabic letters. These don't need to be encoded, because the layout software determines the correct form from context.
For historical reasons, a substantial number of presentation forms were encoded in Unicode as compatibility characters, because legacy software or data included them.
And yet, they do not have a mathematical italic "h" letter, using the reasoning that the Planck constant (ℎ) already exists. But this character looks out of place compared to the rest of the italic letters in many fonts.
Oh thank you for pointing that out! There is also a lack of indices and exponents, among the most importants (can't remember but letters like i, x or n).
You need a font that supports those and superscript-x is a modifier, rather than a subscript, but:
aⁱ U+0271 Superscript Latin Small Letter I
aˣ U+02e3 Modifier Letter Small X
aⁿ U+207f Superscript Latin Small Letter N
aᵢ U+1d62 Latin Subscript Latin Small Letter I
aₓ U+2093 Latin Subscript Latin Small Letter X
aₙ U+2099 Latin Subscript Latin Small Letter N
I use DejaVu Sans Mono that is free and has some coverage, though it's missing the most inopportune ones.
But you're right that there are many missing, either from the sub- and supescript ranges or from modifiers (or both).
Edit: there's a page on wikipedia with a very complete list of unicode sub- and super-scripts _and other characters that can be used as such_:
I don't think it's that simple. ASCII and ISO 8859 didn't have luxurious code spaces, so they had less semantics than Unicode. Unicode, as it's name implies, tries to unify these (and other) codesets. When unifying codesets you sometimes have to decide whether to unify two distinct codepoints for similar glyphs or not, and if not, then semantics intrude via the two codepoints' names (and, therefore, purpose). You might recall that Unicode 2.x tried to unify CJK... -- that was back when Unicode did not have a sufficiently-luxurious code space, and that unification, politically, went badly.
In many ways it might have been better to have exactly one codepoint per distinct glyph, but in practice, the ability to have simple algorithms to map from existing codesets to Unicode, politics, and the ability to distinguish script (in the cases of otherwise-equal codepoints) turned out to be more important.
You could ask the same thing about hypen, dash and minus sign, but they are different in their meaning and as someone who layouts texts I might want to style them differently from each other
I bet the average person on the street would tell you that these are the same and they never thought about hyphen and minus being different. For them all three look like a dash.
Since I had no clue, and was curious, I tried to find out.
"Tricolon" shares a name with a far more common use as a rhetorical technique, making it hard to find details. I tracked down the Unicode proposal for 'Epidaurean acrophonic symbol three' at http://www.unicode.org/L2/L2003/03075r2-tlg-numeric.pdf .
> This proposal contains 52 Greek Acrophonic (numerical, non-alphabetic) characters. Acrophonic numerals are found primarily in ancient Greek inscriptions in Athens and other city states. ... The proposal includes the characters needed for the encoding of the Attic acrophonic system—namely characters used in Athens and the surrounding area (Attica)—and non-Attic characters which cannot be considered as glyph variants of Attic.
It has the comment "Note: Accepted by UTC (2003-11-7) as part of extended Greek punctuation. Design to match 00B7. Example: IG IV.316". That's Ancient Greek.
I tried, but failed, to find the example from Inscriptiones Graecae IV.316.
I couldn't figure out the reason for why the triple colon operator was added, other than the comment in the Unicode database that it's used in "logic". There's a 2002 mention of it at https://unicode.org/L2/L2002/02174-tr25-5.pdf .
The article shows the use of a symbol which is visually similar to a tricolon.
The text says "We find four characters [in Unicode] that look like this: ... “Tricolon” appears in 19th century sources as a name for one type of verse structure found in the bible, so that isn't so helpful. ... So really Unicode was no help to me."
This suggests that the use of "tricolon" in Unicode is not the same as on the keyboard.
The word "tricolon" is otherwise not mentioned, and I don't know what comments you refer to.
Any idea where is a triple colon operator used actually? I can find Vertical Ellipsis in matrices to denote continuation but nothing about using the glyph as an operator
That last one might be "..." for vertical monospace text, like traditional Japanese. Not sure though as when I highlight it, it doesn't look wide enough.
>Our mystery symbol was clearly intended to be typed, however the person transcribing incoming telegrams could just make a new paragraph on receipt of that code, rather than typing a special character. There appears to be no reason to ever put the symbol on paper.
Err, typing the symbol to separate paragraphs instead of adding a new paragraph would save space (and also paper) though...
Didn't telegraphs use paper tape in the beginning? If so, then the only way to signal line/paragraph breaks would be symbolically. Of course, a typewriter wouldn't have needed that, but as TFA shows, it came in handy in some cases anyways.
Didn't telegraphs use paper tape in the beginning?
Not just in the beginning. Almost until the end. The paper tape with the actual messages was cut into strips and pasted onto a telegram form before being put into an envelope and delivered.
No, that's not how it worked. The messages weren't arranged in paragraphs. They were snipped to fit inside the pre-printed box on the telegraph form.
And even very rich people didn't send multi-paragraph telegrams. You paid by the letter. Most telegrams were two sentences or less. Sometimes two words or fewer.
I always heard it was Victor Hugo and Les Miserables–indeed it's mentioned ("apocryphal tale") on Hugo's wikipedia page...with a link to this page saying it most probably never happened. Also I came across it as an Oscar Wilde story online just now.
It could also imply we don't really know most of the mundane details and intricacies of past cultures. Tricks and techniques lost to time forever. Like the difference between the many small things you experience when you are traveling somewhere compared to what's left in the pictures.
Not so much "lost to time forever" as "locked away in physical media that makes them hard to access."
We talk a lot about the seemingly infinite storage capabilities we have today, but all it takes is one terrorist attack, or one major SV company to go out of business and suddenly there's a massive hole in our cultural memory that cannot be recovered because everything we have is stored magnetically.
There's also the "never written about because every one knew" common knowledge problem. It was a boring thing everyone knew, so why would they write about it?
I always wonder how long it would take humanity to relearn certain things if some cataclysm would happen. And I don’t even mean the latest inventions in semiconductor industry. But things like metal, glass and woodworking. Crafts that took ages to evolve.
True but that was not my point. I meant in documentation and details of everyday activity of certain people. But granted such small details might also be forgotten completely.
Just speculating here, but it could also be plausible that one of Sholes' investors was a biographer, who wanted that kind of symbol on the typewriter. But given the success of the typewriter, by the second version, Sholes abandoned it, because the demand for a slash symbol far outweighed the demand for the dotted line.
In my highschool typing classes, 40 years ago, I remember seeing this symbol used to indicate omitted paragraphs ... in the same way that the ... is used to indicate a pause between sentences, the vertical version indicates a pause between paragraphs.
.
.
.
Like this.
But .. I can't seem to find online examples. Its a terrifically difficult character to search for ..
Expanding on that (nice!) idea: could it be an "everything else" symbol? Since a lot of glyphs are missing from the typewriter, this could be some kind of placeholder for missing characters meant to be manually overwritten after typing?
It seems like the original blog post got it wrong there. See the comment by "godspace"[1] and the link they provided[2]. The | indicates a line break but the ︙ indicates a logo.
It's a really rather advanced use of typography (making borders) for such a simple machine that didn't even have parentheses. But yeah, it's plausible, and I like the authors theory that it was intended to be a vertical bar, but it was replaced with three dots to avoid confusing it with I and 1.
Clickbait title rescue: It’s the ⁝ - U+205D tricolon character, present on the drawing accompanying the original patent for the QWERTY-layout typewriter. Its actual use is obscure, and still unknown, although theories abound.
I don't think it's clickbait. Sure, you can tell me it's some weird character I've never heard of. But I already knew that from the title! Failing to use the name of something in a title, when basically nobody knows that name, is not a bad thing.
The title could have been “The lost ⁝⃣ Key of QWERTY”. Or even “The Lost ‘vertical dots’ Key of QWERTY”. To needlessly omit crucial information from the headline is clickbait.
From the title I thought this would be about the fact that the QWERTY keyboard has 104 keys vs 105 for the AZERTY keyboard (used in France and other countries in Europe). QWERTY has a really wide SHIFT key at the bottom left, whereas AZERTY has an extra key there.
If you're going to list facts, at least get them right. The version that Europe uses is also called QWERTY. The US uses the ANSI variant of the QWERTY layout while the "European" variant is called ISO. One of the differences between the two versions is indeed the left shift being longer or shorter to accommodate the extra key but it's not the only one.
AZERTY (a variation of the ISO QWERTY, with 105 keys, the big enter, etc) is used in France like you say but nowhere else. The rest of Europe use other layouts within the ISO QWERTY. Here you'll find Spanish, UK, Italian, Norwegian, and many other layouts that move and add the necessary symbols for their languages.
edit: corrected right to left, stupid me.
I didn't express myself correctly when talking about AZERTY. It is used in a few other places besides France. What I meant is to say that is not Europe's standard as the parent comment seemed to indicate.
I think you mean left shift. The Enter/Return key is also different, though looking at different layouts I have no idea why, as the extra key on that side seems to be present in different places regardless of the actual shape of those keys.
>The rest of Europe use other layouts within the ISO QWERTY.
This is also not exactly true.
e.g. in Poland ANSI QWERTY is typically used, with diacritics typed with right Alt. (there are other layouts with diacritics directly available, like Windows "Polish (214)", but nobody uses them)
Did I say no other layouts exist outside of AZERTY and QWERTY? I was pointing out that the assertion that the parent comment made about AZERTY = Europe and QWERTY = US was wrong.
QWERTZ largely maps to QWERTY with exception of special characters and Z/Y being swapped. QWERTZ can be bought in ISO or ANSI, though ANSI QWERTZ is hard to get from any sane manufacturer since everyone here likes DIN/ISO.
"
"