Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Lost Key of QWERTY (2016) (widespacer.blogspot.com)
190 points by bryanrasmussen on Nov 5, 2019 | hide | past | favorite | 84 comments


Not really related but do we really need 4 different unicode symbols for the same glyph? It's not even like it represents different things - whats the difference between a tricolon and triple colon or the difference between a vertical ellipsis and a presentation form of vertical horizontal ellipsis (which is really a vertical ellipsis described with twice as many words?)

"

    ⁝ - U+205D tricolon

    ⋮ - U+22EE vertical ellipsis 
   
    ⫶ - U+2AF6 triple colon operator  
  
    ︙- U+FE19 presentation form for vertical horizontal ellipsis    
"


Unicode encodes symbols rather than glyphs. That is why there are different code points for say latin o and greek omicron even though they look the same. It also distinguishes between punctuation symbols and mathematical operators, like between dash and minus, even though they look similar.

As for the "presentation forms" they are not semantically different and are mostly a compatibility thing:

A: Presentation forms are ligatures or glyph variants that are normally not encoded but are forms that show up during presentation of text, normally selected automatically by the layout software. A typical example are the positional forms for Arabic letters. These don't need to be encoded, because the layout software determines the correct form from context.

For historical reasons, a substantial number of presentation forms were encoded in Unicode as compatibility characters, because legacy software or data included them.

Unicode FAQ, https://unicode.org/faq/ligature_digraph.html#Pf1


And yet, they do not have a mathematical italic "h" letter, using the reasoning that the Planck constant (ℎ) already exists. But this character looks out of place compared to the rest of the italic letters in many fonts.


Oh thank you for pointing that out! There is also a lack of indices and exponents, among the most importants (can't remember but letters like i, x or n).


You need a font that supports those and superscript-x is a modifier, rather than a subscript, but:

  aⁱ U+0271 Superscript Latin Small Letter I
  aˣ U+02e3 Modifier Letter Small X
  aⁿ U+207f Superscript Latin Small Letter N
  aᵢ U+1d62 Latin Subscript Latin Small Letter I 
  aₓ U+2093 Latin Subscript Latin Small Letter X
  aₙ U+2099 Latin Subscript Latin Small Letter N
I use DejaVu Sans Mono that is free and has some coverage, though it's missing the most inopportune ones.

But you're right that there are many missing, either from the sub- and supescript ranges or from modifiers (or both).

Edit: there's a page on wikipedia with a very complete list of unicode sub- and super-scripts _and other characters that can be used as such_:

https://en.wikipedia.org/wiki/Unicode_subscripts_and_supersc...

But, again, you need the right fonts.


I'm pretty sure I hadn't found some of these last I tried...

Thank you SO MUCH!


I don't think it's that simple. ASCII and ISO 8859 didn't have luxurious code spaces, so they had less semantics than Unicode. Unicode, as it's name implies, tries to unify these (and other) codesets. When unifying codesets you sometimes have to decide whether to unify two distinct codepoints for similar glyphs or not, and if not, then semantics intrude via the two codepoints' names (and, therefore, purpose). You might recall that Unicode 2.x tried to unify CJK... -- that was back when Unicode did not have a sufficiently-luxurious code space, and that unification, politically, went badly.

In many ways it might have been better to have exactly one codepoint per distinct glyph, but in practice, the ability to have simple algorithms to map from existing codesets to Unicode, politics, and the ability to distinguish script (in the cases of otherwise-equal codepoints) turned out to be more important.


You could ask the same thing about hypen, dash and minus sign, but they are different in their meaning and as someone who layouts texts I might want to style them differently from each other


Hyphen, dash, and minus do not look the same, nor have the same use.


I bet the average person on the street would tell you that these are the same and they never thought about hyphen and minus being different. For them all three look like a dash.


Hyphen, dash, and minus do not look the same

Depends on your font. Not every font or display has the resolution available to make that distinction.


The semantics are different.


What is the semantic difference between a "tricolon" and a "triple colon operator" m


Since I had no clue, and was curious, I tried to find out.

"Tricolon" shares a name with a far more common use as a rhetorical technique, making it hard to find details. I tracked down the Unicode proposal for 'Epidaurean acrophonic symbol three' at http://www.unicode.org/L2/L2003/03075r2-tlg-numeric.pdf .

> This proposal contains 52 Greek Acrophonic (numerical, non-alphabetic) characters. Acrophonic numerals are found primarily in ancient Greek inscriptions in Athens and other city states. ... The proposal includes the characters needed for the encoding of the Attic acrophonic system—namely characters used in Athens and the surrounding area (Attica)—and non-Attic characters which cannot be considered as glyph variants of Attic.

It has the comment "Note: Accepted by UTC (2003-11-7) as part of extended Greek punctuation. Design to match 00B7. Example: IG IV.316". That's Ancient Greek.

I tried, but failed, to find the example from Inscriptiones Graecae IV.316.

I couldn't figure out the reason for why the triple colon operator was added, other than the comment in the Unicode database that it's used in "logic". There's a 2002 mention of it at https://unicode.org/L2/L2002/02174-tr25-5.pdf .


Well the article itself shows the use of tricolon punctuation in bibliographies. Some comments on the page elaborate on that point.


The article shows the use of a symbol which is visually similar to a tricolon.

The text says "We find four characters [in Unicode] that look like this: ... “Tricolon” appears in 19th century sources as a name for one type of verse structure found in the bible, so that isn't so helpful. ... So really Unicode was no help to me."

This suggests that the use of "tricolon" in Unicode is not the same as on the keyboard.

The word "tricolon" is otherwise not mentioned, and I don't know what comments you refer to.


Tricolon is punctuation (like the dash) while triple colon operator is a mathematical symbol (like a minus).


one is a mathematical operator, the other is not


Any idea where is a triple colon operator used actually? I can find Vertical Ellipsis in matrices to denote continuation but nothing about using the glyph as an operator


That last one might be "..." for vertical monospace text, like traditional Japanese. Not sure though as when I highlight it, it doesn't look wide enough.


The usual reason for Unicode symbol redundancy is round-trip encoding compatibility.


I don't understand, could you elaborate briefly?


I think Unicode tries to avoid this:

encoding A,B,C visually identical character → Unicode One True Character → garbage in encoding A,B,C


>Our mystery symbol was clearly intended to be typed, however the person transcribing incoming telegrams could just make a new paragraph on receipt of that code, rather than typing a special character. There appears to be no reason to ever put the symbol on paper.

Err, typing the symbol to separate paragraphs instead of adding a new paragraph would save space (and also paper) though...


Or maybe the same keyboard mechanism and layout was also intended to be used on the telegram-sending-device


Didn't telegraphs use paper tape in the beginning? If so, then the only way to signal line/paragraph breaks would be symbolically. Of course, a typewriter wouldn't have needed that, but as TFA shows, it came in handy in some cases anyways.


Didn't telegraphs use paper tape in the beginning?

Not just in the beginning. Almost until the end. The paper tape with the actual messages was cut into strips and pasted onto a telegram form before being put into an envelope and delivered.


All the more reason, then, to have a line-/paragraph-breaking glyph!


No, that's not how it worked. The messages weren't arranged in paragraphs. They were snipped to fit inside the pre-printed box on the telegraph form.

And even very rich people didn't send multi-paragraph telegrams. You paid by the letter. Most telegrams were two sentences or less. Sometimes two words or fewer.


Hence the word "telegraphic":

1. Of, relating to, or transmitted by telegraph.

2. Brief or concise: a telegraphic style of writing.

(My underlining)


Mark Twain famously sent a one-character telegram to his agent: ?

And the agent famously responded: !

The real question was "how's the new book doing?".


I always heard it was Victor Hugo and Les Miserables–indeed it's mentioned ("apocryphal tale") on Hugo's wikipedia page...with a link to this page saying it most probably never happened. Also I came across it as an Oscar Wilde story online just now.

https://quoteinvestigator.com/2014/06/14/exclamation/


Fascinating. Three different apocryphal versions of the same story.

EDIT: Searching, I only find the versions about Wilde and Hugo, so it may be just that I misremembered it.


...it doesn't just open up the Options menu?


No, there were touch gestures for that.


Can somebody please create a Babbage-style infinite scroll interface?


It amazes me how much we still know from say 1000 years ago and how little we sometimes know about things just shy of 150 years.


It could also imply we don't really know most of the mundane details and intricacies of past cultures. Tricks and techniques lost to time forever. Like the difference between the many small things you experience when you are traveling somewhere compared to what's left in the pictures.


Not so much "lost to time forever" as "locked away in physical media that makes them hard to access."

We talk a lot about the seemingly infinite storage capabilities we have today, but all it takes is one terrorist attack, or one major SV company to go out of business and suddenly there's a massive hole in our cultural memory that cannot be recovered because everything we have is stored magnetically.


There's also the "never written about because every one knew" common knowledge problem. It was a boring thing everyone knew, so why would they write about it?


I'm pretty sure it's rather that there are a lot of things that are in "we don't know that we don't know" category


Jon Blow has an excellent talk titled "preventing the collapse of civilization" where he explores this.


I always wonder how long it would take humanity to relearn certain things if some cataclysm would happen. And I don’t even mean the latest inventions in semiconductor industry. But things like metal, glass and woodworking. Crafts that took ages to evolve.


We don't know every single detail of the everyday life from 1000 years ago...


True but that was not my point. I meant in documentation and details of everyday activity of certain people. But granted such small details might also be forgotten completely.


Just speculating here, but it could also be plausible that one of Sholes' investors was a biographer, who wanted that kind of symbol on the typewriter. But given the success of the typewriter, by the second version, Sholes abandoned it, because the demand for a slash symbol far outweighed the demand for the dotted line.


In my highschool typing classes, 40 years ago, I remember seeing this symbol used to indicate omitted paragraphs ... in the same way that the ... is used to indicate a pause between sentences, the vertical version indicates a pause between paragraphs.

.

.

.

Like this.

But .. I can't seem to find online examples. Its a terrifically difficult character to search for ..


That symbol is called an ellipsis when used to indicate a pause - https://en.wikipedia.org/wiki/Ellipsis

If you search by this term you can get unicode characters for both the horizontal and vertical versions.


This is called the "vertical ellipsis", if that helps. It's rarely used but there's some discussion of it in the article.


Great research by the author of this article.


When it comes to keyboards, Marcin Wichary will not leave a single stone unturned.


I remember old terminals showing | as two separate segments with a space in the middle.


The "standard" (who defines the standard?) UK keyboard layout includes both ¦ and |.

Like most European keyboard layouts, there are several symbols available using AltGr, but only ¦ and € are printed on the keys. ¦ is AltGr+`.

¦ is even more useless than Shift+`, which gives ¬. At least I used ¬ when I was studying logic.

https://answers.microsoft.com/en-us/surface/forum/all/why-is...


This had its own discussion a little while ago:

https://news.ycombinator.com/item?id=20627274


Oh interesting, totally missed that one.



Checking my immediate surroundings, it's also like that (separate lines) on the HP keyboard (KU-0316) I'm using. In other words, it's pretty standard. (Amazon has a pic: https://www.amazon.com/Genuine-HP-Hewlett-Packard-KU-0316-Ke... )

But on the Macbook keyboard (US layout, circa 2015 if it matters) it's a solid line.




Spoiler: an early example of the hamburger menu


We call it the "kabob" menu. If it's sideways, it's "ants on a log"


Expanding on that (nice!) idea: could it be an "everything else" symbol? Since a lot of glyphs are missing from the typewriter, this could be some kind of placeholder for missing characters meant to be manually overwritten after typing?


Can someone explain to me why that original source needs "an alternate set of line breaks"?

Why one set (vertical bar) is not enough?


It seems like the original blog post got it wrong there. See the comment by "godspace"[1] and the link they provided[2]. The | indicates a line break but the ︙ indicates a logo.

[1] http://widespacer.blogspot.com/2016/03/the-lost-key-of-qwert...

[2] https://www.newspapers.com/image/39396241/?fcfToken=eyJhbGci...


I like the idea it could have been used for borders.

    ..............................
    ⋮                             ⋮
    ⋮.............................⋮
Seems plausible.


It's a really rather advanced use of typography (making borders) for such a simple machine that didn't even have parentheses. But yeah, it's plausible, and I like the authors theory that it was intended to be a vertical bar, but it was replaced with three dots to avoid confusing it with I and 1.


The author suggested the borders idea as well. I didn't come up with it, I just illustrated it.


Clickbait title rescue: It’s the ⁝ - U+205D tricolon character, present on the drawing accompanying the original patent for the QWERTY-layout typewriter. Its actual use is obscure, and still unknown, although theories abound.


I don't think it's clickbait. Sure, you can tell me it's some weird character I've never heard of. But I already knew that from the title! Failing to use the name of something in a title, when basically nobody knows that name, is not a bad thing.


The title could have been “The lost ⁝⃣ Key of QWERTY”. Or even “The Lost ‘vertical dots’ Key of QWERTY”. To needlessly omit crucial information from the headline is clickbait.


Thank you! You saved me a click. Your comment should be at the top of this thread.


From the title I thought this would be about the fact that the QWERTY keyboard has 104 keys vs 105 for the AZERTY keyboard (used in France and other countries in Europe). QWERTY has a really wide SHIFT key at the bottom left, whereas AZERTY has an extra key there.


If you're going to list facts, at least get them right. The version that Europe uses is also called QWERTY. The US uses the ANSI variant of the QWERTY layout while the "European" variant is called ISO. One of the differences between the two versions is indeed the left shift being longer or shorter to accommodate the extra key but it's not the only one.

AZERTY (a variation of the ISO QWERTY, with 105 keys, the big enter, etc) is used in France like you say but nowhere else. The rest of Europe use other layouts within the ISO QWERTY. Here you'll find Spanish, UK, Italian, Norwegian, and many other layouts that move and add the necessary symbols for their languages.

edit: corrected right to left, stupid me. I didn't express myself correctly when talking about AZERTY. It is used in a few other places besides France. What I meant is to say that is not Europe's standard as the parent comment seemed to indicate.


I think you mean left shift. The Enter/Return key is also different, though looking at different layouts I have no idea why, as the extra key on that side seems to be present in different places regardless of the actual shape of those keys.


>The rest of Europe use other layouts within the ISO QWERTY.

This is also not exactly true. e.g. in Poland ANSI QWERTY is typically used, with diacritics typed with right Alt. (there are other layouts with diacritics directly available, like Windows "Polish (214)", but nobody uses them)


My keyboard is AYERTZ, but perhaps I just put the keys at the wrong place after cleaning


Did I say no other layouts exist outside of AZERTY and QWERTY? I was pointing out that the assertion that the parent comment made about AZERTY = Europe and QWERTY = US was wrong.


Doesn't Germany use QWERTZ?


QWERTZ largely maps to QWERTY with exception of special characters and Z/Y being swapped. QWERTZ can be bought in ISO or ANSI, though ANSI QWERTZ is hard to get from any sane manufacturer since everyone here likes DIN/ISO.


AZERTY is used in France like you say but nowhere else.

Interesting. I used an AZERTY keyboard when I was in Austria in the 90's. Did I end up with a French keyboard? I just assumed it was the local form.


Belgium also uses AZERTY.


You're right, thanks for pointing that out. I didn't express that thought correctly and I've added a note to my previous comment.


Thanks, it's a bit clearer what you mean with your edit.


105 key qwerty keyboards are very common




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: