One of my favorite Unicode oddities is in the cyrillic block. Some books used to use the letter "ꙩ" when writing the word eye (ꙩко). The letter was (from what I can tell) never used for anything else. Because dual is a thing, clever people then went to write ꙭчи or Ꙫчи to mean two eyes.
Both those characters made it into Unicode as there was some use in historic script. However even completely absurd variations made its way into unicode. The "many eyed seraphim" is written as серафими многоꙮчитїи. So if you need to write something with a lot of eyes, you can use ꙮ.
I googled up the paperwork proposing the inclusion of ꙮ and variants a few years ago and got the impression that a) you're not supposed to include something that only has a single recorded use and b) ꙮ pretty much has a single recorded use.
Interestingly, the phaistos disk[1] has its own little bit of Unicode and from what I understand the only known “original” use of all of the symbols is the phaistos disk itself. I guess there was enough discussion about the disk and it’s symbols to include it.
The Phaistos disk sort of makes sense to me since, single use or not, it's (probably) some sort of script. I'm not really any kind of expert in ancient scripts or Unicodery and this was a while back but the gist I got from the docs was something along the lines of 'no one-time/decorative uses' and the eye thing looked like it was exactly that. The submission was from an academic in Slavic studies of some sort, I thought about emailing them to ask but couldn't really come up with a way to phrase what amounts to some version of 'did you mess up Unicode or what?' in a non-dickish way.
I guess you could argue that a single discovered extant usage is not single use, it's quite possible that there will one day be a reconstruction of the script (maybe with some more samples being discovered).
You mean the eye thing? That's from Cyrillic, a baby of a script (as scripts go) that's still in wide use today. The o-as-eyes is a decorative flourish, a little visual pun - it's still just an o. An analogous thing would be a medieval Latin parchment of, let's say, the Lord's Prayer and it opened with a bigass P with vines, a gargoyle dancing to a cat shredding on the lute and a tiny caricature of the scribe's dad. If someone found that, we probably won't end up with ʟᴀᴛɪɴ ᴅᴀᴅ ᴊᴏᴋᴇ ᴘ in Unicode.
I think wisty was saying you could make the argument that the glyphs on the Phaistos disk were also used elsewhere, but we just don't have any samples.
Oh, sure, if it's about the Phaistos stuff - it sounds reasonable to have them in Unicode to me but much more importantly, I'm oversimplifying/butchering/misremembering whatever the actual Unicode rules are. You're better off just assuming I'm wrong about their details in important ways.
See, that’s the kind of thing that’s absolutely fascinating. Some scribe thought they were being clever a few centuries ago, and the glyph will now live on forever in our modern equivalent of a collective myth known as “the Unicode standard”, with people discovering new uses for it for generations to come.
It's a nice little doodleglyph, no doubt. I never thought I'd be virtually pointing at an internet stranger and being all 'yes, that's also one of my favourite weird things in Unicode!', though.
> One of my favorite Unicode oddities is in the cyrillic block. Some books used to use the letter "ꙩ" when writing the word eye (ꙩко).
Interesting. That "ꙩко" looks phonetically (if that's the right word, I'm not well up on linguistics (if that's the right word again, ha ha)) a bit like the Hindi word "aankh" for "eye". The "n" sound in "aankh" is emphasized less, for lack of a better term. Actually, in Hindi, it is shown as a dot on top of one of the other letters, to show that.
Also reminded by this, via George Borrow's novel Lavengro[1] (a story about gypsies), that the gypsy and Hindi words for "nose" are similar, "nak".
One theory is that the gypsies (Roma(ni)[2]) migrated from northwestern parts of India to other parts of the world, such as North Africa and Europe.
Aankh (pronounced almost like aak) and nak, get it? :)
Most modern Indo-European language words for 'eye' share a common origin, this predates the Romani and their migrations by a long stretch. Here's an eye:
> The "n" sound in "aankh" is emphasized less, for lack of a better term.
Sometimes an original nasal consonant will reduce to a vowel that remembers the original consonant only by releasing air through the nose. (Where ordinarily the air would come out of the mouth.)
This is a big thing in Portuguese and French. (At this point, the consonants are long since gone and the nasal vowel is correct Portuguese/French. But the change would have originated in people speaking something closer to Latin, which doesn't use nasal vowels, and being "careless" with their pronunciation.)
Elision of -m and nasalisation and/or lengthening of the preceding vowel was already happening in classical latin, including the one spoken by ruling elites and is attested through poetic metric and other sources.
See Classical Latin. W. Sydney Allen, in Vox Latina 30–31
Also youtuber ScorpioMartianus has invested some time into training himself into using reconstructed pronunciation and has talked extensively about that, see for example https://youtu.be/psYM-LvBplw
Granted; I spoke much too broadly. According to the classical sources, -m fully disappears when followed by a vowel. (Though same-word intervocalic -m- does not.) Poetic meter backs this claim up robustly.
It's actually a little bit weirder than that; the vowel before -m also disappears. But it's certainly plausible for some nasalization to remain anyway.
> and/or lengthening of the preceding vowel
You're referring to -ns- / -nf-? You're also right there. As far as I'm aware, this doesn't happen for -nd- / -nt-, though.
> Also youtuber ScorpioMartianus has invested some time into training himself into using reconstructed pronunciation and has talked extensively about that
While that sounds like a cool project, I don't think it necessarily has a lot to tell us about the historical pronunciation. I think you could develop a pronunciation system that matched nearly every documented feature of a dead language while failing to match a large number of undocumented features.
> While that sounds like a cool project, I don't think it necessarily has a lot to tell us about the historical pronunciation
You could say the same about pronunciation research published in linguistic journals. Let me use an analogy:
Imagine looking at the source code of a game. It's technically possible for a reader to technically understand what the program is doing and understand what the game is about, how it works, it's rules and goals.
However, if you pass the sources through a compiler (whose behaviour you also can well understand) what you end up with is a game you can run and experience.
Reconstructed pronounciations are a bit like that. You get to "experience" rules that are otherwise coded in an abstract language. The effort of translating those rules into something you experience actually requires a lot of effort and expertise. You can in theory become a "compiler" and learn how to do it yourself (aloud or in your head) but it's hard; what's wrong with outsourcing it?
> Imagine looking at the source code of a game. It's technically possible for a reader to technically understand what the program is doing
This is already well beyond what's possible for a dead language. It's not even possible for living languages, although in that case we can draw empiric conclusions.
I've been interested for a long time in the question of how we can determine how a language divides up the space of possible sounds. For example, English [θ] (the sound at the beginning of "thick") is perceived by Mandarin speakers as being the sound [s] (as in "sick"). It is perceived by Cantonese speakers as being [f] (as in "fickle").
The sounds [s] and [f] are both phonemic in both Mandarin and Cantonese. But something about the phonology of each pushes the sound [θ] into one category or the other. The choice is not arbitrary; it is quite consistent across speakers of each language.
To the best of my knowledge, we have no way to answer the question "how would language X categorize sound Y?" other than experimentation, which is impossible with a dead language. But it is a fact about the language, and in principle the question can be answered solely by looking at the pronunciation of sounds within the language -- in the ordinary course of events, a Chinese speaker would go their entire life without being exposed to the sound [θ], and yet they would largely agree with each other on what the sound was if they did hear it.
I say that this categorization question draws upon rules of pronunciation which we don't presently have a good idea of how to describe or characterize at all.
So I say reenactment of a dead language is an interesting project, but you're inevitably going to make choices that are wildly different from the language as it existed in the past. Pronunciation reconstruction is on much firmer ground -- and it gets there by not addressing most questions. But a reenactment cannot avoid addressing every possibility, and it's going to get most of them wrong.
YMMV. I once watched a short video by an accent coach teaching how to make an Irish accent, a Scottish accent, an Australian accent etc. He talked about place of articulation and made pretty decent (although clearly not native) approximations of the pronounciations. I found his attempts at actively voicing things out quite helpful. I'm fully aware this is just an approximation, but in a way I found that teacher to be more effective at conveying what makes a given accent peculiar, more than what just listening to a native speaker would. Probably it all depends on what you're interested in.
I love this. There are lots of characters in Unicode I'd love to know the history of, but searching for such things is hard, at best.
The one I've stumbled across recently are the "Negative Squared Latin Capital Letter" (1F170 - 1F189 [1]). The 26 letters are there, but "A", "B", "O", and "P" are special. Blood types and a parking symbol. Sure, but why? Why was it decided we needed all of the Latin letters in inverted squares, except those? Why aren't they their own symbol? I'm not complaining here, I just want to know the history.
I'd also love to know the history of the other Latin letter ranges. It feels super odd to me to have, say the "Mathematical Bold Fraktur" range of characters (1D56C - 1D59F [2]). What's the history of including a font in Unicode? Why did we stop with the few that are in there?
This is probably somewhere online, but I can't find it.
There's a popular belief in Japan that a person's blood type influences their personality[0]. This probably made Japanese emoji designers add them to their pre-unicode emoji sets, which then got rolled into the first set of unicode emoji in 2010[1], along with a bunch of other Japanese symbols.
Some OSes display some unicode characters as emoji-default. These characters were selected for that for exactly the reason you surmise. There’s a [text presentation selector][] to enforce the text variation. This is also useful for making (yellow triangle with exclamation point inside) become (single color triangle outline with same-color exclamation point inside).
The committee thinks/thought that the Fraktur letters aren’t just a different way to write letters, but symbols with a different meaning, just as ℂ, ℕ, and ℝ aren’t a different way to write C, N and R.
(And yes, they included not only 𝔸𝔹ℂ𝔻𝔼𝔽𝔾ℍ𝕀𝕁𝕂𝕃𝕄ℕ𝕆ℙℚℝ𝕊𝕋𝕌𝕍𝕎𝕏𝕐ℤ, but also 𝕒𝕓𝕔𝕕𝕖𝕗𝕘𝕙𝕚𝕛𝕜𝕝𝕞𝕟𝕠𝕡𝕢𝕣𝕤𝕥𝕦𝕧𝕨𝕩𝕪𝕫 and 𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡 in Unicode. In iOS Safari, using the font for entering HN comments, ℂℍℙℚ renders a bit taller and bolder for me. ℝ and ℤ render taller, but not bolder. Why beats me.)
I agree with them on ℂ, ℕ, and ℝ. I also think that makes it hard to disagree with them on the Fraktur letters.
These can result in crossness amongst mathematicians: everyone uses them on the blackboard (they are called blackboard-bold after all) but half use them in printed matter and the other half insist on actual bold. Discussions on the matter lead to cold stares and the poking of fingers in the chest.
No, the positive intergers are Z⁺ (Z^+ / Z<sup>+</sup>). N is natural numbers aka nonnegative integers aka positive integers along with (not "including", since it isn't positive) zero.
Most of the Latin subscript glyphs are from the Phonetic Extensions block. They only added those that were actually used in various phonetic notations.
> The committee thinks/thought that the Fraktur letters aren’t just a different way to write letters, but symbols with a different meaning, just as ℂ, ℕ, and ℝ aren’t a different way to write C, N and R.
As to the mathematical uses, this thought is correct. Niche, obviously, but Unicode has no trouble with niche characters.
It's hard to justify the full alphabet on those lines, though.
There's, imo, a reasonable expectation that if some of those characters were used then some other might be later, and it's easier to have them already in the standard than reserving space for them and then having to backfill it, I suppose.
> In iOS Safari, using the font for entering HN comments, ℂℍℙℚ renders a bit taller and bolder for me. ℝ and ℤ render taller, but not bolder. Why beats me.
Font fallback. HN specifies Verdana, Geneva, sans-serif. Geneva has the six characters mentioned. The rest will be rendered using a different font that covers more parts of Unicode, or that covers Unicode math specifically. With the built-in system fonts, that would be Cambria Math on Windows and one of the STIX fonts on macOS.
(Source: Firefox → Inspect Element → Fonts tab of the right Inspector pane. Chrome can do that too in a slightly different place, macOS Safari is too user-hostile to have this feature.)
I guessed that, but still can’t figure out what makes ℂℍℙℚ special. Looking again, I notice ℕ is taller, too, but not bolder. ℂℍℕℙℚℝℤ is much more “I can see why a font would have these, but not others”. That solves the issue enough for me.
(And, by the way, iOS doesn’t have Geneva, but it has a version of Verdana)
Weird that this comes up today. My Steam handle has an O and I used "negative squared Latin" characters -- I was investigating this yesterday (I found no answers!).
Last year my Steam name showed properly on MS Win10 and on Kubuntu. About 6 months ago Win 10 started showing it with the O as red. Last week the red O doesn't show at all in one part of Steam but shows white in another part.
I copied the symbols to a Unicode decoder and it showed the O name with a {blood red} modifier, something like that.
I searched briefly but couldn't find the symbol without the red colour, but it only shows red in some places: I'd pasted it into the search bar in Firefox (on Kubuntu), it showed as a number-square (1Fxxx) but then after searching it showed as the red-O character.
In case you enjoy the technical pedantry: UTF-8 is "just" an encoding scheme to represent the characters defined in the Unicode standard. There are other schemes in the standard (like UTF-16 or UCS-4).
Your issue here is with the characters and their representation, not with the specific encoding. Hence, what you wanted to say is: "Unicode is weird" ;)
That's fair, but why stop there? The example that comes to mind is "Courier 12pt is the only font ever for screenplays". It's required to convey a screenplay, to my mind, like using Fraktur is required in the math space.
Despite what it may seem like, I'm really not trying to mis-parse the reasons here, I'm honestly trying to figure out where the line is, and why it's there.
> why stop there? The example that comes to mind is "Courier 12pt is the only font ever for screenplays". It's required to convey a screenplay, to my mind, like using Fraktur is required in the math space.
The screenplay is still being written in letters. Mathematical ℝ is more accurately thought of as an ideogram than a letter. If you were to write "let r be a member of ℝ", the "ℝ" would be structurally parallel to the full word "member", not to the "r" within it.
Courier for screenplays is a choice you make at the document level; blackboard bold for mathematical entities is not. ℝ is always ℝ no matter what styles apply to your document.
I really don't know, but my guess would be that in mathematics particular symbols denote particular meanings depending on the shape of / decorations on the character. Big g is different from little g which is different from bold g which is different from italic g which is different from Fraktur g. Because Fraktur g is different in meaning from just g, Fraktur gets a spot in the Unicode specs. Courier, while it is standard for screenplays, does not affect the meaning of the text of the screenplay as opposed to the use of another font.
Historically, a few were included in the Letterlike Symbols block¹ because they were present in some pre-Unicode character set. Much later it was argued that since a few were present, they all should be present.
Also note that these selective ones were part of the basic multilingual plane, where space was always a bit at a premium. They were assigned before Unicode expanded to have 17 complete 16-bit planes and space stopped being a problem.
You mix the Fraktur and Roman letters in the same mathematical manuscript, the way you mix Greek and Roman letters in the same mathematical manuscript.
Fractur is more than a font difference as it has a few ligatures (e.g. tz) that are not found in the Roman alphabet as used in modern German (ß is the only one that made the transition). And these “ligatures” aren’t really true ligatures in the sense of, say, the “fi” ligature in some fonts; they are glyphs that are close to being fully fledged letters, as, say, ö, which is an accented (umlautened) letter in Germany is a fully-fledged letter in Swedish, or how W became a freestanding letter in English.
The introduction of the Fraktur font introduces different meaning. 'R' in Fraktur would mean something different than R in another font in the same text.
I think you could argue the same for typewriter (monospace serif) fonts. Plenty of texts use them to denote the name of a variable or function in-line, much as we would use backticks to talk about `leftPad` here.
This is actually what started me down this path a while ago. I had to do a full text search feature for some text of questionable sources, and some users had taken to using the full width characters for emphasis (I think, they clearly had rules in their head for whey they'd use it, but I didn't know what the rules were). There are libraries that can handle the official Unicode normalization rules, but users don't exactly always pay attention to the official rules, so I get to start finding all sorts of weird little corners of Unicode.
Though, as I understand it, the full-width characters are there not for any modern use cases, but for historical reasons having to deal with older character sets.
I don't think there's much call for fullwidth Latin characters for that purpose. Ordinary use means typing with whatever your input method gives you. This is generally not fullwidth characters.
A clean grid would be desirable in formal use, but formal use means trying to avoid Latin characters as much as possible. It's generally possible. Plaques and the like are much more likely to say e.g. 二〇二〇年 than to say 2020年.
Grids are not just for formal use, they're useful any time you want to have aligned text, e.g. if you want to write a markdown table mixing Latin and CJK characters.
And I doubt you'd want to eliminate all formal uses of Latin characters. E.g. a plaque about a person would likely want to use their preferred name, which might be in Latin characters.
I think the full adoption of different alphabet styles as independent unicode glyphs is, overall, a conceptual mistake.
But, note that the identical process, much earlier, is how we got separate capital and lowercase forms. Writing systems never do that when they're developed.
Spacing was used for emphasis in blackletter, and consequently persisted in roman in German even after other means (e.g. italics) became common in other languages. https://de.wikipedia.org/wiki/Sperrsatz
> The one I've stumbled across recently are the "Negative Squared Latin Capital Letter" (1F170 - 1F189 [1]). The 26 letters are there, but "A", "B", "O", and "P" are special.
In which way are they special? As far as unicode is concerned they are all the same. It's just that some have an emoji rendering variant for the blood type reasons. The rendering can be picked through representation characters: http://www.unicode.org/reports/tr51/#def_text_presentation_s...
These all have Emoji_Presentation=No as specified in emoji-data.txt, for precisely this reason (to avoid discrepancies in rendering), but most platforms don't respect these defaults, as it's common to get strings containing emoji from mobile devices, which usually default to emoji presentation. I talk about this a little in my recent "Text layout is a loose hierarchy of segmentation" blog post.
My recollection is for the textual characters that got repurposed as emoji, it’s implementation-defined as to whether the character defaults to text or emoji but that most renderers opt for emoji (which I believe is copying Apple’s decision here). There are variation selectors you can use to force textual or emoji presentation.
That file is new in Unicode 13. Also I think it might be based on Apple's behavior. I just tested every single emoji codepoint listed there, and it's almost entirely consistent with Apple's default rendering. And the very few exceptions (like ) actually seem to depend on whether the text renderer is styled or not, where it becomes emoji in styled fields (like Spotlight) and text in plain-text fields (like this comment box). This seems to only be done for the small handful of pre-existing text symbols that got turned into emoji and given the Emoji_Presentation property.
Another interesting character is U+2189 Vulgar Fraction Zero Thirds: ↉
IIRC, it also came from a Japanese code page and was probably used for baseball scores, although the exact origin and usage remains a bit of a mystery.
It is very likely that this symbol is added for baseball. It's not from the Japanese national standard but from a character set for Japanese TV sets called ARIB charset (1). According to Wikipedia (2), ↉ is actually used for baseball scores. It says that if a pitcher is removed before any batter is put out, he threw ↉ innings for that inning on the record.
> nobody could tell what they meant or how they should be pronounced
Definitely the legend of this area is Korean[1]. Almost 95% (or even more) of characters have never been used and no one know how to pronounce it. Even from the first line of the blocks, I could find weird characters like 갅, 갌, 갍, 갎.
> Almost 95% (or even more) of characters have never been used and no one know how to pronounce it.
This is absolutely false. KS X 1001, the primary character set for Hangul, contains 2,350 out of 11,172 (modern) syllables, which is definitely much larger than 5%. And that's not enough (KS X 1001 itself was heavily criticized for this), there are multiple secondary character sets got into Unicode; Unicode 1.1 had 6,656 arbitrarily ordered characters that were finally replaced by 11,172 neatly ordered characters by 2.0. They are real characters people use, my very old personal research [1] indicates that at least a half of them has legitimately appeared in the chat log for example.
Pronunciation-wise, multiple consonant clusters in the final position indicate conditional pronunciation. "갅", for example, comprises of ㄱ g + ㅏ ah + ㄴ n + ㅈ j which is normally pronounced "gahn" by its own but "gahn-j" when followed by a vowel (e.g. "갅이" is pronounced "간지" gahn-ji). Phonetically Korean only has 7 possible codas and other final consonants exist because Korean orthography is a compromise between phoneticism and ideographicism and some words had to be adjusted to show their morphological roots.
More broadly speaking, there are some true facts cited in this post, but the basic premise is horseshit, and the whole of it should be retracted.
As Andrew West has pointed out, these characters were not "made up" as they are real characters attested in Chinese texts. From a Unicode perspective, none of them is a "ghost character" at all.
I disagree with your characterisation as "horseshit" -- although eleven of the characters were subsequently located in other texts, these texts are different from the sources that were supposedly cited when they were input into JIS C 6226 in 1978 [1]. Of the twelve characters cited as "ghost characters", nine of them had claimed sources, but on further investigation were not in the claimed sources or were data entry errors of similar characters in those sources, while three did not have any claimed sources, of which two were later found in other texts (and one of those is believed to be an error), and 彁 has never been located in any text predating its input into JIS C 6226.
The problem is that we don't, and probably never will, be able to know if it was actually used or not. Chinese characters are very flexible and the phonosemantic construction of 彁 is fully justified, we couldn't just attest it (before 1978). The term "ghost characters" is merely colloquial and doesn't correctly reflect the reality of standards.
> Following the general adoption of the JIS standards these characters all made their way into Unicode
This is the exact kind of place lost glyphs would find a home. The UNIHAN effort (which spans tons of sources) is very dedicated to cataloging these efforts.
I run a few projects based around UNIHAN, which is geared toward cataloging glyphs that are variants, many of which are archaic and no longer used.
Slightly related : On French keyboards there is a key dedicated to the letter ù (u with an accent) which is used in one word only in the French language : "où" which means "where".
("ou" without an accent means "or", sql would be funny in French : SELECTIONNE * DE matable OÙ type='A' OU type='B' )
> For example, 妛 was an error introduced while trying to record "山 over 女". "山 over 女" occurs in the name of a particular place and was thus suitable for inclusion in the JIS standard, but because they couldn't print it as one character yet, 山 and 女 were printed separately, cut out, and pasted onto a sheet of paper, and then copied. When reading the copy, the line where the two little pieces of paper met looked like a stroke and was added to the character by mistake. The original character (𡚴) was not added to JIS or Unicode until much later and doesn't display on most sites for me.
Meanwhile, in a parallel universe where ASCII was invented in asia and latin characters were not available until unicode was established:
> For example, Ä was an error introduced while trying to record A below a dotted line. When reading the copy, the dotted line was added to the character by mistake. The original character (A) was not added to Unicode until much later and doesn't display on most sites for me.
The only spectres that are haunting Unicode still are U+3164 HANGUL FILLER and U+ffa0 HALFWIDTH HANGUL FILLER, which are not whitespace but valid ID_Start and ID_Continue chars.
Zerowidth whitespace as identifiers are illegal. eg https://github.com/jagracey/Awesome-Unicode#user-content-var...
These unused glyphs on the post are harmless, because unused.
Many of the original Hong Kong extension is for male and female sex organ used in Cantonese. They were there for court to write down what the gangster talk to each other exactly.
Another printing mistake gave English the word "dord"[0] (at least temporarily, and from a prescriptivist point of view). While this "word" didn't influence Unicode, it is somewhat analogous to the situation in Japan where there are symbols that are arguably in the language but which are never meaningfully used.
I assume that the situation is even stranger in Japanese because these mistakenly created symbols do not have either an associated definition (even a mistaken one) or a pronunciation.
I thought it was more of an accident? The story I know goes that Church used hats over variables, and then he switched over to lambda prefixes for ease of printing.
I have often thought that if I were to teach a class on functional programming, I would use JavaScript’s fat arrow notation to introduce lambda calculus. Less succinct with all the extra parentheses that occur in function calls but I think it’s easy enough to explain.
Pretty much the same idea, but in math contexts I tend to use the \mapsto arrow (↦), since mathematicians are more familiar with this than lambda calculus. (The fat arrow seems like an interesting alternative in programming contexts, but I worry that it might be confused with implication.)
The function that swaps the arguments to a two-argument function:
f ↦ ((x, y) ↦ f(y, x))
Interesting. The mathematical notation I learned in school was: f:x↦2x+3, So I guess this would translate to f:(x,y)↦(y,x).
Alternatively, the f(x,y) = (y,x) notation would work (sort of, you tend to explicitly write out the base vector 𝑒ₖ for both notations: f(x,y) = f(x𝑒₁+y𝑒₂) = y𝑒₁+x𝑒₂ )
That's close but not quite how I meant for the notation to be interpreted: f is an argument to an anonymous function. Maybe clearer is
swap = f ↦ ((x, y) ↦ f(y, x))
The rule this satisfies is
swap(f)(x, y) = f(y, x)
and the type for swap is
swap : (X × Y → Z) → (Y × X → Z)
for some types/sets X, Y, and Z.
(Also, note that tuples might not be anything like a vector space. For example, for String × [Int], you wouldn't usually write ("hi", [2,3]) as "hi"𝑒₁ + [2,3]𝑒₂ since there's not really a good commutative addition operation for strings or lists.)
HN intentionally removes “emoji-type” characters (I don’t know what the exact criteria are, it’s definitely more complicated than which plane the character is on) from posts. The only time I’ve seen this cause issues is when trying to discuss Unicode itself; presumably the moderators decided this is an acceptable tradeoff.
Automatically silently stripping characters from posts seems like a bad idea. If we want to forbid characters from posts, wouldn't it be way safer to prevent the post and tell the user to remove them?
There are swastikas facing both directions as religious symbols, but not angled Hakenkreuz like the Nazis tended to use. There also isn't a fasces or anything like that, although I feel like that's a less widely recognized symbol (and of course it is used in non-fascist contexts as well).
It's referred to officially as "FARSI SYMBOL", which apparently was a euphemism chosen because during ISO standardization of Unicode the original name "SYMBOL OF IRAN" was deemed unacceptable. As a logo, it wouldn't make it into the standard today, but nobody knows how it originally got in there:
Both those characters made it into Unicode as there was some use in historic script. However even completely absurd variations made its way into unicode. The "many eyed seraphim" is written as серафими многоꙮчитїи. So if you need to write something with a lot of eyes, you can use ꙮ.