I think this works great for apps like Slack, in user-land so to speak, but isn't realistic for the Unicode standard, not only because these entry formats are in English.
Modifiers and combinators aren't exclusive to emojis, but apply to all kinds of glyphs in other languages and writing systems as well. Arabic script even has some common ligatures for common expressions.
A lot of complexity simply doesn't stem from emoji in Unicode, a lot of the complexity comes from all the writing systems that Unicode supports. Admittedly, emoji are kind of an oddball addition to Unicode, but they're by far not the most complex part of it.
And even in Slack/IM apps, custom emoji codels only “work” because people aren’t often trying to 1. interoperate with external services using, or 2. parse archived logs of, arbitrary message text.
If either of these were common (e.g. slack bots that tried to parse semantic meaning from regular text rather than responding to commands; or Slack logs of OSS communities being public-access on the web) then you’d see a lot of people up-in-arms around the fact that these custom codels are used.
But since text in these group-chat systems is private, ephemeral, and mostly a closed garden, it never bubbles up into becoming an issue anyone else has to deal with.
(Though, on a personal note, I wrote my own VoIP-SMSC-to-Slack forwarder because Slack is a much better SMS client than any of the ones built into VoIP softphone apps, and I’m irritated every day that Slack auto-translates even Unicode-codepoint-encoded emoji from a source postMessage call, into its own codels in the canonical message stream. I don’t want to send my SMS contact “:thumbs_up:”, I want to send them U+1F44D!)
Think of Unicode like HTML. What’s better for interoperation and machine-readability: a custom SGML entity (like you could use up through HTML4); a custom HTML tag; or a normal HTML tag with an id/class attribute that applies custom CSS styling?
One way to encode a ‘custom emoji’ would be encoding it as a variation of some existing emoji. Use an as-yet-unused variation-selector on top of an existing emoji codepoint, and then “render” that codepoint-sequence on receipt by the client back to an image (but in a way where, if you copy-and-paste, you get the codepoint-sequence, not the image. In HTML, you’d use a <span> with inline-block styling, a background-image, and invisible content.) This is pretty much what Slack was doing with the flesh-tone variation-selectors, before Unicode standardized those. But you can do it for more than just “sub”-emoji of a “parent” emoji; you can do it to create “relatives” of an emoji too, as long as it’d be semantically fine in context to potentially discard the variation selector and just render the base emoji.
Or, if your emoji could be described as a graphical (or more graphical) depiction of an existing character codepoint, you could just use the “as an emoji” variation-selector on that codepoint.
Or, rather than a variation-selector, if you have a whole range of “things to combine with” (i.e. the possibilities are N^2), you could come up with your own private emoji combining character for use with existing base characters. The “cat grinning” emoji U+1F639 could totally have been (IMHO should have been) just a novel “face on a cat head” combining-character codepoint, tacked onto the regular “face grinning” emoji codepoint. Then you could have one such combining-character for any “head” you like! (And this would also have finally allowed clients to explicitly encode “face floating in the void” emoji vs. “face on a solid featureless sphere” emoji, where currently OSes decide this feature arbitrarily based on the design language of their emoji font.)
And, I guess, if all else fails, you could do what Unicode did for flags (ahem, “region selectors”), and reserve some private-use space for an alphabet of combiner-characters to spell out your emoji in. That way, it’s at least clear to the program, at the text-parsing level, that all those codepoints make up one semantic glyph, and that they are “some kinda emoji.” Custom-emoji-aware programs (like your own client) could look up which one in a table of some kind; while unaware programs would just render a single unknown-glyph glyph.
I don’t suggest this approach, though—and there’s a reason the Unicode standards body hasn’t already added it: it’d be much better to just take your set of emoji that you’re about to have millions of people using (and thus millions of archivable text documents containing!) and just send them to the Unicode standards body for inclusion as codepoints. Reserving emoji codepoints is very quick, because the Unicode folks know that the alternative is vendors getting impatient and doing their own proprietary thing. Sure, OSes won’t catch up and add your codepoint to their emoji fonts for a while—but the goal isn’t to have a default rendering for that character, the point is to encode your emoji using the “correct” codepoint, such that text-renderers 100 years from now will be able to know what it was.
So, please, just get your novel emoji registered, then polyfill your client renderer to display them until OSes catch up. Ensure your glyph is getting sent over the network, and copy-pasted into other apps, as the new Unicode codepoint. Those documents will be correct, even if the character doesn’t render as anything right now; if the OS manufacturers think the character is common (i.e. if it ever gets used in text on the web or in mobile chat apps), they’ll provide a glyph for it soon enough. And, even if the OS makers never bother, and you’re stuck polyfilling those codepoints forever, there’ll still be entries in the Unicode standard describing the registered codepoints, for any future Internet archaeologists trying to figure out what the heck the texts in your app were trying to communicate, and for any future engineers trying to build a compatible renderer. (Consider what Pidgin’s developers went through to render ICQ/AIM emoji codels. You don’t want to put engineers through that.)
> A lot of complexity simply doesn't stem from emoji in Unicode, a lot of the complexity comes from all the writing systems that Unicode supports.
Yes, it's not that emoji are doing anything odd compared to lots of real world languages, it's that emoji are just latin script writers' "first"/"only"/"most likely" interaction with that sort of stuff. The fascinating bit that that if it weren't for emoji a lot of these problems would still go unfixed in a lot of real languages, but because emoji are fun and everyone wants to use them we've seen a lot of Unicode fixes brought about by emoji that's a rising tide to lift other Unicode boats.
Modifiers and combinators aren't exclusive to emojis, but apply to all kinds of glyphs in other languages and writing systems as well. Arabic script even has some common ligatures for common expressions.
A lot of complexity simply doesn't stem from emoji in Unicode, a lot of the complexity comes from all the writing systems that Unicode supports. Admittedly, emoji are kind of an oddball addition to Unicode, but they're by far not the most complex part of it.