There are a little over 256 unicode Combining Marks that have a 2-byte UTF-8 enc...

junon · 2025-09-20T11:32:56 1758367976

A demonstration as a comment on the gist would probably work! I'd love to see that

Retr0id · 2025-09-20T11:55:17 1758369317

Good point, added

junon · 2025-09-20T12:10:54 1758370254

Interesting, I actually expected it to encode a single letter with infinitely long combining marks such that 'highlighting' it was just highlighting one character.

Retr0id · 2025-09-20T12:24:09 1758371049

You can do that too, if you increase the STACK_HEIGHT constant (btw, the decoder still works the same, so changing this doesn't break compatibility)

junon · 2025-09-20T13:28:16 1758374896

Oh neat! Thanks :)

all2 · 2025-09-20T19:55:07 1758398107

Most of the characters appear as boxes on my phone.

Retr0id · 2025-09-20T19:56:11 1758398171

That's curious, because the only character is just the letter A. But I suppose if the font doesn't support a particular combining mark, it gives up on the whole grapheme?

Dylan16807 · 2025-09-20T17:19:20 1758388760

HN filters some combining characters? That's weird, compared to the symbol/emoji blocking.

Also I'm reminded that the unicode normalization annex suggests that legitimate grapheme clusters will be 31 code points or less. "The value of 30 is chosen to be significantly beyond what is required for any linguistic or technical usage."

Retr0id · 2025-09-20T17:25:45 1758389145

If I had to guess, they probably filtered the ones that could be used to break page layouts by creating very-tall glyphs.

Dylan16807 · 2025-09-20T17:29:00 1758389340

I guess that's one way to do it. Pretty far from ideal though.

RGamma · 2025-09-20T16:30:27 1758385827

Are you sure this doesn't summon The One by accident?