I hate this argument every time I see it because it's invariably used in the wro...

carapace · on March 12, 2017

> It's a completely solved problem, theoretically speaking.

lol.

Unicode was ambitious for its time, but naive. Today we know better. It "jumped the shark" when the pizza slice showed up and has only been getting stupider since. Eventually it will go the way of XML (yes, I know XML hasn't gone anywhere, shut up) and we will be using some JSON hottness (forgive the labored metaphor please!) that probably consist of a wad of per-language standards and ML/AI/NLP stuff, etc.. blah blah hand-wave.)

Unicode just sucks.

Manishearth · on March 12, 2017

Again.

Yes, "it jumped the shark when the pizza slice showed up". However, that doesn't imply that it did everything wrong. The notion of multi-codepoint characters is necessary to handle other languages. that is a solved problem, it's just that programmers mess up when dealing with it. Emoji may be a mistake, but the underlying "problems" caused by emoji existed anyway, and they're not really problems, just programmers being stupid.

We had multiple per-language encodings. It sucked.

carapace · on March 13, 2017

I don't agree that the notion of multi-codepoint characters is necessary, I don't think it was a good idea at all. I submit [1] as evidence.

Whatever this mess is, it's a whole thing that isn't a byte-stream and it isn't "characters" and it isn't human language. Burn it with fire and let's do something else.

[1] http://stackoverflow.com/documentation/unicode/6485/characte...

(In reality I am slightly less hard-core, I see some value in Unicode. And I really like Z̨͖̱̟̺̈̒̌̿̔̐̚̕͟͡a̵̭͕͔̬̞̞͚̘͗̀̋̉̋̈̓̏͟͞l̸̛̬̝͎̖̏̊̈́̆̂̓̀̚͢͡ǵ̝̠̰̰̙̘̰̪̏̋̓̉͝o̲̺̹̮̞̓̄̈́͂͑͡ T̜̤͖̖̣̽̓͋̑̕͢͢e̻̝͎̳̖͓̤̎̂͊̀͋̓̽̕͞x̴̛̝͎͔̜͇̾̅͊́̔̀̕t̸̺̥̯͇̯̄͂͆̌̀͞ it is an obvious win.). Even when it doesn't quite work... (I think I'm back to "fuck Unicode" now.)

Manishearth · on March 14, 2017

I submit Hangul and all the Indic scripts as counterevidence. Not all scripts can easily avoid multicodepoint chars.