Character encoding is in a special class of problems. Like time handling. If you...

Character encoding is in a special class of problems. Like time handling.

If you pick up a halfway non-ancient framework in a somewhat common language with a somewhat non-terrible persistence like postgres, you just don't have problems. Just don't care, and it just works.

But it's super easy to derail that fragile correctness with something like MySQLs utf8-ish handling, or some OS's path handling, or 'efficiency', or a user or frontend dev submitting data in a wrong encoding. And then it gets mangled. And then the user is unhappy.

At that point, it becomes very hard to argue why one of the two things is wrong, and the other is not. While the user argues the other way around. Because both look correct, if you look from the right angle. And the only reason why I am right is because of some standard, while the customer is right because of money.

And yes, it is very 'surprising' why our software now functions correctly for russian or greek customers.