Caterina Fake, co-founder of Flickr, famously had issues with IT systems: Tim: T...

BurningFrog · 2025-05-02T16:02:56 1746201776

People named Null are also having struggles in the modern world:

https://www.bbc.com/future/article/20160325-the-names-that-b...

MiddleEndian · 2025-05-02T17:26:26 1746206786

lol so much data gets converted into strings at some point when passed around. Definitely encountered systems where you have to check for both null and "null"

adolph · 2025-05-02T16:07:27 1746202047

This seems like a good spot for the link to @patio11's "Falsehoods Programmers Believe About Names"

  So, as a public service, I’m going to list assumptions your systems probably 
  make about names.  All of these assumptions are wrong.  Try to make less of 
  them next time you write a system which touches names.

https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-...

msla · 2025-05-02T20:03:40 1746216220

I get what he's doing, but some of these are not actionable:

> People’s names are all mapped in Unicode code points.

So... what? What do I do with this? My program has to use something to represent text, and since I fail to be a large multinational consortium, I can't invent my own character set and expect it to work.

Also:

> Confound your cultural relativism! People in my society, at least, agree on one commonly accepted standard for names.

This is pretty much true in countries with naming laws, yes.

> People have names.

People in a database will have certain records which will not be NULL. Whether you call one of those records a 'name' outside the context of that database really isn't my concern.

zzo38computer · 2025-05-02T23:14:12 1746227652

Unicode is not the only character set (or the best one); this is a falsehood programmers believe about character sets (I wrote a list of this too but I do not remember if I had published it). However, that is not the most severe issue, due to the other things mentioned, such as if people do not have names (or if there are multiple ways to enter them, or if people sometimes change their name, or have the same name as other people, etc).

msla · 2025-05-04T23:30:01 1746401401

> Unicode is not the only character set (or the best one); this is a falsehood programmers believe about character sets

Unicode is the best if I want to communicate with other people. I lived through the 1990s; you won't convince me that playing "guess the encoding" with dozens of subtly-incompatible standards (and non-standards, and almost-standards) was a good time, or that having to override a web browser's helpful guess was fun.

jval43 · 2025-05-03T01:18:16 1746235096

>So... what? What do I do with this?

Try to understand these issues or rather how they could affect your business processes and software implementations down the line rather than dismissing them on a technical level.

You can store the Unicode representation just as you normally would. But what you don't do is assume that your Unicode representation is the only representation of the actual name.

More concretely, there are names that have multiple equally valid ways of writing them. You can probably expect that usually the same one is used, but you should absolutely not require this when building your business processes.

Even more concretely, as an example there are transliteration or simplification / shortening rules that allow people with otherwise strange or long names to buy an airline ticket. The actual, real name may not be any of the ones you have in your system. This matters e.g. when searching for someone or in customer support.

As for people without names (or unknown names), you should probably recognize that the handling might differ by country. E.g. records with "John Doe" in the US might have to be handled differently: analogous to "NULL != NULL" in SQL John Doe != John Doe. Or maybe even "Jane Doe == John Doe" in some cases. See also "Fnu Lu" (First Name Unknown, Last Name Unknown) used in the US.

And although I don't have knowledge about all the countries in the world, it may very well be that this leads to situations where the "no name" has to be handled specially or at least understood to be a special case, completely differently from other cases.

varun_ch · 2025-05-02T22:05:21 1746223521

> Try to make less of them

lmm · 2025-05-03T00:59:44 1746233984

> So... what? What do I do with this? My program has to use something to represent text, and since I fail to be a large multinational consortium, I can't invent my own character set and expect it to work.

Maybe don't rush to remove your "legacy" encoding support because "everyone is using UTF-8"? Or at least check with some Japanese users with obscure names first.

fuzzer371 · 2025-05-02T17:02:32 1746205352

See, the issue is a lot of people have stupid names.