Oh, my point was that two kinds of names should not be used interchangeably. To ...

readme · on June 19, 2013

I think the act of storing both names is bad, because you multiply the amount of data that could possibly become wrong by 2.

With lower(), we can expect we'll get the right transformation of string A each time. If instead, we store string A, and then store string B as A.lower() and copy it... A.lower() will always be A.lower, but it's much easier for someone to come along, screw with the database, and change B.

timv · on June 19, 2013

I'm not sure how they can avoid storing both.

They need to store the verbatim username in order to know how to display the username in the UI.

They need to store the canonical username in order to efficiently know whether a given canonical username is in use.

brokenparser · on June 19, 2013

Not necessarily, in PostgreSQL you could simply add a canonicalised index.

timv · on June 19, 2013

Well, in that case you're still storing it, you're just letting the the db store it for you.

But - when the issue here is the question of the reliability of the implementation of the canonicalisation function, having it done once in python, and then again by PG is going to be a huge issue.

a-nikolaev · on June 19, 2013

Yeah, I see. I'm not a web developer, so maybe this is why I did not think about this way to break the data.

Well, I still think that it is better to have two (hopefully) correct fields in the database, rather than only one. (Consistency of the two fields can be checked once in a while).