Let’s talk about usernames

flurdy · on Feb 12, 2018

> So if you’re enforcing unique email addresses, or using email addresses as a user identifier, you need to be aware of this and you probably need to strip all dot characters from the local-part, along with + and any text after it, before doing your uniqueness check. Currently django-registration doesn’t do this, but I have plans to add it in the 3.x series.

Sorry what? That seems pretty unneccessary. A third party system to dictate how a third party system handles it local alias system for emails? I can't see any benefit to that.

Whether a mail server handles '+' in a standard way is not guaranteed, and surely it is up to the user how they use that feature if enabled.

lentil_soup · on Feb 12, 2018

Absolutely this. It's not even a standards issue, for the outside world they're completely different email addresses. The fact that they happen to land on the same inbox for some providers is unrelated, it's just how that particular provider decided to handle all your different email accounts.

dalore · on Feb 15, 2018

Yes! And if you have your own domain, you can just setup a *@example.com to be your one email address. And then you can give a different email address out anyway even without needing to go +. As you said each system can treat it differently.

sosilkj · on Feb 12, 2018

Agreed.

From RFC 2822: "The local-part portion is a domain dependent string."

Headaches await you if your code is making a lot of assumptions about how the originating domain manages that local-part portion.

oakesm9 · on Feb 12, 2018

Apple ID's make assumptions (they strip out the + and everything after it) and it's made it a nightmare to use iTunes Connect. Their multiple account support is still terrible so it's easiest to have an Apple ID per iTunes Connect account. Now I need to setup actual aliases in my GSuite account...

yxhuvud · on Feb 12, 2018

Stripping out things past the + will also break things for users that intentionally are creating more than one signup. This seems like a really bad idea.

jelder · on Feb 13, 2018

Agree. You're going to regret doing this when you try to test it yourself, with your own Gmail (or GSuite) account.

LambdaComplex · on Feb 12, 2018

That's the entire point, no?

hueving · on Feb 12, 2018

Why do you want to detect the difference between a user signing up with one email account twice or two different accounts? It's folly to try to detect this in this manner and you will only piss off regular users.

chii · on Feb 13, 2018

The service that's trying to strip the +'s are trying to prevent users from using their (usually free) service in a way that will cost them more money, but never convert to and profit.

i understand why they do it, but i can't condone it off course.

The service should just allow it, and make sure the appropriated subscribing fee.

rootkea · on Feb 13, 2018

> The service that's trying to strip the +'s are trying to prevent users from using their (usually free) service in a way that will cost them more money, but never convert to and profit.

I can always create multiple email accounts, you know. So I don't see how spending development efforts to parse and detect + and . is effective.

shkkmo · on Feb 12, 2018

It's a pointless and stupid point.

It is extremely easy to setup extra email accounts, so preventing people from doing so with slightly less work is pointless.

It is stupid because it could prevent someone from using their real, valid email address because it matches another, different, valid email address.

detaro · on Feb 12, 2018

Agreed. The + thing probably is "only" unfriendly to powerusers, but stripping dots can easily get in the way. Several of my e-mail addresses have "duplicates" that just differ in a dot and are owned by entirely different people.

kevin_b_er · on Feb 12, 2018

> so it’s impossible without doing DNS lookups to figure out whether someone’s mail provider actually thinks johndoe and john.doe are distinct.

The author has announced they believe them to intentionally indistinct and has announce an intent to break handling for any mail servers that consider them unique. All on account of gmail ignoring them?

This coupled with their author's intent around + makes me loathe this behavior.

Glyptodon · on Feb 12, 2018

I was with him most of the way, but the advice about emails seems very user hostile to me.

adrianratnapala · on Feb 12, 2018

It is worse than unnecessary. If people start irrationally doing things like recognizing the '+' then that creates facts on the ground which rational reasons for others to have to recognize the '+' -- except nobody really knows why anyone is doing it or what semantics anyone else is using. Thus the whole thing evolves into mess.

So please don't do that. Respect the bytes that be!

pfooti · on Feb 12, 2018

yeah, I actually use google's plus-email features to help me generate test accounts on some sites or to attribute mail spam to other sites.

dharmab · on Feb 12, 2018

If you don't do this your system may allow users to sign up for multiple accounts and double-dip on signup benefits, free trials and other "one per account" features. (Barring additional controls, of course).

Coryodaniel · on Feb 12, 2018

You’re gonna be sad to know that mailinator[1] and it’s 100s of domain aliases exist.

You’re fighting the wrong battle.

[1] https://www.mailinator.com/

jcomis · on Feb 12, 2018

Some places check for these types of services though

zingmars · on Feb 12, 2018

They do, but they're not very effective. They might have an up-to-date list of domains belonging to the big services (i.e. Mailinator), but you can always just google "Disposable e-mail", open the 10th page and pick the first one, and it's pretty much always guaranteed to work.

jstarfish · on Feb 12, 2018

I beg to differ. I use this blacklist and not much gets by. Anything that does, I blacklist manually.

https://github.com/martenson/disposable-email-domains/blob/m...

A lot of sites I've come across lately are going back to the old-school way of only whitelisting email addresses from .edu domains or ISP accounts ("@comcast.net").

antihero · on Feb 12, 2018

That's an absolutely ridiculous idea and a good way to destroy the amount of users you're getting. Do you whitelist Gmail? Then it's piss simple to sign up to multiple accounts. Do you not? Good luck a massive portion of the internet to sign up...

jstarfish · on Feb 13, 2018

I don't want the sort of users who are signing up to perform financial transactions using services like mailinator and guerrillamail. It's always fraudulent.

Gmail itself is allowed, but the addresses are normalized. I'm well aware of its potential for abuse.

Coryodaniel · on Feb 13, 2018

I have infinity. I use Gmail for business with a catchall rule that forwards to the one address in my domain.

I have an email address for every site I sign up for.

Helps you figure out who sells your email.

How do you handle this? Or do you not?

I guess you could do a DNS lookup of the mail exchange records and see if it points back to gmail and compile a list of domains to allow one account from... but then that would break many companies emails. That’s no fun.

Whitestrake · on Feb 15, 2018

I use Fastmail in exactly the same manner (catch-all, I use "yourdomain.example@mydomain.example" as the email address for signups). Also, anyone could put a cheap domain name on Migadu with a catch-all for this purpose.

Services need a better way to figure out fraudulent or abusive behaviour than guessing based on the email account's domain name.

TheDong · on Feb 13, 2018

I think what the parent meant was that creating multiple gmail accounts is also piss-simple.

I personally own over 100 from a couple years back when recapta was easily automateable and the phone-number requirement wasn't there.

WhyNotHugo · on Feb 13, 2018

> only whitelisting email addresses from .edu domains or ISP accounts ("@comcast.net").

I haven't seen ISPs give out emails addresses like that in years (I now my last couple of ISPs have no such thing).

So you're basically limiting yourself to people from universities, or who've had an old-school ISP for a while.

dragonwriter · on Feb 13, 2018

I've yet to see a fixed ISP that doesn't give out email addresses; though I very rarely meet anybody who used them as anything other than a fallback for their webmail addresses.

Mobile carriers even still seem to, though they are optional and require an additional setup step.

aeruder · on Feb 12, 2018

Or they could just sign up at one of the numerous free email providers with a different username. Stripping the + suffixes is only providing one thing - pain for the users that want to use it.

reaperducer · on Feb 12, 2018

This is true. I know someone who says she's been doing the Hulu free month thing for a couple of years, and does it for other services as well.

It's all too much for me to keep track of, but for some people it's no big deal to create new e-mail addresses every month.

slhck · on Feb 12, 2018

The author suggests stripping the part before doing the uniqueness check. This does not mean that the username (email address in this case) would not be allowed.

bdhess · on Feb 12, 2018

It would for the second user whose address resolves to the same result from the “uniqueness” check.

coredog64 · on Feb 12, 2018

I wish this would happen. There's a "rocket surgeon" on the East Coast who has (tried) to sign up for Facebook, Twitter, Steam, and a bunch of sleazy 'message gurlz now' apps using my Gmail address without the period.

Obviously it never works, as I get the "I see you're trying to create a new account" email, but one of these days he's going to figure out a way to take over one of those accounts and then I'll really be fked.

bdhess · on Feb 12, 2018

I don’t understand your complaint. Google resolves the addresses john.smith@gmail.com and johnsmith@gmail.com to the same account, which you control.

(1.) What are you imagining is the attack vector exactly?

(2.) Are you asserting that all website owners should build to Google’s (non-standard) behavior?

reaperducer · on Feb 12, 2018

I have a similar problem with someone who keeps (hopefully accidentally) putting my landline number into Facebook, then I get a call with a recording asking me to press some number to verify my Facebook account.

joshmanders · on Feb 12, 2018

I'm curious as to who would use a service someone else is using, sharing their email but with a local addition?

Can me and my wife both sign up to HN and use my email but hers be josh+swife@joshmanders.com and mine be josh@joshmanders.com?

That's a strange usecase, isn't it?

pinkythepig · on Feb 12, 2018

gmail makes josh@... and josh+swife@... resolve to the same thing, but there is no guarantee that all other email services behave like that. For all you know, there is an email service that lets you register an email like that, so you have 2 users now whose email is: john+smith@... and john+brown@...

sodapopcan · on Feb 12, 2018

It's way easier to write a script to generate thousands of variations on the same email address than to sign up for a thousand different accounts. I've actually been bitten by this bug before... or rather, my company was bitten by an affiliate who neglected to sanitize their emails this way and someone was able to create thousands of gift cards in our system.

Having said that, in development, it's super nice to be able to create addresses with +'s in them.

pzxc · on Feb 12, 2018

What you say is not untrue, but it's still bad advice to do it -- a security red herring. First of all, you don't know that 100% of mail servers ignore characters after the +, so you can't safely strip those characters or you might not end up with a usable email address. That goes double for stripping the dots/periods, which gmail ignores but many other mail servers do not.

On top of that, it's just as easy to set up a catchall email address -- an email box that accepts all mail for a domain, literally anything@mydomain.com. So a malicious actor could sidestep this security attempt with minimal effort, but it still inconveniences legitimate users despite being worthless from a security perspective.

sodapopcan · on Feb 12, 2018

True, true. As I mentioned below, in my case, it was even usernames, just entering you email for a free gift card. The attacker actually used dots with a gmail address.

shkkmo · on Feb 12, 2018

There are soooo many ways to easily game the email side of it that you would be better off using other means of detecting uniqueness (rate limit per IP address, rate limit per hash of IP address and user-agent)

Nadya · on Feb 12, 2018

>It's way easier to write a script to generate thousands of variations on the same email address than to sign up for a thousand different accounts.

It's just as easy to write a script to use ephemeral hosts that you don't need to sign up for. Things like Mailinator.

All it does is irritate people like me who use +words as prefilters for email (and to see which companies are selling my email/user data).

sodapopcan · on Feb 12, 2018

Fair enough! In my case it wasn't actually usernames, just entering an email address through a phone company for a free gift card from my old company so yeah, my point is moot.

lukeschlather · on Feb 12, 2018

That's all fine, but except in pretty specific circumstances you're going to have valid reasons to want multiple accounts for a single email address. Kind of a crazy scale issue, but one example is wanting your AWS Account to be separate from your Amazon Retail account, even though they use the same underlying account store it's a good idea to use separate accounts even if they're tied to the same email.

eli · on Feb 12, 2018

Well, that's always a risk. The thing this guards against is someone accidentally creating two accounts with variations on the same address and being very confused.

chii · on Feb 13, 2018

i recall Amazon did something quite amazing, which is to suggest to the user who signed up twice whether an existing account is the same. but only after they've purchased!

tjoff · on Feb 12, 2018

If you as a service provider can't accept that you shouldn't be offering those free trials / benefits at all. This isn't an issue.

bradleybuda · on Feb 12, 2018

> So if you’re enforcing unique email addresses, or using email addresses as a user identifier, you need to be aware of this and you probably need to strip all dot characters from the local-part, along with + and any text after it, before doing your uniqueness check. Currently django-registration doesn’t do this, but I have plans to add it in the 3.x series.

Please don't "normalize" email addresses like this. Not all mail systems are Gmail, and many do treat "john.doe@example.com" and "johndoe@example.com" as different identities. And even if we are talking about Gmail - it's not your identity system's job to deduplicate different logical addresses for the same physical inbox.

jcranmer · on Feb 12, 2018

To emphasize your point: don't touch email addresses. You can get away with doing equality checks on NFKC case folding, but don't assume that you can store a lowercased email address and have it work properly.

Lots of weird email systems exist. Don't assume that everybody works like gmail. And do test that things work right with uppercase letters in email addresses: I've been locked out of systems before because I use an uppercase letter in my email address and one half of the system was trying to match the lower cased version to the actual text.

inopinatus · on Feb 12, 2018

The local part (left of the @) of email addresses is case sensitive by definition and this should not be controversial.

RFC5321 s2.4:

   The local-part of a mailbox MUST BE treated as case sensitive.
   Therefore, SMTP implementations MUST take care to preserve the case
   of mailbox local-parts.  In particular, for some hosts, the user
   "smith" is different from the user "Smith".

Even though it goes on to say:

   However, exploiting the case sensitivity of mailbox local-parts impedes
   interoperability and is discouraged.

This doesn't prohibit a local delivery agent (such as Gmail) consolidating multiple variants into one mailbox, but everything up until that point must refrain from making assumptions.

craigds · on Feb 12, 2018

That's definitely controversial, and in my experience, just plain wrong. Many, many users sign up to a service with john.doe@example.com and get confused when they later try to log in with John.Doe@example.com. we had to add a fix for this in our django user model a couple years back. We havent had any complaints with the new, case insensitive system. IMHO this sort of this should be upstreamed but it seems Django have decided to stick with the spec in spite of usability.

jcranmer · on Feb 12, 2018

As you clarify later, both case-sensitive and case-insensitive are wrong. What you generally want is case-preserving: store the original case, but allow case-insensitive matches to select it.

chatmasta · on Feb 12, 2018

This is an especially common problem when entering email addresses on mobile. Often autocorrect will capitalize the first letter. I always make sure it’s lowercase when registering, but I can see how this could easily lead to confusion.

lsaferite · on Feb 12, 2018

Web pages (and apps) that don't indicate the input box is actually an email address so auto-correct won't capitalize for you are a serious peev of mine.

craigds · on Feb 12, 2018

Oh also, we used a CITEXT field for storing email addresses. Store the case the user registers with, and use that for sending comms etc, but uniquify and match (on login etc) in a case insensitive way.

pluma · on Feb 12, 2018

I ran into this problem as well and in the end added a separate field for the normalized (lowercase) form with a uniqueness check on that. Users would still be presented the original case and it would be preserved when sending e-mails but for identification/uniqueness purposes e-mails were considered case insensitive.

papito · on Feb 17, 2018

Right, something like "email_lc", and "username_lc". This would be your unique index.

ubernostrum · on Feb 12, 2018

Unfortunately, Gmail rules the world and has trained people to expect that they can be sloppy and inconsistent and enter their email address as 'johndoe' or 'john.doe' or 'JohnDoe' or 'John.Doe'... etc. and it just works, I don't understand why your site is broken, because my email works!

And when that happens, trying to patiently pull a "well, technically" and explain to them about RFC this and the specs say that is a way to lose users.

(I actually have extremely strong feelings about email, email addresses and the whole associated mess of specs, but had to tone it down for this article since it was mostly about the various traps you can wander into from naïvely thinking that you can just read a spec or implement something obvious and get away with it)

kqr · on Feb 12, 2018

> And when that happens, trying to patiently pull a "well, technically" and explain to them about RFC this and the specs say that is a way to lose users.

Not in my experience. Showing compassion and agreeing with them that what happened is terrible and you wish things were different but you didn't call these shots back in the days and if they want improvement you can both go together and complain to google, the service which is actually broken.

Most people just want to be listened to. If you can do that, you'll earn a loyal fan, even if you don't do exactly what they tell you to when agitated. Some will even appreciate learning more about the email systems after the fact. They may even get the feeling you went above and beyond by offering to help them with matters outside of your site.

ubernostrum · on Feb 12, 2018

Most people just want to be listened to.

Many people, even after being listened to, and even after having things patiently explained to them, still continue to enter someone else's email address into forms which will send sensitive information to that email address, and complain that they never got their important email, or that some "hacker" has "hacked" "their" email, etc.

In a perfect world this would not happen. We don't live in a perfect world and are unlikely to live in a perfect world any time soon, so we should not be asking "how can we be pedantic and tell users it's their fault for not reading the RFCs", we should be asking "how can we protect users from their ignorance of the RFCs".

(when I wrote this article, I did not expect that this would be the single most controversial line in it from HN's perspective, but I guess by this point I should have anticipated it)

jorams · on Feb 12, 2018

> has trained people to expect that they can be sloppy and inconsistent and enter their email address as 'johndoe' or 'john.doe'

Has it? In my experience, even if people know dots and the part after the + don't matter for their Gmail account (and most don't) they know it doesn't matter to Gmail, a peculiarity they can use to create multiple accounts on a single website without creating new email accounts.

(This is different from case insensitivity, which the majority of popular email providers seem to implement.)

spc476 · on Feb 12, 2018

And I constantly get email for other people with my name at gmail.com. Let's see ... I'm married, and I have a few girlfriends, I run marathons, and I've been to several family reunions. Oh, and I'm in the process of buying several different cars. Wheeee!

ajdlinux · on Feb 12, 2018

I suppose the tradeoff here is when you get two non-Gmail users, johndoe and john.doe, who are legitimately different people, one of whom gets very confused and is actually unable to use their correct address.

On the one hand, that's a (presumably extremely) rare corner case - on the other hand, some applications must handle those corner cases.

spc476 · on Feb 12, 2018

Then I must be lucky. https://news.ycombinator.com/item?id=16357271

kbenson · on Feb 12, 2018

People where trained for that long before Gmail even existed. Email systems have generally followed Postel's law, since it makes life easier for all parties.

Do you want to run a support system where when you ask for people's username or email address, you also have to ask them for their casing (and in the case that their problem is that they didn't know it mattered, they may not know what they signed up for)?

I worked at an ISP for many years. Even though it was all backed by Linux, usernames (which included email addresses) were considered case insensitive for the purpose of the service (all usernames were lowercase). It solves so many problems and the downsides are so small all the sane email providers did it.

The flip side of this is that it was a simpler time. Usernames and email addresses were ASCII, not Unicode. These days with Unicode, you can't even be sure that uppercasing and then lowercasing an already lowercased string yields the same characters.

neltnerb · on Feb 12, 2018

Oh, I assumed the recommendation was to store the actual email address as entered, and then also generate the fingerprint to compare for similarity. Wouldn't you just keep both?

mhandley · on Feb 12, 2018

The problem is that jo.elane@example.com and joe.lane@example.com can be different people, or jan.eton@example.com and jane.ton@example.com. On gmail, they can't, but on my university email system for example, this is the default naming convension for circa 50,000 people. If you consider them to be the same, one of them can't sign up to your service. How often this happens? Probably not all that often. But if you run a popular service, it will bite a few people.

caf · on Feb 12, 2018

Yes. The answer to the question posed in the article is actually simple - there are 3 unique email addresses listed. The fact that email addressed to some of them may end up in the same box is neither here nor there.

Alex3917 · on Feb 12, 2018

> The fact that email addressed to some of them may end up in the same box is neither here nor there.

It's not an issue if they're the same, the issue is if they're different. E.g. if I'm storing bitcoin on an exchange (I know) behind the email@domain.com, then I don't want someone else to be able to register Email@domain.com and then start looking for bugs in the service or start trying to socially engineer customer support. (The local part of the email address is case sensitive as per the spec.)

ubernostrum · on Feb 12, 2018

You hit the nail on the head.

People can hem and haw about the specs, but at the end of the day Gmail trained most of the world to believe email addresses are case-insensitive, dots don't matter, etc., and now we have to live with the consequences. If that means somebody can't sign up for twelve accounts using case and dot variations of their Gmail, well, so be it. And if that means they come to HN to rant about how that awful site didn't follow the RFCs, then they come to HN to rant about that, but their account will be safer in spite of it.

slavik81 · on Feb 12, 2018

Maybe. But, if I was legitimately john.doe@example.com and you refused to allow me to register an account just because somebody else had a similar email, I'd be pretty annoyed. There's a good chance I'd take my business elsewhere.

I've only once had my email address rejected by a website (for ending with an underscore), but I never bothered setting up another email just for them.

ubernostrum · on Feb 12, 2018

And if the call-center person decided that 'johndoe@example.com' should be able to reset the password for 'john.doe@example.com' because they read a lifehack article about how dots in email addresses don't matter, you'd take your business elsewhere.

No matter what, there are reasonable hypotheticals where you get angry and take your business elsewhere. The difference is my approach has you leave because you're angry at the signup page, and your approach has you leave because someone stole your stuff. I'll take my approach any day of the week.

slavik81 · on Feb 12, 2018

Given that we're developers and we make the tools that the help desk people use, why would we make a help-desk password reset that can send to a different address than the one registered with the account? Couldn't there be other social engineering tricks people could pull on your help desk employees that you haven't yet imagined? Maybe they also think it's ok to drop the underscore from my email address, despite that not being a part of the gmail scheme.

baystep · on Feb 12, 2018

What about the "I've lost that email account, can I switch it to another to recover my account?" case. Or the "I'm the legal guardian of this person and need the account control switched to my personal email", or in the business world "this employee doesn't work here anymore and as administrator I would like the account transferred to me" cases. Locking customer support systems down tightly has it's own pros/cons.

slavik81 · on Feb 12, 2018

I suppose I was more specific than I really should have been. More broadly, I'm trying to say that you have control over the tools and processes followed by your customer service. They can be used to combat social engineering.

For something as important as the credentials for a bitcoin exchange account, as Alex gave as his example, there should be policies specifying the reasons why account credentials can be changed and what evidence must be presented to do so. Front-line customer service reps shouldn't be flying by the seat of their pants when making difficult decisions with potentially hundreds of thousands of dollars on the line.

Alex3917 · on Feb 12, 2018

What happens when someone calls the CS person and tells them to type in their email address instead of copy pasting it or whatever. If there are any bugs at all in the CS software then it won’t be hard for the CS person to believe there is a bug they need to work around similar to the other bugs that are already in their dashboard.

The point of social engineering attacks is that they’re innocuous requests that don’t raise suspicion, and are hard to train people against.

ubernostrum · on Feb 12, 2018

OK. You go get every single company, site and service on earth to deploy a 100% perfect credential-recovery system and only have it used by 100% perfect people who never ever make a mistake. And when you've finished, let me know and I'll rethink my approach to email.

slavik81 · on Feb 12, 2018

You don't need to make a system that's perfect. You just need to make a system in which checks are required to make _any_ change the credentials on an account. Once those checks are complete, it makes no difference if the change is from john.doe@example.com to johndoe@example.com or to jonathan.doe@example.com.

Alex3917 · on Feb 12, 2018

It’s not just password resets that are the issue. It’s also things like:

- Functions to get the account based on the email address

- Internal tools

- Stored procedures and other SQL stuff that happens outside the main code base

- Third-part integrations (Mailgun, Sailthru, ZenDesk, SalesForce, etc.)

That’s a huge attack surface where if there is even a minor mistake by a junior dev that no one noticed then everyone is going to lose their assets under protection.

Doctor_Fegg · on Feb 12, 2018

> Gmail trained most of the world

At least here in the UK, gmail has high reach among techies but is very much a minority provider for normal people. Looking through the 3000ish registered accounts on the local community website I run, Hotmail, Yahoo, and particularly ISPs (btinternet.com, plus.net, sky.com) are all more common than gmail.

quickthrower2 · on Feb 12, 2018

the best way is to err on the side of allowing a single email inbox to create multiple accounts.

Who cares if they do? It's not like that person couldn't create multiple email addresses anyway. E.g. abc@gmail.com and def@gmail.com can be the same person. You can't validate for that, other than other "gamification" systems (e.g. how HN treats new users - disincentivizes creating multiple accounts because why be limited on your second one when you have a good first one).

by the way I have used the + trick on google to sign up for a service (and pay for it!) that wouldn't let me reuse my old account for some reason. So their relaxed validation made them money.

zbentley · on Feb 12, 2018

It's a difference in difficulty. More to the point, it's a difference in asymmetry of difficulty.

Say you run an online store. You offer USD $5 in first-time user credit.

A lot of people (even some nontechies I know) know about the "+" trick for gmail. Assuming your signup flow is easy and fast, it's very easy for those people to sign up for multiple accounts and get multiple $5 credits.

If a lot of people do that, it might significantly impact your bottom line. Not just because you have to give away a lot of inventory, but secondary effects also suck: you stop the $5-free promotional, and then all of the legitimate users who signed up during the campaign who told their friends to sign up and get some free money now have their friends bad-mouthing the site to them because "it didn't give me free shit the way you told me I should expect it to!". You might see a drop in sign-ups to below pre-promotional levels, or, worse, you might see people who signed up during the promotional trust and use your site less. I know trends/behaviors like this seem trivial--and they definitely would only affect a minority of users--but past a certain scale effects like those can have a real financial impact.

Now let's say your site is "smart" about the "+" trick and doesn't let people with gmail (or google-federated emails--boy, is figuring that out a bastard) accounts sign up multiple times. You'll lose the dubious potential business of folks who like gaming promotionals. You'll still be vulnerable to people creating second email accounts and signing up using those--but the difficulty asymmetry now favors you, the vendor: it's work for a user to make a second email account; work they probably won't do, especially if you blacklist typical temp email services like guerillamail. If the promotional is large enough to entice first-time users but small enough to deter people from doing this, you have succeeded in minimizing your loss. If the user already has a fleet of accounts for this purpose they're probably going to just take your money anyway.

Of course, there's another more annoying scenario which I'll mention because sites should never do it: sites that think they're being smart about the "+" trick by not storing the part of the address between the plus and the domain part. This is usually done to get email campaigns (read: almost entirely spam) to show up in the user's main mailbox rather than some filter-purgatory. It will drive users away in two main ways: first, if I'm technical enough to use the "+" trick and a filter to route mail, I probably have enough obsessive annoyance with spam to immediately either junk-flag yours or delete my account, compared to a small chance I would have actually read it otherwise. Second, more than half of sites that do this which I've audited parse and strip the content of the email address wrongly (parsing emails is very hard, after all) in such a way that what they end up storing could be a totally different person's email, or an invalid one. That means signups just won't work. Whatever you do, don't do this.

This doesn't only apply to first-time-user promotionals, either. It also applies to:

- Referral bonus programs.

- Services that give hand-customized products to users (think a "one per user" etsy store with one person knitting cat dolls or something): multiple similar contacts from the same user would mean you spend a lot of time making their products--time which might be wasted, even if they paid for their products, compared with time spent making them for lots of different users and increasing your recognition/exposure.

- The same applies to tech-support contact-us forms: one user can "bogart" your support staff, clogging the queue with (legitimate or not) requests and defeat your rate limiters by using the "+" trick, making other users wait a long time for replies.

- Others I haven't thought of.

lorenzfx · on Feb 12, 2018

You can still buy your own domain and get as many new email addresses which pass your test as you want. So you probably need another solution to your problem.

What I noticed what several (large) services these days do, is ask for your credit card number on signup (even if they promise not to bill you).

zbentley · on Feb 13, 2018

You could. But most people won't. The ones that do this will game your system regardless; they have the time and resources to. It's about making it not worth it to buy a domain and set up MX forwarding/gmail federation or whatever, just for $5.

cestith · on Feb 12, 2018

Many email systems have an idea of a catchall account. I can have a domain for which cestith@mydomain goes into one account and anything else @mydomain goes into a second account.

quickthrower2 · on Feb 12, 2018

Stores that offer $5 credit usually don't sell things that cost $5. So it is nothing but a $5 discount code really.

If they are really giving away credit for free that can be used wholly then they probably need to verify your identity e.g. a $0 credit card authorisation or something.

nodesocket · on Feb 12, 2018

Completely agree. If somebody wants to signup for your service as ceo@company.com and ceo+sales@company.com let them. Two distinct user accounts.

nzjrs · on Feb 12, 2018

Dear sir. I enjoyed your free trial. May I have+1 another?

Hextinium · on Feb 12, 2018

If you wanted to continuously have free trials it is trivially easy to just create entirely new email adresses

protopeer · on Feb 13, 2018

yeah, i paid less than a dollar for a super cheap domain, and created a catch-all email forwarder that lets me create any random email forwarding address that all forwards to my one email. i've used this method in the past to get unlimited free things at certain establishments that offer welcome gifts to new rewards accounts.

zbentley · on Feb 12, 2018

I replied to this same idea above: https://news.ycombinator.com/item?id=16358625

hungerstrike · on Feb 12, 2018

Dear user. Sure. Enjoy losing all of your preferences and data every 14 days.

tbabb · on Feb 12, 2018

My thought would be to have two distinct fields: "normalized" and "pristine" version of the user's email. Then always mail to exactly what the user typed ("pristine"), and use the "normalized" mail to prevent multiple registrations to "john.doe@gmail" and "johndoe@gmail" and "johndoe+1@gmail" and "JOHNDOE@gmail.com" and so on.

Denying registration to a suspected-duplicate seems a lot safer than mailing to a different person.

__david__ · on Feb 12, 2018

If you're going to do that then only do it for email addresses where you understand the local MTA conventions (like gmail). Do not do it for domains you don't know (because you cannot predict how they work).

For instance, on my email domain, david+abc@example.com, david.def@example.com, and david_ghi@example.com are all the same ('+', '.' and '_' all act like you'd expect '+' to), but david+xyz@example.com is not (it get's picked off and aliased somewhere else). Applying gmail conventions to other domains is silly and wrong.

sarreph · on Feb 12, 2018

It does surprise me how many big SAAS products with generous free trials don't check for '+'s on Gmail addresses...

pawelk · on Feb 12, 2018

You can still do a check for the normalized address for known sevices that allow moving the dots and adding a +somthing (e.g. to catch someone creating multiple fake accounts). Just be decent about it and store the user-provided address for communication purposes.

Alex3917 · on Feb 12, 2018

> it's not your identity system's job to deduplicate different logical addresses for the same physical inbox.

The problem is that then you're adding potential attack vectors to every single web app just to cater to the .01% of email clients that insist on implementing the email RFC exactly to spec. Not deduplicating email addresses creates an attack surface for both hard-to-spot technical issues and also social engineering attacks. E.g. what if your ESP at some point adds deduplication on their end, either mistakenly or on purpose, then suddenly you're sending password reset requests to the wrong users.

I think you should normalize email addresses to enforce account uniqueness, both for security purposes and usability, as long as you also store a second copy of the email address exactly as the user entered it and only send email to the latter version.

djsumdog · on Feb 12, 2018

At most you're allowing duplicate accounts to the same e-mail address. If you want to enforce 1 to 1 identities with e-mail accounts, you could have an issue. But in general, it shouldn't be a security concern.

Gracana · on Feb 12, 2018

> And while I could write this as one of those “falsehoods programmers believe about X” articles, my personal preference is to actually explain why this is trickier than people think, and offer some advice on how to deal with it, rather than just provide mockery with no useful context.

Thank you, so much.

I just wanted to highlight that for anyone who looked at the comments to decide whether or not to read the article.

ch4s3 · on Feb 12, 2018

Yeah, this is a really good article. I really wish more auth systems made use of the tripartite identity pattern.

jedberg · on Feb 12, 2018

In my 20 years of experience validating email addresses, I've found one thing that works every time without fail:

Send email to it

That is literally the only way to validate an email address. There is no regular expression or algorithm that can validate and/or deduplicate an email address.

You must simply treat every email as unique until you send an email to it and that person proves otherwise.

That being said, this article brings up a lot of important things about confusables that everyone should definitely be aware of, especially if you're going to have public identities.

gitgud · on Feb 12, 2018

How does sending email to it prove it's not a duplicate? For example:

    john.doe@gmail.com

    johndoe@gmail.com

Both resolve to the same email at the users end.

jedberg · on Feb 12, 2018

Exactly, and then the consumer clicks a link to verify their email and they have a second account, and it's up to them to put them together, or up to you to see that they already have a login cookie and combine the accounts.

Sohcahtoa82 · on Feb 14, 2018

A user attempting to game a system like that could easily just delete their cookie or use browse in incognito mode.

Spooks · on Feb 12, 2018

well with gmail they are the same email. But with other mail applications, those could be two different email addresses. You are going to need to test all the different email services and find out which do this and which do not do this... which is no easy task

gitgud · on Feb 13, 2018

Exactly, so you're better off not accounting for all these special cases. As there's a never-ending list of email providers.

A better approach might be to limit the effect that a new account can have in the system. Hackernews, Reddit, Stackoverflow all do this through reputation. A new account on these systems is unable to achieve much until time is spent proving the account is being used. Thus reducing the incentive for an individual to create multiple accounts.

TheDong · on Feb 13, 2018

> which is no easy task

Which is impossible; a number of people run their own email or use a small friend-run email server, and you can't possible discern the delivery rules from the outside.

ubernostrum · on Feb 13, 2018

number of people run their own

Yes. A small number is a number.

If we could go back in time and force every single implementer to follow every relevant RFC to utter perfection (and make sure all the RFCs were perfectly unambiguous), I'd be more sympathetic.

But email is fucked. The sheer number of oddball things, hacks, workarounds, deviations and other bits of mess that implementers have engaged in over the years means the RFCs should be treated as at best a loose hopeful outline of how email might in theory work.

Spivak · on Feb 12, 2018

> So if you’re enforcing unique email addresses, or using email addresses as a user identifier, you need to be aware of this and you probably need to strip all dot characters from the local-part, along with + and any text after it, before doing your uniqueness check. Currently django-registration doesn’t do this, but I have plans to add it in the 3.x series.

Isn't this a really dangerous game to play? Just because some major MTA's assign a class of addresses to each user doesn't make each member of that class not a unique identifier in general. Is it worth the headache to maintain a list of various email systems' policies rather than just treating them all as unique?

jcranmer · on Feb 12, 2018

The email RFCs explicitly say thou shalt not interpret the localpart of an email address, unless thou art the MTA of the domain in question. Even case folding is forbidden. And the wisdom of people who work with email is... the RFCs have good advice here: don't assume anything about how the localpart is structured.

You can generally get away with treating the names as case-preserving (as distinct from case-insensitivity), and you are probably safe in rejecting quoted localparts. But beyond that, even forcibly lowercasing email addresses, is likely to cause problems.

ubernostrum · on Feb 12, 2018

Like it or not, the RFCs have lost. "How Gmail does it" is now how email works in the minds of a stupendous number of people. So if Google says 'johndoe' and 'john.doe' are the same, we're stuck with the reality that 'johndoe' and 'john.doe' are the same.

fiddlerwoaroof · on Feb 12, 2018

It’s completely legitimate to use variations on an email address for different accounts on the same website.

Also, it’s useful to make use of +foo or varied usages of dots to create a unique email address for each site: for one thing, it’ll help if one site leaks your email address, then it’ll let you trace the origin of the leak if that email address gets unwanted email.

Finally attempting to deduplicate email addresses before authentication is almost as bad as lowercasing the password before checking if it matches.

ubernostrum · on Feb 12, 2018

There's a line in the Zen of Python: "practicality beats purity". If I can avoid someone filing a bug or a support request by knowing that Gmail has trained people to believe a bunch of distinct (according to RFC) mailboxes actually aren't distinct, I'm going to avoid the support request. The -- by comparison -- minuscule set of users who A) actually understand the relevant specs and B) care enough to yell at me in an HN comment are going to lose that battle every time.

fiddlerwoaroof · on Feb 12, 2018

> Gmail has trained people to believe a bunch of distinct (according to RFC) mailboxes actually aren't distinct

I'd be fairly surprised if your average user of gmail knew this: I know it and I use it in part because it lets me _distinguish_ different accounts on the same site. Second-guessing someone who's taking advantage of this feature is more likely to generate tech support requests than not.

zbentley · on Feb 12, 2018

Knowledge of the plus trick has gotten pretty widespread. Anecdotally, I know a lot of non-technical people that use a single +spam address to route to a spam folder.

Non-anecdotally, articles with large numbers of views/comments about the trick can be found with a quick Google search on non-techie sites like NYT/HuffPost/BusinessInsider/Buzzfeed/Pinterest/etc. Not that those are definitive, but I think knowledge of this is more widespread than you think.

notriddle · on Feb 12, 2018

If you know about gmail plus and dot addressing, and use it for verifying uniqueness, then I'll understand. I'll also probably just use mailinator to make the second and third accounts anyway, but whatever.

If you actually strip the dots and plusses from my email, and start sending stuff to my main address, then I will mark your messages as spam. You need to store the normalized and non-normalized versions of the address. Actually, you need to do this for normalizing on usernames anyway, to make sure you don't mutilate people's Arabic names or anything (Unicode-normalized cursive looks really bad; you need to preserve the original version, while keeping the normalized version around specifically for uniqueness checking).

smichel17 · on Feb 12, 2018

Without questioning this line of thought, it seems like deduplicating by lowercasing and perhaps removing dots is a good choice, but stripping +suffixes seems likely to generate more user annoyance than it prevents. If I filter based on those suffixes and you send me mail and strip the suffix, I'm going to be pissed.

zbentley · on Feb 12, 2018

> attempting to deduplicate email addresses before authentication is almost as bad as lowercasing the password before checking if it matches.

I think it's not even close. You have to transmit the content of the email address to the server, since you might need to email the person. Whether you validate/sanitize/perform voodoo on it there is up to you.

You don't have to transmit the password (because one way hashing), and should never do so.

leni536 · on Feb 12, 2018

Gmail doesn't break the RFC here. They just assign multiple email addresses to an unified mailbox.

LaGrange · on Feb 12, 2018

As long as there's one big school where different dots point to different people, no, it's not practical to strip the dots. Your support problem with asking a user to use the same gmail address they used to register is a lot less awkward than if the user can't use a different one.

zbentley · on Feb 12, 2018

So only perform normalization on emails whose domains are known to route "+"-trick emails to the same mailbox. Even if you just do this on @gmail.com, it removes a big swath of users that could abuse your promotionals and waste your time with multiple pseudo-accounts.

A harder question is what you should assume about people who run a "+"-tricky email service on their own domain (e.g. federated gmail) and who later switch to using a service that isn't "+"-tricky (e.g. federated gmail user switches to running their own mail server). What's your default policy: default-allow or default-deny? I suspect the answer will have to do more with the amount of potential revenue lost due to such users' likelihood to abuse the plus trick, and less about the technicals of how to address it.

LaGrange · on Feb 13, 2018

Or remove incentives for having multiple accounts in the first place (even multiple public identities should be something handled by your system without need for re-log), and stop messing with emails. On one hand, even your assumption that a shared inbox implies one person is wrong (I don't like it when people share inboxes and consider it toxic, but it is what it is), and your mitigations are futile (it's between cheap and free to just have an entire domain point to your one inbox - your 'federated gmail' is easily such a case).

pc86 · on Feb 12, 2018

> Many systems ask the username to fulfill all three of these roles, which is probably wrong.

What system with any non-trivial level of use uses the text username as (1) the FK in the database, as opposed to the generated or auto-incremented ID in the db; (2) the login name; and (3) the publicly-displayed displayed "name" of the user for others to see?

Plenty of forums etc use the login name for #2 and #3, and I'm not convinced by this article that that's the wrong way to do it. I haven't ever seen a single professional product that uses the text username that a user logs in with as the actual DB-level foreign key. That's grade school level database design.

cjslep · on Feb 12, 2018

When logging in, how do you get that autoincremented ID column? Some more complex variant of "SELECT *.id WHERE username = $1". So functionally, yes the username is the root identifier that pulls the very first record that then allows other joins to occur. But you are right, the username column is not literally the key being joined on.

There is also the security issue that by having the login name also be the publicly displayed name lowers the bar for attempting to make a targeted attack on the site, as well as other sites where the attacker suspects the victim may be using the similar login name. This can particularly be true in cases of harassment across platforms, which while is not a computer science security issue, it is a personal psychological security issue.

bpicolo · on Feb 12, 2018

> the username column is not literally the key being joined on

That's exactly the point though. If you join on the username than allowing emails/usernames or whatever that identifier is to be edited is very hard. How you identify the row to auth against is literally the point of a username.

always_good · on Feb 12, 2018

Though I haven't seen too many systems where the static login name is actually hidden from the world. It seems to just end up being the canonical ID, so right there in every permalink or whatever.

gravypod · on Feb 12, 2018

> Uniqueness is harder than you think

Discord has a very interesting solution to this. They have user names and user ids. User IDs are tied to emails and the user's name seems to just be a random text identity for displaying to users. I assume most of their backend code used a unique, sequential or random, integer ID to identify and talk about users while their frontend just makes the ID to a "user name". As long as you slap account creation behind a verification email and don't mind one user being able to sign up for multiple accounts you side step many of the larger problems that come from choosing user names because, in effect, you are choosing the "Real" username and you can make any guarantees that make writing all of your other software easy.

Klathmon · on Feb 12, 2018

Blizzard also uses a style like this. While it's great for some use cases, it really sucks for others.

In Blizzard's implementation, I can't add a friend by just knowing their name, I need their id number as well, and the process for finding it isn't exactly front-and-center.

ulzeraj · on Feb 12, 2018

I remember when I’ve switched to bnet ID and my tag became username#1337. People sometimes ask me if I’ve bought the name or something like that.

Ambroos · on Feb 12, 2018

I got Ambroos#2772 or #2727 (can't remember, it's been a while). Which is fun, because I have a thing with 27 as it's the birthday of all my grandparents (moms side) grandchildren.

Zekio · on Feb 12, 2018

well with Discord, you can literally pay to get a different number of your choosing if you pay for their subscription service, as long as the number isn't taken, you can pick whatever number your want as long as it is 4 characters long of course

lbill · on Feb 12, 2018

I really like Blizzard's solution : {$Username}#{$number} is very practical! It complicates thing a bit when you want to share your contact info to a friend, or when you try to remember a specific battletag, but it solves the uniqueness problem. And to be honest, on most sites I end up using numbers at the end of my username anyways, such as "Username0037".

mooreds · on Feb 12, 2018

This should be exhibit number one why you should always favor open source libraries rather than writing your own plumbing functionality, especially around authentication and authorization. The onus should be on the developer to explain why the open source library isn't a fit, rather than defaulting to 'roll your own'.

The edge cases discussed don't pop up that often unless you have lots of folks using your software or are really diligent about fuzzing and testing edge cases. If you roll your own, say, username system, you probably aren't going to fall into either of those two cases. Which means you're vulnerable.

devmunchies · on Feb 12, 2018

but then again, open source does not mean its good software, obviously. there should be some way quickly check if a library meets security best-practices. like a some sort of "vetted software" reference

always_good · on Feb 12, 2018

Also, using a 3rd party library for something as important as authentication because you don't know how it works doesn't sound much better nor secure.

Like storing sensitive data in the authn's session system because you don't understand encryption vs signing nor how to find out -- maybe it's time to just sit down and credentialize as a craftsman.

The authn/z systems I've used that were the biggest headaches in my life were kitchen sink frameworks trying to generalize over everyone's creature features, and they were often tied to a company/community culture of not-gonna-touch-it that only hurt users and security.

mooreds · on Feb 12, 2018

I think you should absolutely understand any third party systems/libraries you use, especially when it is as important as authentication. Using a third party component doesn't free you up to be lazy or to use it incorrectly.

My comment was stating that you should default to these types of libraries and only roll your own if you can't do what you need to, simply because they're more likely to handle edge cases that can have serious implications.

Do you do unicode normalization on your usernames? I freely admin that I don't, and wasn't aware it was needed until I read this post.

pc86 · on Feb 15, 2018

If you don't fall into any of the affected cases, what exactly is the vulnerability? The security aspect I get, and happen to agree with 100%, but if I roll my own sub-optimal username classification system for my CRUD app that gets 80 users in its entire lifetime, so what? I will have certainly learned a lot more than just typical `npm install usernames`.

rch · on Feb 12, 2018

Actually, the first step in a lot of Django projects is to rip out the default config and ditch usernames in favor of email addresses for login, and attach some form of unique internal id to the "account" (or whatever billing, social IDs, etc. might be associated with).

yjftsjthsd-h · on Feb 12, 2018

Should that be included by default, then?

NoGravitas · on Feb 12, 2018

Yes, but it would make upgrades to the first version that uses it a major flag day. Which limits when it could be rolled out.

yjftsjthsd-h · on Feb 12, 2018

Off by default so no changes for existing users?

rch · on Feb 12, 2018

Probably, but there's disagreement about how to do so, and it's not hard to make the changes anyway.

IgorPartola · on Feb 12, 2018

I agree with everything here except the email addresses example. Yes I do want to register igor@example.com and igor+work@example.com. Those are different accounts, please don’t mess with that.

zbentley · on Feb 12, 2018

There may be very good reasons for a site operator to disallow this. I mentioned some of them elsewhere in these comments: https://news.ycombinator.com/item?id=16358625

yeukhon · on Feb 12, 2018

I also employ the dot scheme for different purposes. Example: johndoe.hn@gmail.com vs johndoe.school@gmail.com so I wouldn’t want anyone aliases without the dot.

pc86 · on Feb 15, 2018

For things that are legitimately tied to your physical identity as opposed to an email address it makes sense not to let you do this.

neya · on Feb 12, 2018

Hey community, shameless plug: For the purpose mentioned in the article to disallow certain usernames, I created this GitHub Repo sometime back. Feel free to submit a pull request :)

https://github.com/dsignr/disallowed-usernames

flurdy · on Feb 12, 2018

Seems I should add some more to https://github.com/flurdy/bad_usernames :)

neya · on Feb 13, 2018

Nice! Maybe we should merge our efforts.

timvdalen · on Feb 12, 2018

> So if you’re enforcing unique email addresses, or using email addresses as a user identifier, you need to be aware of this and you probably need to strip all dot characters from the local-part, along with + and any text after it, before doing your uniqueness check.

Please don't do this, lots of people (including myself) use the '+' hack to separate accounts for different contexts (business/personal, different projects/clients, etc).

jrimbault · on Feb 12, 2018

I think he wasn't proposing to remove the '+' part from the address stored, but splitting the address into smaller parts when doing the unicode verification and checking if there's isn't already a 'john.doe+xxx' when you register with 'john.doe+yyy'.

timvdalen · on Feb 12, 2018

I realize that. What I'm trying to say is that I like to create 'john.doe+projectA@example.com' and 'john.doe+projectB@example.com' accounts with some services.

Checking for the existence of any 'john.doe@example.com'-like accounts would mean I have to register an entirely separate email account or set up (another) email forwarder/alias.

Ajedi32 · on Feb 12, 2018

Which is just as bad, since now you need to create an entirely new email account just to have two accounts on the same service.

kevin_b_er · on Feb 12, 2018

Also proposing to ignore all dots (.) on account of GMail doing that, while a moment before saying how that behavior is different. So they will break the experience for two people on a dot-unique mail server.

ghalvatzakis · on Feb 12, 2018

My Linkedin username: https://www.linkedin.com/in/Αdmin/

edent · on Feb 12, 2018

Brilliant! Does it cause you any problems?

ghalvatzakis · on Feb 12, 2018

I'm using this username for about 6 months now. No problems so far...

Alex3917 · on Feb 12, 2018

If you're going to allow unicode usernames then you should casefold them rather than lowercasing them before normalizing as NFKC.

You should ideally also store a second copy of the username in the original casing and normalized as NFC for display purposes, as some users care a lot about seeing their username exactly as they entered it. (And in fact not allowing this may be seen as culturally insensitive in some cases, much like not supporting unicode.) The same applies to the user's first and last name, which you can store in NFC for display purposes and casefolded into NFKC for string comparison (e.g. search) purposes.

That said, most sites limit usernames to ASCII characters so that they can be (easily) used in URLs. In this case you don't need to casefold or normalize, just converting to lowercase is enough.

ubernostrum · on Feb 12, 2018

If you're going to allow unicode usernames then you should casefold them rather than lowercasing them before normalizing as NFKC.

I wanted to stay out of the Python 2 vs. 3 quagmire in this article, but it's worth knowing that in Python 3.3+, strings have a 'casefold()' method:

https://docs.python.org/3/library/stdtypes.html#str.casefold

Unfortunately, since Python 2 still has around two years of upstream support before EOL, I can't universally recommend people just use 'casefold()', no matter how much I'd like to.

unethical_ban · on Feb 12, 2018

These "battle hardened" articles are fascinating. It is the output of years of experience and learning from real problems. It's building the best practices guides, and building the tools to scan for the edge cases. A great read!

stasel · on Feb 12, 2018

Worth mentioning Spotify's account hijacking problem when using unicode https://labs.spotify.com/2013/06/18/creative-usernames/

mhandley · on Feb 12, 2018

When it comes to email address normalization, it sounds like we could do with a standardized way for a domain to express normalization policies.

Could be as simple as publishing a set of regular expression subsitution rules, specifying (for example):

* render to lower case (because this particular domain is case insensitive)

* drop periods (because this domain treats them like gmail does)

* drop '+' and any subsequent characters (because this domain treats them like gmail does)

* ASCII only (because mail software is old, and doesn't support unicode)

Etc.

Each domain could then publish their own rule, perhaps in a DNS txt record, and anyone needing to check if two email addresses alias to the same could run the correct checks.

ino · on Feb 13, 2018

Some users capitalize their email address, and they expect to see it capitalized.

I think a better solution would be to use a case insensitive collation on the database for the email column.

If the user changes the capitalization of their email, treat it like any other email change (validate the new email via email token)

Paianni · on Feb 12, 2018

My strategy is to think of a name so unique that no one else would think of using it. The one on my HN account I originally cooked up in Nov. 2014, and I never had to extend it with numbers to get it accepted on various forums (yes, most of them were fine with changing the nick). My biggest gripe is that since YT changed their username system in October of that year, my most popular channel is stuck on an old username despite being a few months older.

Mayzie · on Feb 12, 2018

Aside from the authors library, django-registration, what other similar libraries for Python and other languages have taken all or some of this into consideration?

Excellent read by the way. Many things I have never considered or even worried about before.

jancsika · on Feb 12, 2018

> 3. Public identity, suitable for displaying to other users

Many sites-- like HN-- may not even need that. If you have system and login identity you can just display "dingus" as the name of every single user and the system should still work the same.

grzm · on Feb 12, 2018

I think without display names having discussions would be difficult, as people find some identifier useful to follow along. Even if it's a pseudonym, it's hard to build a sense of community without being able to distinguish those in your community. Or am I misreading you?

Anon1096 · on Feb 12, 2018

4chan and other imageboards manage to have discussion fine without unique identifiers.

zbentley · on Feb 12, 2018

It's a different kind of discussion, though. There is value to each type of threading, but no one-size-fits-all approach--as evidenced by the fact that 4chan lets you opt out of anonymity.

sergiotapia · on Feb 12, 2018

I don't follow the initial premise.

>Well, it’s easy until we start thinking about case. If you’re registered as john_doe, what happens if I register as JOHN_DOE? It’s a different username, but could I cause people to think I’m you? Could I get people to accept friend requests or share sensitive information with me because they don’t realize case matters to a computer?

Just this month we fixed this issue by using a citext column in postgres. So yes, it is easy. Maybe I'm missing an edge case here?

https://www.postgresql.org/docs/9.1/static/citext.html

bdcravens · on Feb 12, 2018

I assume that's not a standard datatype in all RDBMSs, and the article seems to focus on Django, so I think the author in speaking about ORMs (of course, you generally can also define custom validations in most ORMs, so even a lower() check would help in many cases)

imron · on Feb 12, 2018

The problem is not hard to fix from a technical standpoint, but from a practical standpoint it is impossible to fix due to breaking too many sites that were created without case insensitivity and that likely contain conflicts.

jarofgreen · on Feb 12, 2018

So citext just compares a lower case value of the string, unless I'm missing something?

If so, the rest of the article covers in great detail all the other edge cases :-)

acdha · on Feb 12, 2018

It’s easy if you thought about it before you have users; people didn’t and then need to ensure that a fix doesn’t break something.

craigds · on Feb 12, 2018

Yeah. We switched to CITEXT for email addresses after 10k users, and had to go through quite a complex process to merge about 30 user accounts that had signed up duplicate accounts with email addresses varying only in case). It was a major PITA

ubernostrum · on Feb 12, 2018

Good for you!

Now, how did you solve the other problems mentioned?

saurik · on Feb 12, 2018

The entire concept of usernames that are unique and permanent is stupid and even "cruel". The reality is that a relatively small handful of privileged early adopters get good usernames that match their identities, and everyone else gets screwed. These identifiers then act like tatoos that you got a long time ago and are stuck with for the rest of your life: people end up reminded every day of a sport they can no longer play due to an injury ("hockeystar") or loves lost ("iheartjessie"), attached to a joke that is no longer funny or to a thought that they found adorable as a 13 year old (when you are legally asked to "choose a username": a modern era coming of age scenario) but which adults find inane, or to a nickname that means something different than you realized to some people and now can't change.

The reality is that there are almost ten billion people on this planet and they live for upwards of a century. You are simply deluding yourself if you think it is reasonable to build a system with unique, permanent usernames. Nothing in the real world works like that, including trademarks. And it just helps enforce the very problem that people try to trust usernames and then get tricked by people who sniped usernames that are tied to other peoples' well-known identities (leading to abused "verified" badge systems and legal challenges and expensive hostage scenarios... it just sucks).

And for what? To make it easier to hand-type a URL? Does anyone even do that? I am super technical and I barely even do that in 2018, as if nothing else there are too many websites in existence to remember all of their one-off URL schemes. Like almost everyone, I either use the site's built-in search feature or I do a search on Google to find people, and let a combination of page rank and personalized results guide me to the right destination. Some web browsers don't even show URLs anymore!

Here is a great example of where it is completely insane: Facebook. There is absolutely no good reason for that website to have usernames for regular users, and they frankly shouldn't have usernames for businesses either. It isn't even clear to me that the app--which most users are using, not the website--even has a way to show people's usernames, which means this is an identifier which somehow everyone knows must be chosen and must be unique and is nigh-unto permanent but which somehow is also simultaneously meaningless but is also a horrible point of contention? What?

I am lucky. I spent a bunch of time in 1994 to select a username, and despite being 13, I was mature enough to come up with something that wouldn't ever come to cause me complex problems. People ask me what it means, and it essentially doesn't mean anything: it has only a positive connotation to me when I hear it, it is entirely neutral, and it had no existing usage I could find. Yet, I also still got screwed, as I am semi-famous, and everyone knows me as this username. I have kids who look up to me enough to want to take my name as a show of support and I have to essentially be the big bad asshole about it because in a world of unique and permanent usernames, people then assume the kid is really me. On the other side, I have been asked to rename myself by moderators of various forums as they couldn't believe the real saurik got an account on their site, and it was "confusing" people.

And so in the end we all have to deal with the worst-case scenario anyway: unless you do nothing but sign up for random sites rumored to be interesting constantly (which I seriously tried to do), you eventually will succumb to needing a way to prove who you are on multiple sites and tie together those identifies. And for most users... as in virtually all "normal users", that moment comes when they are using only two websites, as their username was probably something like jay.freeman.178 as everything that was even remotely interesting to them was taken a decade earlier by literally a different generation of humans, so they let the website automatically generate one.

In a world where everyone is having to solve the worst-case problem anyway, every site should just have numbers as unique identifiers, at most have some kind of trust score for degrees of separation on the site (so you can get a feeling for "is this the saurik that I met?"), and everyone should be trained "names don't matter and if you see someone with that name it doesn't even slightly mean that they are the same person you met last week".

tjoff · on Feb 12, 2018

> And for what?

So that you can be identified? (I'm not talking identified in a mathematical sense, but in a informal conversational sense (you know, what usernames are actually used for)). The whole point of a username is that it is the most humanly convenient way to represent a user in text in the context of a certain site.

Because the alternative would be to have thirteen saurik in the same thread debating a topic and you would have no way of distinguishing them. Avatars are an attempt to fix that but it sucks and is bloated for many scenarios.

Sites that do allow you to change username break conversations where people refer to each other using the username (stackoverflow comments are a really common and annoying issue)

Yes, it is annoying when you don't get your first pick but it truly is not a big deal and it solves a real problem.

jay.freeman.178 is an excellent username. You are not your username.

laumars · on Feb 12, 2018

People can change their legal names, I see no reason why they shouldn't also be able to change their user names.

The problem with breaking continuity in forums and other similar networks can be mitigated via dynamic user name lookups (eg how Facebook does `@` mentions - however I have also seen some forums do this as well), supporting in line quoting (like how message boards often work), nested replies (HN, reddit, etc). Granted there will still be occasions when references slip through the net but us humans have a remarkable ability to deduce the context of the written word even when it doesn't always read perfectly.

tjoff · on Feb 12, 2018

I have not seen dynamic user name lookups work well in practice though, where the system has made it convenient enough to actually be used consistently. A different approach might be to leave the name as is for past conversations but use the new name for new interactions (and some way of showing the past name, xxx (formerly yyy) for new posts and xxx (now known as yyy) for past posts) - as is quite common that people manually do in the real world (and on facebook). Not perfect and quickly gets complicated.

I'd argue that a username does not have the same function as a name. My name is not an identifier and most have not chosen their name - which is also why (I believe) the first name change is free of charge where I live. In most cases you can also create a different account. Depending on the type of service this might not be desirable but in others it is exactly what you would want.

Willamin · on Feb 12, 2018

Slack handles this quite well actually. I know a lot of people dislike Slack for a lot of valid reasons, but they handle a few things right. If I type a message on Slack saying "hey, @alice, what's up?" then "@alice" gets replaced with the person's real name "Alice". When I send the message, the "Alice" bit in the resulting message becomes a link that will open the user's profile. Additionally, if "Alice" ever changes her name OR username, the name updates accordingly.

nfc · on Feb 12, 2018

I agree with you about the identification part. I was wondering whether we could have a user friendly way to have a system of usernames + "something else" that allowed usernames to not be unique while still solving the identification part.

A possible implementation would be to allow the user to give a "nickname" to add to usernames he wants to identify uniquely that would be visible only for them. For example since I talk now with you I could add to your username "user with whom I discussed identities and usernames" and this (or a short version of it) would be shown next to your username from now on.

A more automated way to do this is to create a unique image for the user based on the content they have posted, when they created their account, not so personal but requires less effort from the user, I'm sure such systems exist in many sites to create avatars. Obviously in this machine learning times we could get to do sth much better.

hood_syntax · on Feb 12, 2018

The former battle.net (now Blizzard App) does what many games do and has the form "Username #XXXX", where the xs are digits. So you can be Frank #0001 and I can be Frank #0002. Steam allows you to tag people on your list with user created descriptions. There are definitely applications out there that make this kind of user experience a priority, I feel like the gaming world is ahead of the curve on this point.

Crespyl · on Feb 12, 2018

Steam also allows you to change your publicly visible username more or less at will.

There's one "account name" which you pick at sign up, use to log in, and can never change (as far as I know), but you can then pick a "user name". The "user name" is what gets shown in any interaction with other users (forums, profile page, chat, in games, etc) and IIRC it doesn't even need to be unique.

Tagging people on your friends list is a great feature to match, since the tags remain when the tagged user changes their user name, which some people do quite often (whether for a joke or just because they got bored of the old one).

It really seems like a great system and it'd be nice if other systems offered the same degree of flexibility.

TheDong · on Feb 13, 2018

Steam has some poorly thought out shit; it has four identifiers, three of them unique:

1. Username, used to login, not visible elsewhere, unique and security related.

2. SteamID, id number which is visible elsewhere and used in their api fairly often; not too different from the username other than being public

3. UrlID - by default a "steamcommunity" link to the user's page is their SteamID, but it can optionally be edited by the user to a custom url. This ID is a globally unique namespace. For example, https://steamcommunity.com/id/gaben

4. Display name ... Yup, the display name. Arbitrarily editable.

Having that many things is dumb. Having an editable url component is dumb (it shoulda just been the steamid forever).

All in all, they're not a good example.

henrieri · on Feb 13, 2018

A combination of username, steamID and display name completely understandable in my opinion. User shouldn't be forced to login using a randomly generated ID, but something he can remember and has the option of being private by being able to use the display name.

UrlID can be beneficial for the user so the user can choose a easily spellable identifier for the URL in case he needs to share the link often using voice.

Seems like an optimal system except that the UrlID probably is only a use case not that often. But it still won't really hurt anyone. If it wasn't the URL ID it would be steamID, which does nothing to help to remember the URLs. So why is it poorly thought out if it gives benefits to some usecases without making anyone else worse off?

mrguyorama · on Feb 12, 2018

And as a user, that system is _utterly terrible_

"I want to add you as a friend so we can play games together, what's your name" Mrguyorama + random junk I didn't choose

It's user unfriendly

hood_syntax · on Feb 12, 2018

Would you rather be xXTheRealMrGuyoramaXx or mrguyorama_123 instead? I really don't see how it's that different. It solves the problem of first come first serve, without forcing people to come up with the random meaningless bits themselves.

WhyNotHugo · on Feb 13, 2018

For Battle.net, you can just give out your email address and that works fine.

Sean1708 · on Feb 12, 2018

But now the onus is on other people to create a name for everyone they might interact with regularly. So instead of each person creating one name to denote themselves, each person is now creating multiple names to denote everyone else.

And I don't really see how this is functionally any different to adding digits to the end of your normal username.

Crespyl · on Feb 12, 2018

In Steam it is actually the case that a user can select their own display name (distinct from the account name used to log in) and change it at any time (and there's very few restrictions on what you can set it to, the system doesn't care if every single one of your friends has the same display name).

This function is separate from the "tagging" feature, which is helpful for keeping track of those friends who frequently change their display names and avatars. It's also possible to view the list of previous display names a given user has had, at least if you're friends with them.

hood_syntax · on Feb 12, 2018

If you're referring to the tagging on Steam, it's in addition to the regularly displayed username. If you're not, I guess I'm not sure what you're talking about.

remir · on Feb 12, 2018

Discord does this as well, plus you can change you public username on every servers.

O4epegb · on Feb 12, 2018

It is battle.net again

hood_syntax · on Feb 12, 2018

Glad to hear it, had no idea

scrollaway · on Feb 12, 2018

It's actually Blizzard Battle.net, now.

I wish I was making this up. Maybe they're trying to get their chat app acquired by Google.

Natanael_L · on Feb 12, 2018

Asymmetric keypairs, where your nickname is associated to your public key. It's the only reliable way to do it across multiple sites. But then you have a keypair to protect.

zeveb · on Feb 12, 2018

> A possible implementation would be to allow the user to give a "nickname" to add to usernames he wants to identify uniquely that would be visible only for them. For example since I talk now with you I could add to your username "user with whom I discussed identities and usernames" and this (or a short version of it) would be shown next to your username from now on.

That sounds a bit like SDSI & SPKI's nicknaming functionality. Entities were identified by keyhashes, but you could (and in practice would) give them nicknames or use others' nicknames for them.

kemitche · on Feb 12, 2018

> Because the alternative would be to have thirteen saurik in the same thread debating a topic and you would have no way of distinguishing them.

In the real world, people almost always go by their first name, and we don't have this problem. When two people in a social circle have the same first name, we don't turn and say "well everyone has to use their whole name, always, now." Rather, we adjust our names (usually someone gets a nickname, or goes by their last name).

The steam system allows multiple people to have the same display name and it works just fine. Sure, people can troll with it when they join your tf2 server (and then you kick them off).

The blizzard system also works great. The unique identifier is there, if all other forms of attempting to add a friend fail, but mostly you end up working contextually.