Hacker News new | past | comments | ask | show | jobs | submit login
Let’s talk about usernames (b-list.org)
900 points by acjohnson55 on Feb 12, 2018 | hide | past | favorite | 429 comments



> So if you’re enforcing unique email addresses, or using email addresses as a user identifier, you need to be aware of this and you probably need to strip all dot characters from the local-part, along with + and any text after it, before doing your uniqueness check. Currently django-registration doesn’t do this, but I have plans to add it in the 3.x series.

Sorry what? That seems pretty unneccessary. A third party system to dictate how a third party system handles it local alias system for emails? I can't see any benefit to that.

Whether a mail server handles '+' in a standard way is not guaranteed, and surely it is up to the user how they use that feature if enabled.


Absolutely this. It's not even a standards issue, for the outside world they're completely different email addresses. The fact that they happen to land on the same inbox for some providers is unrelated, it's just how that particular provider decided to handle all your different email accounts.


Yes! And if you have your own domain, you can just setup a *@example.com to be your one email address. And then you can give a different email address out anyway even without needing to go +. As you said each system can treat it differently.


Agreed.

From RFC 2822: "The local-part portion is a domain dependent string."

Headaches await you if your code is making a lot of assumptions about how the originating domain manages that local-part portion.


Apple ID's make assumptions (they strip out the + and everything after it) and it's made it a nightmare to use iTunes Connect. Their multiple account support is still terrible so it's easiest to have an Apple ID per iTunes Connect account. Now I need to setup actual aliases in my GSuite account...


Stripping out things past the + will also break things for users that intentionally are creating more than one signup. This seems like a really bad idea.


Agree. You're going to regret doing this when you try to test it yourself, with your own Gmail (or GSuite) account.


That's the entire point, no?


Why do you want to detect the difference between a user signing up with one email account twice or two different accounts? It's folly to try to detect this in this manner and you will only piss off regular users.


The service that's trying to strip the +'s are trying to prevent users from using their (usually free) service in a way that will cost them more money, but never convert to and profit.

i understand why they do it, but i can't condone it off course.

The service should just allow it, and make sure the appropriated subscribing fee.


> The service that's trying to strip the +'s are trying to prevent users from using their (usually free) service in a way that will cost them more money, but never convert to and profit.

I can always create multiple email accounts, you know. So I don't see how spending development efforts to parse and detect + and . is effective.


It's a pointless and stupid point.

It is extremely easy to setup extra email accounts, so preventing people from doing so with slightly less work is pointless.

It is stupid because it could prevent someone from using their real, valid email address because it matches another, different, valid email address.


Agreed. The + thing probably is "only" unfriendly to powerusers, but stripping dots can easily get in the way. Several of my e-mail addresses have "duplicates" that just differ in a dot and are owned by entirely different people.


> so it’s impossible without doing DNS lookups to figure out whether someone’s mail provider actually thinks johndoe and john.doe are distinct.

The author has announced they believe them to intentionally indistinct and has announce an intent to break handling for any mail servers that consider them unique. All on account of gmail ignoring them?

This coupled with their author's intent around + makes me loathe this behavior.


I was with him most of the way, but the advice about emails seems very user hostile to me.


It is worse than unnecessary. If people start irrationally doing things like recognizing the '+' then that creates facts on the ground which rational reasons for others to have to recognize the '+' -- except nobody really knows why anyone is doing it or what semantics anyone else is using. Thus the whole thing evolves into mess.

So please don't do that. Respect the bytes that be!


yeah, I actually use google's plus-email features to help me generate test accounts on some sites or to attribute mail spam to other sites.


If you don't do this your system may allow users to sign up for multiple accounts and double-dip on signup benefits, free trials and other "one per account" features. (Barring additional controls, of course).


You’re gonna be sad to know that mailinator[1] and it’s 100s of domain aliases exist.

You’re fighting the wrong battle.

[1] https://www.mailinator.com/


Some places check for these types of services though


They do, but they're not very effective. They might have an up-to-date list of domains belonging to the big services (i.e. Mailinator), but you can always just google "Disposable e-mail", open the 10th page and pick the first one, and it's pretty much always guaranteed to work.


I beg to differ. I use this blacklist and not much gets by. Anything that does, I blacklist manually.

https://github.com/martenson/disposable-email-domains/blob/m...

A lot of sites I've come across lately are going back to the old-school way of only whitelisting email addresses from .edu domains or ISP accounts ("@comcast.net").


That's an absolutely ridiculous idea and a good way to destroy the amount of users you're getting. Do you whitelist Gmail? Then it's piss simple to sign up to multiple accounts. Do you not? Good luck a massive portion of the internet to sign up...


I don't want the sort of users who are signing up to perform financial transactions using services like mailinator and guerrillamail. It's always fraudulent.

Gmail itself is allowed, but the addresses are normalized. I'm well aware of its potential for abuse.


I have infinity. I use Gmail for business with a catchall rule that forwards to the one address in my domain.

I have an email address for every site I sign up for.

Helps you figure out who sells your email.

How do you handle this? Or do you not?

I guess you could do a DNS lookup of the mail exchange records and see if it points back to gmail and compile a list of domains to allow one account from... but then that would break many companies emails. That’s no fun.


I use Fastmail in exactly the same manner (catch-all, I use "yourdomain.example@mydomain.example" as the email address for signups). Also, anyone could put a cheap domain name on Migadu with a catch-all for this purpose.

Services need a better way to figure out fraudulent or abusive behaviour than guessing based on the email account's domain name.


I think what the parent meant was that creating multiple gmail accounts is also piss-simple.

I personally own over 100 from a couple years back when recapta was easily automateable and the phone-number requirement wasn't there.


> only whitelisting email addresses from .edu domains or ISP accounts ("@comcast.net").

I haven't seen ISPs give out emails addresses like that in years (I now my last couple of ISPs have no such thing).

So you're basically limiting yourself to people from universities, or who've had an old-school ISP for a while.


I've yet to see a fixed ISP that doesn't give out email addresses; though I very rarely meet anybody who used them as anything other than a fallback for their webmail addresses.

Mobile carriers even still seem to, though they are optional and require an additional setup step.


Or they could just sign up at one of the numerous free email providers with a different username. Stripping the + suffixes is only providing one thing - pain for the users that want to use it.


This is true. I know someone who says she's been doing the Hulu free month thing for a couple of years, and does it for other services as well.

It's all too much for me to keep track of, but for some people it's no big deal to create new e-mail addresses every month.


The author suggests stripping the part before doing the uniqueness check. This does not mean that the username (email address in this case) would not be allowed.


It would for the second user whose address resolves to the same result from the “uniqueness” check.


I wish this would happen. There's a "rocket surgeon" on the East Coast who has (tried) to sign up for Facebook, Twitter, Steam, and a bunch of sleazy 'message gurlz now' apps using my Gmail address without the period.

Obviously it never works, as I get the "I see you're trying to create a new account" email, but one of these days he's going to figure out a way to take over one of those accounts and then I'll really be fked.


I don’t understand your complaint. Google resolves the addresses john.smith@gmail.com and johnsmith@gmail.com to the same account, which you control.

(1.) What are you imagining is the attack vector exactly?

(2.) Are you asserting that all website owners should build to Google’s (non-standard) behavior?


I have a similar problem with someone who keeps (hopefully accidentally) putting my landline number into Facebook, then I get a call with a recording asking me to press some number to verify my Facebook account.


I'm curious as to who would use a service someone else is using, sharing their email but with a local addition?

Can me and my wife both sign up to HN and use my email but hers be josh+swife@joshmanders.com and mine be josh@joshmanders.com?

That's a strange usecase, isn't it?


gmail makes josh@... and josh+swife@... resolve to the same thing, but there is no guarantee that all other email services behave like that. For all you know, there is an email service that lets you register an email like that, so you have 2 users now whose email is: john+smith@... and john+brown@...


It's way easier to write a script to generate thousands of variations on the same email address than to sign up for a thousand different accounts. I've actually been bitten by this bug before... or rather, my company was bitten by an affiliate who neglected to sanitize their emails this way and someone was able to create thousands of gift cards in our system.

Having said that, in development, it's super nice to be able to create addresses with +'s in them.


What you say is not untrue, but it's still bad advice to do it -- a security red herring. First of all, you don't know that 100% of mail servers ignore characters after the +, so you can't safely strip those characters or you might not end up with a usable email address. That goes double for stripping the dots/periods, which gmail ignores but many other mail servers do not.

On top of that, it's just as easy to set up a catchall email address -- an email box that accepts all mail for a domain, literally anything@mydomain.com. So a malicious actor could sidestep this security attempt with minimal effort, but it still inconveniences legitimate users despite being worthless from a security perspective.


True, true. As I mentioned below, in my case, it was even usernames, just entering you email for a free gift card. The attacker actually used dots with a gmail address.


There are soooo many ways to easily game the email side of it that you would be better off using other means of detecting uniqueness (rate limit per IP address, rate limit per hash of IP address and user-agent)


>It's way easier to write a script to generate thousands of variations on the same email address than to sign up for a thousand different accounts.

It's just as easy to write a script to use ephemeral hosts that you don't need to sign up for. Things like Mailinator.

All it does is irritate people like me who use +words as prefilters for email (and to see which companies are selling my email/user data).


Fair enough! In my case it wasn't actually usernames, just entering an email address through a phone company for a free gift card from my old company so yeah, my point is moot.


That's all fine, but except in pretty specific circumstances you're going to have valid reasons to want multiple accounts for a single email address. Kind of a crazy scale issue, but one example is wanting your AWS Account to be separate from your Amazon Retail account, even though they use the same underlying account store it's a good idea to use separate accounts even if they're tied to the same email.


Well, that's always a risk. The thing this guards against is someone accidentally creating two accounts with variations on the same address and being very confused.


i recall Amazon did something quite amazing, which is to suggest to the user who signed up twice whether an existing account is the same. but only after they've purchased!


If you as a service provider can't accept that you shouldn't be offering those free trials / benefits at all. This isn't an issue.


> So if you’re enforcing unique email addresses, or using email addresses as a user identifier, you need to be aware of this and you probably need to strip all dot characters from the local-part, along with + and any text after it, before doing your uniqueness check. Currently django-registration doesn’t do this, but I have plans to add it in the 3.x series.

Please don't "normalize" email addresses like this. Not all mail systems are Gmail, and many do treat "john.doe@example.com" and "johndoe@example.com" as different identities. And even if we are talking about Gmail - it's not your identity system's job to deduplicate different logical addresses for the same physical inbox.


To emphasize your point: don't touch email addresses. You can get away with doing equality checks on NFKC case folding, but don't assume that you can store a lowercased email address and have it work properly.

Lots of weird email systems exist. Don't assume that everybody works like gmail. And do test that things work right with uppercase letters in email addresses: I've been locked out of systems before because I use an uppercase letter in my email address and one half of the system was trying to match the lower cased version to the actual text.


The local part (left of the @) of email addresses is case sensitive by definition and this should not be controversial.

RFC5321 s2.4:

   The local-part of a mailbox MUST BE treated as case sensitive.
   Therefore, SMTP implementations MUST take care to preserve the case
   of mailbox local-parts.  In particular, for some hosts, the user
   "smith" is different from the user "Smith".
Even though it goes on to say:

   However, exploiting the case sensitivity of mailbox local-parts impedes
   interoperability and is discouraged.
This doesn't prohibit a local delivery agent (such as Gmail) consolidating multiple variants into one mailbox, but everything up until that point must refrain from making assumptions.


That's definitely controversial, and in my experience, just plain wrong. Many, many users sign up to a service with john.doe@example.com and get confused when they later try to log in with John.Doe@example.com. we had to add a fix for this in our django user model a couple years back. We havent had any complaints with the new, case insensitive system. IMHO this sort of this should be upstreamed but it seems Django have decided to stick with the spec in spite of usability.


As you clarify later, both case-sensitive and case-insensitive are wrong. What you generally want is case-preserving: store the original case, but allow case-insensitive matches to select it.


This is an especially common problem when entering email addresses on mobile. Often autocorrect will capitalize the first letter. I always make sure it’s lowercase when registering, but I can see how this could easily lead to confusion.


Web pages (and apps) that don't indicate the input box is actually an email address so auto-correct won't capitalize for you are a serious peev of mine.


Oh also, we used a CITEXT field for storing email addresses. Store the case the user registers with, and use that for sending comms etc, but uniquify and match (on login etc) in a case insensitive way.


I ran into this problem as well and in the end added a separate field for the normalized (lowercase) form with a uniqueness check on that. Users would still be presented the original case and it would be preserved when sending e-mails but for identification/uniqueness purposes e-mails were considered case insensitive.


Right, something like "email_lc", and "username_lc". This would be your unique index.


Unfortunately, Gmail rules the world and has trained people to expect that they can be sloppy and inconsistent and enter their email address as 'johndoe' or 'john.doe' or 'JohnDoe' or 'John.Doe'... etc. and it just works, I don't understand why your site is broken, because my email works!

And when that happens, trying to patiently pull a "well, technically" and explain to them about RFC this and the specs say that is a way to lose users.

(I actually have extremely strong feelings about email, email addresses and the whole associated mess of specs, but had to tone it down for this article since it was mostly about the various traps you can wander into from naïvely thinking that you can just read a spec or implement something obvious and get away with it)


> And when that happens, trying to patiently pull a "well, technically" and explain to them about RFC this and the specs say that is a way to lose users.

Not in my experience. Showing compassion and agreeing with them that what happened is terrible and you wish things were different but you didn't call these shots back in the days and if they want improvement you can both go together and complain to google, the service which is actually broken.

Most people just want to be listened to. If you can do that, you'll earn a loyal fan, even if you don't do exactly what they tell you to when agitated. Some will even appreciate learning more about the email systems after the fact. They may even get the feeling you went above and beyond by offering to help them with matters outside of your site.


Most people just want to be listened to.

Many people, even after being listened to, and even after having things patiently explained to them, still continue to enter someone else's email address into forms which will send sensitive information to that email address, and complain that they never got their important email, or that some "hacker" has "hacked" "their" email, etc.

In a perfect world this would not happen. We don't live in a perfect world and are unlikely to live in a perfect world any time soon, so we should not be asking "how can we be pedantic and tell users it's their fault for not reading the RFCs", we should be asking "how can we protect users from their ignorance of the RFCs".

(when I wrote this article, I did not expect that this would be the single most controversial line in it from HN's perspective, but I guess by this point I should have anticipated it)


> has trained people to expect that they can be sloppy and inconsistent and enter their email address as 'johndoe' or 'john.doe'

Has it? In my experience, even if people know dots and the part after the + don't matter for their Gmail account (and most don't) they know it doesn't matter to Gmail, a peculiarity they can use to create multiple accounts on a single website without creating new email accounts.

(This is different from case insensitivity, which the majority of popular email providers seem to implement.)


And I constantly get email for other people with my name at gmail.com. Let's see ... I'm married, and I have a few girlfriends, I run marathons, and I've been to several family reunions. Oh, and I'm in the process of buying several different cars. Wheeee!


I suppose the tradeoff here is when you get two non-Gmail users, johndoe and john.doe, who are legitimately different people, one of whom gets very confused and is actually unable to use their correct address.

On the one hand, that's a (presumably extremely) rare corner case - on the other hand, some applications must handle those corner cases.



People where trained for that long before Gmail even existed. Email systems have generally followed Postel's law, since it makes life easier for all parties.

Do you want to run a support system where when you ask for people's username or email address, you also have to ask them for their casing (and in the case that their problem is that they didn't know it mattered, they may not know what they signed up for)?

I worked at an ISP for many years. Even though it was all backed by Linux, usernames (which included email addresses) were considered case insensitive for the purpose of the service (all usernames were lowercase). It solves so many problems and the downsides are so small all the sane email providers did it.

The flip side of this is that it was a simpler time. Usernames and email addresses were ASCII, not Unicode. These days with Unicode, you can't even be sure that uppercasing and then lowercasing an already lowercased string yields the same characters.


Oh, I assumed the recommendation was to store the actual email address as entered, and then also generate the fingerprint to compare for similarity. Wouldn't you just keep both?


The problem is that jo.elane@example.com and joe.lane@example.com can be different people, or jan.eton@example.com and jane.ton@example.com. On gmail, they can't, but on my university email system for example, this is the default naming convension for circa 50,000 people. If you consider them to be the same, one of them can't sign up to your service. How often this happens? Probably not all that often. But if you run a popular service, it will bite a few people.


Yes. The answer to the question posed in the article is actually simple - there are 3 unique email addresses listed. The fact that email addressed to some of them may end up in the same box is neither here nor there.


> The fact that email addressed to some of them may end up in the same box is neither here nor there.

It's not an issue if they're the same, the issue is if they're different. E.g. if I'm storing bitcoin on an exchange (I know) behind the email@domain.com, then I don't want someone else to be able to register Email@domain.com and then start looking for bugs in the service or start trying to socially engineer customer support. (The local part of the email address is case sensitive as per the spec.)


You hit the nail on the head.

People can hem and haw about the specs, but at the end of the day Gmail trained most of the world to believe email addresses are case-insensitive, dots don't matter, etc., and now we have to live with the consequences. If that means somebody can't sign up for twelve accounts using case and dot variations of their Gmail, well, so be it. And if that means they come to HN to rant about how that awful site didn't follow the RFCs, then they come to HN to rant about that, but their account will be safer in spite of it.


Maybe. But, if I was legitimately john.doe@example.com and you refused to allow me to register an account just because somebody else had a similar email, I'd be pretty annoyed. There's a good chance I'd take my business elsewhere.

I've only once had my email address rejected by a website (for ending with an underscore), but I never bothered setting up another email just for them.


And if the call-center person decided that 'johndoe@example.com' should be able to reset the password for 'john.doe@example.com' because they read a lifehack article about how dots in email addresses don't matter, you'd take your business elsewhere.

No matter what, there are reasonable hypotheticals where you get angry and take your business elsewhere. The difference is my approach has you leave because you're angry at the signup page, and your approach has you leave because someone stole your stuff. I'll take my approach any day of the week.


Given that we're developers and we make the tools that the help desk people use, why would we make a help-desk password reset that can send to a different address than the one registered with the account? Couldn't there be other social engineering tricks people could pull on your help desk employees that you haven't yet imagined? Maybe they also think it's ok to drop the underscore from my email address, despite that not being a part of the gmail scheme.


What about the "I've lost that email account, can I switch it to another to recover my account?" case. Or the "I'm the legal guardian of this person and need the account control switched to my personal email", or in the business world "this employee doesn't work here anymore and as administrator I would like the account transferred to me" cases. Locking customer support systems down tightly has it's own pros/cons.


I suppose I was more specific than I really should have been. More broadly, I'm trying to say that you have control over the tools and processes followed by your customer service. They can be used to combat social engineering.

For something as important as the credentials for a bitcoin exchange account, as Alex gave as his example, there should be policies specifying the reasons why account credentials can be changed and what evidence must be presented to do so. Front-line customer service reps shouldn't be flying by the seat of their pants when making difficult decisions with potentially hundreds of thousands of dollars on the line.


What happens when someone calls the CS person and tells them to type in their email address instead of copy pasting it or whatever. If there are any bugs at all in the CS software then it won’t be hard for the CS person to believe there is a bug they need to work around similar to the other bugs that are already in their dashboard.

The point of social engineering attacks is that they’re innocuous requests that don’t raise suspicion, and are hard to train people against.


OK. You go get every single company, site and service on earth to deploy a 100% perfect credential-recovery system and only have it used by 100% perfect people who never ever make a mistake. And when you've finished, let me know and I'll rethink my approach to email.


You don't need to make a system that's perfect. You just need to make a system in which checks are required to make _any_ change the credentials on an account. Once those checks are complete, it makes no difference if the change is from john.doe@example.com to johndoe@example.com or to jonathan.doe@example.com.


It’s not just password resets that are the issue. It’s also things like:

- Functions to get the account based on the email address

- Internal tools

- Stored procedures and other SQL stuff that happens outside the main code base

- Third-part integrations (Mailgun, Sailthru, ZenDesk, SalesForce, etc.)

That’s a huge attack surface where if there is even a minor mistake by a junior dev that no one noticed then everyone is going to lose their assets under protection.


> Gmail trained most of the world

At least here in the UK, gmail has high reach among techies but is very much a minority provider for normal people. Looking through the 3000ish registered accounts on the local community website I run, Hotmail, Yahoo, and particularly ISPs (btinternet.com, plus.net, sky.com) are all more common than gmail.


the best way is to err on the side of allowing a single email inbox to create multiple accounts.

Who cares if they do? It's not like that person couldn't create multiple email addresses anyway. E.g. abc@gmail.com and def@gmail.com can be the same person. You can't validate for that, other than other "gamification" systems (e.g. how HN treats new users - disincentivizes creating multiple accounts because why be limited on your second one when you have a good first one).

by the way I have used the + trick on google to sign up for a service (and pay for it!) that wouldn't let me reuse my old account for some reason. So their relaxed validation made them money.


It's a difference in difficulty. More to the point, it's a difference in asymmetry of difficulty.

Say you run an online store. You offer USD $5 in first-time user credit.

A lot of people (even some nontechies I know) know about the "+" trick for gmail. Assuming your signup flow is easy and fast, it's very easy for those people to sign up for multiple accounts and get multiple $5 credits.

If a lot of people do that, it might significantly impact your bottom line. Not just because you have to give away a lot of inventory, but secondary effects also suck: you stop the $5-free promotional, and then all of the legitimate users who signed up during the campaign who told their friends to sign up and get some free money now have their friends bad-mouthing the site to them because "it didn't give me free shit the way you told me I should expect it to!". You might see a drop in sign-ups to below pre-promotional levels, or, worse, you might see people who signed up during the promotional trust and use your site less. I know trends/behaviors like this seem trivial--and they definitely would only affect a minority of users--but past a certain scale effects like those can have a real financial impact.

Now let's say your site is "smart" about the "+" trick and doesn't let people with gmail (or google-federated emails--boy, is figuring that out a bastard) accounts sign up multiple times. You'll lose the dubious potential business of folks who like gaming promotionals. You'll still be vulnerable to people creating second email accounts and signing up using those--but the difficulty asymmetry now favors you, the vendor: it's work for a user to make a second email account; work they probably won't do, especially if you blacklist typical temp email services like guerillamail. If the promotional is large enough to entice first-time users but small enough to deter people from doing this, you have succeeded in minimizing your loss. If the user already has a fleet of accounts for this purpose they're probably going to just take your money anyway.

Of course, there's another more annoying scenario which I'll mention because sites should never do it: sites that think they're being smart about the "+" trick by not storing the part of the address between the plus and the domain part. This is usually done to get email campaigns (read: almost entirely spam) to show up in the user's main mailbox rather than some filter-purgatory. It will drive users away in two main ways: first, if I'm technical enough to use the "+" trick and a filter to route mail, I probably have enough obsessive annoyance with spam to immediately either junk-flag yours or delete my account, compared to a small chance I would have actually read it otherwise. Second, more than half of sites that do this which I've audited parse and strip the content of the email address wrongly (parsing emails is very hard, after all) in such a way that what they end up storing could be a totally different person's email, or an invalid one. That means signups just won't work. Whatever you do, don't do this.

This doesn't only apply to first-time-user promotionals, either. It also applies to:

- Referral bonus programs.

- Services that give hand-customized products to users (think a "one per user" etsy store with one person knitting cat dolls or something): multiple similar contacts from the same user would mean you spend a lot of time making their products--time which might be wasted, even if they paid for their products, compared with time spent making them for lots of different users and increasing your recognition/exposure.

- The same applies to tech-support contact-us forms: one user can "bogart" your support staff, clogging the queue with (legitimate or not) requests and defeat your rate limiters by using the "+" trick, making other users wait a long time for replies.

- Others I haven't thought of.


You can still buy your own domain and get as many new email addresses which pass your test as you want. So you probably need another solution to your problem.

What I noticed what several (large) services these days do, is ask for your credit card number on signup (even if they promise not to bill you).


You could. But most people won't. The ones that do this will game your system regardless; they have the time and resources to. It's about making it not worth it to buy a domain and set up MX forwarding/gmail federation or whatever, just for $5.


Many email systems have an idea of a catchall account. I can have a domain for which cestith@mydomain goes into one account and anything else @mydomain goes into a second account.


Stores that offer $5 credit usually don't sell things that cost $5. So it is nothing but a $5 discount code really.

If they are really giving away credit for free that can be used wholly then they probably need to verify your identity e.g. a $0 credit card authorisation or something.


Completely agree. If somebody wants to signup for your service as ceo@company.com and ceo+sales@company.com let them. Two distinct user accounts.


Dear sir. I enjoyed your free trial. May I have+1 another?


If you wanted to continuously have free trials it is trivially easy to just create entirely new email adresses


yeah, i paid less than a dollar for a super cheap domain, and created a catch-all email forwarder that lets me create any random email forwarding address that all forwards to my one email. i've used this method in the past to get unlimited free things at certain establishments that offer welcome gifts to new rewards accounts.


I replied to this same idea above: https://news.ycombinator.com/item?id=16358625


Dear user. Sure. Enjoy losing all of your preferences and data every 14 days.


My thought would be to have two distinct fields: "normalized" and "pristine" version of the user's email. Then always mail to exactly what the user typed ("pristine"), and use the "normalized" mail to prevent multiple registrations to "john.doe@gmail" and "johndoe@gmail" and "johndoe+1@gmail" and "JOHNDOE@gmail.com" and so on.

Denying registration to a suspected-duplicate seems a lot safer than mailing to a different person.


If you're going to do that then only do it for email addresses where you understand the local MTA conventions (like gmail). Do not do it for domains you don't know (because you cannot predict how they work).

For instance, on my email domain, david+abc@example.com, david.def@example.com, and david_ghi@example.com are all the same ('+', '.' and '_' all act like you'd expect '+' to), but david+xyz@example.com is not (it get's picked off and aliased somewhere else). Applying gmail conventions to other domains is silly and wrong.


It does surprise me how many big SAAS products with generous free trials don't check for '+'s on Gmail addresses...


You can still do a check for the normalized address for known sevices that allow moving the dots and adding a +somthing (e.g. to catch someone creating multiple fake accounts). Just be decent about it and store the user-provided address for communication purposes.


> it's not your identity system's job to deduplicate different logical addresses for the same physical inbox.

The problem is that then you're adding potential attack vectors to every single web app just to cater to the .01% of email clients that insist on implementing the email RFC exactly to spec. Not deduplicating email addresses creates an attack surface for both hard-to-spot technical issues and also social engineering attacks. E.g. what if your ESP at some point adds deduplication on their end, either mistakenly or on purpose, then suddenly you're sending password reset requests to the wrong users.

I think you should normalize email addresses to enforce account uniqueness, both for security purposes and usability, as long as you also store a second copy of the email address exactly as the user entered it and only send email to the latter version.


At most you're allowing duplicate accounts to the same e-mail address. If you want to enforce 1 to 1 identities with e-mail accounts, you could have an issue. But in general, it shouldn't be a security concern.


> And while I could write this as one of those “falsehoods programmers believe about X” articles, my personal preference is to actually explain why this is trickier than people think, and offer some advice on how to deal with it, rather than just provide mockery with no useful context.

Thank you, so much.

I just wanted to highlight that for anyone who looked at the comments to decide whether or not to read the article.


Yeah, this is a really good article. I really wish more auth systems made use of the tripartite identity pattern.


In my 20 years of experience validating email addresses, I've found one thing that works every time without fail:

Send email to it

That is literally the only way to validate an email address. There is no regular expression or algorithm that can validate and/or deduplicate an email address.

You must simply treat every email as unique until you send an email to it and that person proves otherwise.

That being said, this article brings up a lot of important things about confusables that everyone should definitely be aware of, especially if you're going to have public identities.


How does sending email to it prove it's not a duplicate? For example:

    john.doe@gmail.com

    johndoe@gmail.com
Both resolve to the same email at the users end.


Exactly, and then the consumer clicks a link to verify their email and they have a second account, and it's up to them to put them together, or up to you to see that they already have a login cookie and combine the accounts.


A user attempting to game a system like that could easily just delete their cookie or use browse in incognito mode.


well with gmail they are the same email. But with other mail applications, those could be two different email addresses. You are going to need to test all the different email services and find out which do this and which do not do this... which is no easy task


Exactly, so you're better off not accounting for all these special cases. As there's a never-ending list of email providers.

A better approach might be to limit the effect that a new account can have in the system. Hackernews, Reddit, Stackoverflow all do this through reputation. A new account on these systems is unable to achieve much until time is spent proving the account is being used. Thus reducing the incentive for an individual to create multiple accounts.


> which is no easy task

Which is impossible; a number of people run their own email or use a small friend-run email server, and you can't possible discern the delivery rules from the outside.


number of people run their own

Yes. A small number is a number.

If we could go back in time and force every single implementer to follow every relevant RFC to utter perfection (and make sure all the RFCs were perfectly unambiguous), I'd be more sympathetic.

But email is fucked. The sheer number of oddball things, hacks, workarounds, deviations and other bits of mess that implementers have engaged in over the years means the RFCs should be treated as at best a loose hopeful outline of how email might in theory work.


> So if you’re enforcing unique email addresses, or using email addresses as a user identifier, you need to be aware of this and you probably need to strip all dot characters from the local-part, along with + and any text after it, before doing your uniqueness check. Currently django-registration doesn’t do this, but I have plans to add it in the 3.x series.

Isn't this a really dangerous game to play? Just because some major MTA's assign a class of addresses to each user doesn't make each member of that class not a unique identifier in general. Is it worth the headache to maintain a list of various email systems' policies rather than just treating them all as unique?


The email RFCs explicitly say thou shalt not interpret the localpart of an email address, unless thou art the MTA of the domain in question. Even case folding is forbidden. And the wisdom of people who work with email is... the RFCs have good advice here: don't assume anything about how the localpart is structured.

You can generally get away with treating the names as case-preserving (as distinct from case-insensitivity), and you are probably safe in rejecting quoted localparts. But beyond that, even forcibly lowercasing email addresses, is likely to cause problems.


Like it or not, the RFCs have lost. "How Gmail does it" is now how email works in the minds of a stupendous number of people. So if Google says 'johndoe' and 'john.doe' are the same, we're stuck with the reality that 'johndoe' and 'john.doe' are the same.


It’s completely legitimate to use variations on an email address for different accounts on the same website.

Also, it’s useful to make use of +foo or varied usages of dots to create a unique email address for each site: for one thing, it’ll help if one site leaks your email address, then it’ll let you trace the origin of the leak if that email address gets unwanted email.

Finally attempting to deduplicate email addresses before authentication is almost as bad as lowercasing the password before checking if it matches.


There's a line in the Zen of Python: "practicality beats purity". If I can avoid someone filing a bug or a support request by knowing that Gmail has trained people to believe a bunch of distinct (according to RFC) mailboxes actually aren't distinct, I'm going to avoid the support request. The -- by comparison -- minuscule set of users who A) actually understand the relevant specs and B) care enough to yell at me in an HN comment are going to lose that battle every time.


> Gmail has trained people to believe a bunch of distinct (according to RFC) mailboxes actually aren't distinct

I'd be fairly surprised if your average user of gmail knew this: I know it and I use it in part because it lets me _distinguish_ different accounts on the same site. Second-guessing someone who's taking advantage of this feature is more likely to generate tech support requests than not.


Knowledge of the plus trick has gotten pretty widespread. Anecdotally, I know a lot of non-technical people that use a single +spam address to route to a spam folder.

Non-anecdotally, articles with large numbers of views/comments about the trick can be found with a quick Google search on non-techie sites like NYT/HuffPost/BusinessInsider/Buzzfeed/Pinterest/etc. Not that those are definitive, but I think knowledge of this is more widespread than you think.


If you know about gmail plus and dot addressing, and use it for verifying uniqueness, then I'll understand. I'll also probably just use mailinator to make the second and third accounts anyway, but whatever.

If you actually strip the dots and plusses from my email, and start sending stuff to my main address, then I will mark your messages as spam. You need to store the normalized and non-normalized versions of the address. Actually, you need to do this for normalizing on usernames anyway, to make sure you don't mutilate people's Arabic names or anything (Unicode-normalized cursive looks really bad; you need to preserve the original version, while keeping the normalized version around specifically for uniqueness checking).


Without questioning this line of thought, it seems like deduplicating by lowercasing and perhaps removing dots is a good choice, but stripping +suffixes seems likely to generate more user annoyance than it prevents. If I filter based on those suffixes and you send me mail and strip the suffix, I'm going to be pissed.


> attempting to deduplicate email addresses before authentication is almost as bad as lowercasing the password before checking if it matches.

I think it's not even close. You have to transmit the content of the email address to the server, since you might need to email the person. Whether you validate/sanitize/perform voodoo on it there is up to you.

You don't have to transmit the password (because one way hashing), and should never do so.


Gmail doesn't break the RFC here. They just assign multiple email addresses to an unified mailbox.


As long as there's one big school where different dots point to different people, no, it's not practical to strip the dots. Your support problem with asking a user to use the same gmail address they used to register is a lot less awkward than if the user can't use a different one.


So only perform normalization on emails whose domains are known to route "+"-trick emails to the same mailbox. Even if you just do this on @gmail.com, it removes a big swath of users that could abuse your promotionals and waste your time with multiple pseudo-accounts.

A harder question is what you should assume about people who run a "+"-tricky email service on their own domain (e.g. federated gmail) and who later switch to using a service that isn't "+"-tricky (e.g. federated gmail user switches to running their own mail server). What's your default policy: default-allow or default-deny? I suspect the answer will have to do more with the amount of potential revenue lost due to such users' likelihood to abuse the plus trick, and less about the technicals of how to address it.


Or remove incentives for having multiple accounts in the first place (even multiple public identities should be something handled by your system without need for re-log), and stop messing with emails. On one hand, even your assumption that a shared inbox implies one person is wrong (I don't like it when people share inboxes and consider it toxic, but it is what it is), and your mitigations are futile (it's between cheap and free to just have an entire domain point to your one inbox - your 'federated gmail' is easily such a case).


> Many systems ask the username to fulfill all three of these roles, which is probably wrong.

What system with any non-trivial level of use uses the text username as (1) the FK in the database, as opposed to the generated or auto-incremented ID in the db; (2) the login name; and (3) the publicly-displayed displayed "name" of the user for others to see?

Plenty of forums etc use the login name for #2 and #3, and I'm not convinced by this article that that's the wrong way to do it. I haven't ever seen a single professional product that uses the text username that a user logs in with as the actual DB-level foreign key. That's grade school level database design.


When logging in, how do you get that autoincremented ID column? Some more complex variant of "SELECT *.id WHERE username = $1". So functionally, yes the username is the root identifier that pulls the very first record that then allows other joins to occur. But you are right, the username column is not literally the key being joined on.

There is also the security issue that by having the login name also be the publicly displayed name lowers the bar for attempting to make a targeted attack on the site, as well as other sites where the attacker suspects the victim may be using the similar login name. This can particularly be true in cases of harassment across platforms, which while is not a computer science security issue, it is a personal psychological security issue.


> the username column is not literally the key being joined on

That's exactly the point though. If you join on the username than allowing emails/usernames or whatever that identifier is to be edited is very hard. How you identify the row to auth against is literally the point of a username.


Though I haven't seen too many systems where the static login name is actually hidden from the world. It seems to just end up being the canonical ID, so right there in every permalink or whatever.


> Uniqueness is harder than you think

Discord has a very interesting solution to this. They have user names and user ids. User IDs are tied to emails and the user's name seems to just be a random text identity for displaying to users. I assume most of their backend code used a unique, sequential or random, integer ID to identify and talk about users while their frontend just makes the ID to a "user name". As long as you slap account creation behind a verification email and don't mind one user being able to sign up for multiple accounts you side step many of the larger problems that come from choosing user names because, in effect, you are choosing the "Real" username and you can make any guarantees that make writing all of your other software easy.


Blizzard also uses a style like this. While it's great for some use cases, it really sucks for others.

In Blizzard's implementation, I can't add a friend by just knowing their name, I need their id number as well, and the process for finding it isn't exactly front-and-center.


I remember when I’ve switched to bnet ID and my tag became username#1337. People sometimes ask me if I’ve bought the name or something like that.


I got Ambroos#2772 or #2727 (can't remember, it's been a while). Which is fun, because I have a thing with 27 as it's the birthday of all my grandparents (moms side) grandchildren.


well with Discord, you can literally pay to get a different number of your choosing if you pay for their subscription service, as long as the number isn't taken, you can pick whatever number your want as long as it is 4 characters long of course


I really like Blizzard's solution : {$Username}#{$number} is very practical! It complicates thing a bit when you want to share your contact info to a friend, or when you try to remember a specific battletag, but it solves the uniqueness problem. And to be honest, on most sites I end up using numbers at the end of my username anyways, such as "Username0037".


This should be exhibit number one why you should always favor open source libraries rather than writing your own plumbing functionality, especially around authentication and authorization. The onus should be on the developer to explain why the open source library isn't a fit, rather than defaulting to 'roll your own'.

The edge cases discussed don't pop up that often unless you have lots of folks using your software or are really diligent about fuzzing and testing edge cases. If you roll your own, say, username system, you probably aren't going to fall into either of those two cases. Which means you're vulnerable.


but then again, open source does not mean its good software, obviously. there should be some way quickly check if a library meets security best-practices. like a some sort of "vetted software" reference


Also, using a 3rd party library for something as important as authentication because you don't know how it works doesn't sound much better nor secure.

Like storing sensitive data in the authn's session system because you don't understand encryption vs signing nor how to find out -- maybe it's time to just sit down and credentialize as a craftsman.

The authn/z systems I've used that were the biggest headaches in my life were kitchen sink frameworks trying to generalize over everyone's creature features, and they were often tied to a company/community culture of not-gonna-touch-it that only hurt users and security.


I think you should absolutely understand any third party systems/libraries you use, especially when it is as important as authentication. Using a third party component doesn't free you up to be lazy or to use it incorrectly.

My comment was stating that you should default to these types of libraries and only roll your own if you can't do what you need to, simply because they're more likely to handle edge cases that can have serious implications.

Do you do unicode normalization on your usernames? I freely admin that I don't, and wasn't aware it was needed until I read this post.


If you don't fall into any of the affected cases, what exactly is the vulnerability? The security aspect I get, and happen to agree with 100%, but if I roll my own sub-optimal username classification system for my CRUD app that gets 80 users in its entire lifetime, so what? I will have certainly learned a lot more than just typical `npm install usernames`.


Actually, the first step in a lot of Django projects is to rip out the default config and ditch usernames in favor of email addresses for login, and attach some form of unique internal id to the "account" (or whatever billing, social IDs, etc. might be associated with).


Should that be included by default, then?


Yes, but it would make upgrades to the first version that uses it a major flag day. Which limits when it could be rolled out.


Off by default so no changes for existing users?


Probably, but there's disagreement about how to do so, and it's not hard to make the changes anyway.


I agree with everything here except the email addresses example. Yes I do want to register igor@example.com and igor+work@example.com. Those are different accounts, please don’t mess with that.


There may be very good reasons for a site operator to disallow this. I mentioned some of them elsewhere in these comments: https://news.ycombinator.com/item?id=16358625


I also employ the dot scheme for different purposes. Example: johndoe.hn@gmail.com vs johndoe.school@gmail.com so I wouldn’t want anyone aliases without the dot.


For things that are legitimately tied to your physical identity as opposed to an email address it makes sense not to let you do this.


Hey community, shameless plug: For the purpose mentioned in the article to disallow certain usernames, I created this GitHub Repo sometime back. Feel free to submit a pull request :)

https://github.com/dsignr/disallowed-usernames


Seems I should add some more to https://github.com/flurdy/bad_usernames :)


Nice! Maybe we should merge our efforts.


> So if you’re enforcing unique email addresses, or using email addresses as a user identifier, you need to be aware of this and you probably need to strip all dot characters from the local-part, along with + and any text after it, before doing your uniqueness check.

Please don't do this, lots of people (including myself) use the '+' hack to separate accounts for different contexts (business/personal, different projects/clients, etc).


I think he wasn't proposing to remove the '+' part from the address stored, but splitting the address into smaller parts when doing the unicode verification and checking if there's isn't already a 'john.doe+xxx' when you register with 'john.doe+yyy'.


I realize that. What I'm trying to say is that I like to create 'john.doe+projectA@example.com' and 'john.doe+projectB@example.com' accounts with some services.

Checking for the existence of any 'john.doe@example.com'-like accounts would mean I have to register an entirely separate email account or set up (another) email forwarder/alias.


Which is just as bad, since now you need to create an entirely new email account just to have two accounts on the same service.


Also proposing to ignore all dots (.) on account of GMail doing that, while a moment before saying how that behavior is different. So they will break the experience for two people on a dot-unique mail server.



Brilliant! Does it cause you any problems?


I'm using this username for about 6 months now. No problems so far...


If you're going to allow unicode usernames then you should casefold them rather than lowercasing them before normalizing as NFKC.

You should ideally also store a second copy of the username in the original casing and normalized as NFC for display purposes, as some users care a lot about seeing their username exactly as they entered it. (And in fact not allowing this may be seen as culturally insensitive in some cases, much like not supporting unicode.) The same applies to the user's first and last name, which you can store in NFC for display purposes and casefolded into NFKC for string comparison (e.g. search) purposes.

That said, most sites limit usernames to ASCII characters so that they can be (easily) used in URLs. In this case you don't need to casefold or normalize, just converting to lowercase is enough.


If you're going to allow unicode usernames then you should casefold them rather than lowercasing them before normalizing as NFKC.

I wanted to stay out of the Python 2 vs. 3 quagmire in this article, but it's worth knowing that in Python 3.3+, strings have a 'casefold()' method:

https://docs.python.org/3/library/stdtypes.html#str.casefold

Unfortunately, since Python 2 still has around two years of upstream support before EOL, I can't universally recommend people just use 'casefold()', no matter how much I'd like to.


These "battle hardened" articles are fascinating. It is the output of years of experience and learning from real problems. It's building the best practices guides, and building the tools to scan for the edge cases. A great read!


Worth mentioning Spotify's account hijacking problem when using unicode https://labs.spotify.com/2013/06/18/creative-usernames/


When it comes to email address normalization, it sounds like we could do with a standardized way for a domain to express normalization policies.

Could be as simple as publishing a set of regular expression subsitution rules, specifying (for example):

* render to lower case (because this particular domain is case insensitive)

* drop periods (because this domain treats them like gmail does)

* drop '+' and any subsequent characters (because this domain treats them like gmail does)

* ASCII only (because mail software is old, and doesn't support unicode)

Etc.

Each domain could then publish their own rule, perhaps in a DNS txt record, and anyone needing to check if two email addresses alias to the same could run the correct checks.


Some users capitalize their email address, and they expect to see it capitalized.

I think a better solution would be to use a case insensitive collation on the database for the email column.

If the user changes the capitalization of their email, treat it like any other email change (validate the new email via email token)


My strategy is to think of a name so unique that no one else would think of using it. The one on my HN account I originally cooked up in Nov. 2014, and I never had to extend it with numbers to get it accepted on various forums (yes, most of them were fine with changing the nick). My biggest gripe is that since YT changed their username system in October of that year, my most popular channel is stuck on an old username despite being a few months older.


Aside from the authors library, django-registration, what other similar libraries for Python and other languages have taken all or some of this into consideration?

Excellent read by the way. Many things I have never considered or even worried about before.


> 3. Public identity, suitable for displaying to other users

Many sites-- like HN-- may not even need that. If you have system and login identity you can just display "dingus" as the name of every single user and the system should still work the same.


I think without display names having discussions would be difficult, as people find some identifier useful to follow along. Even if it's a pseudonym, it's hard to build a sense of community without being able to distinguish those in your community. Or am I misreading you?


4chan and other imageboards manage to have discussion fine without unique identifiers.


It's a different kind of discussion, though. There is value to each type of threading, but no one-size-fits-all approach--as evidenced by the fact that 4chan lets you opt out of anonymity.


I don't follow the initial premise.

>Well, it’s easy until we start thinking about case. If you’re registered as john_doe, what happens if I register as JOHN_DOE? It’s a different username, but could I cause people to think I’m you? Could I get people to accept friend requests or share sensitive information with me because they don’t realize case matters to a computer?

Just this month we fixed this issue by using a citext column in postgres. So yes, it is easy. Maybe I'm missing an edge case here?

https://www.postgresql.org/docs/9.1/static/citext.html


I assume that's not a standard datatype in all RDBMSs, and the article seems to focus on Django, so I think the author in speaking about ORMs (of course, you generally can also define custom validations in most ORMs, so even a lower() check would help in many cases)


The problem is not hard to fix from a technical standpoint, but from a practical standpoint it is impossible to fix due to breaking too many sites that were created without case insensitivity and that likely contain conflicts.


So citext just compares a lower case value of the string, unless I'm missing something?

If so, the rest of the article covers in great detail all the other edge cases :-)


It’s easy if you thought about it before you have users; people didn’t and then need to ensure that a fix doesn’t break something.


Yeah. We switched to CITEXT for email addresses after 10k users, and had to go through quite a complex process to merge about 30 user accounts that had signed up duplicate accounts with email addresses varying only in case). It was a major PITA


Good for you!

Now, how did you solve the other problems mentioned?


The entire concept of usernames that are unique and permanent is stupid and even "cruel". The reality is that a relatively small handful of privileged early adopters get good usernames that match their identities, and everyone else gets screwed. These identifiers then act like tatoos that you got a long time ago and are stuck with for the rest of your life: people end up reminded every day of a sport they can no longer play due to an injury ("hockeystar") or loves lost ("iheartjessie"), attached to a joke that is no longer funny or to a thought that they found adorable as a 13 year old (when you are legally asked to "choose a username": a modern era coming of age scenario) but which adults find inane, or to a nickname that means something different than you realized to some people and now can't change.

The reality is that there are almost ten billion people on this planet and they live for upwards of a century. You are simply deluding yourself if you think it is reasonable to build a system with unique, permanent usernames. Nothing in the real world works like that, including trademarks. And it just helps enforce the very problem that people try to trust usernames and then get tricked by people who sniped usernames that are tied to other peoples' well-known identities (leading to abused "verified" badge systems and legal challenges and expensive hostage scenarios... it just sucks).

And for what? To make it easier to hand-type a URL? Does anyone even do that? I am super technical and I barely even do that in 2018, as if nothing else there are too many websites in existence to remember all of their one-off URL schemes. Like almost everyone, I either use the site's built-in search feature or I do a search on Google to find people, and let a combination of page rank and personalized results guide me to the right destination. Some web browsers don't even show URLs anymore!

Here is a great example of where it is completely insane: Facebook. There is absolutely no good reason for that website to have usernames for regular users, and they frankly shouldn't have usernames for businesses either. It isn't even clear to me that the app--which most users are using, not the website--even has a way to show people's usernames, which means this is an identifier which somehow everyone knows must be chosen and must be unique and is nigh-unto permanent but which somehow is also simultaneously meaningless but is also a horrible point of contention? What?

I am lucky. I spent a bunch of time in 1994 to select a username, and despite being 13, I was mature enough to come up with something that wouldn't ever come to cause me complex problems. People ask me what it means, and it essentially doesn't mean anything: it has only a positive connotation to me when I hear it, it is entirely neutral, and it had no existing usage I could find. Yet, I also still got screwed, as I am semi-famous, and everyone knows me as this username. I have kids who look up to me enough to want to take my name as a show of support and I have to essentially be the big bad asshole about it because in a world of unique and permanent usernames, people then assume the kid is really me. On the other side, I have been asked to rename myself by moderators of various forums as they couldn't believe the real saurik got an account on their site, and it was "confusing" people.

And so in the end we all have to deal with the worst-case scenario anyway: unless you do nothing but sign up for random sites rumored to be interesting constantly (which I seriously tried to do), you eventually will succumb to needing a way to prove who you are on multiple sites and tie together those identifies. And for most users... as in virtually all "normal users", that moment comes when they are using only two websites, as their username was probably something like jay.freeman.178 as everything that was even remotely interesting to them was taken a decade earlier by literally a different generation of humans, so they let the website automatically generate one.

In a world where everyone is having to solve the worst-case problem anyway, every site should just have numbers as unique identifiers, at most have some kind of trust score for degrees of separation on the site (so you can get a feeling for "is this the saurik that I met?"), and everyone should be trained "names don't matter and if you see someone with that name it doesn't even slightly mean that they are the same person you met last week".


> And for what?

So that you can be identified? (I'm not talking identified in a mathematical sense, but in a informal conversational sense (you know, what usernames are actually used for)). The whole point of a username is that it is the most humanly convenient way to represent a user in text in the context of a certain site.

Because the alternative would be to have thirteen saurik in the same thread debating a topic and you would have no way of distinguishing them. Avatars are an attempt to fix that but it sucks and is bloated for many scenarios.

Sites that do allow you to change username break conversations where people refer to each other using the username (stackoverflow comments are a really common and annoying issue)

Yes, it is annoying when you don't get your first pick but it truly is not a big deal and it solves a real problem.

jay.freeman.178 is an excellent username. You are not your username.


People can change their legal names, I see no reason why they shouldn't also be able to change their user names.

The problem with breaking continuity in forums and other similar networks can be mitigated via dynamic user name lookups (eg how Facebook does `@` mentions - however I have also seen some forums do this as well), supporting in line quoting (like how message boards often work), nested replies (HN, reddit, etc). Granted there will still be occasions when references slip through the net but us humans have a remarkable ability to deduce the context of the written word even when it doesn't always read perfectly.


I have not seen dynamic user name lookups work well in practice though, where the system has made it convenient enough to actually be used consistently. A different approach might be to leave the name as is for past conversations but use the new name for new interactions (and some way of showing the past name, xxx (formerly yyy) for new posts and xxx (now known as yyy) for past posts) - as is quite common that people manually do in the real world (and on facebook). Not perfect and quickly gets complicated.

I'd argue that a username does not have the same function as a name. My name is not an identifier and most have not chosen their name - which is also why (I believe) the first name change is free of charge where I live. In most cases you can also create a different account. Depending on the type of service this might not be desirable but in others it is exactly what you would want.


Slack handles this quite well actually. I know a lot of people dislike Slack for a lot of valid reasons, but they handle a few things right. If I type a message on Slack saying "hey, @alice, what's up?" then "@alice" gets replaced with the person's real name "Alice". When I send the message, the "Alice" bit in the resulting message becomes a link that will open the user's profile. Additionally, if "Alice" ever changes her name OR username, the name updates accordingly.


I agree with you about the identification part. I was wondering whether we could have a user friendly way to have a system of usernames + "something else" that allowed usernames to not be unique while still solving the identification part.

A possible implementation would be to allow the user to give a "nickname" to add to usernames he wants to identify uniquely that would be visible only for them. For example since I talk now with you I could add to your username "user with whom I discussed identities and usernames" and this (or a short version of it) would be shown next to your username from now on.

A more automated way to do this is to create a unique image for the user based on the content they have posted, when they created their account, not so personal but requires less effort from the user, I'm sure such systems exist in many sites to create avatars. Obviously in this machine learning times we could get to do sth much better.


The former battle.net (now Blizzard App) does what many games do and has the form "Username #XXXX", where the xs are digits. So you can be Frank #0001 and I can be Frank #0002. Steam allows you to tag people on your list with user created descriptions. There are definitely applications out there that make this kind of user experience a priority, I feel like the gaming world is ahead of the curve on this point.


Steam also allows you to change your publicly visible username more or less at will.

There's one "account name" which you pick at sign up, use to log in, and can never change (as far as I know), but you can then pick a "user name". The "user name" is what gets shown in any interaction with other users (forums, profile page, chat, in games, etc) and IIRC it doesn't even need to be unique.

Tagging people on your friends list is a great feature to match, since the tags remain when the tagged user changes their user name, which some people do quite often (whether for a joke or just because they got bored of the old one).

It really seems like a great system and it'd be nice if other systems offered the same degree of flexibility.


Steam has some poorly thought out shit; it has four identifiers, three of them unique:

1. Username, used to login, not visible elsewhere, unique and security related.

2. SteamID, id number which is visible elsewhere and used in their api fairly often; not too different from the username other than being public

3. UrlID - by default a "steamcommunity" link to the user's page is their SteamID, but it can optionally be edited by the user to a custom url. This ID is a globally unique namespace. For example, https://steamcommunity.com/id/gaben

4. Display name ... Yup, the display name. Arbitrarily editable.

Having that many things is dumb. Having an editable url component is dumb (it shoulda just been the steamid forever).

All in all, they're not a good example.


A combination of username, steamID and display name completely understandable in my opinion. User shouldn't be forced to login using a randomly generated ID, but something he can remember and has the option of being private by being able to use the display name.

UrlID can be beneficial for the user so the user can choose a easily spellable identifier for the URL in case he needs to share the link often using voice.

Seems like an optimal system except that the UrlID probably is only a use case not that often. But it still won't really hurt anyone. If it wasn't the URL ID it would be steamID, which does nothing to help to remember the URLs. So why is it poorly thought out if it gives benefits to some usecases without making anyone else worse off?


And as a user, that system is _utterly terrible_

"I want to add you as a friend so we can play games together, what's your name" Mrguyorama + random junk I didn't choose

It's user unfriendly


Would you rather be xXTheRealMrGuyoramaXx or mrguyorama_123 instead? I really don't see how it's that different. It solves the problem of first come first serve, without forcing people to come up with the random meaningless bits themselves.


For Battle.net, you can just give out your email address and that works fine.


But now the onus is on other people to create a name for everyone they might interact with regularly. So instead of each person creating one name to denote themselves, each person is now creating multiple names to denote everyone else.

And I don't really see how this is functionally any different to adding digits to the end of your normal username.


In Steam it is actually the case that a user can select their own display name (distinct from the account name used to log in) and change it at any time (and there's very few restrictions on what you can set it to, the system doesn't care if every single one of your friends has the same display name).

This function is separate from the "tagging" feature, which is helpful for keeping track of those friends who frequently change their display names and avatars. It's also possible to view the list of previous display names a given user has had, at least if you're friends with them.


If you're referring to the tagging on Steam, it's in addition to the regularly displayed username. If you're not, I guess I'm not sure what you're talking about.


Discord does this as well, plus you can change you public username on every servers.


It is battle.net again


Glad to hear it, had no idea


It's actually Blizzard Battle.net, now.

I wish I was making this up. Maybe they're trying to get their chat app acquired by Google.


Asymmetric keypairs, where your nickname is associated to your public key. It's the only reliable way to do it across multiple sites. But then you have a keypair to protect.


> A possible implementation would be to allow the user to give a "nickname" to add to usernames he wants to identify uniquely that would be visible only for them. For example since I talk now with you I could add to your username "user with whom I discussed identities and usernames" and this (or a short version of it) would be shown next to your username from now on.

That sounds a bit like SDSI & SPKI's nicknaming functionality. Entities were identified by keyhashes, but you could (and in practice would) give them nicknames or use others' nicknames for them.


> Because the alternative would be to have thirteen saurik in the same thread debating a topic and you would have no way of distinguishing them.

In the real world, people almost always go by their first name, and we don't have this problem. When two people in a social circle have the same first name, we don't turn and say "well everyone has to use their whole name, always, now." Rather, we adjust our names (usually someone gets a nickname, or goes by their last name).

The steam system allows multiple people to have the same display name and it works just fine. Sure, people can troll with it when they join your tf2 server (and then you kick them off).

The blizzard system also works great. The unique identifier is there, if all other forms of attempting to add a friend fail, but mostly you end up working contextually.


In the real world we have faces/voices/personalities etc. and an insane amount of context that we associate with a person, in most cases two people can have and use the same name and there won't be any confusion because the context is trivial. In written form this information is greatly reduced. This gets increasingly apparent on the internet where the number of participants can be huge and the time spent with each one minimal.

Your examples are games, which greatly limit interactions both in number of people and in time. I don't jump in and offer help in a game that ended 4 years ago. Everything is already in a very specific context.

It is often desirable for the username to be consistent across the entire site, if you recognize the username you remember past interactions and conversations which gives you a better context. This is a valuable part of a community.

Sure, there are places where you deliberately want to maintain pseudo-anonymity, where you'd get a new username in each discussion. But that's something else.


In the real world you can see the other person.


My name is Cory. Despite sharing that name with 120k other people in the U.S., occurrences of misidentification due to having the same name as somebody else are very rare and usually often resolved by merely using my last name instead.

If your apps' users report problems with identifying people, just allow users to add more specificity to their username. "People always get me confused with this other user 'chairdude', can I change my display name to 'armchairdude'?"

If thirteen 'saurik's want to have a fun time and create a confusing discussion thread together, so be it.


Mohammad Mohammad has over 2200 profiles on Linkedin.


And unique usernames don't solve that problem either. If I know a guy named "Mohammad Mohammad" and I want to find him on LinkedIn, I already don't know his unique username should he have one.

If I know him in person and ask for his LinkedIn, he can just send me a link to his profile. If Mohammad is advertising his LinkedIn presence, again, he can provide a link to his profile. Adding in unique usernames doesn't help much in this case.


I've actually not signed up for services at all because the couple username variants I tried were already registered.

I was also recently annoyed at being forced to switch to a new system for my credit card, and it's a unified system with all their other cards and banking customers, and still uses "username" (instead of email) as a login, so of course my name was taken. I decided to just append some random characters, and then realized I could just generate my entire username and have been doing that since, when I don't care about identifying myself to others. My password manger saves it, so it really doesn't matter to me.


> then realized I could just generate my entire username and have been doing that since, when I don't care about identifying myself to others. My password manger saves it, so it really doesn't matter to me.

Even better, if they have your email and use it for password recovery, you can basically turn it into a two step authentication by not saving the password and using password recovery every new time you need to log in. Though, that can get annoying if their password recovery takes a while to send.


Creating two-step authentication for yourself, but allowing attackers to have one-step authentication... interesting concept.


If you are making up random usernames on a banking site, attackers aren’t going to know your username. So it is two-step.

This is not a great idea if you have a public profile connecting your username to your email because someone can hack your email.

But you not knowing your password doesn’t hurt your security as far as I can tell.


Assuming an attacker can't know your information is not a good idea.

Your login information can be gained via keyloggers, network sniffing, phishing scams, malware, malicious employees, and all sorts of other methods..

This is why two-factor authentication is so important, to help prevent your account from being compromised in the event that your username and password is.


The part I don’t get is how not knowing your password makes the situation worse. The password recovery mechanism exists whether or not you use it every time you log in.

The way I see it, not knowing your password removes some potential threats around managing that password incorrectly, at the cost of increasing the risk of losing access to your account if the recovery mechanism doesn’t work.


It doesn't make the situation worse, you're the only person suggesting that.

It offers some extra security, but very little. It's the digital equivalent of locking your back door but not bothering with the front door.


>It doesn't make the situation worse, you're the only person suggesting that.

The comment I originally responded to seemed to think so.


If you are making up random usernames on a banking site, attackers aren’t going to know your username. So it is two-step.

The username isn't supposed to be a secret, social engineering will most likely be very easy.


Agreed, but this is true whether or not you’re using parent’s plan of not knowing the password.


As your sibling comment correctly inferred, this would hopefully be done after setting some sufficiently hard random password on the service in question. At least that's how I've seen it described by those here that have mentioned they do it.

A sufficiently large random unknown password is actually significantly less likely to be brute forced than the service itself is to be exploited.


I'm toying with an idea of a SMTP proxy service so you could have a SMTP server on your smartphone. I would like to use Let's Encrypt certificates so my service would be just a dumb pipe. Then you could register using random email for every service. SMTP would handle signal outage as message would be eventually received. Or one could set up secondary SMTP server in DNS. I would like to give an option of subdomains handling. So every email would be on different subdomain. That would make blocking easier.

So you would have addresses like:

foo@mjlptle3sq.emailproxy.net

It would only be for receiving.

I know that there are lots of options already: 20 minutes mail, user+whatever@gmail.com or just a catch-all. But this could at least provide somewhat end to end encryption.


The idea of having easy to use single-use email addresses is good, but I don't think it's practical to run an SMTP server on a smartphone (you need the relevant ports open and forwarded through various NATs) and email anonymisers are already banned from signup on many sites, so *.emailproxy.net would quickly be, too.


it'd be cool if you could use an ip as the mail domain, and simply having an (easy to use) app running locally allowed that IP to receive email.


If you're serious about this, consider running UUCP as the mail transport. It can be tunneled through SSH easily (or have TLS applied to it with something like stunnel), allows either end to initiate a transfer of data (assuming open ports), handles dynamic network addresses easily, and will likely be a much smaller drain on mobile phone batteries.

Plus it can allow sending and receiving of email and files, if you so choose.


I will look in to it.

I'm serious but my todo list is quite long.

I was thinking about dedicated application on phone for it instead of regular mail client. Mainly to provide easy interface to manage big amount of accounts, banning hosts etc. Also then the app could just use service like Google's Firebase Cloud Messaging. That would wake my app and then it would get a message. I hope that it is fast enough that a sending SMTP server would not timeout.


Backup MX servers do this every day. One of the commonuses of differing MX priorities is to provide a backup MX server to accept mail on the behalf of your mail server if it is overloaded or unavailable. This has been used in the past to provide reliable mail transport to mail servers that are not always connected.

One downside has traditionally been that backup MX servers were generally much less stringent in their connection level spam filtering/blocking (since the downstream server is generally responsible for that, and they may be a backup server for multiple downstream servers), so it became common for spammers to send directly to lower priority mail servers to take advantage of this and bypass a lot of that active filtering at the eventual destination. Expect a lot of spam to queue up.

In your case, you actually would control the backup MX and the eventual destination (if it indeed is a separate SMTP server), so that's less of a problem. You could just put a pretty harsh timer on the queued mail, and throw it away after 24 or 48 hours. Then again, you could probably do all this almost identically by replacing the SMTP server run on the client device with an IMAP client, and just have delivery end at your server.


It seems that the EnvKey app/service does this. You only register with an email address, and if you need to log in to the app, it will email you a one time password to use.

Nowhere near as inconvenient as it sounds - It is the sort of service that you would rarely log in to. Mainly when setting up a new server or adding a third party service to your list of environment variables on your server.

I can see that for a service that you would use several times a day (Twitter etc.) then it would be a major PITA.


> I can see that for a service that you would use several times a day (Twitter etc.) then it would be a major PITA.

Well, even then, it depends on how long sessions last in combination with your browser cookie policy. How often do you authenticate your HN account?


True. Long running sessions would help. There are those that would argue it is a security risk though.

Even EnvKey that I mentioned above has a session cookie of some sort - I can usually use the app for several days after logging in, even if I close the app - but after that I am prompted to instigate the email with my unique login key.


It would be nice if we eventually got to a point where control of whether a password was even allowed, how long your session cookies lasted, and the ability to list and invalidate all existing sessions was as common and expected as a password reset system.


Although I disagree that it's two step authentication in the general use of the term, I actually built this type of authentication flow into Remarkbox (https://www.remarkbox.com)

It works really well for most users although it does have some quirks.


How is it not, even in the general use of the term? Instead of site username and site password plus a separate token to a previously agreed upon authenticated service (whether phone or email account), it's site username and site account email (hopefully hidden), and a token sent to a previously agreed upon authenticated service.

If your account name and associated email is known, it's not really better than a username and password (except that it's delegated to what should be one of your strongest accounts that you protect more diligently), but if the email is not generally known for that account name then it's extra identifying information that must also be known to access the service account.


I’m annoyed at banks that require my username to have numbers in it but only let me pick an 8 character password.


That's because your password needs to be "easily" entered through the phone call service.


High quality speech recognition is ubiquitous in phone services these days so that isn't a good reason. Besides, if I want to set a 16 char password and am willing to enter that on a keypad, what's the problem?


Don't worry, you can enter 16 characters, but it only will care about the first 8, everything else can be random when you log in.

Also, if you have an a, b, c, or 2 in your first character, you can use any of them in place of that character index to log in.


As banks have merged and eaten each other over the years, I've ended up going from four different website to log into to one website with four different usernames.

There's no way to merge the online accounts, even though the banking accounts are merged and I can see all of the financial information from each no matter which login I use.


> Facebook [...] nigh-unto permanent

I found a way to change it and it still worked last time I tried.

You must install Facebook Messenger. I am using iOS, don't know if the Android version is the same.

Keep in mind that your old username will no longer lead to your profile once changed. For me this was exactly what I wanted but for some they might want to not change their username after all due to this.

In the Facebook Messenger app, tap your profile picture in the top left corner. This brings you to a screen with the title "me". Right under your picture it will say "username m.me/yourusername" where yourusername is your actual username. Tap on your username and select "edit username".

Once you've changed your username in the Facebook Messenger app your identifier on Facebook itself will change also so now when people go to your Facebook profile on facebook.com in their web browser they will see your new username in the address bar.

Figuring this out was actually very difficult, as most information online claimed that your username could not be changed.

Because it somewhat seems to me that Facebook also don't want people to change usernames I ask that everyone who reads this keep that secret. HN pages usually don't rank highly on Google so mentioning it here shouldn't matter too much.

If any Facebook employees read this, please either

a) Make it easy for others to find out by updating official documentation, or

b) Make it easy to change from the main facebook.com application, or

c) Forget that you saw my comment.

As I'd like to be able to change my username in the future as I have done in the past.


You should be able to change your username in the desktop site settings: https://www.facebook.com/settings?tab=account&section=userna...


That does mostly work ( I only have a FB page with a ficticious name, which I change from time to time ) but it leaves a trail of the old names here and there.

For example when I send a photo by Messenger it is attributed to my ( n-2 ) name.

One could probably infer something about the state of Facebook's architecture from further study.


> In a world where everyone is having to solve the worst-case problem anyway, every site should just have numbers as unique identifiers...

ICQ did that. Though it still led to interesting results, because lower numbers were thought to be more valuable, and people were buying/selling those.

Perhaps a random numbers with the same number of digits or UUIDs may work without such issues. :)


I remember my ICQ number, not used since 2001 I remember my compuserve number too -- 101611,1220 - not used since 1996. I remember our first phone number as a kid, 818641, which we changed in the early 90s too. I recall friends numbers too, and a bank account I had from 1993 to 2004.

However my slashdot number, which I've had nearly 20 years, I know nothing more than it begins with 2.

The modern numbers I remember are my mobile phone number, my wife's, and my passport numbers (phone numbers as we've had them over a decade and all of them because I have to write them on forms so much). The only other numbers that spring to mind are my staff number at work (used in various forms, had since 2003) and my bank numbers (needed to log on)

If you use a number a lot, you learn it. If you don't (like usernames which are saved) you forget it. I can barely remember my credit card pin as I use contactless so much, but muscle memory seems to work there.


> I remember my compuserve number too -- 101611,1220

I was (am?) 72167,3530. Does that make my ID older or newer than yours?


Yes


> Perhaps a random numbers with the same number of digits or UUIDs may work without such issues. :)

Then people will be buying/selling UUIDS which are easier to pronounce or memorize. People will always consider patterns more valuable.


Indeed. When I was hunting for ICQ numbers (see my other post how I did that) there were 2 golden aspects:

1) Short number.

2) Repeating digits like say only containing 2 or 3 numbers.

One of these was great, but both? Jackpot.

You could add a third factor: keypad pattern. It never occurred to me I'd use keypad to remember the number TBH, but IIRC one of my friends did care about that. I'm actually frightened by that option in Android I kid you not; I am frightened I forget the pattern!

Of my own numbers discounting the starting 1 (I personally did not care about that one but I know others did) one ended with 0's and the other one only contained 2 different digits with one being twice the other one. Extremely easy to remember.


I noticed that in many gaming related systems, this is already kind of the case. Blizzard appends random numbers to the end of each username in order to avoid name clashes, Steam lets you change your displayed username (although your account is still accessed through the old one), and so on.

UUIDs were also my first idea, but I have the feeling that sharing them (i.e. to invite a new friend) would be cumbersome. I wonder if a new system akin to what3words.com could help there.


You mean modern Blizzard does that.

Oldschool Battle.net had a 16 alphanumeric characters with underscore allowed and that was that. At least for Warcraft 3 it's been the case since 2002 for a very long time (until quite recently last year they allowed fancy symbols in usernames).

You also had to login at least once every 3 months or Blizzard purged your account.


> You mean modern Blizzard does that.

Of course, yes. I doubt this problem came up with the original Battle.net accounts :-)


Many people used @hotmail.com addresses back in the days (or other free e-mail providers) to register their ICQ number. Heck, you could even search for people on ICQ who were using @hotmail.com addresses. Eventually, those @hotmail.com addresses expired, and you could reregister them. Once you did that, you could recover the ICQ password, and bingo. The old UINs (what the UUID was called back then) were often not in use anymore (my memory is vague if I ever encountered one in use, I think it happened once and I struck a friendship with the one person who msged me). I traded many of these UINs away to friends. Even told some friends about the trick. I never sold them. Eventually the supply dried up.

The weakness lies partly in ICQ: they allowed to easily find all these people using @hotmail.com e-mail address and even showed this information. Sure, you could disable being part of this feature (IIRC it was called "yellow pages" or something akin to it) but still.

The other part of the weakness is exactly the very issue of domain squatting, username squatting, e-mail squatting or whatever you want to call it. I understand Microsoft wants to save space on their e-mail servers back in the early '00s but: former username should be frozen and their e-mail could be either bounced or silently rejected to /dev/null or whatever's the Windows equiv.

Blizzard's WoW has the rule that you you can only get a username from an inactive account. An inactive account is an account which did not play the previous expansion. That's their compromise. To be fair, it is not like people use WoW usernames for password recovery.

As for using numbers as username: that is what UNIX does under the hood, it is what Facebook does under the hood as well, it is what Blizzard's WoW does under the hood as well, and what T9 converts to as well, and ICQ did as well in contrast to MSN. Turns out people are lousy at remembering a bunch of numbers. So they resort to 26 character system of letters, or 36 character system of letters plus numbers. (Some services are more or less strict.) So, no, using numbers as human-usable UUID is not a solution but using it under the hood is totally OK.


The article does a very intelligent job of disentangling system identifier, login, and display names (and helpfully links to a discussion of the tripartite identity pattern†), which obviates a lot of what you describe here.

http://habitatchronicles.com/2008/10/the-tripartite-identity...


Yes, but it does not bother to motivate why people should use "tripartite identity" (and even that linked article doesn't really try to make an argument); and then it goes on to talk about why uniqueness is hard from a technical level, leading people to be responding with comments like "I just included a library in my stack or chose the right type for my PostgreSQL column that fully handled the case and Unicode mapping, so this wasn't a big deal".

The real problem is that unique and permanent usernames serve as tatoos (which people later may find to be humiliating or depressing), disadvantage late-comer non-technical users (who will almost never have a good username and almost never will have the same username on two websites), and lead to weird problems with assumptions people make about what usernames even mean (that they are a signal for identity) that are simply not true.


How would you handle making them non-permanent, though?

Expiring hotmail addresses have been problematic in many cases, and any unique lookup string will eventually be stored somewhere by someone and assumed still valid later on.

It's not even solved in full for phone numbers, despite everybody knowing they can expire and be reassigned - since long before our own lifetimes.


> every site should just have numbers as unique identifiers

Do you realize how impractical it is for users to remember these numbers for every site? Until we get to the stage where every non-English-speaking user and their grandma finds a password manager convenient, this proposal won't even pass the laugh test.


Don't underestimate the ability of non-technical people. Issuing membership numbers has been standard organisational practice for centuries and continues to work today for everything from airline loyalty schemes to international sports associations.

Case in point, I operate a service that uses numeric identifiers. Going by the helpdesk queries, our users are more likely to get their email address wrong than their membership number.


How often do your customers need access to these membership numbers, and how many such membership numbers do they deal with in their lives? If every service they used did the exact same thing as you do you think the error rate would still be so low?

And how often do they forget their numbers (rather than getting them wrong)?


These questions might make sense to me if the notion of a membership number was some strange new construct that we should adopt warily until proven to work.

It's amazing the results you get from treating your users like competent human beings rather than idiot cattle.


> These questions might make sense to me if the notion of a membership number was some strange new construct that we should adopt warily until proven to work. It's amazing the results you get from treating your users like competent human beings rather than idiot cattle.

All this snark just to dodge obvious questions about your approach?


Your demands for additional data have crossed into crude sealioning. I already offered the key data point that represents our concerns, viz. that users are more reliably remembering their membership number than an email address.

Speculation on the potential scalability of this approach seems absurd given that membership numbers have been successfully used by organisations of every scale for centuries.


Speculation on the potential scalability of this approach seems absurd given that membership numbers have been successfully used by organisations of every scale for centuries.

The concern is not with the scale of the organization, but with the number of organizations a user is a member of, which has exploded since website accounts appeared.


It really hasn’t. The number of organisations that wish to track people for marketing purposes has exploded, with all the unnecessary account creation that goes with it. That is not membership, and these are not organisations worth your engineering expertise.


Well, before the Internet I would never have needed an account in a club race management system; I would probably just have used a bunch of unwieldy papers.

While I don't disagree that there's a lot of useless account creation, I'm still member of at least magnitude more of useful online accounts that I or my parents ever were offline.


I would propose that maybe they can't remember their membership number, so are forced to look it up on a piece of paper. Whereas they believe they know their email, and maybe make typos.

I know that whenever I have to contact a service for which I only have a long numeric number I have some reference handy to make sure that I don't fuck it up.


> Your demands for additional data have crossed into crude sealioning. I already offered the key data point that represents our concerns, viz. that users are more reliably remembering their membership number than an email address. Speculation on the potential scalability of this approach seems absurd given that membership numbers have been successfully used by organisations of every scale for centuries.

Right, sure.


> Do you realize how impractical it is for users to remember these numbers for every site?

That's really not an issue. It isn't much easier to remember that I'm "John28161" on a busy site. Websites have been offering "I forgot my username" functionality for ages, so as long as you remember your e-mail address, you're fine. Also, just let users bookmark their personal profile page (foo.site.com/user/83755567565) easily and it's solved even if cookies are deleted. Apps won't have a problem either way.


You might as well have them just log in with their email address in that case, which brings you back to the original "cruel username" problem.


The number is not confidential, so people would not have to remember it, nor would they have to use a password manager.

Digits are much easier to spell over the phone than a mixture of letters and digits.

People are used to identifying themselves with a sequence of digits. If you're dealing with the tax office, the water company, the electricity company, or whatever, you get asked for your customer number, your meter number, your reference number, and so on, and usually these consist mostly of digits. Sometimes these identifiers are way too long, or several different identifiers are unnecessarily used, or the same sequence of digits is confusingly referred to by several different names ("customer reference", "account number", whatever), but those are separate problems.


> People are used to identifying themselves with a sequence of digits. If you're dealing with the tax office, the water company, the electricity company, or whatever, you get asked for your customer number, your meter number, your reference number, and so on, and usually these consist mostly of digits.

It's not a level comparison though. You don't need those numbers often, and when you do, you have all the resources you need with you, and it's not urgent. You generally need those when you call customer service, which you only do when you're at home and have time to spare. But if every email, messaging/social networking app, bank, credit card company, shopping site, etc. required me to go hunting for a 10+-digit number, I wouldn't have any trouble understanding it, but I would get fed up pretty fast.


That's why we need biometric hardware everywhere and use its data as a login, not as a password. Bio data is mapped onto a long UUID and user just sets whatever username he wants to be displayed. We even have means to smoothly transition from no hardware to 100% coverage - just allow manual UUID input for systems where biometric is unavailable - e.g. you have a pair of face/ID on the phone, fingerprint/ID on laptop and just ID on PC.


Serious question: what happens when you lose your finger? Or you have an accident and your face gets mangled? Or a ball hit your eye and you lose it?

You have to remember your UUID? Probably it would be more like a file than a string.


You keep your UUID as a backup. The fingerprint system on my phone still has an option to log in with password, the fingerprint is just a convenient, faster alternative. It's the same here, you keep the UUID in a folder with other sensitive documents and when you lose your finger, you fish it out, log in with it, and register another finger.


You'll have to remember multiple UUIDs, one per service. Now that I think of it will be rather cumbersome. And bio data has to be editable if it will be usable at all i.e. add/delete new entries - fingers, eyes etc. Damn, it is harder than it seems. Anyway, bio data as a login should be implemented, we only need to think exactly how.


And gain the ability for anyone to link and track your identity across all services you use. The NSA will be pleased :)


How? UUID will be different per service. Anyway - it won't be worse than current single (2-3) email as a login to everywhere, of facebookID as login to everywhere.


Sure, I can see this working in principle in a distant future. Not in the current world or the foreseeable future though.


No. Just a token like a yubikey. Biometrics doesn't help.


Especially when you want to print this on your business card, or spell it out over the phone, or type it into a friend’s contacts app.

100595964940551841549 isn’t exactly usable for that. Even phone numbers, such as 01573 0677867 are much shorter, and they follow a pattern.


Would a number half as long as that not be sufficient?


That number is the value Google currently uses to identify accounts. 100001702987293 (Facebook’s) is still too long.

And in either case, someone will end up with huge, unwieldy numbers. By using ascii identifiers, they’ll be 3 times shorter, though. Facebook’s number is just "WvN1+8Yd" in base64.

And at that point, why not allow people to choose identifiers? John.Doe.123 is still much more readable than 100001702987293


I wonder what happens if you still get to choose a username, but the service appends at least 3 digits (randomly; the first doesn't get 001; in fact let's say nobody gets a number below 101), and does some mangling especially for really short usernames (appends something or prepends something).

This gives every 'username' an effectively unlimited space of identifiers via the number suffix, and it trains users to realize that 'person who controls the account with username X' is not necessarily the same person as 'person who I know on other sites as username X'.


Discord (and I believe Blizzard/Battle.net) does this. Your site-wide username would be something like Example#5436 but on individual servers you can be tagged with @example


When Google introduced account names in G+ (separate from gmail account names), they started by doing this. But I suspect it was a fairly big turn-off for people who are used to getting their usernames.


Alternatively, the service provider could implement a procedural jingle generation system to produce a short, catchy customized song to help them recall the number.

0118999881999119725....3


12 digits would be more than enough.


Counter point: My grandma could easily memorize her own phone number - and all her friends phone numbers. ICQ also survived for years with numeric identifiers.

Though yes, i agree that once the nr of accounts start going above 1 or 2 a password manager will be required.


Exactly why I said "for every site".


Password managers really do need to be forced on people, not only because of unsafe password issues but because of username issues listed in the OP as well.


At the risk of going a bit on a tangent, what password manager would you recommend for my grandma?


Use a paper notepad. Generate passwords by opening a dictionary at random for 3 words, with a random number at the end.

It’s not as good as, say, 1Password but it’s more likely to get used. Combine it with the browser or OS level password manager. It’s good enough for grandma, definitely better than “kitten4” that she’s currently using everywhere.

On a tangent, stereotyping this as “grandma” is a bit unfair. Most of my colleagues are college educated males in their 20s, some of them developers. And their passwords are rubbish, with no password manager, and no 2fa.


Aside from how painful that sounds, paper notepads can easily get lost. And if she's out and wants to check stuff on her phone (or trying to check her bank account at my aunt's home, or whatever), is she supposed to carry it all around and risk getting it stolen? If that's the implication, I'd rather she just have kitten4 at that point.

(And re: the grandma thing: it's nothing specific to grandmas, it's because the moment you suggest your audience is "college educated developers in their twenties" as in your case, people throw the notion of UI/UX out the window and recommend you suggest they compile their own kernel first. It seems you just can't win.)


If we make a crude risk assessment, it is way more likely that her account will be randomly hacked by a botnet if she has "kitten4" as a password than someone actively stealing her purse to get her passwords. And if the notebook with passwords was stolen/lost, she would at least know it and be able to take preventive measures.

For most people, writing (good and unique) passwords down in a notepad is a way more secure system than having the same bad password for every account.


You only need unique passwords and a username.

Having a botnet guessing the random "kitten4" password for a random user account, is as likely as having your purse stolen for the passwords on that note. FWIW "m" is almost a secure password on a root account with an SSH that allows password authentication, even if you allow brute force attacks. Imperically speaking, obvisouly it's going to fail in the end but I hope you get my drift.


> FWIW "m" is almost a secure password on a root account with an SSH that allows password authentication

This is very counter-intuitive. Is the idea that guessing both the username and the password together is much harder than guessing the password when you already know the username?

In the kitten4 example, I would guess most botnets are working from a list of usernames/email addresses that they got from leaks.


Thanks, I missunderstood GP about how kitten4 was used.

> Is the idea that guessing both the username and the password together is much harder than guessing the password when you already know the username?

No, to be clearer no one in the last 6 years has ever tried "m" as a password on my root accounts.

I feel very strongly that there is too much stigma around passwords, kitten4 is a nice password if you use it only once.


We are obviously talking about a different stereotype. My “grandma” already keeps various notepads - recipes, appointments, address books. And she never has an urgent need to check her bank account while at Auntie Rita’s. As such, this fits her needs and workflow.


Yeah. In fact most likely, she's already written down "kitten4" in a notepad somewhere, because she doesn't trust herself to remember. So asking her to use a slightly longer password is not a massive change.


> Use a paper notepad.

That's what my grandpa does. After failing to find his gmail address in it, he went through the "forgotten password" process. Then, after needing it the third time, we found the old password in the notebook, which was now wrong...


3 words isn't nearly enough. Typically you'd want at least 6-7, or ideally 8-9.


Depends on what you're protecting. Try not to lose sight of the idea that security doesn't exist in a vacuum.


Xkcd's classical correct course battery staple is about 40 bits is entropy, while being selected uniformly at random from a fairly large pool of words.

I can assure you that the average user wouldn't get above 15 - 20 bits with self selected words. That's often worse than most current passwords.


Get her an iPhone or iPad, and have her use the built-in password manager.


Anything using cheaper/more common hardware? so the user doesn't have to buy new hardware and switch ecosystems just for the sake of being able to manage passwords? (i.e. anything PC/Android?)


Is your grandma REALLY tied to some ecosystem?

Is she a Visual Studio Code developer? Does she need to manage Docker containers?

Security does require new hardware because iOS is leaps-and-bounds better than any other system.

There is no other option. Nothing else comes close.


I mean, if you agree that the only option for mass adoption of password managers is to get people to shell out $$$+ for new hardware and switch ecosystems, I rest my case.


Not sure what your case is?

Were you expecting security in broken systems like Android? Instead of forcing security onto a broken system, just avoid that system?

The fist step of security: stop using Android.

And you haven't explained why your grandma is tied to an ecosystem. I'm honestly asking if she's a developer or not?

What is her use case? Why does she need to be on a specific platform?


As developers we might be used to paying 100s of euros (or signing up to a contract to effectively do the same over time) on a phone, but the point is $grandma may not be willing to spend even 20% of iPhone budget (and definitely not replacing it by the time OS updates end)


You don't need to be a developer to be tied to an ecosystem. Maybe she wouldn't want to lose her Candy Crush purchases?


Is your grandma Grace Hopper? Or someone more stereotypical?


Not quite Grace Hopper, just Hedy Lamarr. :-P

(Wouldn't the question only make much sense in just one of those cases...? Not sure if I'm missing anything.)


There's a case made that statements like "so my grandma can use it" is (unintentionally) implicitly agist and sexist -- Grace Hopper worked in computing until her death at age 85.


KeePassXC on desktop, KeepShare on mobile.


KeePassXC doesn't support synchronization though?


I use KeepPassXC and the Android app and just email the database to myself if I update it. Not very elegant, but I couldn't think of an easier way.

I tried using Google Drive to sync it up, but Drive is useless for this - it doesn't open the file using the right intent on Android ("file type not recognised" or something similar it says, this used to work as well) and the Drive website makes it a pain to upload an updated file even from the desktop using Chrome.


Emailing your database to yourself after every change sounds... very painful. And error-prone.

In my case, I use KeePass 2 and KeePass2Android with Google Sync and it works decently well (I would recommend you try this). I would never recommend it to non-technically-minded folks though.


Nah, doesn't hurt at all :) The db doesn't change often so isn't a big deal.

Sync looks to be for Google-domains/business only. In fact Wikipedia says it has been discontinued! I used to sync over owncloud and that worked pretty well, but the provider shut down and I haven't gotten round to setting another up.


I'm confused, would you mind clarifying? What is for Google-domains/business only and has been discontinued? I'm using the software I mentioned with a regular @gmail.com account and it syncs fine with my Google Drive. I don't have gSuite/a business account/anything else.


You weren't talking about this? https://en.wikipedia.org/wiki/Google_Sync - I guess not. I think the sibling poster cleared it up though, the app I'm using is pretty old and doesn't integrate well any more in Android, there's a newer Keepass app that works with Drive natively.



KeePass2Android supports Google Drive natively. Open that app instead of opening the specific file from Drive.


Ahh! I have been using Keepass Droid, since I was using Keepass v1 files. For a long time Ubuntu LTS didn't have a good v2 client. A while back I upgraded my database to v2 on Ubuntu but stuck with KP Droid on Android. Maybe time to change app, thanks.


What kind of synchronization? Keeping the database in Nextcloud/Dropbox should work fine.


Interesting... that actually works fine? What happens if you make an edit to your password database on your phone, and then make another independent edit on your PC, and then they both get a chance to synchronize? Do they both persist, or do you lose one?


I can't honestly remember, it either lets you choose which file to keep or creates a copy. Might be that Dropbox and Nextcloud even behave differently. If I edit the file on mobile, I make a point of triggering the synchronization right after to avoid the problem.

If you're on Android, Keepass2Android [1] is an excellent app that implements the input with a special keyboard. This avoids risking your password via the clipboard. It even comes with a no-network-permission version!

[1] https://play.google.com/store/apps/details?id=keepass2androi...


I use Dropbox. I haven't actually tried that case since I rarely change my database from my phone. That said, I have had conflicts between two computers, but KeePassXC's built-in merge tool have fixed those nicely.


Dashlane. Synchronization included, easy to use, works both offline and online.


Being unable to move past an old username without having to give up your history is rather uncomfortable too.

During my freshman year of college a particular sandwich shop hired a spokesperson who shared my first name. One thing led to another, and the name of that shop became a lasting nickname.

Unfortunately that spokesperson turned out to be quite a monster, leaving me in a bit of an awkward position on sites that don't allow username changes.


Don't worry about it, Jared. No one thinks you're a creep, and if you create a new account, you're only sacrificing internet points, which are worth even less than bitcoin.


> And for what? To make it easier to hand-type a URL? Does anyone even do that?

This dilutes your otherwise excellent point. URLs are great when done right and lots of people prefer them to the sites search functionality. But that is entirely orthogonal to identities, which barely ever need to show up in a URL. (Unless treated as permanent and uniquely attached to physical people, which we agree they should not.)


> Here is a great example of where it is completely insane: Facebook. There is absolutely no good reason for that website to have usernames for regular users, and they frankly shouldn't have usernames for businesses either.

Uh, Facebook doesn't have usernames, and haven't had usernames for as long as I've been able to be a member.

There's an option to grab a unique identifier for your personal page, so that you become https://www.facebook.com/identifier, but it's completely optional, it's just a vanity thing.

Same for groups, they can grab a unique identifier, or stick to their auto-generated id.

Same for businesses.


In Facebook I can go to Settings > General to find an option to create a username. Been like this for as long as I can remember.


Yes, but they're not required, you can't use them to login, and they're not displayed to other users. They're only used to create a nicer URL for the static link to your personal page.

Actual usernames typically do all of the above.


Well, it adds an air of authenticity when everything lines up right as you just the validity of a page. Let's assume for a moment you were the type to look for support on Facebook for your bank, and you looked wherever for "MyBank Facebook".

First two results are:

facebook.com/mybank

facbeook.com/mm48283df884

Which is the "real one" based just off this information? How do you know that when you message one of them you're getting the actual bank and not a fraudster?

It's not just vanity, people do check these things, not all of them savvy enough to continue researching. Just peek at "safe browsing tips" you'll see from tech rags online and it's pretty clear we do a poor job of educating people about proper vetting online, so you get people instilled with dogmatic understandings of security. ("The URL clearly says Mybank, so it's the real one." "Google actively removes fraudulent websites from the top hit, so it must be the real one", etc)

It sucks, but it is important to try to control for such errors.


> they're not displayed to other users

They are shown in the address bar of your profile page once set.


> you can't use them to login

Incorrect, I can use my username in the email field and log in.


Around 2009/2010. I actually set my alarm for the time they opened them up, which was the middle of the night for me. My wife managed to get her first name as her username. Worth it for 10 minutes of interrupted sleep!


I was lucky enough to get a 3 character Instagram handle back in the day. I get 5+ password resets a day and frequent offers for it - the most I've been offered is $25,000 for it. It's a pain.


It occurs to me there's probably a whole industry built around snatching up interesting account names for all the new services in the hopes that one might make it big. It's slightly complicated by the fact that different norms develop for different platforms (e.g. @realDonaldTrump on twitter because of what I assume were satire accounts).

There are probably people out there with spreadsheets full of service types, account names and passwords of accounts they control that include all the two letter to four or five letter company names, and many celebrity names, just in the case that someone wants to pay for it. The domain name game, just evolved for the current climate.

I mean, it would take me less than $25k worth of my time to build something to automate this, even if I had to get rotating IPs, mobile accounts, and have mechanical turk to solve CAPTCHAs (although with all those features it might be close), and you were offered that for one account.


If it's such a pain why don't you sell it for 25K


They wanted to pay via PayPal. I'm not accepting anything reversible for it.

Also, I don't want to be out of it if they report to Instagram that it's for sale etc.


Go big, ask ten times more and feel relieved from that pain.


is your account yielding you more than 25k in discounted cash flow? if not, why not sell?


More generally, only very rarely should canonical identifiers for computer use ever be meaningful to humans. The (potential, and conventional and UI-encouraged) meaningfulness and tree structure of the HTTP URL path component has probably contributed seriously to the Web's link-rot. Having a tree-structured, human-readable name at the top of the browser window is great! It just shouldn't be the URL.


This was solved within corporations to an extent 50 years ago. Institutions that used IBM mainframes tend to use an id of the form aaann where a is a letter and nn a number. An example is Matthew Garret's id "mjg59". Large corps still use something similar. You can have a separate mail address and aliases but they are not your id.


Names changed to protect the guilty:

I joined a Big Dinosaur Company early enough that they were still using mainframe RACF as the system-of-record for authentication, flowing downstream to LDAP. So indeed I received, for example, dlg28 as my ID and stem for e-mail address.

However after several years the SoR was migrated to Windows LDAP ( can't remember its brand name ) and it generated 'sensible' IDs for all the newer staff. So someone received JimSmith as ID & e-mail address.

We oldies felt old and uncool! So a project introduced self-selected e-mail aliases for the oldies, which then led to interpersonal conflicts because jsm22 wanted JimSmith@, but the 'new' Jim Smith already had that... But jsm22 felt he had title to it since he had worked there 40 years etc etc So he was given JimBSmith@ which of course led to misdirected e-mail. Hilarity ensued.

I'd rather they had never introduced the long-form IDs at all!


Interesting story, but it seems like it's the oldies who are at fault in this tale.


By the way, happy 3000th day on HN :)


To solve this problem in part, keybase exists. At its core it's a proof-of-identity service, with some fanciness on the side. Sure it doesn't fix the root of the username issue, but it helps a little bit, while having cool things like kbfs and kbp.


Publicly guessable addresses can still be a pain like sequential numerical ones. https://en.wikipedia.org/wiki/CompuServe#User_IDs_and_e-mail...


I'll give an example of a permanent username that you can't change: Steam. For obvious reasons, Valve is way too underhanded to do anything about this. Nothing malicious, just inept to try because there's just no competition to.


Thats true, but your username is not visible to other users, even for stuff like profile pages.


My Steam user name is a nickname given to by and proudly worn in a small, tight-knit online and real life community.

Having had a major fall-out with them fifteen years ago, it always stings when I have to manually enter this user name.


It is in rare cases like Library Sharing.


> I'll give an example of a permanent username that you can't change: Steam.

No need to go that far... just look at the URL bar right here. :-)


Does Facebook actually have usernames the way other sites have usernames? I'm Bryan Rasmussen on Facebook not rasmussen.bryan127 or something, and if I search for somebody by name they recommend lots of people with that name.


There is a unique username and URL for every profile.


We used to laugh at those that chose personalised number plates, and yet there is probably more utility in one of those than a vanity url.


about the usernames for businesses I just spent some time being confused today because it turns out there is a Viasat in the nordic area, and I thought it was somehow related to the American Viasat.


TLDR all the way..

Yeah so what’s your alternative again?


> names don't matter and if you see someone with that name it doesn't even slightly mean that they are the same person you met last week

If this is your belief why have you stuck to a specific name, to the point of signing up to random sites just to take possession of it?


No, the reality is that everybody got used to it and nobody cares. What serious pages are there where you could not just create another account if you start to being ashamed for what you have right now? This is seriously a no-problem and giving people numbers is the worst way you could go from here. We are not some kind of 70s science-fiction robots who would love to be called by numbers....damn sometimes I have the feeling that commentators on here are robots. Please go outside some time and ask real people what they think about the new genius idea that just came into your mind.


The reality is that usernames are seen as a computery tech internet thing rather than a business thing. Unofficial, maybe even frivolous.

In business you have customer account numbers, bank account numbers, membership numbers, invoice numbers etc. There is no conflation with identity - you are not your account with your bank, gym or stationary supplier. It is only because of internet forums and login usernames that we have even gotten to this state of affairs in the first place.

Usernames (or internet aliases in general) were a routinely mocked part of internet culture by mainstream culture. People these days use services in spite of usernames rather than because of them. The president of the united states has 'real' in front of his name. Think about that. In no other medium do we have people asserting that they are the genuine person they are claiming to be. Its tautological.


Dunno, what decade were people making fun of usernames? Starting back when I first downloaded MSN/AIM, people were having fun coming up with usernames and trading them. Everyone at school. Everyone.

People weren't just tapping random key combos either. They were coming up with their own identities because that's what people like to do.

I can't believe these comments that think people are itching to be assigned a #Reference ID on the internet. For example, you also think the mainstream used internet message boards.


In business I have mainly my mail address. It's forename.surname@company This is the most common identification uniform. Pretty easy, looks good on your card and can be remembered by the client who knows your name.

Everything else you've listed is part of a identification procedure that happens mostly between you and a machine. Not between two humans (taking out the call center semi-human who needs to type that into the machine first).


> What serious pages are there where you could not just create another account if you start to being ashamed for what you have right now?

Depends what's associated with the account, no?

If it's something like Steam, your username might be tied to hundreds of dollars of purchases.

If it's Slashdot, you'll lose your sweet low user ID.

If it's StackOverflow, you'll lose your various reputation scores.


Thankfully, for steam your username is not displayed to any other person. You have a profile ID (globally unique, alphanumeric, you can change as many times as you want), and a profile name (not unique)


Unfortunately for Steam, your login username is still displayed in some places like the top right of the Steam app and any computers you choose to share your library on.

It's also likely that they used the login username as a primary key, which means it's unlikely to be able to be changed anytime soon.


you can't change your profile id though. That's what's called a "steam id". You can change the URL (steam custom url) however. But accessing your profile with the steam id will still work.


I'd prefer user names which were changeable rather than numbers; but lets be honest, a numeric only system isn't without precedence: ICQ.

To be honest, given user names are just an arbitrary reference, you could probably also include phone numbers, IP addresses, social security, national insurance and house numbers into the list of prior art as well.

Not that I'm advocating the use of numbers instead of names. Twitter I think gets it right where they give everyone a number which is fixed but you can assign yourself a name; which can change. Most of the time people choose not to, But the option is still there.


I come from the ICQ time (...hell I still have it). The fact that you had a number, did not make it good in any way. At the same time we already had IRC where everybody had usernames and it was much better if you had to tell someone who you are. It is by far easier to remember. If I'd tell someone (in Germany) that my nick is Aluhut, they'll have a picture in their head instantly. If I'd tell them that I'm 13475456, they'll ask me to write it down for them.


You could argue your preconception point as a negative as well. To quote the GP:

> The reality is that a relatively small handful of privileged early adopters get good usernames that match their identities, and everyone else gets screwed. These identifiers then act like tatoos that you got a long time ago and are stuck with for the rest of your life: people end up reminded every day of a sport they can no longer play due to an injury ("hockeystar") or loves lost ("iheartjessie"), attached to a joke that is no longer funny or to a thought that they found adorable as a 13 year old (when you are legally asked to "choose a username": a modern era coming of age scenario) but which adults find inane, or to a nickname that means something different than you realized to some people and now can't change.

The best happy medium is a user name that can change. But so many places make them static (sometimes for reasons no better than they just made "username" the foreign key in their users table)

> I come from the ICQ time

Likewise; that's why I used it as an example ;) I think comparing ICQ to IRC is a bit disingenuous as they occupied slightly different use cases.


I never seen it as something negative. There are many letters/signs. There is creativity and Peter23 will always be only Peter at best. Changing a nick in IRC is as easy as joining a chan.


We are talking about more than just IRC though.

Plus at risk of repeating myself, I'm not arguing against users names on the whole; just systems which have user names that cannot be changed


For reference: "Aluhut" translates to "tinfoil hat".


Exactly. You can see it the moment you read it.


>This is seriously a no-problem

Most users have at least two email addresses because most of their mail is routed through email addresses like "PartyChick88@hotmail.com" but they don't feel comfortable putting that on job applications and medical forms.

It is a statistical certainty that people have missed job opportunities and subsequently defaulted on mortgages because they sent off a bunch of job applications on their "business" email address and then forgot to check it because they don't use it much.

One of the more common office security failures is to have your email client auto-fill to someones personal account instead of their company-issued account, resulting in sensitive documents leaving the auditable environment of the office email server.

Now for sure, it's not exactly up there with global warming and north korea, but I'm not sure I'd call it a "no-problem". It's a fundamental UX failure that we're only just now starting to see get fixed with email address aliases becoming a more widespread feature, and even that is just a patch. We've all gotten used to it, but that doesn't mean it's not a problem.


Well, the problem here is obviously not the nick but the inability of someone to drop an old email address or just forward it to another one. You don't have to have it in your mail client. You can manage that from the webinterface and never again login to that mail address.

I'm/was doing quite a lot IT support for friends, family their friends and so on and have never heard of anything like that. I'm also sure that the domain @hotmail.com would be enough to not get you a job on certain businesses.

> One of the more common office security failures is to have your email client auto-fill to someones personal account instead of their company-issued account

So...you are sending private emails from your business account? Again it's not the nick/names problem. The problem is your behavior. This is the root cause here and doing some make up won't solve your problem.


These comments have made me realize how many HNers are absolutely clueless about what people do and like in the world.


It explains a lot in the startup/tech industry though.


real saurik got an account on their site, and it was "confusing" people.

I have no idea who you are and don’t recognise your username. Was that whole rant just a humblebrag that you’re “internet famous”? Because here at least, no one cares.


You don’t have to be mean about it! The world is big enough to fit lots of people like you who have no idea who he is, and also people like me who think “wait, is that the same saurik from...”

Seems like a real anxiety, not just a humblebrag. I’ve felt the same way and my response is to just make up new names all the time.



One can be "semi-famous", as OP said, in small circles, Eg. chess, and you wouldn't recognize his username--but maybe lots of chess players would. I didn't see it as humblebrag anyway, since it was a true part of his points.


I care and on HN there will be a lot of people who do. The wider internet, probably less so.


Maybe you can ask Mr. Saurik for an iOS 11 Jailbreak.


So you're saying give young kids funny dumb usernames but once they grow up they have to use their real name online?


One of the reasons I made my own bad username lookup. https://github.com/flurdy/bad_usernames Its a simple json file of usernames to disallow.

It does not address many of the other things higlighted in this post but it is a start, at least for my services.


In the en version of your file I don't see any of the common c-level words: ceo, cfo etc...


> So if you’re enforcing unique email addresses, or using email addresses as a user identifier, you need to be aware of this and you probably need to strip all dot characters from the local-part, along with + and any text after it, before doing your uniqueness check. Currently django-registration doesn’t do this, but I have plans to add it in the 3.x series.

This is needlessly user-hostile. If users wish to use mailbox extensions to have multiple unique accounts, that's their right. They can always get multiple different email accounts, after all.

He doesn't mention the one thing he ought to do, which is to strip email addresses of comments before checking them: (foo)jdoe@example.com, jdoe(bar)@example.com, jdoe@example.com, jdoe@(home)example.com & (a (nested (comment)))jdoe(more)@example.com(all done) are all the same email address.


I'm surprised nobody has mentioned PRECIS - the framework for Preparation, Enforcement, and Comparison of Internationalized Strings in Application Protocols[0].

It defines a (small) set of profiles to validate and compare various types of string, including "Username" (in both case folded and case prepared variants) and "Nickname".

Want to compare two usernames for equality? Run the two strings through the comparison steps for the UsernameCaseMapped[1] profile.

It won't solve all of your problems, but it's a good place to start.

[0] https://tools.ietf.org/html/rfc8264

[1] https://tools.ietf.org/html/rfc8265#section-3.3


For PHP, see the Spoofchecker class for similar functionality to the Python class discussed in the article.

http://php.net/manual/en/class.spoofchecker.php


> What we really want in terms of identifying users is some combination of:

> System-level identifier, suitable for use as a target of foreign keys in our database

> Login identifier, suitable for use in performing a credential check

> Public identity, suitable for displaying to other users

Some sites want a fourth one:

Public Identity, suitable for other users to use to refer to each other.

Like on Twitter: "Discussed this with @bob and @jane yesterday, you'll find ..."

Now, you don't need a unique username to be able to meet this requirement - StackOverflow is an example of a site that handles this I think? But having a unique username is a common pattern that many sites use to solve this so it seems worth mentioning.


With all the unicode problems I'm surprised so few languages include ICU which has everything unicode-related. Things mentioned in the article:

- Normalization: http://userguide.icu-project.org/transforms/normalization

- Confusables: http://icu-project.org/apiref/icu4j/com/ibm/icu/text/SpoofCh...

(and there are so many more things)


> Django’s auth system doesn’t enforce case-insensitive uniqueness of usernames

Their routing (for URLs) is also not case-insensitve. The whole framework by default is case sensitive. Honestly, kind of annoying.


Why would you need case insensitivity in URLs by default? I can imagine you might in the odd view here and there, though in ten years of django usage I don't think I've needed to do so


Honestly never really thought about it much. Wouldn’t that lead to accidentally duplicate URLs in the eyes of things like PageRank?


No. That’s why you embed a canonical url meta tag.


If you use mysql, a lot of things end up being case-insensitive by default. Not necessarily recommending it though.


It is certainly slower but what i always did even back around 1999-2000 when i first learned web programming, was to simply query the user db to see if any user exists with the requested username before even doing anything else. Also at some point i decided to also store a "broken down" version of the username with symbols removed, Os replaced with zeroes, etc and check against that.

Also i never allowed less than two letters and characters that weren't numbers, latin letters, a space and a few punctuation symbols.


I'm currently working on an service and have put a lot of thought into about seven tenths of what is said in this article.

This is a very good read and one I have bookmarked to share with colleagues.


For my website, normally I'd be in favor of allowing users to create multiple accounts with variations of email addresses (e.g. foo@example.com, foo+bar@example.com, f.oo@example.com). I sometimes create multiple accounts like that myself as well.

Coincidentally, today a spammer is creating hundreds of accounts with such variations of the same email (gmail) address -- something that should be stopped right away.


> making usernames case-insensitive would be a massive backwards-compatibility break and nobody’s sure whether or how we could actually do it.

Couldn’t you store an upcase version of the username that is unique for this? You would still keep both columns so you have the upcase version for uniqueness and the original column for display name. This would also be backward compatible.


An upgrade script can find collisions and report them before you upgrade. You’ll be required to resolve those before enabling case insensitive user names.


> Well, it’s easy until we start thinking about case. If you’re registered as john_doe, what happens if I register as JOHN_DOE?

For most applications you are better off with making everything NOT case sensitive.

That's why SunSed language is completely case insensitive -- from variable names, tags (functions) to all string comparations. Users should not worry about case!


Doesn't case insensitivity combine poorly with (fairly unrestricted use of) Unicode though?


I somehow lost my old HN account and had to register a new one not so long ago. After trying a multitude of user names I surrendered and just copied words from the browser menu bar. This is what I ended up with :)

I always wondered why we don't use emails as (unique) login names generally. I mean they can be shown with wildcards if that is the concern?!


I generally agree. But we've had customers who share an email address, whether businesses or personal, who want distinct accounts on our site.


Meanwhile I haven't found a good way to know my Skype user name (or id?). My account was migrated to Skype from Outlook I guess and it is something like "live:username" but people can't really find me with that. I take full blame if it turns out that I just don't know how to use Skype...


Somewhat of a tangent:

I somehow created a Reddit account without an email address in order to comment on something about 8 years ago.

Eventually, I decided to comment on something else but forgot the password.

I was unable to reset the password without an email address.

So I never commented on Reddit again...I don't want a username that isn't styfle :)


Creative usernames and Spotify account hijacking Posted on June 18, 2013 https://labs.spotify.com/2013/06/18/creative-usernames/


I actually loved the concept of bare-number UINs used in ICQ back in the days when ICQ and AOL dominated the messengers market. Those were neutral, easy to share vocally, allowed mutable non-unique screen names and e-mails. That's a pity people don't use them any more.


I like how many system have problems with surrogate-pairs and the shy hyphen \xad. I just the other week had a support issue where I had used the very special character of - (dash) in my email...

Also Atlassian Stride doesn't support sending the \xad character at all. It just fails...

\xad makes me \sad



I don't see what is wrong with treating a email as a username.

I wonder if Auth0 or Cognito resolve any of these issues.


This is a great article, thanks.


> No, really, uniqueness is harder than you think

No it's very simple. Restrict usernames to ascii. Do unicode where it makes sense.


Would it be ridiculous to suggest that this is no different to requiring all usernames to be composed of CJK characters? That would be inconvenient and exclusive, no?

(also, a nitpick: "just ascii" is still the wrong approach, since you probably don't want people putting BEL or NUL in their usernames)


Which is great for everyone using the Latin alphabet, but what about Cyrillic and other non western-world based languages?


They can also use ascii, you do not have to use your real name (in fact i'd recommend to not use your real name - it is really concerning how people lost the will for Internet privacy they had in the 90s) as a username. And even if you really really want, for some reason, you can just romanize it.


In the 90s and before privacy wasn't really a big thing -- looking at a couple of threads from usenet in the 90s, the majority of people used their real name (or at least I assume their real name).


I do not think usenet and newsgroups are a good indicator, by the 90s - especially mid to late 90s - when the floodgates to commercial internet fully opened together with the fast rise of the web, people spent more time in forums, chatrooms, etc than usenet which had more of a 80s "small community" background (where people felt more comfortable using their real names - most likely they already knew each other IRL considering they were usually from the same or nearby University or organization). When it came to forums, chatrooms, etc everyone used nicknames and even people who used both forums and usenet often used nicknames on the usenet than the other way around. Personally i remember reading back in 2000 or so that it is considered a good practice to use your real name in newsgroups (showing the real name bias it had) so that people "take you seriously" and thinking that it was a bad idea (and not commenting anywhere because i felt very uncomfortable with using my real name online - sadly this eroded over time, although the last few years i am sometimes concerned about it, but i think it is too late now and anyone can find who i am with a single google search).

But regardless, even if people in the 90s weren't more privacy minded (which really REALLY goes against my experience with any community i either was aware of - and i use the net since 1993), this doesn't really change my opinion that people should not use their real names online and instead they should limit themselves to latin, letters and a few symbols that cannot be forged.


They can use any username they like, as long as it is in ascii.


This is a non-solution if most of the names I like are non-ascii.


I would like to use emojis in my username. But as a programmer, I'd like to keep a system as simple as possible. If there were a trillion humans using my service Unicode with emojis would be a useful thing for unique usernames. But in the current scenario where the biggest services have users in the low billions, ascii is the most efficient and simple thing to do. You can always romanize Hindi or whatever language you want to use. The latin alphabet is unusually good at that. IMO it's not worth to introduce the complexity Unicode brings into something like usernames where the slightest problems can lead to stolen identities etc.


Why have usernames at all?

Think about it ... people who have you in their addressbook already have nicknames for you.

People who you let see your first and last name can already see that.

About the rest of the people - why do they care?

Only because you want your REPUTATION to be communicated to others in English. Hey it's "someguy22"!!

Yeah that's a pretty limited thing. Could be useful but really, how often do we remember names of others? Only celebrities. And who actually cares about Aziz Ansari and his sex life? Or any of the other dudes who we never met? Or why is one dude "the" Bill Gates and the others with same name are not verified by Twitter? My point is, think about what is the sociological meaning behind usernames.


I recognize people's usernames all the time on HN (including yours) and other communities. I gravitate towards people who write good stuff and I learn to avoid the usernames of people that are unreasonable.

I'd say it's less about broadcasting your reputation and more about letting other people make that judgement about you out of their own interest.


So, how should mail services handle emails accounts?

1004383737289302@example.tld ?


Email John

Otherwise how did you come across John's email? SPAM?


From his business card?

From him spelling it out to me when I asked him on transit?

I've got 100% of my contacts either by them spelling out their contact to me, or by them entering it into my contacts app, or by me telling them my email, and they sending an email to that.

For all these use cases, 193938939302002 is a useless identifier.

So how should I handle emails, when I want to allow people to register emails?


Have you ever considered that verbally spelling something out may be a sign of a problem? If you were taking someone's credit card number, would you have them go one digit at a time over the phone, then read it back to them and so on, or email it so you can copy and paste it? Or how about just have them autofill a form?

Same here. John gave you a card? Cool, maybe it can have a QR code you can scan. Or do you enjoy reading and typing long URLs and having to double check them?

How do you handle emails, you ask? Well, how do people enter their emails? They can visit your site and the emails get autofilled. They can bump phones or use bluetooth or any number of ways that don't require verbally spelling things out.

And anyway, if you haven't allowed someone to email you, why should they be able to spam you?


> Have you ever considered that verbally spelling something out may be a sign of a problem? If you were taking someone's credit card number, would you have them go one digit at a time over the phone, then read it back to them and so on, or email it so you can copy and paste it? Or how about just have them autofill a form?

I don’t use CCs, but I (and my parents) have memorized the Kontonummer and Bankleitzahl (and, from that, you can concatenate the IBAN).

> Same here. John gave you a card? Cool, maybe it can have a QR code you can scan. Or do you enjoy reading and typing long URLs and having to double check them?

I visit wikipedia by typing https://en.wikipedia.org/wiki/<topic>.

If a URL is designed well (HN’s aren’t), this is very easy. Wiktionary and Wikipedia do it well, reddit’s is also okay, e.g. https://redd.it/7wwtqy – most sites, in fact, work nicely like this.

> Cool, maybe it can have a QR code you can scan.

Most don’t.

Maybe you remember the time, just a few years ago, when everyone knew their phone number and email, and their friends’, by heart? Why force people to change that? Relying on autocomplete and autofill for everything is horrible, and creates massive network lock-in.


Why not remembee your friend's IP address by heart? Come on. No one is "forcing" people to forget their friends' numbers. They could have always entered it even now. They CHOSE to stop remembering new ones and just press the thing in their contacts.

Are you one of those people who remembers every password on every site, because you think a password manager will one day screw you?


> Why not remembee your friend's IP address by heart?

51.15.1.223 is my server, in case I need to SSH into it.

> Are you one of those people who remembers every password on every site, because you think a password manager will one day screw you?

I remember two of them, the password to my password manager, and the password to the email that I’d need to reset the password to my password manager.

It’s always nice to use automated tools such as a digital contacts app or a password manager. But you shouldn’t rely on it.

During the @googlemail.com to @gmail.com switch for German gmail addresses, I told Google not to switch. A few years later, in '16, Google auto-switched me anyway, so I selected "undo". In that moment, Google (probably due to a sync bug) wiped all my stored passwords in Chrome, all my contacts, all my emails, and my entire Calendar. On all connected Android devices as well.

It took me ages to get back to have everything working again after that, because I relied on this technology. I’ve lost contacts to some friends that I’ll never be able to get back, because I only had them in Google contacts, or in Gmail.

So I won’t ever support any suggestion that would make me rely even more on these. I’m self-hosting everything now, I’ve got backups everywhere, and, just in case, I’ve got the most important info memorized.


But this is an example of a remote company controlling your identity. Stored passwords in Chrome shouldn't rely on some server. You should be using an app (preferably open source) that has your own biometrics and passwords as seeds from which to derive keys that can get the master key, which is stored encrypted with those keys on YOUR computers (and maybe some cloud backups). I think that is how Apple's keychain does it.

That's a totally different problem to remembering info of CONTACTS.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: