Hacker News new | past | comments | ask | show | jobs | submit login
Email address validation: please stop (sinjakli.co.uk)
69 points by sadiq on Feb 14, 2011 | hide | past | favorite | 85 comments



What's even worse is when they have different validations in different places. I've had ticketmaster accept my name+tm@gmail.com email at sign up, only to never let me log in after that first time.


I ran into this with a flight reservation system. I lost hundreds of dollars and missed my flight when the front end accepted my email address but the back end didn't send me a confirmation.

A plausible explanation given to me was that the web application worked just fine, but it sent the data to some COBOL-ish back end written long before the advent of web bookings, and the integration code mangled my email address when it stuffed it into some data field that was never intended to hold an email address.

This is why any real-world system needs end-to-end integration testing and not just unit testing :-)


I recently changed our email validation: all I do now is check for the existence of an "@".


Then why validate at all? Just send the user an email with an activation link. If it doesn't work and they're still logged into the site (with a limited account), then let them change their email address.


If you want your site to have a "login as guest" type feature that's one thing, but if an email address doesn't have an @ it will never work and it would be misleading to encourage people to check their email for a validation link.

People sometimes misread labels and enter their name on the line for their address. This would stop that.


Yeah exactly. It's more to catch wrong stuff entered in wrong field.


I had people write www.hotmail.com in the email field in the past.


In my experience, the most common problem with user entered email addresses is that they've mistakenly used a comma where they should have used a period.


The "+" feature of gmail is great, but I hesitate to use it after some weird validation problems I've had. I've stopped asking that people validate properly, and started hoping that they a) don't validate or b) fail gracefully.

One (very important) site properly validated my "+" email address on the front end (gave me no errors), but the backend failed and I never received the required confirmation email... all resulting in a customer service call. Arggg.


FYI this isn't a "feature of gmail". It's a feature of many, if not most mail systems (sendmail, postfix, qmail, etc.) and was around for many years before Google even existed.


Shouldn't the spammers have figured out the "+" feature by now? Just remove it and the suffix to get a valid address, for gmail or any other provider that uses the syntax.

return Regex.replace(email, "([^\+])°[^@]°(@.°)$", "$1$2")

° should be * but HN eats it as markup.


I don't see this feature as a way to fight spam, but to make it easier to label incoming mail by using the sufixes. As you said, it's easy to bypass it with some simple find and replace.


I like the idea of having validation but when the email doesn't match your pattern, give the user a warning that says "sorry, we don't think this is correct" but allow them to continue if they think it's legit, then have them click a link to validate so an incorrect email serves 0 purpose for them.


The point here is that "the pattern" used by developers is often grossly incorrect. It'd be better to not even attempt to enforce any pattern.


Optimize for a few hundred spam obsessed power users, or, prevent a major cause of the #2 most common CS complaint at many businesses. This does not take much pondering.

P.S. Trivially A/B testator at high volumes if your CS infrastructure is capturing sufficient data.


Which part of this is the #2 complaint - users making a mistake entering in their email address? Isn't this why you ask them to confirm it by sending them an activation email?


The activation email isn't a panacea for users fumble fingering (or misremembering, or not knowing) their email address. Users who don't receive it will either a) ignore it if you let them use the application anyway or b) frequently bounce hard if they don't get it, because they assume naturally that your Googles are broken.

The single most compelling reason to send people activation emails is -- I kid you not -- to remind them that they signed up for your website and how to get back to your website. A secondary consideration is not proving that they got their inbox right but proving that they didn't get someone else's inbox wrong.


This is a very good practice.

I've been working on a large project that takes this approach for more than just email verification. We try to validate required information of multiple types (emails, accounts, etc.), and provide a "Move forward without validating" option after a third unsuccessful attempt. It's made for quite a better user experience overall, reducing the frustration of users sure they're submitting correct data while "the computer" thinks it's wrong, allowing them to complete the process nonetheless.


Some people, when confronted with a problem, think "I know, I'll use regular expressions!". Now they have two problems.


While we're on the topic of emails, does anyone have any anecdotes or data on how often users will click activation links if I log them in after registration?

I always hated having to log in to my email after signing up, so I just create an account and login users without any upfront verification.

My email to the user says I will disable accounts that are not activated in 4 days, but its just a bluff :)


It may be annoying to have to go into your email to confirm, but as someone who has her first name @ gmail.com, I'm very thankful for it. I get people signing up for random crap with my e-mail all the time; without e-mail account confirmation, it means I start getting random e-mails someone else signed up for.

Or sometimes I get no e-mails at all. When I installed Rapportive (<3!) I found out that "I" had a profile on hi5, which I promptly deleted.


Be careful about that - if you keep mailing them without requiring a confirmation, people will scream about "double opt-in" and put your mailserver on various blacklists. (A nastier sort will use your service to flood the mailboxes of people they don't like with your messages, hence the spamfighters' response.)


I sometimes put my email down as foobar@example.com, so I’m definitely not going to click your activation links. That’s probably not representative of most users though.


You'd be surprised how many people supply bogus email addresses for something they ostensibly actually would like emailed. Like the To field on "tell a friend" or the signup box for an email newsletter.


Ironic ... I tried to leave a comment on his blog with a random email address, but received this message:

"Error: please enter a valid email address."


This is truly annoying and makes me hesitate to use the '+' feature. The problem I've had is when I get manually subscribed to an email list (like when giving my email on paper or being added after sending a senator an email) and then cannot unsubscribe due to validation failures.


There are two separate issues going on here.

One: validating addresses to catch typos. A common example is typing a comma instead of a dot or typing just a username instead of a whole email address. Flagging these errors is a good thing.

Two: some developers believe that they can make people enter real email addresses by being very clever about only accepting strings that look like real email addresses. This is stupid, doesn't work, and often blocks legitimate addresses.


Re "Two", if you choose an email address that doesn't look like an email address and it gets blocked then I'm not sure that it is the developer [alone] who is being stupid.


I'm not talking about escaped @ symbols here (that's bonkers). There is still plenty of code out there that assumes a domain suffix is only ever 2 or 3 letters long and that usernames are only letters and numbers.


Please stop... to collect email addresses you don't really need.

When I participate in some kind of online community, I want to chose if I receive emails from them at all. And if not, it should be my choice if I provide any email address at all.

I have a small site where you can participate anonymously or log in, and when you create an account it's your choice if you provide an email address at all. If not, and you lose your password, you're out of luck.


True, although I'm a fan of 'tiered' services since robots (spammers, trolls, and others) also participate, I'd like a way of saying "this is a real person".

If you're looking for a startup idea how about a service that creates an anonymous ID (to me anyway) where the user provides that id to me, I send it to a service and get back a 'reputation' bit which says if you're a good guy or a bad guy (person what ever). And a way to report you've not been co-operating so that others can benefit.

Ebay reputation model but nominally anonymous. (at some point in some server somewhere there will be a way to link token a to token b but I'm totally ok if it can't be resolved into an actual person.)


Thats true for many things, but often money is involved (say amazon or some other online shop).

And telling your customer that they are out of luck may not be acceptable.


I dunno. I know a guy who bought a list of 750,000 e-mail addresses from a shady source and was dismayed to discover that many of them didn't even have @ signs in them... ;-)


I tend to rely on http://www.regular-expressions.info/email.html when coming to validate an email address.

I do often fall into the trap of trusting the framework's built-in email validation to be correct.

Apparently, this is the regex to match RFC2822 (?:[a-z0-9!#$%&'+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'+/=?^_`{|}~-]+)|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])")@(?:(?:[a-z0-9](?:[a-z0-9-][a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-][a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])


The problem with matching against the fully fledged RFC compliant regex is that not all email addresses are RFC compliant. As I indicated in my comment above, I've abondoned trying to "correctly" or "completely" validate email addresses. There's onyl one thing certain in an email address: it contains the "@" character.


Cam you give an example of a noncompliant address that actually works?


One cellular phone company in Japan used to allow people to register e-mail addresses with two periods in a row before the @ character. I have seen some addresses like this in the wild, but the decision to mark these as valid or not for an app depends on the domain you're working in. At my old job making web apps for Japanese companies, programmers would usually allow these types of addresses if we were making a mobile site.


Here's an example. I've stumbled across others in the past. http://www.reddit.com/r/programming/comments/fl6i0/email_add...


Thanks! You learn something new from Hacker News everyday. Never thought it'd be such a long regex for emails though. I just got really confused trying to parse that in my head.


A loosely related anecdote:

I was registering a general purpose domain name a couple of years back and asked a friend if he had any input on a good short name. He replied "nope", and being a Swede I registered nope.se.

There was a time I still forwarded all nope [at] nope [dot] se emails to my primary email, but as it turned out (not that unexpected), this was an address frequently used by Swedes to register "anonymously".

Anyway, it was an interesting/alternative way of keeping track of popularity of new communities etc. Clearly not all users expected that they had to verify their email addresses.


My fandango account is completely irrecoverable because of this ... I turned my name@gmail.com address into name+fandango@gmail.com and was able to login without any trouble, but of course when I went back to login months later, I had completely forgotten about it, so I couldnt get into my account (kept putting in name@gmail.com and couldn't figure out what the problem was)

Naturally I reset my password, and the temporary password arrives in my account, but when I went to put in the new password, it puked on the email I was using ... then I remembered what I had done, but till today whenever I go to put in the new password with the correct login (name+fandango@gmail.com) ... I keep getting sent back to the reset password screen, over, and over and over.

It knows its me, because my credentials (Hi xxxx) are displayed in the top right hand corner, but it simply won't reset my password correctly.

Fandango support is well ... worse than useless.

Its all very maddening ... an account I've had for lord-knows-how-long containing my entire theater going experience, inaccessible. Thats what I get for trying to be clever.


On the other hand, maybe the specification for email addresses is too loose.


Too loose for what though? To make it more useful as a communication format or to make it easier for developers to validate it? It's hard to believe that a tighter spec could have improved the former.


Well for starters, why do you need to put comments in your email? That can't be used to make it more useful as a communications format.


I'm afraid I don't understand your question


Then you are unfamiliar with email address formats, which have a specific formatting that allows for "comments", and it is kind of hairy.


For one thing it is easier to implement if it is more limited. This should help the client and server developers. Right now email software often seems to be quite complex.


Maybe. I all know is that no one should have to write a regexp like this one http://www.ex-parrot.com/pdw/Mail-RFC822-Address.html


the regexp is very impressive, but it's not quite as bad as it would want you to believe. The regexp also matches real names in addresses and lists of addresses.

Usually, in forms, you would ask for the real name and for the email address in separate fields and you wouldn't allow lists to be passed in.

I do agree though: The email address format is (even without the real name part) too complicated to really validate. In my applications, I check if there's an @-sign (I usually need a proper SMTP email address to deliver anyways), but leave the rest to the mail server on send-time, handling the bounces.


I believe that misses some edge cases as well.


Here's an example of RFC 2822 using RegEx in case HackerNews comments filter out some of the symbols: http://bit.ly/g1uFMz


I note that you apparently put the source for the regex in there - http://tools.ietf.org/html/rfc2822?


That's the RFC it follows, yes. I found out about the regex here: http://www.regular-expressions.info/email.html

The author does say you shouldn't use it -- it's a crazy regular expression after all -- but it IS the RFC :D


As a little addendum to this piece, I now realise that WordPress does exactly the kind of horrible validation I was talking about in the article, and apologise for it.


I still think you should validate, but instead of rejecting "incorrect" email adresses, just ask the user to check if there's no mistake.


I wrote a little follow up to the article, covering some of the points mentioned here and on reddit. http://blog.sinjakli.co.uk/2011/02/15/email-address-validati...


I think that recaptcha is just a better idea in regards to this. Yes, it's an extra step, but it does two things: a. verify that the person is in fact a person and b. cancel out spam bots, because of the need for the spam bots to be able to read the image, which is almost usually impossible to fake.

This way, email validation is not even important anymore to avoid spam.

Using part of what was suggested in your post, if we do both, use recaptcha and send an email validation link before sending any emails, we avoid spam to our servers and to the people from us and we save everybody a little bit of time. :)

The next issue arises with email delivery. How do we then ensure that our validation emails don't get filed as spam? Because if the user never sees it, then it becomes a hassle for them and chances are, unless they really, really wanted access to our site, they're not going to spend time contact us to help them with validation so that they can login or otherwise...


Interestingly enough - I subscribe to a email list that has an email address that fails to validate at google. Most irritating.


"If you want to know that you’re being given a valid address, send it an email and have the user click a validation link in it, and stop annoying your users!"

Epic fail. Its this sort of approach that ends up resulting in cross site scripting bugs. Oh just take what ever the user typed in, and send it to the server they told me to send it to. Boom!

The perl code is perfectly reasonable for validating RFC compliant addresses.


I never understood the point of enforcing the spec for user input. Even if done properly it may reject some working, but invalid email addresses. And it does nothing to increase your chances of getting a good email address. Your user is either willing to give you their real address or not. If they are willing, validating fully does not protect against typos and if they are not you will get a well formattd fake address - validating to spec serves no purpose and possibly harms. Just don't do it.

Check for an @ sign and possibly a top level domain name (at leasr one dot) and be done with it.


Most forms I have seen validate this way - or at the most, check for x@x.x

Agree that anything beyond this may be a little overzealous- the real purpose and value of email validation is to help the form-filler discover typos/mistakes, to make sure you reach your willing customer. I can see though how some programmers would be tempted to distort this goal into the goal of having "a clean database." (Although worthy, but not the end goal)


Some really broken/misconfigured mailer software may still accept the 'foo@bar@qux' syntax (route mail for user foo@bar via qux, or the other way round - I forgot which, since no sane system has implemented this since the word 'spam' came to mean bad e-mail.)

So there may, theoretically, be some value in checking for the presence of exactly one @.


@'s are fine as long as they are escaped properly!


You should first ask yourself if you really need an email address. And see if you can get away with not having one.

Requiring that the user have an email on file is not as necessary a requirement as a lot of people seem to think. It seems like half the time or more, they just want to spam it anyway.


Honest question: how do you handle the extremely common case of a lost password without an email address?


As someone else mentioned in the thread, make the email address optional, with the understanding that without it they can't recover lost passwords.

This is probably an unacceptable solution for some sites, but I can definitely think of cases where the only reason you need email is for password recovery, and account loss isn't the end of the world.


While I agree that some kinds of validation are 'too eager' and annoying, just use a 'legit' e-mail address, ffs.

By including super-special characters and whatever extra features GMail or whoever provides, you're just asking for it, sorry.

Especially if you're a coder yourself, you can already assume that even if it passes the initial validation, it probably won't be properly stored or escaped when the actual mail is sent, when you try to log in with your address later, etc.


The relevant RFCs make it clear what is a correct eMail address. Why should we have to put with lazy or incompetent coders who can't be bothered to meet the standard?


You probably got me wrong. In a perfect world, it would work just as specified in the RFC, everywhere and I am all for that. But obviously that's not the case.

And I am not encouraging people to be lazy and sloppy, I am just saying considering the 'real world', you are better off with an email-address that does not contain any too unusual characters (e.g. - _ . should be fine, as those are commonly used).


Because those RFCs are insane.

Seriously comments (nested even) in email addresses?


Yeah but not all email addresses are RFC compliant. Plenty of mail servers accept, or can be configured to accept, non-compliant addresses.


That's not the issue at hand - the grandparent comment advocates that you shouldn't even expect a service to accept RFC-Compliant addresses.

Validating against the RFC is more, not less permissive than the position held by the grandparent comment.


Yup, agreed. And of course RFC compliance only specifies syntax. It would pass blah@blah.hlah.


Well, just go ahead and try signing up at facebook with {^|~!}@gmail.com then.


So because Facebook don't know what counts as a valid eMail address, everyone else has to adapt to them?


When that's the behavior of the 800-pound gorilla, then the answer is yes. IE6 didn't know what counts as valid DOM or CSS but everybody else adapted to it.


How can other people work out what you mean by a 'legit' email address?

Are hyphens allowed? What about dots, underscores or numbers? Do any of those count as "super-special characters"? Just like "+", they are all permitted in standards-compliant email addresses, but I have no way of knowing whether they are permitted in addresses that you consider "legit".


Exactly. You have no way of knowing what all the code along the way between the mail-sending application and your inbox does, that was my whole point.


But how can I be considered to be "asking for it" if I put a dot, hyphen or number in my email address, when millions of email addresses have them; the standard permits them; and many institutional policies create emails with them by default?

Which characters can I have in my email address, that won't cause you to tell me it's my own fault when they get rejected?

I agree that you have no way of knowing what all the code along the way does, but you can at least hope that it behaves in something resembling a standards-compliant fashion. Otherwise, what's the point of email addresses at all?


I am not sure whether relying on 'hope' is a good approach.


Hope is all you have, if email is the black box wilderness you describe. You rely on hope regardless of whether or not your email address contains "super-special characters".

My whole point is that there is a standard for an email address, outlined in a freely-available document. If an application claims to handle email, that claim implies conformance to that standard. Any deviation should be documented.

Your claim appears to be that there is some other definition of a "legit" email address, that you can absolutely 100% guarantee will be handled by absolutely every single email-handling application ever (without relying on hope).

Please answer these questions -

1) What, exactly, is the format of such a "legit" email address, according to your definition?

2) Where does this definition come from?


Your view of this whole subject seems to be completely upside down.

There's no such thing as a 100% guarantee when it comes to email (Interwebs 101) but it should be completely obvious to any sane person that you are getting much closer to those 100% if you don't use any "super-special characters" in your e-mail address as opposed to people "asking for it" by using an address like {^|~!}@gmail.com - which will obviously get you into some kind of trouble, sooner or later, whether it's Facebook's validation rejecting it or mail applications which can't handle it properly.

So of course, while there can't be 100%, from the perspective of an application that deals with e-mails, you'd want to get as close as possible. And like I said in an earlier comment, I wouldn't expect problems with characters such as - _ . but yeah, who knows?

And actually, there's no way anyone could ever have that obscure example address used above, as hey surprise, GMail won't allow you to register it (same for Hotmail). So, I am not sure what this means now:

GMail only allows alphanumeric characters and dots in mail addresses because...

a) ...their coders don't know the RFC and hardly anything about that whole e-mail thing in general, so they provided us with a heavily flawed product, according to your definition.

b) ...their coders have already been doing this whole development and e-mail thing for a week or two and it was obvious to them that "super-special characters" could lead (and have previously led) to trouble, so they're saving less 'techy' people from registering addresses that are basically "asking for it".

Ever since the first comment I left here, my whole point is that everyone who has built applications sending a lot of e-mails just knows that it's insane to assume that everybody else "does it right" and that combinations of special characters, escaping and UTF-8 often result in 'lots of fun'... not. That's far from being fully RFC-compliant but that's how it is, out there in the wild.

One last example: according to Wikipedia, Hotmail "refuses to send mail to any address containing any of the following standards-permissible characters: ! # $ % * / ? ^ ` { | } ~". (And being aware of this, you'd be "asking for it".)


1) Regarding the kinds of mailbox names an email provider will provide you with, it's up to the provider. My employer, for example, only lets us have firstname.surname, With the addition of a single digit in case more than one employee has that name. These limitations have nothing to do with standards compliance.

2) Gmail does allow you to send mail to non-legit, but standards-compliant addresses like {^|~!}@example.com, because they know that their own mailbox name rules don't extend to other providers.

Regarding your point b) evidently they're not smart enough to grasp that addresses with + in them only work on the theoretical internet, and not the real one.


You are talking about naming conventions, that's something completely different. GMail does not enforce any 'firstname.lastname' patterns.


Completely different to standards-compliance, yes. However it is exactly the same thing as GMail's alphanumeric only rule.

To reiterate - the mailbox names that a email provider will provide are subject to the naming conventions defined by that provider. Whether the convention is firstname.lastname; 5-7 characters only; alphabetic only; or 13375p34k transcriptions of characters from Lord of the Rings. None of that has anything to do with whether they are standards-compliant.

The test is whether they will send email to any RFC-compliant email address regardless of whether it conforms to their own mailbox naming convention. Gmail does.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: