Hacker News new | past | comments | ask | show | jobs | submit login

Ah, this brings me back to the time when I was trying to convince a product manager that it was a bad idea to "validate" email addresses with regular expressions. I failed, and the product was rolled out with the following regex: ^[a-z0-9.-]+@[a-z0-9.-]+\.[a-z]{2,4}$

I quit shortly thereafter.




I used to hold this opinion, but this Stack Exchange post https://dba.stackexchange.com/a/165923/34006 changed my mind. In short, HTML5 defines its own specification of an email address here [2], and notes:

> This requirement is a willful violation of RFC 5322, which defines a syntax for e-mail addresses that is simultaneously too strict (before the "@" character), too vague (after the "@" character), and too lax (allowing comments, whitespace characters, and quoted strings in manners unfamiliar to most users) to be of practical use here.

> The following JavaScript- and Perl-compatible regular expression is an implementation of the above definition.

> /^[a-zA-Z0-9.!#$%&'+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)$/

To quote the DBA Stack Exchange post, "if it's good enough for HTML5, it's probably good enough for you". And if you're using inputs with `type="email"` in HTML5, you already are constraining your emails to that format.

(That said, it sounds like the regular expression you were asked to use was not a good one)

[2] https://html.spec.whatwg.org/multipage/input.html#valid-e-ma...


That's an interesting choice of rules. When cutting down the complexity so drastically, do you really need to double the length of your regex just to enforce the rule that dashes only appear in the middle of a domain segment? This version is so much more legible:

  /^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9-]{1,63}(?:\.[a-zA-Z0-9-]{1,63})*$/
Hostnames starting or ending with dashes tend to work in the real world anyway.


That regex is short by a few kilobytes [1]!

[1] http://referencesource.microsoft.com/#System.ComponentModel....


When I decided to learn more about regular expressions back in 2002-2003 I found a regular expression for RFC822 (with comments removed): http://www.ex-parrot.com/pdw/Mail-RFC822-Address.html

As an exercise, I deconstructed that into the pieces it was built from. Funny times :)


Don't see a timestamp on your link, but i suspect this one is more correct:

https://metacpan.org/source/RJBS/Email-Valid-1.202/lib/Email...



This is the only sane way.


Until for some reason a user really needs to use a classic UUCP bang-path. Poor help!trapped!in!lost!server ;)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: