Hacker News new | past | comments | ask | show | jobs | submit login

> And you have no right to expect a web form will let you enter your address with nested comments.

While nested comments are a bit extreme, I'm not a fan of the attitude that a user has "no right" to use features that are a documented part of the spec. Just because a feature is uncommon or doesn't seem important to you doesn't mean it's not important to some small subset of your users.

For example, that's not far from saying that a user has "no right" to put +tag in their email address (after all barely anyone uses that), but some people find this extremely valuable.




No, nested comments are not part of "the" spec (in a way that would imply "the e-mail spec"); sure, they are a feature of "a" spec, but that spec is actually somewhat unrelated to what an e-mail address actually is: they are just a feature of the MIME specification for header field values.

If you look at the SMTP specification (which, given that it defines the protocol in charge of using e-mail addresses for actual delivery has a much better claim to being "the" spec), you will note that you aren't allowed to use an e-mail address with nested comments in that context, as they have no meaning to SMTP.

However, you will also find that the rules for what characters are allowed and which have to be escaped are different, as that's what these specifications are actually discussing: how to escape an e-mail address for use with specific transport protocols.

An actual e-mail address? It seems to support pretty much anything followed by an @ followed by a domain name. It is just that in MIME, if you want to have a space character you will need to put it in quotes, or if you want a quote you will need to use a backslash.

The user of your web form, of course, is not typing MIME: there is a box that they can just type their e-mail address into, and it should probably support the raw syntax of their actual e-mail address, not a randomly chosen format required for escaping.

To make this more clear, one has to ask: why MIME escaping? Why not require the user to use HTML attribute escape sequences? That way, if their e-mail address contains a special character, instead of using quotation marks and backslash escaping, they'd use entities, like """.

Honestly, that makes about as much (if not more) sense. Meanwhile, of course, the user's username and password fields should also be escaped similarly, and if the user attempts to the use a bare < or > they should get a validation error "please escape your password using RFC1866 (HTML)".

Previous, more detailed versions of this same complaint:

http://news.ycombinator.com/item?id=4794368

http://news.ycombinator.com/item?id=4486872


Thanks for the clarification, I did not realize that there are multiple parallel RFC tracks that define differing syntax and semantics of email addresses. Your claim then, is that all of the complicated syntax defined for email addresses in RFC2822 and RFC5322 is for the sole purpose of escaping characters that are significant to MIME? What about "+" -- is it just convention that most email hosts ignore everything to the right of that, or is that actually specified somewhere?


Yes: that is just convention. In fact, RFC5233 defines an extension to Sieve (a purposely-not Turing-complete language for filtering e-mail that is implemented as part of many mail systems) that parses those + addresses; this is the only e-mail-related standard I've so far come across that mentions this common feature (and I've read through numerous at this point ;P).

However, it does not define the syntax for + addresses (even so far as to define the "+"), as + is only a convention (as is the entire concept of having detailed/sub-addressing at all): it even has various examples, such as "5551212#123@example.com", that use alternate characters.

> NOTE: Because the encoding of detailed addresses are site and/or implementation specific, using the subaddress extension on foreign addresses (such as the envelope "from" address or originator header fields) may lead to inconsistent or incorrect results.

> Implementations MUST make sure that the encoding method used for detailed addresses matches that which is used and/or allowed by the encompassing mail system, otherwise unexpected results might occur. Note that the mechanisms used to define and/or query the encoding method used by the mail system are outside the scope of this document.

Also, yes: RFC5322 defines a ton of syntax, and all of that syntax is related to MIME headers; a "structured header" has particular rules related to whitespace and is allowed to contain comments, so e-mail addresses included as part of the address lists used in headers like To and From are going to be adapted to follow those rules.

FWIW, RFC5322 actually has a SHOULD NOT on the things that make it un-similar to the SMTP specification. The two specifications really do attempt to use fairly similar syntax. You thereby are allowed to have comments and crazy whitespace in weird places in MIME, but "please don't" ;P.

> Comments and folding white space SHOULD NOT be used around the "@" in the addr-spec.

The goal really did seem to be, I will happily admit, to have the two protocols be largely compatible to the extent that they could: the same list of reserved characters is used by both (as a key example, SMTP also doesn't allow the ()'s despite not supporting MIME comments). There are some weird differences, like RFC5321 allowing empty double-quotes as the local part; although, RFC821 did not seem to have that corner case, so I'm starting to think this is bug introduced in RFC2821 (I had read mailing list posts about this issue a while back, but somehow it wasn't clear from those that it is a mistake).

I maintain, though, that it is very weird to be forcing this particular escape sequence set everywhere: when you lift e-mail addresses out of angle addresses and lists you don't need it anymore, as you can parse the address from the right unambiguously once you hit the @. Regardless, I do need to emphasize the statement in one of the earlier versions of my comment that RFC3696 has recommendations for e-mail address validation, and it includes the MIME escaping. I thereby doubt that my opinion, to be explicit, is shared by some of the people who worked on these specifications.

(That said, RFC3696 is weird... it mentions, for example, a limit of 64 characters on a username, but in fact that was just a "minimum maximum" from SMTP, and SMTP was quite clear that "TO THE MAXIMUM EXTENT POSSIBLE, IMPLEMENTATION TECHNIQUES WHICH IMPOSE NO LIMITS ON THE LENGTH OF THESE OBJECTS SHOULD BE USED", while at the same time saying that you must not send such things; I guess "welcome to Postel" ;P.)


I meant it in the spirit of Postel's Law [1]

[1] http://en.wikipedia.org/wiki/Robustness_principle


Great. Now I'm confused. Isn't "no right to expect a web form will let you enter your address with nested comments" the exact opposite of "liberal in what you accept from others"?


When choosing an email address to use and entering it in a form, you should be conservative. When accepting an address you should be liberal.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: