Hacker News new | past | comments | ask | show | jobs | submit login

I don't think so. The traditional, canonical regular expression[1] for parsing a URL is

  ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
See https://tools.ietf.org/html/rfc3986#appendix-B

The authority section (which contains the host domain) must begin with "//" whether there's a scheme prefix or not. Otherwise it's just part of the path (or query or fragment). IIRC, these semantics are also fixed by HTML such that any attribute like HREF or SRC is parsed as-if using the canonical regex (but after entity substitution and whitespace trimming). Browsers might have implemented this differently many years ago, but I doubt it as it would conflict with being able to use a bare path atom (e.g. foo.html).

[1] I normally eschew using regular expressions for proper parsing, but for URLs the canonical expression is both adequate and advisable for correctness.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: