The ':' is to separate the scheme from the bits that are specific to the scheme. The // is to indicate a hostname and not a directory. http:/test is a valid url indicating a relative path on the machine that the current resource came from, http://test is a url that specifies a resource on a machine mapped to the TLD test. The double slash removes the ambiguity. It really is two bits, a ':' and the '//'
In particular, section 3.1, 'The scheme specific data start with a double slash "//" to indicate that it complies with the common Internet scheme syntax.'. So it is used by URLs that require any of this information: "//<user>:<password>@<host>:<port>/<url-path>".
Note that a url like //www.example.com will use the context's protocol, which means an img src="//www.example.com" will use http or https depending on which the page was loaded in. Very handy!
This is not Apache specific. If you take a look at the HTTP protocol, you will see why: Every request has the full path in it (e.g. GET /images/xyz.png HTTP/1.0) , but it is actually optional (in HTTP 1.0) to specify the domain name/host that you want to reach (most browsers send the Host header anyway, because serving multiple domains on one IP would be impossible otherwise).
In a private conversation some years ago, Tim BL told me that he used to use Apollo workstations back in the 1980s and that he really liked Apollo Domain/OS, so he took the // from the Domain distributed filesystem, which used // as a way of addressing possibly remote files, i.e., //hostname/path/to/file . I suspect the \\ in Microsoft UNC pathnames is also derived from the same, probably due to Paul Leach's influence there as he was also from Apollo.
RFC 1738 section 2.1 specifies the colon between the scheme (e.g., "http") and the scheme-specific-part (e.g., //news.ycombinator.com). The // is specified in section 3.1 as part of the "common internet scheme syntax." Specifically, the // is intended to identify the scheme-specific-part as complying with the CISS.
Yes, but there would be a couple of problems as well, specifically the DNS would need some major revamping. The DNS was already operational long before the web came along and it already used the '.' notation.
http:/com/test/www/someresource
and
http:/uk/co/test/www/someresource
Would have both been valid resources but it would be harder than now to figure out where the machine boundary is located.
You can't go by 'count' (because of subdomains) and you can't go by www either.
I think if they would have gone that route for practical reasons the // would have been 'reinvented', and it would probably be placed like this:
I'm not sure what the implications for phishing, certificates and humans interpreting URLs would have been in that situation either, but I know that I find it convenient to be able to fish the 'hostpart' out of a URL without further knowledge on my side.
Heh, when _I_ first started using the internet, domain names in the UK were "backwards". My first email address was:
rwj@uk.ac.dl.cxa
In fact the first "domain names" I used were things like "lancs.pdsoft" but I think those were X.25 names, and the less said about X.25 the better.
At some point around '91 or '92 JANET reversed all the domain names to bring it into line with IETF standards. This caused some confusion with names beginning "cs." which could either be the Computer Science dept of some UK unversity, or a domain in the old Czechoslovakia.
And that would have wrapped DNS and HTTP together in a very uncomfortable way -- you'd have to make at least one extra DNS query for every HTTP request to disambiguate.
People would just end up using a different separator after the DNS part, and you'd be back where you started (just with big-endian domain names).
Other possible sources of confusion:
http://test.com:80/
You could then get:
http:/test.com:/
Position would give it away, but, if you then complicate matters further by using default protocol (in your broweser http) it looks like:
:/test.com:/
In many browsers
://test.com:/ is perfectly legal.
Of course, you could strip that down further by dropping the default port colon to get:
://test.com/
See here for a much longer (and probably better :) ) explanation:
http://tools.ietf.org/html/rfc1738
I hope that helps !