Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Why does a url use ://?
58 points by quizbiz on Aug 31, 2009 | hide | past | favorite | 23 comments
anyone know why a url is structured that way?



The ':' is to separate the scheme from the bits that are specific to the scheme. The // is to indicate a hostname and not a directory. http:/test is a valid url indicating a relative path on the machine that the current resource came from, http://test is a url that specifies a resource on a machine mapped to the TLD test. The double slash removes the ambiguity. It really is two bits, a ':' and the '//'

Other possible sources of confusion:

http://test.com:80/

You could then get:

http:/test.com:/

Position would give it away, but, if you then complicate matters further by using default protocol (in your broweser http) it looks like:

:/test.com:/

In many browsers

://test.com:/ is perfectly legal.

Of course, you could strip that down further by dropping the default port colon to get:

://test.com/

See here for a much longer (and probably better :) ) explanation:

http://tools.ietf.org/html/rfc1738

I hope that helps !


Technically, not all of them do; "<schema>:" is the common prefix, and everything else depends on the URL type.

See: http://www.ietf.org/rfc/rfc1738.txt

In particular, section 3.1, 'The scheme specific data start with a double slash "//" to indicate that it complies with the common Internet scheme syntax.'. So it is used by URLs that require any of this information: "//<user>:<password>@<host>:<port>/<url-path>".


An example of this is mailto:foo@bar.com this is still totally valid, but email addresses don't use the // notation for nodes.


I should note that both Safari and Firefox understand the URL 'http:news.ycombinator.com/item?id=796434' as well.


but only in the url bar; both turn a url like that into a relative url when used in an img src, for example.


Note that a url like //www.example.com will use the context's protocol, which means an img src="//www.example.com" will use http or https depending on which the page was loaded in. Very handy!


[deleted]


This is not Apache specific. If you take a look at the HTTP protocol, you will see why: Every request has the full path in it (e.g. GET /images/xyz.png HTTP/1.0) , but it is actually optional (in HTTP 1.0) to specify the domain name/host that you want to reach (most browsers send the Host header anyway, because serving multiple domains on one IP would be impossible otherwise).


In a private conversation some years ago, Tim BL told me that he used to use Apollo workstations back in the 1980s and that he really liked Apollo Domain/OS, so he took the // from the Domain distributed filesystem, which used // as a way of addressing possibly remote files, i.e., //hostname/path/to/file . I suspect the \\ in Microsoft UNC pathnames is also derived from the same, probably due to Paul Leach's influence there as he was also from Apollo.


RFC 1738 section 2.1 specifies the colon between the scheme (e.g., "http") and the scheme-specific-part (e.g., //news.ycombinator.com). The // is specified in section 3.1 as part of the "common internet scheme syntax." Specifically, the // is intended to identify the scheme-specific-part as complying with the CISS.


Kinda wondering, but doesn't a question like this belong on stack overflow?


Possibly. But these are the cool questions that make Hacker News Hacker News and not Digg.


I think its the answers, more than the questions, are what makes Hacker News.


Tim Berners Lee, the creator of the Web and of URLs, wrote some time ago that he regretted the double slash in URLs, saying that one would suffice.


It went a little further than that, I think the scheme would have been:

http:/com/ycombinator/news/item?id=796434


And that would have been awesome.


Yes, but there would be a couple of problems as well, specifically the DNS would need some major revamping. The DNS was already operational long before the web came along and it already used the '.' notation.

http:/com/test/www/someresource

and

http:/uk/co/test/www/someresource

Would have both been valid resources but it would be harder than now to figure out where the machine boundary is located.

You can't go by 'count' (because of subdomains) and you can't go by www either.

I think if they would have gone that route for practical reasons the // would have been 'reinvented', and it would probably be placed like this:

http:/com/ycombinator/news//some/path/resource.html

I'm not sure what the implications for phishing, certificates and humans interpreting URLs would have been in that situation either, but I know that I find it convenient to be able to fish the 'hostpart' out of a URL without further knowledge on my side.


Heh, when _I_ first started using the internet, domain names in the UK were "backwards". My first email address was:

rwj@uk.ac.dl.cxa

In fact the first "domain names" I used were things like "lancs.pdsoft" but I think those were X.25 names, and the less said about X.25 the better.

At some point around '91 or '92 JANET reversed all the domain names to bring it into line with IETF standards. This caused some confusion with names beginning "cs." which could either be the Computer Science dept of some UK unversity, or a domain in the old Czechoslovakia.


With UUCP, we used to put machine names in the order they were needed for routing, so in effect it was backward compared to today.


And that would have wrapped DNS and HTTP together in a very uncomfortable way -- you'd have to make at least one extra DNS query for every HTTP request to disambiguate.

People would just end up using a different separator after the DNS part, and you'd be back where you started (just with big-endian domain names).


Yes, that's exactly what I would have expected to happen, with the // as a prime candidate for that separator.

Big-endian domain names would have been very nice to have though, if only because it would have been a lot more orderly than the current jumble.

Big-endian for the 'major' components and the path name, little-endian for the host portion seems a bit weird.

mail:/com/domain/hostname//username

Would look strange though...


The : separates the protocol and location, and // is the root level of a path.






Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: