URLs Are The Uniform Way to Locate Resources

pak · on March 31, 2010

I never liked the idea of putting passwords into URLs... it just gives people the wrong idea about how they should handle their password.

To me, URLs and passwords are orthogonal. One says "this is where it is", the other says "let me in please".

earle · on March 31, 2010

NO SHIT! This is the top link on Hacker News? Is this a fucking joke?

Why are -incorrect- cursory coverage of baseline RFCs Hacker News worthy?

psadauskas · on March 31, 2010

Because its become very apparent how few web-developers understand "web". The more articles like this that point them in the right direction, the better.

dutchflyboy · on March 31, 2010

I agree, this article is really useless. I mean, just look up what the acrynom means: http://en.wikipedia.org/wiki/Uniform_Resource_Locator

It's the definition!

krainboltgreene · on March 31, 2010

Except URI's aren't uniform. They're used in so many ways as to be confusing. For instance:

"git://github.com/thoughtbot/paperclip.git" vs "smtp://user:pass@hostname/domain" vs "http://news.ycombinator.com/user?id=krainboltgreene vs "chrome://history/"

The user name appears in the path, the username section, and the query params. All of these are pretty "standard" uses. So while they might be called "uniform" the reality is absolutely different.

m0th87 · on March 31, 2010

All of your examples still portray uniformity:

> git://github.com/thoughtbot/paperclip.git

You are accessing a resource that is partially identified by the username.

> smtp://user:pass@hostname/domain

The username and password is used for authentication to access the resource, but is part of its unique identification.

> http://news.ycombinator.com/user?id=krainboltgreene

The mistake actually lies with HN, because part of what should be used to identify the resource is instead a param.

> chrome://history/

This still fits because you are accessing a uniquely identified resource. Just because it is user-dependent does not mean it breaks REST.

blasdel · on March 31, 2010

URL Parameters ≠ RPC

The resource is the whole string between the domain and the fragment identifier. '/' is no different from '?' or '&' — it's all the same once you drop your false-REST preconceptions.

The HTML spec does define a standard for concatenating the name-value pairs from form elements onto target attributes for use in GET requests, but that doesn't reify it into the definition of HTTP, much less the definition of REST.

m0th87 · on March 31, 2010

Except the use of parameters for resource identification both breaks REST and isn't SEO-friendly. Forms somewhat break this for GET submissions, but do at least adhere for POST.

mlLK · on March 31, 2010

I agree for the most part. Even Tim Berners-Lee has his regrets formulating the semantics of the URL, http://en.wikipedia.org/wiki/Uniform_Resource_Locator#Histor...

protocol:TLD/domain/www/path/file.html just makes more sense.

jerf · on March 31, 2010

"Except URI's aren't uniform."

Sure they are! They're structured data serialized to a binary blob that claims to be a string but contains no encoding indication with guidance given by an internet standard for the contents of the string before the first colon but no guidance whatsoever after that! What's not uniform about that?

Normally I wouldn't be so pedantic but at the point where you're talking about "mysql://myuser:mypass@db8.myhost.com:3306/mydatabase" as if it's some sort of solution to a problem you've really dropped the ball.

URLs are only meaningful in a given semantic context. "http" and "https" are meaningful because we all agree what they mean (see RFCs below). "git" is meaningful-ish because there's only one thing that plausibly can be said to give a definition, but your application does not magically gain any understanding of the subsequent what-might-as-well-be-a-binary-blob merely by virtue of sticking "git:" in front of it. "mysql" is simply meaningless. I use Perl and therefore DBI and I observe that it too has a sort of "mysql URL" but it looks nothing like Ruby's. Uniformity is relatively to a universal agreement about what it means, and mysql lacks that. For that matter, git may very well lack it in the future, if someone implements other gits (and I've seen attempts). Without that universal agreement, you don't have a URL. You can't make a URL by fiat.

For non-standard URLs, what follows the colon... and for that matter what precedes it... is nothing more and nothing less than a binary blob in a constrained character set. And this article should be treated exactly as if it were an article about how all your resource location problems go away if you just express them in terms of opaque binary blobs, because once you leave http and https behind, that is what you are doing. Not "like" what you are doing, it is what you are doing, full stop. It may work for you and it may work for your friends, but that is not by the magic of calling your binary blob a URL, it is by the magic of agreeing to a way to interpret bits, and that's hardly any sort of breakthrough.

(Just to be clear, this is vehement agreement that acknowledges that you got to this point first.)

By the way, since I can already guess that someone will reply with something like m0th87's point, I invite you to read the URL RFC: http://www.ietf.org/rfc/rfc1738.txt But read it carefully, for what it actually mandates. Section 2.2: "Many URL schemes reserve certain characters for a special meaning"... none of them are universal to URLs, they are all scheme-specific, which means you can't trust their meaning in undefined schemes. Section 2.3: "Some URL schemes ... contain names that can be considered hierarchial"... / doesn't have a universal meaning, it's relative to the scheme. 3.1 describes the double-slash, which I can now say I've seen used incorrectly in both directions. Section 3.5 defining "mailto:" observes that URLs aren't even necessarily resources. (Section 3.10, file URLs defined in a way that violates the earlier discussion of double-slash. I understand why, but it's still a violation.)

And if you want to talk URI (http://www.ietf.org/rfc/rfc2396.txt ), section 3 starts right off with "The URI syntax is dependent upon the scheme."

derefr · on March 31, 2010

So, what you're saying, basically, is that URLs confer no special advantage over URNs—which are specified to just be binary blobs with a schema[/namespace] identifier attached.

I don't agree with this. URLs, in practice, are a standardized format, predicated mostly on how HTTP has handled them. Any active, well-known URL schema will use

    schema://username:password@host:port/resource/path?query=parameters&more=with%20percent%20encoding#and-fragment-identifier

as that is what we consider to be a URL, no matter what the RFC says. And that format is useful for encoding a great many things. Just because some libraries have chosen to create things that resemble URLs (such as MySQL, as you mentioned), does not mean that they are URLs as the term is descriptively, not prescriptively, defined.

jerf · on March 31, 2010

The scheme breaks down in practice at the resource point, and even before then is a stretch in some cases. But at the resource point it's all over. There's no agreement about "mysql". And...

"Just because some libraries have chosen to create things that resemble URLs (such as MySQL, as you mentioned), does not mean that they are URLs as the term is descriptively, not prescriptively, defined."

As it turns out, that's a key part of the point I was making. This is why the original article is silly.

derefr · on March 31, 2010

RDBMSes, of course, don't have hierarchically-organized "resources." That's the whole point of the "R" in there. However, the rest of the URL format still applies to them. The resource and query-parameter parts of the URL are indeed "a binary blob"—but that doesn't matter. Why?

The great thing about URLs, within a protocol like HTTP, is that they're discoverable—if the protocol associated with your schema guarantees some sort of non-destructive querying operation on a resource (ala a GET or HEAD operation), then you can retrieve the / resource of a server to get a sitemap, and use the further, hyperlinked URLs to find all of the site's resources in turn. You're supposed to treat the resource and query parameter parts of the URL as a binary blob; they're a token you give to the server to get other tokens.

URLs let you turn a your meaningless binary blob—a.k.a. a URN—into a (where to ask, what to ask for, who I am) triple. This is a useful thing, even if the "what to ask for" part is still opaque! It means that you can always dereference a URL starting from the Internet as a whole, whereas with a URN the "where to ask" part has to be figured out on your own. If you combine this with a discoverable protocol, you enable your URLs to be spidered and thus indexed—and then they can all be found and used by anyone who has a single graph edge pointing into your site. That's way better than, say, an ISBN number, isn't it?

"mailto" and "file" and those other ones you mentioned aren't URLs, as the term is commonly used. They are, in practce, URNs—they consist of a schema (namespace identifier) and an opaque blob. They don't decompose into a (where to ask, what to ask for, who I am) triple.

URLs are amazing, but to call something a URL, it has to already be uniform, and to decompose into one of those triples. So the original article was silly (see the sqlite3 example—that's a URN right there)—but at the same time correct. Use URLs to locate your resources. Just don't make something up and call it a URL; do the actual work of having a standard, uniform string that has a resource and a location in it. And if you don't have the weight to make your own standard and make everyone treat it as uniform? Use someone else's. Use HTTP, even! REST is basically people realizing HTTP guarantees nice things about URLs and taking advantage of that.

thwarted · on April 1, 2010

RDBMSes, of course, don't have hierarchically-organized "resources." That's the whole point of the "R" in there.

Databases contain schemas contain tables contain columns. The hierarchy is just fixed types and of limited depth.

And then there's S3, which names a resource but doesn't imply any hierarchy, since the namespace is actually flat, even though it looks like a hierarchical path with / as a separator.

sjs · on March 31, 2010

Of course it depends on the protocol and/or application. That's the way it is meant to be and why they will be useful for a long time.

However, a popular convention is to use protocol://[username[:password]@]host[:port]/path/components/separated/by/slashes?with=query&params=like&this and in practice many URL schemes use this format or a subset of it.

sjs · on March 31, 2010

They are all protocol://host/path?params. Looks pretty uniform to me.

richcollins · on March 31, 2010

I don't see why the json format that he mentioned couldn't easily be made uniform through standard keys (protocol, port, host, path ... etc)

andrewtj · on March 31, 2010

It could — I'm doing something kind of similar with a DNS service I'm building. It has an HTTP interface which amongst other things exposes DNS-SD services (which include host, protocol, port and service-specific key-value pairs).

sjs · on March 31, 2010

We could do that but I can't think of any benefits. Not to say there aren't any, but the obvious negatives would likely outweigh them 100:1.

benkant · on March 31, 2010

Surprised you didn't know.