Hacker News new | past | comments | ask | show | jobs | submit login
Where did all the HTTP referrers go? (smerity.com)
110 points by Smerity on May 28, 2013 | hide | past | favorite | 77 comments



The article makes it sound like every webmaster is entitled to see Referer headers. Really? How does enabling referers "save the world"?

A few weeks ago, I ran across a website (can't remember their domain) that refused to show me any content unless I enabled third-party cookies, and even contained a lengthy argument explaining why disabling third-party cookies hurts the web. IIRC the whole argument was 24K bullshit, written by people who feel entitled to keep making money with their outdated business models and who were obviously alarmed because modern browsers were doing sensible things. And now we're seeing a very similar argument, only this time it's about referers. Why do you think it matters whether I came to your site via Google, Reddit, HN or someone else's blog? What makes you think gives you the right to know that?

Webmasters never had the right to know where your visitors were coming from, any more than the owner of a random gas station on the Interstate has the right to know which city his customers are driving from. If SSL is making Referer headers disappear, good riddance. We just closed a privacy hole, 99 more to go. Next in the TODO list: get rid of referers even when the referring website doesn't use SSL, because as the article correctly points out, we've got a bit of inconsistency there.


If you reread the article, I actually say you should add a meta header regardless of whether you do or don't want to send headers.

Why? It actually solves your "next in the TODO list". Most web sites that shouldn't send referrers don't use <meta name="referrer" content="never"> so will be leaking referrers to other web sites. Adding this meta tag will eliminate referrers in both HTTP and HTTPS.

So yes, my own preference is to keep HTTP Referrers, but I also explain how to kill HTTP referrers for webmasters who would like to as well.


I disagree with your suggestion to use such a meta tag, because it's an opt-out scheme. People should not have to opt out of potential privacy leaks. If a web page does not specify any referer policy, I think the default should be no referer.

Unfortunately I still can't remember the website where I found the "bullshit" argument, and it's not in my history because I probably ended up using a different browser to comply with their no-access-unless-you-accept-3rd-party-cookies policy. But if you understand why people like me want third-party cookies to be disabled by default, I think you'll also understand why I want referers to be disabled by default, too. It's not about user control as @untog suggests, because the user is always ultimately in control when it comes to HTTP headers. Rather, it's about having secure defaults.


That's still taking the decision out of the hands of the user, though.

I'm not saying that I totally agree with kijin, but I don't think your answer addresses his point.


I didn't realise I missed a point to address, I'll seek to clarify.

It's always the user's choice as to whether to send referrers or not, as the referrer is actually added by the user's web browser itself. Extensions exist for just about every major web browser[1][2][...] to modify the behaviour of the HTTP Referrer field. If you don't like the idea of sending referrers, it's entirely within your control to never send a single referrer.

In almost all cases, disabling the referrer entirely won't result in any broken behaviour, primarily as the HTTP Referrer is unreliable and can be spoofed anyway.

[1]: https://chrome.google.com/webstore/detail/referer-control/hn...

[2]: https://addons.mozilla.org/en-US/firefox/addon/refcontrol/


I use referrers extensively (see how sites are doing traffic wise, traffic trades etc) so I am definitely glad they exist, but i am always surprised it was ever added to the spec. think it was added in 1.1 (so 1999). Without it I guess a LOT of sites nowadays would link out with urls like ?from=mysite.com so people knew where traffic was coming from.

It is, like you said, basically a massive privacy hole. But a very handy one


I think search engines like Google should just standardize on a query string like your example. It always works, it's clearly opt-in, it's visible to the user, it's just as straightforward as a header (if not more) for web apps to process, and it's no more spoofable than a header anyway. If you think it's a bad idea to pollute URLs with such things, an alternative would be a URL fragment that is detected by JS.


Query string wouldn't always work, because the site may already use the query argument for something else; some sites may break or freak out when given unexpected arguments.

URL fragments aren't transmitted to the server, so that means you lose referrer analysis for static sites.


> The article makes it sound like every webmaster is entitled to see Referer headers. [..] Why do you think it matters whether I came to your site via Google, Reddit, HN or someone else's blog ?

As an occasional small-time blogger, I like to know where the conversation happens, where people are interested - it enhances my engagement with my readers. As a reader, I'll gladly grant webmasters that courtesy.


Uh...I don't think anyone, OP included, is suggesting it's a right. Just that it used to be almost always available. And it's going away, sort of accidentally. On the scale of "useful things" vs "privacy invasions", it's pretty safely on the former side.


There are some legitimate (IMO) uses of referrers, though. Preventing hot-linking, for example.


There's many legitimate uses for them, but while they're convenient for preventing hot linking, they're not necessary. Preventing hotlinking can be done easily enough by appending a url argument that changes regularly, and can be made performant by adding that argument checking to the configuration of your frontend/caching servers.


Not a bad idea, but you could probably just use the session ID (or a unique ID linked to the users session if it's a secure site). So when the session expires, so does access to the content. The drawback to this is you're exponentially increasing DB i/o.

Personally though, I'd rather not block hotlinking. But I understand why some people are against it.


It's actually implemented in a better way (no db required) with most web servers: http://wiki.nginx.org/HttpSecureLinkModule


No. The web is about links, if you don't want things to be linked too, don't make them accessible to the general public.


Spoken like someone who never got a large and surprising bandwidth bill because some jerk linked to a high value graphic on his site and then blogspammed every forum on the web to promote "his" content.

The ability to effectively DOS anyone's site in this way, not to mention presenting their content as your own without technically infringing copyright (or so the Ninth Circuit seem to feel in the US, though courts in other jurisdictions have differed) is a genuine and, for the unfortunate victim, potentially very expensive problem with the current state of the web. Doing this has always been bad netiquette, but these days even Google do it, and indeed used the fact that they were doing it in their defence in one of the aforementioned copyright cases.


You can responsibly link to an image on someone else's server without embedding it directly on your site. The web is no longer just hypertext, and the potential inequity in resources and scalability between servers matters. Why should I be expected to suffer and pay for anyone who decides they like a particular image on my site and want to use it in their forum avi? They can just as easily copy it locally and use it.


Meanwhile, in the real world people have server bills to worry about.


Spoken like someone who's about to get goatse'd:

http://ascii.textfiles.com/archives/1011

(No goatse images on the page I directly linked to. It does link to them, but the links are marked.)


And there are occasionally similar articles where the author informs us all that AdBlock and NoScript are literally destroying the internet. And that misguided blogger this week who wants Apple to force users to accept push notifications from any app they've installed to make his life easier.


Referers are a privacy trade-off though, and personally I have never seen a good reason why the desire of some website to track where I am coming from should trump my desire to not broadcast this information to the world. Therefore I like to switch of referers anyway.


In our case referrers can sometimes be really helpful. More than once a (potential) customer has posted a link to our website with questions or comments about our products. This allowed us to head over there and answer those questions or generally get in touch with your target audience.


I am doing the same. Fortunately, the websites requiring the referer to be set are getting fewer and fewer – some years ago, I regularly got placeholder images ‘THIS IMAGE WAS STOLEN FROM XYZ’ when browsing XYZ without referers.


This is one of the good uses of HTTP referrers!

I host a small, very low traffic website. One day, the bandwidth shoots through the roof and stays high. The reason? One of the images on a page got added to the .sig of someone in a popular forum. Suddenly thousands of people are fetching the image.

The solution was to filter by referrer header, letting the image be seen by visitors to the actual page, but linking from other sites gets blocked. Note that usually it's best to allow requests that have no referrer header at all, otherwise you'll be blocking some legitimate viewers of your site.

End result: bandwidth back down to the usual, tiny levels. It's not that I cared about people copying the images, I just didn't want to foot the bill for the traffic!


I think you misunderstand — this seems to have been claudius' point:

> Note that usually it's best to allow requests that have no referrer header at all, otherwise you'll be blocking some legitimate viewers of your site.


It was probably a badly-placed reply; my point was to illustrate a use for referrer headers and to state that sites blocking based on the lack of a referrer were doing it wrong.


Typekit and other @fontface providers rely heavily on them. So, with referrers off, every fancy designer website renders in Georgia.


I would call that a feature (though I prefer to use DejaVu Serif or Garamond rather than Georgia).


Yup.

It's just another way that (for some reason) my browser leaks information without asking me if I want to.

It gets switched off now, along with cookies, analytics and most other stuff.


I respect your desire to not be tracked via referer headers, and to turn them off/use SSL/use extensions. You are a person, and your privacy is important to you.

I also respect a site owner's desire to track your referer, and to decide the rules for who gets to access their content. They are (often) a business, and knowing where to focus their efforts to make money is important to them.

I don't respect either side feeling they are owed tracking information, or content, if they aren't willing to respect the other agent's rules and preferences.


So my understanding is that the default behaviour is:

1. Follow link from https://example.org to http://example.com --- referrer is not sent

2. Follow link from https://example.org to https://example.com --- referrer is sent

I don't understand how the same referrer can be too sensitive to be sent as plaintext, but harmless enough to be passed to a not-necessarily-trusted third party.


Let me fix that for you:

1. Follow link from https://example.org to http://example.com --- can be read by a third party if referrer were added

2. Follow link from https://example.org to https://example.com --- cannot be read by a third party so referrer can be added

The assumption is that secure pages are secure for a reason, and that the author of a secure page is linking to other secure pages and has some basis of trust by which the link is provided.


Example.com is the third party. (Example.org and a human user being the first two parties.)

Let me rephrase my question: why the default assumption that example.com is trusted not to misuse referrer information merely because example.org provides a link and the human user follows that link?


Example.com is the third party. (Example.org and a human user being the first two parties.)

I disagree. When you click a link on a page that you retrieved from example.org, one that leads to example.com, there is no communication between you and example.org, nor between example.com and example.org. The communication that takes place is between you (party 1, the initiator of the conversation) and example.com (party 2, the target). The HTTP request mentions example.org, but being a third party, it does not participate in it directly.

The only conversation in which example.org was a party was the one in which you requested the page that contained a link to example.com, which has already finished.

In that light, it seems strange to me that under HTML5 (assuming I understand the article correctly), example.org is given a mechanism to dictate how much information you give to example.com. Should that not be your choice, as the sender of said information?


1) User's browser requests page from Site A

2) Page from Site A suggests what should be sent in the referrer via the meta referrer

3) User clicks on link from Site A to Site B

4) User's browser requests page from Site B (referrer is set by either user's overriding option or the meta referrer from Site A)

So indeed, at no point does Site A speak to Site B directly. The meta referrer simply asks the user to either send or not send the referrer. If the meta referrer is not present or not supported, it falls back to default HTTP Referrer behaviour.

As the user, you can override this behaviour and force the referrer to do whatever you'd like. This includes refusing to send it, always sending it, or spoofing it. Firefox for example allows you to set network.http.sendRefererHeader and there are various browser extensions for any popular browser that will allow for finer grained referrer control.


Because the source site linked to it directly. That is the basis for trust.

If the destination is untrusted, then the source can just anonymise the redirect by sending it through a point that won't reveal the precise source. This is how services like http://anonym.to/en.html work.


By third party, do you really mean eavesdropper?


This is the default behaviour, thought it indeed likely means sensitive referrers are being leaked from website X to third party website Y. In most cases I'd wager people aren't aware this is the default behaviour.

An example where the default behaviour may be appropriate is Facebook interacting with third party apps.

Facebook may be happy to pass referrer information across to these third party apps as long as they handle it securely. If the referrer goes across HTTP, it goes across the Internet in plaintext (unsecure). By ensuring it travels over HTTPS, you're at least ensuring a minimal level of security.


It's more about who can see it along the way than it is about who receives the request.


I'm willing to bet that it gets sent even if the second HTTPS connection provides no effective security.

(If this is the protocol we're building economies on these days, I feel computer security is going to get much worse before it gets better...)


If your browser is communicating on the second HTTPS connection, that pretty much implies that you are happy with its security levels, right? After all, if you weren't (e.g. bad certificates, etc), your browser wouldn't have established & confirmed the secure connection, HTTP referrer headers or not.


You're right, of course.

Let me expand my argument a little.

We have two security domains, associated with the first and second server. The first server has sensitive URLs, the second receives these in the referrer header.

In making the first connection, the client and first server get to make some security policy decisions: the crypto used in securing the connection, the availability of countermeasures against known TLS protocol vulnerabilities, client and server authentication methods, etc. These parameters are complex, and you end up with a connection with some security level given some attacker model. If you think about it really hard, you can come up with a quantitive estimate of what security you might get from the connection in terms of attacker work -- anywhere from zero (trivially broken) to 256-bit security (very good).

Assume our first connection has a 128-bit security level and has countermeasures for TLS vulnerabilities. This is good going. Now we serve over this connection a https link, which the user clicks.

Unfortunately the client and/or second server is poorly configured and we only manage (for the sake of argument) a 40-bit security level and no TLS vulnerability countermeasures. Now we've reduced portions of the first connection to the security level offerred by the second connection. This is really very surprising.


If you are unhappy about using 40 bits, you can configure your browser to reject sending requests to such sites. When the connection with the second site is negotiated, it will ultimately get rejected, long before HTTP or any referrer headers are sent to it.

I see what you mean about dropping the security level, but generally SSL is seen as a binary 'good enough/not good enough' choice. I don't know of any browser that gives a graduated measure of a site's security. Either it flags up a warning or it doesn't.


You may trust both of the websites you visit, but you may not trust them to know about each other.


That is a fair argument, but could equally be used for the non-HTTPS versions as well. The answer is the same in both cases: If you are not happy sharing such information, configure your browser not to use the referrer header.


I consider it incredibly rude to default to a lack of privacy, especially for visits to HTTPS sites, and especially because most users don't realize what's going on behind the scenes.


That may be, but it really shouldn't be. You cannot have sufficient trust in all parties involved to be certain that no sensitive data is going to a third party (the website you're going to) that is malicious.


You can selectively block referrer information if you are using a webkit browser (Chrome, Safari, etc.) by using http://lee-phillips.org/norefBookmarklet/

I've found the information in my referrer logs quite interesting and useful, despite the Russian referrer spam, and am sorry to see it going away.


When I recently tried to purge my server logs of referrer spam, I found that a whopping 80% of all visitors had either no referrer or referred from my own website. About a third of the rest were spam. So judging from my small sample, referrer headers are largely useless except as spam machines. They should probably be removed from HTTP anyway.


Oh I did not know this. Very good to know indeed! I too am missing referrers from websites, and since my website also supports https I suppose others are missing them from me as well. This way I can make the internet a very tiny bit better ^^.


I already hide/change my referrer through browser extensions, though I guess this might finally get those few sites who have functionality based on refs to stop using it.


Whilst I still think referrers should be used where appropriate, there are certainly places or reasons you want to nix them. HTTPS not passing along referrers by default is a sensible decision, as is the addition of <meta name="referrer" content="never"> for sites stuck in HTTP that may want to go under the radar.

I still feel that removing referrers entirely destroys many useful tools and analytics that we've traditionally been able to use. It removes the core way in which we understand connections across the Internet. By removing referrers, the best we can do is use link graphs, falling back to the original PageRank algorithm where we assume people are random bots that click on one of the links on the page.

Edit: Can't reply to you hnriot due to comment depth limit. My reference to PageRank is as links and backlinks could be used as a poor referrer substitute, though they aren't currently used as it's a lot more work and less accurate. I'm simply saying that, in the event that referrers all disappeared tomorrow, you'd see normal websites trying to estimate where their traffic comes from by using a PageRank inspired algorithm, or more naively by looking at who links where.


Referrers have nothing to do with pagerank. It's used in website analytics of course, and, in general is a very bad idea leaking information between websites. Search engines don't see your site's referrer data so the association you tried to create with pagerank is misleading.


I, for one, didn't think his argument was remotely misleading.

I did, however, think that bringing PageRank into the argument was ill advised and probably detrimental to his goal of advocacy through education. It just confused things, introduced another rabbit-hole concept.

If the author thought that mentioning PageRank would lend credibility to the 'link counting' / 'random link clicking bots' foregone conclusion he setup., he was right. It did. So link the text to a footnote referencing PageRank and stay on topic.


This is how I discovered HN went HTTPS some time ago. I never noticed until now.


HN started using SPDY recently, and SPDY requires HTTPS. https://news.ycombinator.com/item?id=5660797


HN has been available over HTTPS for longer than that.


I wish Google would change their meta-tag from "origin" to "always". It's getting harder and harder to see which keywords bring traffic to your site these days.


If you enable SSL for your site, will you start seeing referrers again? Mainly interested in the case of Google searches.


Yes. I just tested it by searching for [haskell dbus] and clicking a result link to my site. Here's the referer in my nginx log:

  2620:0:1000:3509:ed78:b06d:4942:90e1 - - [28/May/2013:17:35:00 +0000] "GET /software/haskell-dbus/ HTTP/1.1" 200 5819 "https://www.google.com/search?client=ubuntu&channel=fs&q=haskell+dbus&ie=utf-8&oe=utf-8" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:21.0) Gecko/20100101 Firefox/21.0"


No. You have to pay if you want referrer info from Google.

http://searchengineland.com/google-puts-a-price-on-privacy-9...


Or run https on your server.


The point is that Google obfuscates their search result links so that they do not include search keywords any more -- if you are interested in knowing the keywords, you [typically] have to pay. If you are just looking to know that the referrer was Google, then yes you can see that. However, this is not really useful information to most people.

They implement this in two ways: (1) If you go directly to google.com and type in your search, the results page uses a # in the url which keeps all the query parameters out of the referrer. (2) Google has used (not sure if they still do/randomly test whether or not to) JavaScript redirects which overwrite the url when a search result is clicked. I'm sure there are other ways for Google to hide the referrer -- plus Google and various browser extensions can turn parts of this on/off however they choose.

It is still possible to wind up with a referrer from a Google search where you can see the search keywords, if for example the search is done using the browser address/search bar, and the JavaScript overwriting result urls is not active (turned off by NoScript, etc.). However, this is not in Google's best business interest (if they can convince people to pay for the info) so I am counting on them trending towards making this the least likely of possible scenarios.


Thanks for the info, I was not aware of this meta tag. My blog is all HTTPS (with SPDY) and I added content="origin" to all visitor facing pages, leaving the admin pages dark.


I get referrers from httpS://www.google.com quite often in my logs.

And I'm running my blog on HTTP.

How is this so?


Read the article. This is explained about 80% of the way through.


Looks like someone didn't RTFA.


But why is that a bad thing? Just count your page impressions to see if you blogcontent is popular.

What websites your visitors consume is IMHO not your business.


> What websites your visitors consume is IMHO not your business.

But if your product is the subject of discussion on another page which links to yours, you may want to be able to join the discussion.

As a site owner, I don't think it's unreasonable to ask my visitors how and where they found out about my website.


It's two very different things if the sudden spike in page views is because of nytimes.com or reddit.com.


Yeah, that's a long known thing. I use HTTPS on parsebin.com specifically for stripping out referrers.


I fail to see why referrer is part of any spec. Referrer is a 90's concoction when we were all so naive of future uses. I'm sure it's made a lot of porn sites some money, but affiliate marketing is not the purpose of web standards.

As far as analytics its all done with tracking cookies now, which is another issue that needs to be addressed, but Im not going to miss referer should it ever really go away.


I see it rather differently. Referrer is a 1990s concoction based on the idea that the Web is a collaborative effort that we all engage in. I don't personally have any problem in a Web site knowing where I arrive from in most cases. It's harmless him of information (in most cases, as I say) which makes the Web site owner's life a little better.


Same here. Referer is part of the way the Web works, and I, for one, don't give two shits if Site A knows that I arrived via a link from Site B.

That said, I can see why certain people might see it as problematic... if you're browsing http://www.anarchistsbombmakingforum.com and somebody posts a link to http://www.fbi.gov, then maybe you don't want the FBI knowing you were at anarchistsbombmakingforum.com. But, still, barring other privacy problems, the FBI don't know who you are when you visit their site, just that you came from anarchistsbombmakingforum.com.

I have a hard time getting worked out about this though... for one, if you're surfing anarchistsbombmakingforum.com, common sense would dictate that following a link to fbi.gov isn't such a good idea (and a forum that fosters discussions of anything controversial should probably munge links to go through an anonymizer anyway) AND the people for whom this really matter are the people who have a referer blocking plugin installed in their browser.


My honours project was on the problems of tracking users visiting different websites using cookies.

The problems are numerous. I think I went through 9 revisions of the "naive protocol" (including, at one point, ditching the referer header in favour of HTTPS). Subsequently I realised that my tracking protocol was broken anyhow.

My conclusion is that there's no reliable way to track users visiting multiple websites using the standard features of HTML/JS/HTTP in the face of malicious users or publishers.

You have to fall back on traffic analysis.

I developed a successor technology which works better in many respects (but not all). As it's the subject of a current patent application I can't really go into much detail.


I hope you've looked at evercookie before you go about filing patents on user tracking...


It's a quite different scheme.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: