Hacker News new | past | comments | ask | show | jobs | submit login
DuckDuckGo Searches Are Not Anonymous (grepular.com)
104 points by mike-cardwell on May 19, 2010 | hide | past | favorite | 83 comments



In the settings, http://duckduckgo.com/settings.html, you can turn on POST requests as well as disable favicons and 0-click. Switching to POST alone should fix this issue for you. In particular the Referer heading then becomes:

Referer: https://duckduckgo.com/


how about you set up a separate domain (e.g., supersecretduck.com) where all the options you mention above default to their paranoid settings?


Or make it automatic or at least default for https? If you're using https, the only thing you have to hide is your search term, right? And that's what's being exposed to Amazon, or whoever else is seeing your URL along the way.


If you're using HTTPS, the Referer is blocked by your browser.


If that separate domain can be used in the Chrome URL bar, count me in. Thought it would be even better if it could be done using the main URL in the Chrome URL bar somehow...


If the domain exists it can be used in the Chrome URL bar. Any search engine that puts its queries in the URL can be added manually to Chrome.


The whole point of using POST is to remove the queries from the URL, though.


In Opera, when setting up search engines [which can be used directly in the address bar], you have the option of using POST instead of GET.


Why use chrome if you seek privacy?


Why not? Could you please elaborate.


Better make that the default then for secure searches!


Some people, perhaps most, don't like POST because it is annoying to copy URLs and use the back button. I'm not opposed to making it the default for https, but I don't want to make a default that most people don't want either.


That's a good point. Maye you could put a small warning on the page if they disable 'POST' that there might be leakage of their search terms to the sites they visit?

"Hello dear user, you probably know exactly what you're doing, but on the off-chance that you don't, please realize that disabling the POST option for https connections may leak your search terms to the visiting site".

Or something to that effect.


The problem is not that the referer leaks to the sites you click through to. The referer leaks to sites as soon as the results page is displayed because there are externally hosted images embedded in the results page.


But a post would take care of that right ?

After all, then the referring url would just be the search page without any parameters.

So if you switch off the post then leakage would occur, with the 'post' enabled you're fine.

edit: I see what you mean now, if they switch to 'get' mode it leaks the info even to sites they don't visit. One more good reason to use that post!


But what's the point of an https search if it's not really secure? The only thing the user is trying to hide is the query, and it's not being entirely hidden.

For users who are annoyed, you could explain to them somewhere on the site that you don't put it in the URL because it exposes their query. If they really want a secure search, I imagine they'll understand the tradeoff.


Does the POST switch works when you are using DDG via the Google Chrome URL bar? (I have it set at "https://duckduckgo.com/?q=%s)


I search using ixquick.com because of their good privacy measures and they searh using post.

They have a button on their website that adds it to you list of search engines in chrome (chromium in my case), so it definitely can be done.

EDIT: it turns out that ixquick switches to a different URL and uses get instead of post for this, so forget what I said.

PS: the opera urlbar inline search has post support.


Doesn't seem to, no. The page I land on is still http://duckduckgo.com/?q=hacker+news.


Encrypting the url parameters is the way to go:

Requests for:

http://duckduckgo.com/?q=hacker+news

Should HTTP redirect to something like:

http://duckduckgo.com/?enc=34g7h3giuh3g

Where 34g7h3giuh3g would be the ciphertext generated by encrypting "hacker news". That page knows what the search term was because it will have decrypted the parameters on the server side, but any referers would just contain "garbage", and it would also mean people can copy/paste the address bar about.


all the other person has to do then is perform a search to see the secret terms. granted, it's better than it showing up in plain text in the referrer, but you could easily write a script to scrape the actual terms...


What if you use the IP address of the user as a seed for the encryption? Then if someone else used the same key from a different IP they'd get different search terms?


That embeds the IP in the process and could theoretically be reverse-engineered.


Are there session ids? I assume that HMAC(secret + sessionID + ip + search terms) would be fine.


No sessions.


I see you do settings through a cookie or URL params. I'm out of ideas unless you hash the cookie + ip for a session ID fir that purpose.


Yes, you're right. I didn't think of that.


Moving such stuff to "settings" is hardly a good solution for wanting to stay anonymous. After all, it probably means you'll have to be logged in.


There are no accounts and these settings can be set without cookies as well.


Furthermore, the setting cookies are simple transparent cookies (f=-1 to turn off favicons), they are not tracking cookies. They are only used when you set the preference and then only contain your preference itself.


How do you do it without cookies? Changing the settings for each search?


You hardwire a specific URL query string into your browser search bar. Here are the params: http://duckduckgo.com/params.html


Good catch, really. Trivial to fix fortunately. Not that I care too much about super secure secret searches, but if they're advertised as such they should be. And even then, you probably should assume they're not. (secret).


> Not that I care too much about super secure secret searches, but if they're advertised as such they should be.

Exactly. Sure you could use Tor and disable everything and log on a random wifi from a stolen computer that will then be smashed to bits and melted, but that's beside the point; if DDG advertises privacy, it should do everything it can to deliver on that promise.


I can't figure out what the trivial fix is. Would proxying all S3 requests through his server fix the issue, or do the headers get passed through to Amazon anyway?


One fix is to not directly send the users to the result page. Instead link to a redirect script on the ddg servers i.e, duckduckgo.com/goto.php?link=http://search-result.com/ and then have goto.php remove the REFERER from the request headers.


Not sure if that will work in all browsers, iirc a 301 or a 302 can still pass those headers on. The only trick I know of that will not do that is by using a 'meta refresh' with a time set to '0', but that has bad implications for the working of the 'back' button.


I haven't tested this out, but I don't see why something like this wouldn't work...

<?php

   header("Location: the-result.com");

   header("Referer: ");
?>


Referer is a header the browser sends, Location is a header the server sends. Also, the Location header needs either a relative url on the local machine or a fully qualified one. In this case it would have needed a fully qualified one.

So maybe you should have tested it ;) ?


I wouldn't say there is a trivial fix. One option would be to encrypt the search term client-side with blowfish (or some other 2-way hash scheme) and use that as the url string.


Erm.. You probably mean something like a "symmetric block cipher", such as Blowfish. The point of (cryptographic) hashes is that they're one way only.

But as for a fix, at least Opera offers the option of never sending referrer information. That would be enough in this case.


Firefox offers that option as well - you just have to go to about:config and set network.http.sendRefererHeader to 0.


Don't put offsite images in secure searches.


These aren't really offsite images. ddgw.s3.amazonaws.com is a server instance run by DDG. It just happens to live in Amazon's cloud.


Yes, but the headers are sent in plaintext.



That was about the headers to the image server which was not using https at the time I believe ?


No, they aren't.


Not any more but they were when that was written. See: http://news.ycombinator.com/item?id=1362122


I think you might be confusing S3 with EC2? I think "offsite images" is an apt description for stuff served from Amazons web frontend to S3... DDG don't run it, they just upload the content to it.


If super secret squirrel searches are your need, you may wish to look at things like, disabling Referrer, Tor, Disabling Flash completely (as it represents it's own version of a cookie), disabling cookies, regularly DBANing your system, and even then realizing that you are still screwed according to EFF's research in their Panopticlick project (https://panopticlick.eff.org/)


True. torbutton does a lot of that, however, so if you enable it and completely disable javascript you can get your browser uniqueness down to about 1 in 1500


The heck are you talking about?

Sure, Amazon S3 could be spying on your searches just like any web host (especially a cloud host) could spy on connections to its customers' sites. I don't think this is very likely, however.

Nothing shown in this blog post indicates that DDG's S3 account is logging IPs or that there's anything at all wrong with the privacy policy.


You misunderstand. The images are being served from Amazons web frontend to S3. They probably log the requests, not for "spying" reasons, but simply because that's what people tend to do when serving websites; log the website requests that are made against it. Because they do that, a government could contact Amazon and ask them for the data.

Knowing that, DDG may as well log the IP+Search themselves, as it makes no difference. The data is already logged and retrievable by contacting Amazon, therefore what is the point in DDG not logging it anyway?


Isn't this sent by the user's browser, and not by DDG? It's a client-side configuration issue -- turn off referrers in your browser. Your browser would send the same header if you clicked through a DDG search result.


It is a duckduckgo issues because they should not be loading third party graphics. The referrer will be on in almost all cases. People just look at the 'lock' item and will assume they're safe.

Whatever happened to that 'mixed content' security warning, I thought that was pretty effective against stuff like this?


Because complaining about "mixed content" was completely pointless in the general case -- you could mix content across multiple https sites, but the certificates were never correlated, and the second site would still get all the headers just the same.

It's extraordinarily user-hostile, and would just add to the pile of pointless wankery that keeps people from using https (see also: shitting all over self-signed certs when in reality the CAs don't do shit for their rent and identity is useless anyway).

The actual solution is to never send Referer headers for cross-site requests from an HTTPS page.


> The actual solution is to never send Referer headers for cross-site requests from an HTTPS page.

That should be on someone's todo list at the major browser vendors. You're right, there really is no point in sending that header along, and sending it can cause all kinds of trouble.


Also in response to this original suggestion (months ago on reddit), I started serving all images over https. I do not think the referer is sent in plain text, e.g. http://stackoverflow.com/questions/499591/are-https-urls-enc...


I think the combination of a POST instead of a GET and the images via https should be pretty much bullet proof. If someone is stupid enough to re-enable GET requests for their https connections they have only themselves to blame if there is any leakage to the target sites.


"First they ignore you, then laugh at you. Then they fight you. Then you win." (Gandhi via Robbie Williams)

Rightly identified and well addressed by epi0Bauqu. And also worth noting that this sort of constructive criticism is a great sign of positive market traction. I'm bumping into DDG more often on the web, which is excellent.


POSTs and HTTPs are good, but here's a few other ideas:

1) Embed the images in iframes, which are then embedded in the page. The iframes will swallow the referrer, so provided they are hosted somewhere where logs are discarded then it should be fine (I'd want to cross-browser test this before relying on it though).

2) If the browser supports data: URIs, then embed the image in the page. Obviously this might have some costs, but you could do it for HTTPS only perhaps?

3) Request the images in Base64 encoded form (or binary strings) via XMLHttpRequest after the page has loaded. You can overwrite the Referrer header in XMLHttpRequest

4) Preload the images BEFORE the search is done (ie, on the search page). With appropriate headers Amazon won't see a request (this won't work from the browser bar, though. I could imagine some ways around that, but I'm not sure they are worth it)


> 1) Embed the images in iframes, which are then embedded in the page. The iframes will swallow the referrer, so provided they are hosted somewhere where logs are discarded then it should be fine (I'd want to cross-browser test this before relying on it though).

https will swallow referer automatically.


Which is why I said "..HTTPS is a good idea".

There are valid reasons why serving everything under HTTPS isn't always a good idea. The obvious one is CPU cost, but cache performance can also be affected. See, for example: http://blogs.msdn.com/ieinternals/archive/2010/04/21/Interne...

http://blog.pluron.com/2008/07/why-you-should.html

(I'm not saying that https isn't the best option. I'm just pointing out other options that can work with plain http.)


Almost all web servers keep a log of HTTP requests if only to weed out troublesome pests.

The worry is not that the information is sent. The headers, IP address, and search query is always sent to any search engine regardless of their privacy policy or whether it's sent as POST or GET, so the worry is not that information is being sent, but rather that the query string in GET requests is most likely kept in a log file at least for a few days.

If Duck Duck Go was upfront about how long this HTTP request log was kept, would that make the default search with GET requests acceptable? I think having search queries sent by default as POST would be irksome for a default setting.


This isn't DuckDuckGo's fault.

People serious about privacy will be using a proxy with flash/javascript disabled and headers scrubbed.


The complaint is that the search terms could potentially be logged.

"They certainly could log that information if they wanted to."

This is a problem with ANY site and ANY system that I know of.


Right, if a user is worried about this the onus is really on them to ensure that their browser has referrer-sending disabled.


They basically say:

"We don't log your IP address with the search term"

As it stands, they should append this to the end:

"but the architecture of our site lets Amazon log it."


>As it stands, they should append this to the end:

>"but the architecture of our site lets Amazon log it."

This is the best suggestion yet. Of course there are numerous hacks possible to make it more secure, but each one at the cost of user experience. DDG has so many things going for it, I don't think for the majority of users security is at the top of their list.


Do a http redirect to a url which has an encrypted version of the search term in the url parameters before displaying the search results. That fixes the problem without hitting the user experience.


That's more accurate.

However, at the end of the day, everything can be logged anyways. If someone is really concerned about this, they shouldn't be using the internet.

Edit: I want to be clear. If someone is concerned about the fact that things can be logged, then they are better off not using the internet, as pretty much everything has the potential to be logged. This is, after all, what the complaint is: the potential of being logged.


Well doesn't duckduckgo use other search engine's results and combines them with its own? Or does that not involve sending the search terms to the other search engines?


Yes, but then it is duckduckgo.com doing the query. Here they're passing on the query in the referrer string and the users original browser makes the non-secure request to amazon.

That sort of gives it all away.

So if duckduckgo would just send back the answers to the query and not use a page that requests resources from third parties they'd be fine.


They aren't passing on the query in the referrer string. Your browser is. This story is a non-story. Referrer is just doing what Referrer does.


It's a fine story, and it is fixable. And it probably should be fixed, if only because fixing it is less work than talking about it. And due to a terrible oversight the header is actually mis-spelled and it came out as 'Referer:'.

Fixing this will make duckduckgo.com even better, which is awesome.


> And due to a terrible oversight the header is actually mis-spelled and it came out as 'Referer

Well, Cant blame DDG for it though

http://en.wikipedia.org/wiki/HTTP_referrer#Origin_of_the_ter...


No, definitely not, I just wanted to point that out.

I knew about this for years, having stared at a bug for a whole day before I realized it was the header that was misspelled. It never even crossed my mind until I was away from the keyboard for a few hours.


Except if the same user clicked through to a result, the server of the result page would receive the same referrer header w/ the DuckDuckGo query in it.


Not if he makes the 'POST' option the default for secure searches.


If you don't store the search term in the URL, then it doesn't get passed along in the referer. Store an encrypted version of the search term in the URL instead.


> Yes, but then it is duckduckgo.com doing the query.

Oh, yes, good catch.


Once again, bad defaults == class A bug.

Well exposed.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: