Hacker News new | past | comments | ask | show | jobs | submit login
Intelligent Tracking Prevention (webkit.org)
177 points by thmslee on June 8, 2017 | hide | past | favorite | 107 comments



I am surprised how complicated it is, they even use machine learning. It will look like a bug to developers when the third-party cookies will suddenly stop working without obvious reason.

Why don't just block third party cookies except where it is enable by the user? I think 100% of the sites I visit use third-party cookies only for tracking.

And of course this is not enough. Using a combination of an IP address and browser fingerprint allows tracking a user without any cookies.

And websites can still track users if they use a redirect through an analytics website (when a user visits a site for the first time he is redirected to an analytics domain, that redirects the user back adding an identifier to URL).


Ad networks can track you without cookies. Cookies are just an easy first step.

Any time any user hits any page with your javascript[1] in it: record ip address, referer (sic), locale preferences, available fonts, what's in their cache, what extensions they have, and anything else you can get. You can even track if they have certain 3rd party sites blocked.

Then it is a matter of bayesian guessing about who they are. Given that a typical person spends most of their time in only a few specific places (home, commute, work, favorite coffee shop/pub, etc) you can make very good guesses about which person is visiting your page after only a few visits... even if they have cookies disabled.

[1]And if they have js disabled they can't really use your site because js is basically required now. And you can still track that information and advertise to that cohort. How many people hitting the site from these 5 ip addresses, who use chromium, and have no cookies, and have js turned off? One? Well, then they just identified that person


We studied the behavior of many trackers in detail. We believe many track you solely with client-side state (including cookies, indexeddb, LocalStorage, The http cache, etc). It's true that some others use fingerprinting like you describe, and it's something we will think about in the future. But stateful tracking is still the most reliable mechanism, and for many trackers it's the sole mechanism.


Don't we have a few extensions mimicking the most common settings to defeat such fingerprinting ?


> I think 100% of the sites I visit use third-party cookies only for tracking.

You probably use sites that rely on 3rd-party cookies for single sign-on, too. Stack Overflow, for instance:

https://nickcraver.com/blog/2017/05/22/https-on-stack-overfl...


We've had to jump through numerous hoops to get SSO functioning across our sites on Safari due to their policy of rejecting third-party cookies.

When you control the full stack on all of your sites it's less of an issue but when you're using multiple vendors SAAS solutions with custom authentication it can be a huge pain.


The challenge lies in trying not to break any user-facing website functionality. Things like single sign-on, login with Google/FB, Share and Tweet This buttons, CDNs for static content, etc. We also had a goal of not blocking ads. These limitations require a more complex feature, but it can be on by default without upsetting users or publishers.


I can't upvote the complexity concerns in here enough. In effect you've banned 3rd party cookies, or at least that's the assumption that a lot of us will have to make to support Safari users because the state of their browser is harder to determine.


You should definitely code websites as if you don't have third-party cookies. It's always been our intent to ban them, and we're closing the loopholes as much as we can without breaking the web.


Yeah, we had to assume that because of the way iframes work in Safari. Its obviously great for privacy, but very frustrating from an application developer perspective. Too bad the interwebs are becoming more hostile and require these measures.

IMO the Webkit team would make developers' lives much easier if you more clearly articulated that intent; otherwise they have to guess the direction your team is heading and might build something in the short term that ends up breaking.


People aware of third party cookies usually disable them anyways, so don't assume third party cookies support is a given.


"Self-Destructing Cookies" is an extension for Firefox that automatically deletes cookies 30 seconds after you close the last tab associated with a site, unless you put it on a whitelist.

It is the most sensible and useful policy I have ever seen. Sadly, it can't be done as an extension in the new Firefox model.



That one looks great, it even supports the new container tabs with per container whitelists.



Pale Moon is lacking security wise, maybe waterfox is a more suitable alternative. Also Waterfox author plans to fund a startup to fork the firefox to support XUL and go from there: https://www.ghacks.net/2017/03/13/waterfox-dev-has-big-plans...


Thanks for the info, I never heard about that project before.


Yes it can see here: https://addons.mozilla.org/en-US/firefox/addon/cookie-autode...

The only thing it can't do yet is delete local storage due to the API not being implemented.


Actually I am against machine learning. It only makes the debugging more difficult. There should be simple rules and not some classifier one cannot understand without a math degree.


You make it sound like machine learning is some kind of black box. Machine learning is a huge field and there are many analytical methods within it that can offer direct understandable insights into a process.


Rescinding downvote to give an explanation.

Yes, ML isn't magic, but that's irrelevant to the parent's point: the issue is that websites a stable "browser API" to work against so they can know what the user will send back where and when.

If the user is selectively setting cookies based on the results of machine-learning some non-obvious then it becomes that much harder to debug "why didn't this part of the site remember who they were?"


But that's what I mean. Machine learning as a field is enormous. There are wide variety of implementations that can be 100% derived with every step of logic being open for people to analyze. It would be no different than building any other form of conditional system.


That's still missing the point: from the fact that your model derivation is intelligible, it doesn't follow that websites will know that you're using the model, or will be able to product consistent, desirable behavior for both you and everyone else using some different behavior.

Sites will break without warning for users until some engineer discovers that some ML model told the browser to only set cookies for a particular set of sites determined by IP, domain, "trustworthiness", and a few other factors engineers don't want to think about when designing the app.

If you're saying that you can publish the model openly with the expectation that sites will know you're using it, that's fine, and that's a valid approach, but that has nothing to do with ML per se, and is just another attempt at rearchitecting the browser/server interface with all of its associated issues, and continues the same arms race between sites that want to de-anonymize you.

In short: there's a tradeoff between "hiding information about yourself" and "providing a stable set of expectations for web sites to build off of"; ML can favor one of those objectives but not eliminate the tradeoff and somehow get the best of both worlds. The only way to get the best of both worlds is to rearchitect the API by which browser clients talk to webservers so that it's easy to separate out what you do want/need to tell the server vs what you don't.


That's definitely a valid consideration. I was more focusing on refuting OP's statement in implying machine learning is a black box because only a select few people are able to understand it due its complexity. I think I should have made that more clear in my original comment in order to avoid any confusion.


Third-party cookies are blocked by default unless enabled by the user. This has always been the case with Safari. Always! It is the second sentence of the second paragraph of the article.

> From the very beginning, we’ve defaulted to blocking third-party cookies


That's right, browsers should ask permission for a site to set cookies. Which also means there should be a prior user action for a site to even offer cookies.

For example, opening a site and receiving cookies with it: doesn't work any longer. Receiving cookies after submitting a POST request: keeps working.


Congratulations, you just broke nearly every website out there. This would trigger bot detection all over the place.

This also means that on load the first thing that will happen is an AJAX POST in the background, or the moment you press the mouse on anything on the page in case that gets further restricted.


What? I browse without cookies (even first-party), and apart from the WE USE COOKIES banner that ironically can't be acknowledged most sites work just fine.

Obviously shopping sites and similar get whitelisted.


If you don't interact with sites (POSTing, like the parent comment is talking about) you obviously won't notice the breakage. Since you whitelist the sites you interact with (shopping sites and similar) you are already sidestepping the problem.


POST requests work perfectly fine without cookies.

Edit: I just saw your comment in the sibling thread. Guess I won't be interacting with your sites any time soon.

(brb updating my bots and scrapers to use cookies)


No shit. It's the actual server side implementation that will be like, "The fuck you doing?". Because you'll be breaking the web.


My proposal doesn't break anything. If site authors change things users will probably react against the sites. Currently browsing with cookies disabled works fine almost everywhere.

Moreover, a browser can easily differentiate an AJAX POST from a user submitting a form.


Except that most webapps are entirely javascript based and thus AJAX nowadays. But I guess we're just conveniently ignoring this?

And I'm not talking about changing anything. On most of my websites if you do a POST without having a session id (which gets assigned on first pageload, a GET), then you are a bot and your POST will be blocked. Byebye. There are many such schemes out there.


I wish the libraries I use to make sites made it easy to only send cookies after logging in, as opposed to before.


If you login to apps with Facebook, Google, or Microsoft, you use third party cookies for something besides tracking.


How is that true, Oauth2 doesn't set any cookies, it uses query strings, https and redirects.

Maybe you are thinking about displaying the logged in state for things like Like Buttons or other social widgets that you'd be signed into such as Disqus or Facebook comments?


Logout. The standard mechanism for OAuth logout is to iframe the relying parties (RPs aka apps) which is a third party cookie context. Redirecting to each RP is substantially slower because logout is serialized rather than parallelized and requires well behaved RPs as any badly behaved RP breaks logout for the rest of the RPs.

(This is why some browsers have a setting like "third party cookies after visited in a first party context" - first party for login, third party for logout. But advertisers screwed this up by asking sites to do full frame redirects through them. )


>>> The standard mechanism for OAuth logout is to iframe the relying parties

No it isn't.

OAuth2 doesn't specify cookies in the entire spec and definitely doesn't require third party cookies, especially for logout.

Here's the RFC: https://tools.ietf.org/html/rfc6749


OAuth is a family of RFCs, including 6749, 6750, and others. While 6749 is the minimum for OAuth2, in the specific case of the three largest providers under discussion, many other extension specs have been implemented. The OpenID connect logout spec has become the de facto logout spec for OAuth (as the OAuth RFC makes no mention of logout). All three providers mentioned implement it as the primary logout mechanism on the web.

Spec: http://openid.net/specs/openid-connect-frontchannel-1_0.htm

Edit: I should say, this spec also doesn't require cookies - you can use HTML5 local storage or another mechanism. In practice however, if you log into a Google service like YouTube, or a Microsoft service like Outlook, or a Facebook third party like Spotify, they use cookies.


Facebook widgets are used for tracking user across the web too. So one cannot protect user from being tracked without breaking those widgets.


How do you implement browser fingerprinting ?


Unintended consequences of this will be the complete dominance in affiliate and ad networks of major consumer sites like Google, Facebook and Amazon who already own products visited daily. It also means larger focus of all ad networks to aquire consumer sites or buy redirects or hijack users somehow to get their ads to show/track. Ads and addictive sites are going to feed off of this rule.


Although your point is valid, currently, Google uses quite a lot of other domains for analytics and tracking than google.com. For analytics, some (unknown to me) things are served from google-analytics.com, and I'm seeing cookies from googlesyndication.com and doubleclick.net as well.

As with many things in ML and NN, the premise of the tech is that it won't be a 100% perfect solution, but it'll hopefully be good enough or better than what you have.


> Google uses quite a lot of other domains for analytics and tracking than google.com

... for now. It's just a matter of time before analytics and tracking will move. Just like maps has moved to google.com in order for search to also get access to the current location (which is a permission I'd gladly give to maps, but not to the search engine)


They can do that, but it's a delicate choice: if google.com gets flagged as a "tracker domain", the google.com cookies will be deleted after 30 days of no interaction with google.com.

Probably not an issue for most of the users, but for those who don't use it very often, they'd get logged out. Not sure Google wants that.


I think the original point here is that almost everyone interacts with google.com daily.

As such, if a company with a non-add business attaches a tracking business to their existing domains, they can circumvent the deletion policies.


You can disable using precise locations in Google search, and Google will no longer request your location except for when there is a maps card in the result (which makes sense)


I get why this needs to exist, but it feels more complicated than it should be and that sort of “magical” decision that’s based on ML is going to confuse both web developers and web users. It might be worth it, but I don’t think it is just yet.

On a side note though, it seems Apple is doubling down on making decisions for its users. This is one example, another is do not disturb when driving. These sort of features make a lot of assumption and use complex logic to decide for users, even when the users don’t need that. I think that’s concerning.


Tracking prevention, do not disturb when driving, iMessage communication encryption (and refusal to break it), OS encryption, and other decisions that Apple is making for its users seem quite positive. I'm not against any of these decisions. On the contrary, I think they're great! (Let's ignore developer-centric issues for the moment as we're a small fraction of their user base.)

But I agree it is a concerning trend. The situation is analogous to government-mandated morals. While I might be expected to welcome legislation that enforces my beliefs on the rest of the population, I personally would prefer that the government stayed completely out of making these decisions at all, because it's only a matter of time before they make a decision that does not match my beliefs.


> On a side note though, it seems Apple is doubling down on making decisions for its users. This is one example, another is do not disturb when driving. These sort of features make a lot of assumption and use complex logic to decide for users, even when the users don’t need that. I think that’s concerning.

All of the reports I'm reading say DNDWD is opt-in.


We came up with a hand-coded deterministic classifié that gives similar results. Part of the reason to use ML is to enable us to adapt to changes in the web.


Both of these things can be turned off in settings and DNDWD will ask you if you'd like to enable it.


Yeah, these aren't forced decisions. The driving mode is opt-in the first time.


Why is "cookies can't be used in 3rd-party context" not turned on from day 0?

Right now isn't it a cat and mouse game of holding many domains/subdomains, and passing the cookie flag across them. It seems technically possible to cancel out the protections of ITP. However, deciding cookies can never be used outside of 3rd party context may mean I have to make some additional logins to services from time to time, but have much better tracking protection.

Is my assessment faulty?


> Why is "cookies can't be used in 3rd-party context" not turned on from day 0?

It's been the long-standing default of Safari to block 3rd-party cookies. As is described in the excellent post this links to, there is some functionality that 3rd-party cookies enable that can be beneficial. They use single-sign-on as an example.


Single-sign on can be implemented without third-party cookies using only redirects. That is how OpenId and OAuth work.

But social network widgets (Like buttons, comment form) would break without third-party cookies.


If I understand correctly, "Single sign-on" means that when you sign in at one site, you're automatically signed in at others. Like when you sign in to Gmail, you get logged in on YouTube without any extra clicks.

OAuth is typically a user-initiated action, like "Sign in with GitHub" or "Sign in with your domain" (https://indieweb.org/IndieAuthProtocol).

Though I think SSO should be possible with OAuth — maybe with a hidden iframe that does the auth process, or something with CORS requests… Or maybe a custom redirect-based protocol would be better.


OAuth can be modified so that it would not require any user action. When user visits a site, he gets redirected to the authorisation domain that checks whether the user is logged in and makes a redirect back to the original site, adding authorisation result to URL.


I guess it is because of some legacy sites that would break. Cross-domain interaction rules are very complicated and poorly designed. For example there is no CSRF protection on the browser side and every developer has to implement it on the server side.

Some of the things that would break without third-party cookies are social network widgets - you would be unable to like some post or add a comment using Facebook login on a third-party site.


It would be nice if all new HTML5 features and APIs release from now on turns on 'Modern Web Security Mode', which fixes CSRF and a bunch of other security issues that are not backwards compatible.


The untold story here (that is, however, indirectly acknowledged by Brave) is that user tracking is required by websites to prevent being completely defrauded by ad providers / advertisers being defrauded by click bot farms. Without providing viable alternative incentives, this will just move the ad driven part of the web into closed networks (Twitter/FB) where everything is single domain.

Is this really the intended outcome?

(It's nice to see Brave have an effect, though.)


> user tracking is required by websites to prevent being completely defrauded by ad providers / advertisers being defrauded by click bot farms.

Maybe don't pay for clicks, only pay for sales.


How would you keep track of the sales without cookies? Most people don't buy on the first visit.


Unless I'm misunderstanding, the technology described in the article is designed to fight against cross-site tracking (i.e. following users all over the internet to different sites)

I suspect few users would object to a single web property remembering them across multiple visits.


So you'll have to rely on the webshop to keep track of purchases? I'd rather have a third party involved that has to keep both the publisher and the advertiser happy.


That gets defrauded via a technique called "cookie stuffing".

https://en.wikipedia.org/wiki/Cookie_stuffing


...which can be stymied by using first-party cookies for lead attribution, which is exactly what webkit is recommending that sites switch to. Voila!


The tracking is hated and unwanted, so hopefully it will go away and bring down with him the false assumption that online advertising that pretend to require this tracking. To be perfect it would be the end of the tracking ads business model.


As they recommend, you can easily do ad attribution on the server. Simply have the ads link to yourdomain.coffee/widget/adnetwork, see if they convert, have a cold drink, and stop paying networks that don't convert.


What/who is "Brave"?



It's a "privacy first" browser. https://brave.com


Isn't it a "we remove other people ads to put ours instead" browser ?


I think it's fair to say that both of our quoted descriptions are valid. Brave's advertising model is a bit different from the standard model for web advertising. Whether or not it'll work is a different story.


Website owners should rely on metrics like order or calls count rather than try to win against botnets (though I know the case when an ad network was hiring people to make calls to a client (a realty company) to make better statistics for their ad campaign).

Of course bots can simulate visiting different websites and even logging into them if that is profitable.


I'm running a fairly old browser here, I have the ability to set when to accept cookies/third party cookies and which sites to whitelist and to set the retain policy whatever I wish to. Also, I can add extensions like Self Destructing Cookies on a whim.

To me it looks more like Overengineered Tracking Prevention to be able to shovel in 'Machine Learning Classifier' for buzzword compliance.


You can, but are you better than the classifier at actually doing? Have you done it for every web site you visited today?

Besides, there is some value in getting widespread blocking. When you're the only one who blocks, that's a signal. When Apple's new doodah blocks mostly-the-right-thing for 10% of the users, that's quite different. You get to be part of a crowd.


I would think so, yes, mainly because I don't have to take advertisers into consideration. And I can do a lot more for my browsing experience blocking wise, much much more.

Invalidation is hard. Once you accept you have to tread down the road of accepting and deleting cookies and local storage I see no reason to wait 30 days. Why not purge them right on (tab)closing?. If you really care about privacy that is. Which could have been set to be default on the next update btw.


My browsers are set to purge all cookies upon exit. And I typically exit at least daily.


Then you log in to HN again, which enables HN to tie your new cookie to your old cookies and tell all its advertising and tracking partners.

HN has no such partners, of course. But many other sites do, and correlate cookies in order to tie devices together.


Quote: "From the very beginning, we’ve defaulted to blocking third-party cookies."

So what's the difference? Not everyone has enough time to waste to manually manage exceptions. So they're trying to make it smarter. If it works 90% of the time, it's still a big win, and you can still handle the remaining 10% yourself.


Difference is opt-out vs opt-in. If you care about your users privacy block first should be preferred.


Apple is quite good are targeting the lowest common denominator; most people do not know about these settings.


What prevents a tracking script to set a third party cookie first and then set a first party cookie that references the third party cookie. And when the script notices the third party cookie is gone, it sets it again on the next visit? With enough websites using it, you would only lose a couple of visits.


Cookies are called third party based on a context in which they are sent. A cookie itself is not third or first party, it can act as both depending on this context.


Yes, but only one of them is removed after a day.


I have so many questions that I'm sure will be cleared up over time.

I am very happy to hear this "classification" happens on-device. I am curious how opaque it is. I assume, like the rest of WebKit the classifier is open source? In Safari, both desktop and mobile, is there a way to see which domains/cookies are blocked/purged or what the time remaining on them is (assuming I don't revisit the page)?

Also, I assume this can easily be beaten be smart ad networks. "Hey, publisher, just use our JS snippet which loads your first-party cookie we may have set previously and then loads a JS URL with that identifier as a query param in the URL along with other fingerprinting info, and we may set that first-party cookie after load too."

I would personally love the option to not send any third party cookies not on the same domain as the frame (or maybe an opt-in) on a per-site basis as I can only see it breaking social logins and other lazily-dev'd items like comment board integration (most SSO uses redirects anyways). Surely if a website wanted to offer its own domain's cookie store, it could (just like you can with local storage).


You know, we put a "padlock" on the address-line for a reason. Why can't we just make it easy for anyone to inspect the persistent state associated with a website. Websites already make site-maps to tell search engines how they work, why can't websites tell end-users what data they create and use with your browser? Treat them like capabilities and let the user decide if your website gets to treat your localStorage as a stalking ground of your behavior or if they'll have to work even harder.

This kind of contract can do in both ways. The user can set up a permission set to give to the website. If the website balks then the user can go somewhere else or negotiate with their own principles how much they want to be "the product."


Always wanted something like what you suggested. It would be fair to the users, hence why it's not implemented (lol).


I theory cookies are great. They put the choice on the user to be remembered as the same person by either sending the cookie or not. The problem is that cookies were designed intransparently leading to them being percieved as non-consensual tracking rather than as the consensual mechanism that puts the user in control they could have been. These solution feel like a makeshift stopgap workaround when I think the actual solution involves creating a way to put the user back in active control if and how a site remembers you.


A machine learning model is used to classify which top privately-controlled domains have the ability to track the user cross-site, based on the collected statistics. Out of the various statistics collected, three vectors turned out to have strong signal for classification based on current tracking practices: subresource under number of unique domains, sub frame under number of unique domains, and number of unique domains redirected to.

What's preventing ad networks (or whoever this is meant to protect against) from gaming this system?


Presumably the model will be updated to adapt to a changing threat environment from ad networks.


What's the difference between this and the EFF's Privacy Badger? It's not a complaint, as I realise the EFF wants browsers to adopt such technology natively.


I haven't been able to find such an explicit explanation of he policies Privacy Badger uses. It is all very vague machine lerning talk.

I would like to see a clearer comparison as well.


Privacy Badger uses simple heuristics like cookie entropy estimation, domain prevalence (how many other domains embed that domain's stuff): https://github.com/EFForg/privacybadger/blob/master/src/js/h...

and static block lists https://www.eff.org/files/cookieblocklist_new.txt


So this new WebKit one is more complex?


Of course. They shoved goddamn machine learning into the WebKit one.


More on how Intelligent Tracking Prevention works and how it may benefit Facebook and Google: https://news.ycombinator.com/item?id=14514859


Does this affect Google analytics?


Probably not, but hopefully at some point google analytics will be made useless and irrelevant so unethical website owner and developer will stop giving away user privacy and data to google without their consent.

Hopefully the new EU law coming into effect next year will allow website visitor to sue the website owner for tracking him with google analytics without express consent and not allowing to opt out.


This is my main problem. GA loads a script that sets first party cookies.

I really want Safari to have easy integration points to make a good cookie whitelist interaction using plugins.

Sure it's painful the first few days having to gradually build up that list, but like little snitch, once you're passed that stage, you just set it to say no to any new domains setting cookies.

Then users could cooperate to build cookie whitelists...


> This is my main problem. GA loads a script that sets first party cookies.

Only when scripting is enabled.


Probably not. Analytics sets first party cookies.


Does anyone know if this break/affect affiliate networks? They use third-party cookies, I think. Any purchases after more than 24h since the click cannot be attributed to it.


I've had third party cookies disabled since > 5 years and never had a real problem. I'm not convinced that such a complicated solution is necessary.


I already disable 3rd party cookies, its always a good idea, believe me.

Why I say that? 1. No retargeting adv. 2. No site-to-site tracking 3. Less chances of identifying you uniquely.


So is this a personal suggestionby a developer in the webkit community? I see nothing that says this has landed anywhere?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: