No boundaries: Exfiltration of personal data by session-replay scripts

mindslight · on Nov 15, 2017

Home Depot does this in a way that consumes my whole upload bandwidth, dragging down the entire connection (moved and haven't gotten around to reintegrating the proper router with tc(8)). As a result, I've moved towards using Lowes to spec things out, even though it's a 45 minute drive and their products are of generally inferior quality. Good job, surveillance parasites - you're starting to kill your hosts!

(I'm sure Lowes is or will be doing something similar, as faux-competition duopolies tend to move in lockstep. But the outright callous boneheaded execution still amazes me).

skrebbel · on Nov 15, 2017

To be fair, FullStory spends a lot of time in their onboarding, UI and docs encouraging you to check and double check that anything sensitive is excluded. They broadcast this message so clearly that it's obvious that they take privacy seriously (or, about as seriously as any over-the-shoulder-peeking service could), and they strongly encourage their users to adopt the same stance.

This article makes it seem like their defaults are the only exclusion settings possible, which is very far from the truth.

I feel like FullStory is being blamed for trying to provide some minimal default exclusion settings at all. I assume the same holds for competing services.

I'm not saying that this means the core premise of this is wrong: there's many things to dislike about session recording services. But the article goes on and on about a few defaults, instead of focusing on the dangers of the core concept and loses the argument that way IMO.

kevinconroy · on Nov 16, 2017

Yes, the on boarding for FullStory is excellent and they go out of their way to help you try to get it right. Odds are you'll end up doing a better job at protecting privacy in FullStory than in your own log files, but YMMV.

seiferteric · on Nov 15, 2017

Does anyone know if ublock origin blocks this kind of stuff? Yet another reason to never disable it. I'm starting to realize it's a lot more than an ad blocker, but more like a firewall to protect the client against malicious sites with crypto miners, trackers and this stuff...

englehardt · on Nov 15, 2017

Hi, one of the authors here. We discuss this in the last section of the post. uBlock Origin uses lists to determine which requests to block. We tested the two largest, EasyList and EasyPrivacy, and both fail to block scripts from FullStory, Smartlook, and UserReplay.

gorhill · on Nov 16, 2017

I suggest uBO's medium mode, this blocks 3rd-party scripts by default.

[1] https://github.com/gorhill/uBlock/wiki/Blocking-mode:-medium...

alinspired · on Nov 16, 2017

thank you for suggestion in this unexpected place. i wish i knew of this uBO feature before!

ams6110 · on Nov 15, 2017

I checked the hosts file I normally use (MVPS) and they don't seem to be in there either, unless they serve the scripts from a non-obvious domain.

Also fun fact, TurboTax uses SmartLook.

maxerickson · on Nov 16, 2017

That doesn't seem very problematic to me. You are anyway trusting TurboTax with a bunch of sensitive information.

ams6110 · on Nov 17, 2017

Yes, but you may not realize that SmartLook is also getting all that information.

pmoriarty · on Nov 15, 2017

How about NoScript, RequestPolicy, and Privoxy?

Were the blocking capabilities of any of these tested by your team?

dylz · on Nov 15, 2017

Assuming it's used correctly ,noscript/requestpolicy should block these as they are 3rd party <script src>s

dsr_ · on Nov 15, 2017

RequestPolicy appears to have been killed by FF57.

GrinningFool · on Nov 15, 2017

uMatrix stops everything 3rd party from loading by default, but makes it easy to whitelist as you go. It makes browsing extra work for most sites (it adds the step of having to eduguess what you need to enable for the site to function properly - very, very few do so when third parties are blocked) but I find it's worth it.

JoshMnem · on Nov 15, 2017

Try umatrix.

kevinconroy · on Nov 15, 2017

+1 for highlighting the privacy concerns, but -1 for blaming the software for not having strong enough defaults.

As someone who has integrated FullStory into a production site, I spent several days doing a careful audit of our forms and redacting fields from being tracked. FullStory has an excellent, universal account setting to automatically redact fields based on any CSS selector, so it's very, very easy to tell it to remove any sensitive information - or even all form fields! - if that's what the website publisher desires. Out of the box I found that it correctly blocked credit card fields and passwords correctly, and we were able to add additional fields that are sensitive.

Again, rightly so that a website publisher may want more information than you desire, but they could also store your info in plaintext in the database, making it easy for hackers to exfiltrate as well. Yes, this is another vector, but hardly the easiest one.

joosters · on Nov 15, 2017

It's still a broken process. Are you going to re-audit every future change to your web site? I doubt it.

The default, as the article suggests, should be to redact all fields, then let the company opt-in the fields that they really mean to record.

kevinconroy · on Nov 16, 2017

If a company is serious about security and privacy, they have to do those audits for every feature, regardless of if they use these tools. PCI requires this if you handle money online.

Still, you're right that many companies have a surprising lack of security. This vector of unintentional exfiltration may pale in comparison to the intentional mismanagement and lack of security focus internally. Equifax, anyone?

maxerickson · on Nov 16, 2017

What would be the problem with making the default to collect nothing (so the dev would have to opt in for everything they wanted to collect)?

kevinconroy · on Nov 15, 2017

Correction: I wrote redacting when I meant exclusion. My apologies!

itissid · on Nov 15, 2017

Read the article. Noob Q. Surely not ALL the browser tabs are vulnerable to the getting recorded? In other words only the tabs that are connected to websites that contain these recording JS scripts are vunerable, correct?

scarmig · on Nov 15, 2017

Correct. I think there's a caveat if the two different tabs render two different documents but on the same domain, interactions on each could be recorded by either tab.

tzahola · on Nov 15, 2017

Is there a browser extension that warns you about the various tracker scripts a website is utilizing?

drewda · on Nov 15, 2017

I use Disconnect in both Chrome and Firefox: https://disconnect.me/

It blocks scripts for analytics, social sharing, etc. and gives a simple UI for reenabling any (in those situations when someone wrote their JavaScript such that button presses fail if Google Analytics is not loaded -- which is not nice).

inetknght · on Nov 15, 2017

The closest I know of would be use of umatrix and general knowledge of which sites do what activities

JoshMnem · on Nov 15, 2017

Umatrix is great.

servilio · on Nov 15, 2017

Privacy Badger can do this.

tzahola · on Nov 16, 2017

Thanks!

jlgaddis · on Nov 16, 2017

Anyone know where I can find a list of the domain names used by these companies are? I want to block everything from all of their domains.

phkahler · on Nov 15, 2017

DAMMIT. Once again the question that immediately come to mind is "Why the FUCK do browsers facilitate this shit?"

C'mon you stupid web devs on HN tell me again all your excuses to need these capabilities. Sorry to generalize to all those of you who don't do this, but many of you still want those capabilities that have opened the door. And those browser devs... It's like they compete to sell out the users by adding "features".

Myztiq · on Nov 16, 2017

Ok, here's a real example of how fullstory can help provide a better product. Bug reporting. A user gets a JS error on the page, but say my application is pretty complex, and the state machine I've built can't handle a particular state. Well, I can easily walk through the recording of what the user had done and reproduce the issue.

I can actually hop on a call with this user and walk them through how to do something, while looking at what page and what inputs they have filled in. This takes frustrating back and forth of "What do you see now?" that happens without this tool.

Say I want to influence user decisions by offering subtle cues to push them towards something that will be overall beneficial for them. By watching certain key users we can know what frustrates them (erratic mouse movements, long time searching for features etc) and what things they grok easily.

The article totally washes over a super important feature of fullstory, excluding elements[0]. When you include a simple class name or specifically selecting what you'd like to exclude.

[0] http://help.fullstory.com/technical-questions/exclude-elemen...

kevinconroy · on Nov 16, 2017

+1 - having session recordings is a HUGE win to help track down hard-to-reproduce bugs. You can get the strangest bug reports, and by watching the session, as a dev, you can instantly translate it into the technical terms that the end user lacked and you can fix the problem.

It's also great for ad-hoc usability testing to see how people are using your features, where they slow down to read, what elements they try to click on but can't, and other UX improvements that you'd pay consultants six figures to put in a report for you.

phkahler · on Nov 16, 2017

You guys are illustrating my point nicely. I have no doubt that session recording is very helpful in your debugging activity, and provides feedback for site design. The problem is that those capabilities are often used against the user as well. The people who enabled you to do that also enabled others to do bad things. The attack surface has been hugely increased in the name of convenience for developers. Remember, with the internet the whole world is on your doorstep and that doesn't consist of just developers trying to get their UX right.

mschuster91 · on Nov 15, 2017

> but many of you still want those capabilities that have opened the door

Rest assured the majority of (web) developers does not like this crap a bit. Most of the pressure to add hundreds of analytics toolkits, trackers or these snoopers come from marketing - they (or worse, the C-level execs) get convinced that they need to integrate tool XYZ to "stay competitive" or "improve their customer retention" or whatever buzzword goes today, and the devs at the bottom of the chain are left to implement it.

cotillion · on Nov 15, 2017

Sometimes its just laziness.

I was forced to add GTM to a site because it meant marketing could just hand over the GTM login and a pile of money to another company which could then provide them with pretty reports on what the customers were doing. The analytics company promised to not do anything bad so it was OK.

And that was after an incident where the entire site was turned purple by another external JavaScript...

mschuster91 · on Nov 15, 2017

> And that was after an incident where the entire site was turned purple by another external JavaScript...

lol, what was the root cause? Defacement/scriptkiddie attack or a "background covering" ad that did not recognize the content area and paint over the whole screen instead?

cotillion · on Nov 15, 2017

The script was for a yearly user survey run by a small company. It has an awards ceremony attached so I think thats why some marketing people like it. After the survey ended the script was not removed of course.

Around a year later the site changed color when their script started injecting a new stylesheet into our site. They never really said what happened only that they had restored the old version of the script. Maybe some developer pushed dev code to the old production url or maybe they were hacked.

arkh · on Nov 15, 2017

All this started with the first ad scripts and invisible pixels.

It got really crazy with google analytics and then all social beacons. Now the only limit are the CPU and RAM available to the browser.

mschuster91 · on Nov 15, 2017

> Now the only limit are the CPU and RAM available to the browser.

... and in Germany, the data cap if you're on mobile. Video ads with autoplay, tons of trackers, no wonder I regularly hit 3GB a month, which is actually the biggest package my provider offers.

ggg9990 · on Nov 15, 2017

Browsers became a way to deliver general-purpose applications over the Internet rather than just structured documents. Once they did, then all of the features that are being maliciously used here have very useful applications like responding to mouse input.

eadmund · on Nov 16, 2017

> Once again the question that immediately come to mind is "Why the FUCK do browsers facilitate this shit?"

Because their developers were too busy wondering if they could to wonder if they should.

The modern web is like Moria: we've delved too deeply, and have awaken a slumbering horror.

acqq · on Nov 15, 2017

It's not "the browsers" it's primarly the culture of including "whatever" on the pages one maintains. It's so easy as it typically doesn't affect negatively those who decide to do so.

And it's typically not a decision of one person.

michrassena · on Nov 15, 2017

I was giving this matter some thought early today after getting some stupid malware popup on my phone (where the phone vibrates, says it has lots of viruses, etc) while using Chrome. It wasn't even on any kind of dodgy site, but most likely it was part of a banner rotation for an ad network.

There has to be a way for website creators to sandbox content which comes from third parties. I think we have to accept that all of these ad networks and third-party scripts aren't going away, so until everyone uses adblocking, what can be done in the meantime?

It's problematic that including content from elsewhere in your page (like in an iframe) would grant it "first class" behavior with equivalent privileges to one's page. I know it's opening a can of worms, but why not implement a way to show untrusted content?

On the other side of things, I'm using a relatively recent version Chrome. Why is it vulnerable to this dumb sort of popup alert? Why can't I escape from it easily? The back button doesn't work. The UI hangs so I can't close the tab. If I close and re-open my browser it just re-opens my tabs.

phkahler · on Nov 15, 2017

Several parts of your comment resonate with me:

>> There has to be a way for website creators to sandbox content which comes from third parties.

The whole 3rd party thing came about because advertisers needed to establish both ad-distribution and trust (they rightly don't want to pay for ads unless they're actually shown etc...).

>> It's problematic that including content from elsewhere in your page (like in an iframe) would grant it "first class" behavior with equivalent privileges to one's page.

I agree, this one is on the browser devs and the standard creators. Safety by default is the way to go but then people have nifty ideas that would not be possible with limitations.

>> I know it's opening a can of worms, but why not implement a way to show untrusted content?

That's exactly what the browser is supposed to be in the first place.

>> Why can't I escape from it easily? The back button doesn't work.

Because browser devs decided there was some reason the content should be able to alter or override the design of the viewer. I can think of no legitimate (to the user) use case for this. The list of stuff like this is long and ridiculous. They keep doubling down on it too. First we had cookies, but that wasn't enough so now there's a whole client-side database...

michrassena · on Nov 15, 2017

I guess what I find frustrating is that it's the same class of problem as Captain Crunch's whistle, in-band control. But, I think we're getting to the point where it has to be sandboxes all the way down (running things in sandboxes, inside of VMs, with memory protection, etc). But it's still not enough. This class of problem must be extremely difficult to solve. How do you run Turing-complete code which might be hostile? All these layers upon layers sophisticated tools, and to what end. To create a merger of TV and magazine advertising. But one could always turn the page or change the channel.

pdkl95 · on Nov 15, 2017

> How do you run Turing-complete code which might be hostile?

That's the key problem that (almost) nobody wants to talk about. We've been trying to solve the decision problem for a long time, and we already know that even relatively simple problems are provably undecidable[1]. Any real program will be much more complex[2]. An unknown program could generate any output it wants and we cannot know that without running it.

The only solution is to remove output methods. If a program can only e.g. draw to a framebuffer without the ability to trigger future network activity, the worst it can do is waste CPU & RAM. Allowing literally any interface to generate network activity (even indirectly) and people will find ways to tunnel data over that interface.

The original design for the web was (probably) safe. It didn't require anonymous Turing complete code, and provided quite a bit of functionality with declarative markup. It even allowed simple (but still useful) server-side applications with 3270-style forms (again, no code needed). This was wonderfully useful, reasonably safe, and most importantly it was understandable by both humans and machines.

Today's web requires trusting a new set of undecidable software on each page load. We're supposed to trust 3rd parties even though trust is not transitive. We're supposed to accept the risk of running 3rd party software even though risk is transitive. Without some sort of miraculous total reversal where browsers revert back to pre-javascript days, this is going to end badly.

[1] https://www.scottaaronson.com/blog/?p=2725

[2] If your program uses >7918 Turing machine states, [1] proves that it's behavior cannot be analyzed by ZF set theory.

phkahler · on Nov 16, 2017

>> The original design for the web was (probably) safe. It didn't require anonymous Turing complete code, and provided quite a bit of functionality with declarative markup. It even allowed simple (but still useful) server-side applications with 3270-style forms (again, no code needed). This was wonderfully useful, reasonably safe, and most importantly it was understandable by both humans and machines.

Thanks for that! So many web devs today can't even comprehend the idea that you can have interactivity without JS. Part of the reason for what we have came from offloading work from the server. I once made an Othello game with nothing client side but an auto-reload after a timeout - everything was in a CGI script on the server.

michrassena · on Nov 16, 2017

You make a lot of good points. I agree, there's no going back. The websites we have today offer too much functionality to ever go back to the way things were. While I like the ability to turn off JavaScript on a page by page basis, and use the web mostly as a library, most people would be completely put off by a "dumb" Web.

roywiggins · on Nov 15, 2017

There's some limited ways to sandbox with iframes.

https://www.w3schools.com/tags/att_iframe_sandbox.asp

michrassena · on Nov 16, 2017

This is something I was hoping would be possible in the future, but didn't realize an implementation was already in place. I'm surprised I've never heard of it.

There's one problem though. Here's how it should be implemented, in my opinion.

I disagree with making security opt-in.

ptaffs · on Nov 15, 2017

Agreeing with you. Customer service sites are prime targets for solutions where JavaScript is used to enable training, screen sharing, user tracking and advertising, and the up-side is so compelling, especially to non-technical customer service operations managers "those who decide to do so", that they can't conceive of how this little widget can scrape the whole screen, or worse, be maliciously tipped to steal or inject information. The site owners are at fault, but also the browsers should make it easier to disable active content. We can presume that most users aren't discerning, but I'd switch to both plain text e-mail and plain-text browsing, if possible, switching to bloated sites when i can't find an alternative, and like the other comment here, using Lowes instead of Home Depot because their site is less awful. I'll be sharing the link to No Boundaries page next time i'm talking with someone who wants to add a tracking feature to a page.