Hacker News new | past | comments | ask | show | jobs | submit login
No boundaries: Exfiltration of personal data by session-replay scripts (freedom-to-tinker.com)
198 points by ploggingdev on Nov 15, 2017 | hide | past | favorite | 51 comments



Home Depot does this in a way that consumes my whole upload bandwidth, dragging down the entire connection (moved and haven't gotten around to reintegrating the proper router with tc(8)). As a result, I've moved towards using Lowes to spec things out, even though it's a 45 minute drive and their products are of generally inferior quality. Good job, surveillance parasites - you're starting to kill your hosts!

(I'm sure Lowes is or will be doing something similar, as faux-competition duopolies tend to move in lockstep. But the outright callous boneheaded execution still amazes me).


To be fair, FullStory spends a lot of time in their onboarding, UI and docs encouraging you to check and double check that anything sensitive is excluded. They broadcast this message so clearly that it's obvious that they take privacy seriously (or, about as seriously as any over-the-shoulder-peeking service could), and they strongly encourage their users to adopt the same stance.

This article makes it seem like their defaults are the only exclusion settings possible, which is very far from the truth.

I feel like FullStory is being blamed for trying to provide some minimal default exclusion settings at all. I assume the same holds for competing services.

I'm not saying that this means the core premise of this is wrong: there's many things to dislike about session recording services. But the article goes on and on about a few defaults, instead of focusing on the dangers of the core concept and loses the argument that way IMO.


Yes, the on boarding for FullStory is excellent and they go out of their way to help you try to get it right. Odds are you'll end up doing a better job at protecting privacy in FullStory than in your own log files, but YMMV.


Does anyone know if ublock origin blocks this kind of stuff? Yet another reason to never disable it. I'm starting to realize it's a lot more than an ad blocker, but more like a firewall to protect the client against malicious sites with crypto miners, trackers and this stuff...


Hi, one of the authors here. We discuss this in the last section of the post. uBlock Origin uses lists to determine which requests to block. We tested the two largest, EasyList and EasyPrivacy, and both fail to block scripts from FullStory, Smartlook, and UserReplay.


I suggest uBO's medium mode, this blocks 3rd-party scripts by default.

[1] https://github.com/gorhill/uBlock/wiki/Blocking-mode:-medium...


thank you for suggestion in this unexpected place. i wish i knew of this uBO feature before!


I checked the hosts file I normally use (MVPS) and they don't seem to be in there either, unless they serve the scripts from a non-obvious domain.

Also fun fact, TurboTax uses SmartLook.


That doesn't seem very problematic to me. You are anyway trusting TurboTax with a bunch of sensitive information.


Yes, but you may not realize that SmartLook is also getting all that information.


How about NoScript, RequestPolicy, and Privoxy?

Were the blocking capabilities of any of these tested by your team?


Assuming it's used correctly ,noscript/requestpolicy should block these as they are 3rd party <script src>s


RequestPolicy appears to have been killed by FF57.


uMatrix stops everything 3rd party from loading by default, but makes it easy to whitelist as you go. It makes browsing extra work for most sites (it adds the step of having to eduguess what you need to enable for the site to function properly - very, very few do so when third parties are blocked) but I find it's worth it.


Try umatrix.


+1 for highlighting the privacy concerns, but -1 for blaming the software for not having strong enough defaults.

As someone who has integrated FullStory into a production site, I spent several days doing a careful audit of our forms and redacting fields from being tracked. FullStory has an excellent, universal account setting to automatically redact fields based on any CSS selector, so it's very, very easy to tell it to remove any sensitive information - or even all form fields! - if that's what the website publisher desires. Out of the box I found that it correctly blocked credit card fields and passwords correctly, and we were able to add additional fields that are sensitive.

Again, rightly so that a website publisher may want more information than you desire, but they could also store your info in plaintext in the database, making it easy for hackers to exfiltrate as well. Yes, this is another vector, but hardly the easiest one.


It's still a broken process. Are you going to re-audit every future change to your web site? I doubt it.

The default, as the article suggests, should be to redact all fields, then let the company opt-in the fields that they really mean to record.


If a company is serious about security and privacy, they have to do those audits for every feature, regardless of if they use these tools. PCI requires this if you handle money online.

Still, you're right that many companies have a surprising lack of security. This vector of unintentional exfiltration may pale in comparison to the intentional mismanagement and lack of security focus internally. Equifax, anyone?


What would be the problem with making the default to collect nothing (so the dev would have to opt in for everything they wanted to collect)?


Correction: I wrote redacting when I meant exclusion. My apologies!


Read the article. Noob Q. Surely not ALL the browser tabs are vulnerable to the getting recorded? In other words only the tabs that are connected to websites that contain these recording JS scripts are vunerable, correct?


Correct. I think there's a caveat if the two different tabs render two different documents but on the same domain, interactions on each could be recorded by either tab.


Is there a browser extension that warns you about the various tracker scripts a website is utilizing?


I use Disconnect in both Chrome and Firefox: https://disconnect.me/

It blocks scripts for analytics, social sharing, etc. and gives a simple UI for reenabling any (in those situations when someone wrote their JavaScript such that button presses fail if Google Analytics is not loaded -- which is not nice).


The closest I know of would be use of umatrix and general knowledge of which sites do what activities


Umatrix is great.


Privacy Badger can do this.


Thanks!


Anyone know where I can find a list of the domain names used by these companies are? I want to block everything from all of their domains.


DAMMIT. Once again the question that immediately come to mind is "Why the FUCK do browsers facilitate this shit?"

C'mon you stupid web devs on HN tell me again all your excuses to need these capabilities. Sorry to generalize to all those of you who don't do this, but many of you still want those capabilities that have opened the door. And those browser devs... It's like they compete to sell out the users by adding "features".


Ok, here's a real example of how fullstory can help provide a better product. Bug reporting. A user gets a JS error on the page, but say my application is pretty complex, and the state machine I've built can't handle a particular state. Well, I can easily walk through the recording of what the user had done and reproduce the issue.

I can actually hop on a call with this user and walk them through how to do something, while looking at what page and what inputs they have filled in. This takes frustrating back and forth of "What do you see now?" that happens without this tool.

Say I want to influence user decisions by offering subtle cues to push them towards something that will be overall beneficial for them. By watching certain key users we can know what frustrates them (erratic mouse movements, long time searching for features etc) and what things they grok easily.

The article totally washes over a super important feature of fullstory, excluding elements[0]. When you include a simple class name or specifically selecting what you'd like to exclude.

[0] http://help.fullstory.com/technical-questions/exclude-elemen...


+1 - having session recordings is a HUGE win to help track down hard-to-reproduce bugs. You can get the strangest bug reports, and by watching the session, as a dev, you can instantly translate it into the technical terms that the end user lacked and you can fix the problem.

It's also great for ad-hoc usability testing to see how people are using your features, where they slow down to read, what elements they try to click on but can't, and other UX improvements that you'd pay consultants six figures to put in a report for you.


You guys are illustrating my point nicely. I have no doubt that session recording is very helpful in your debugging activity, and provides feedback for site design. The problem is that those capabilities are often used against the user as well. The people who enabled you to do that also enabled others to do bad things. The attack surface has been hugely increased in the name of convenience for developers. Remember, with the internet the whole world is on your doorstep and that doesn't consist of just developers trying to get their UX right.


> but many of you still want those capabilities that have opened the door

Rest assured the majority of (web) developers does not like this crap a bit. Most of the pressure to add hundreds of analytics toolkits, trackers or these snoopers come from marketing - they (or worse, the C-level execs) get convinced that they need to integrate tool XYZ to "stay competitive" or "improve their customer retention" or whatever buzzword goes today, and the devs at the bottom of the chain are left to implement it.


Sometimes its just laziness.

I was forced to add GTM to a site because it meant marketing could just hand over the GTM login and a pile of money to another company which could then provide them with pretty reports on what the customers were doing. The analytics company promised to not do anything bad so it was OK.

And that was after an incident where the entire site was turned purple by another external JavaScript...


> And that was after an incident where the entire site was turned purple by another external JavaScript...

lol, what was the root cause? Defacement/scriptkiddie attack or a "background covering" ad that did not recognize the content area and paint over the whole screen instead?


The script was for a yearly user survey run by a small company. It has an awards ceremony attached so I think thats why some marketing people like it. After the survey ended the script was not removed of course.

Around a year later the site changed color when their script started injecting a new stylesheet into our site. They never really said what happened only that they had restored the old version of the script. Maybe some developer pushed dev code to the old production url or maybe they were hacked.


All this started with the first ad scripts and invisible pixels.

It got really crazy with google analytics and then all social beacons. Now the only limit are the CPU and RAM available to the browser.


> Now the only limit are the CPU and RAM available to the browser.

... and in Germany, the data cap if you're on mobile. Video ads with autoplay, tons of trackers, no wonder I regularly hit 3GB a month, which is actually the biggest package my provider offers.


Browsers became a way to deliver general-purpose applications over the Internet rather than just structured documents. Once they did, then all of the features that are being maliciously used here have very useful applications like responding to mouse input.


> Once again the question that immediately come to mind is "Why the FUCK do browsers facilitate this shit?"

Because their developers were too busy wondering if they could to wonder if they should.

The modern web is like Moria: we've delved too deeply, and have awaken a slumbering horror.


It's not "the browsers" it's primarly the culture of including "whatever" on the pages one maintains. It's so easy as it typically doesn't affect negatively those who decide to do so.

And it's typically not a decision of one person.


I was giving this matter some thought early today after getting some stupid malware popup on my phone (where the phone vibrates, says it has lots of viruses, etc) while using Chrome. It wasn't even on any kind of dodgy site, but most likely it was part of a banner rotation for an ad network.

There has to be a way for website creators to sandbox content which comes from third parties. I think we have to accept that all of these ad networks and third-party scripts aren't going away, so until everyone uses adblocking, what can be done in the meantime?

It's problematic that including content from elsewhere in your page (like in an iframe) would grant it "first class" behavior with equivalent privileges to one's page. I know it's opening a can of worms, but why not implement a way to show untrusted content?

On the other side of things, I'm using a relatively recent version Chrome. Why is it vulnerable to this dumb sort of popup alert? Why can't I escape from it easily? The back button doesn't work. The UI hangs so I can't close the tab. If I close and re-open my browser it just re-opens my tabs.


Several parts of your comment resonate with me:

>> There has to be a way for website creators to sandbox content which comes from third parties.

The whole 3rd party thing came about because advertisers needed to establish both ad-distribution and trust (they rightly don't want to pay for ads unless they're actually shown etc...).

>> It's problematic that including content from elsewhere in your page (like in an iframe) would grant it "first class" behavior with equivalent privileges to one's page.

I agree, this one is on the browser devs and the standard creators. Safety by default is the way to go but then people have nifty ideas that would not be possible with limitations.

>> I know it's opening a can of worms, but why not implement a way to show untrusted content?

That's exactly what the browser is supposed to be in the first place.

>> Why can't I escape from it easily? The back button doesn't work.

Because browser devs decided there was some reason the content should be able to alter or override the design of the viewer. I can think of no legitimate (to the user) use case for this. The list of stuff like this is long and ridiculous. They keep doubling down on it too. First we had cookies, but that wasn't enough so now there's a whole client-side database...


I guess what I find frustrating is that it's the same class of problem as Captain Crunch's whistle, in-band control. But, I think we're getting to the point where it has to be sandboxes all the way down (running things in sandboxes, inside of VMs, with memory protection, etc). But it's still not enough. This class of problem must be extremely difficult to solve. How do you run Turing-complete code which might be hostile? All these layers upon layers sophisticated tools, and to what end. To create a merger of TV and magazine advertising. But one could always turn the page or change the channel.


> How do you run Turing-complete code which might be hostile?

That's the key problem that (almost) nobody wants to talk about. We've been trying to solve the decision problem for a long time, and we already know that even relatively simple problems are provably undecidable[1]. Any real program will be much more complex[2]. An unknown program could generate any output it wants and we cannot know that without running it.

The only solution is to remove output methods. If a program can only e.g. draw to a framebuffer without the ability to trigger future network activity, the worst it can do is waste CPU & RAM. Allowing literally any interface to generate network activity (even indirectly) and people will find ways to tunnel data over that interface.

The original design for the web was (probably) safe. It didn't require anonymous Turing complete code, and provided quite a bit of functionality with declarative markup. It even allowed simple (but still useful) server-side applications with 3270-style forms (again, no code needed). This was wonderfully useful, reasonably safe, and most importantly it was understandable by both humans and machines.

Today's web requires trusting a new set of undecidable software on each page load. We're supposed to trust 3rd parties even though trust is not transitive. We're supposed to accept the risk of running 3rd party software even though risk is transitive. Without some sort of miraculous total reversal where browsers revert back to pre-javascript days, this is going to end badly.

[1] https://www.scottaaronson.com/blog/?p=2725

[2] If your program uses >7918 Turing machine states, [1] proves that it's behavior cannot be analyzed by ZF set theory.


>> The original design for the web was (probably) safe. It didn't require anonymous Turing complete code, and provided quite a bit of functionality with declarative markup. It even allowed simple (but still useful) server-side applications with 3270-style forms (again, no code needed). This was wonderfully useful, reasonably safe, and most importantly it was understandable by both humans and machines.

Thanks for that! So many web devs today can't even comprehend the idea that you can have interactivity without JS. Part of the reason for what we have came from offloading work from the server. I once made an Othello game with nothing client side but an auto-reload after a timeout - everything was in a CGI script on the server.


You make a lot of good points. I agree, there's no going back. The websites we have today offer too much functionality to ever go back to the way things were. While I like the ability to turn off JavaScript on a page by page basis, and use the web mostly as a library, most people would be completely put off by a "dumb" Web.


There's some limited ways to sandbox with iframes.

https://www.w3schools.com/tags/att_iframe_sandbox.asp


This is something I was hoping would be possible in the future, but didn't realize an implementation was already in place. I'm surprised I've never heard of it.

There's one problem though. Here's how it should be implemented, in my opinion.

<iframe src="this_iframe_is_sandboxed.htm">

<iframe src="this_iframe_is_not.htm" nosandbox>

I disagree with making security opt-in.


Agreeing with you. Customer service sites are prime targets for solutions where JavaScript is used to enable training, screen sharing, user tracking and advertising, and the up-side is so compelling, especially to non-technical customer service operations managers "those who decide to do so", that they can't conceive of how this little widget can scrape the whole screen, or worse, be maliciously tipped to steal or inject information. The site owners are at fault, but also the browsers should make it easier to disable active content. We can presume that most users aren't discerning, but I'd switch to both plain text e-mail and plain-text browsing, if possible, switching to bloated sites when i can't find an alternative, and like the other comment here, using Lowes instead of Home Depot because their site is less awful. I'll be sharing the link to No Boundaries page next time i'm talking with someone who wants to add a tracking feature to a page.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: