Hacker News new | past | comments | ask | show | jobs | submit login
Blocklist Facebook domains (github.com/jmdugan)
1349 points by z0a on March 20, 2018 | hide | past | favorite | 348 comments



I highly recommend using uMatrix[1][2] if you're very privacy-conscious. It's the full-blown everything-at-your-fingertips console.

By default, it blocks third-party scripts/cookies/XHRs/frames (with an additional explicit blacklist). You then manually whitelist on a matrix which types of requests from which domains you want to allow. Your preferences are saved.

It is a bit annoying the first time you visit any new domain, because you need to go through a bootstrapping whitelist process to make it work. After a while I find I do it almost automatically though.

I use it in conjunction with uBlock Origin and Disconnect, and it still catches the vast majority of things. As a nice side-effect, I find I keep pretty up-to-date with new SAAS companies coming out!

---

[1] https://chrome.google.com/webstore/detail/umatrix/ogfcmafjal...

[2] https://addons.mozilla.org/en-US/firefox/addon/umatrix/


Any browser plugin is inferior to using a hosts file. Hosts file's blackhole any network request before even attempting to make a connection. These browser plugins only help if you're using the specific browser — they aren't going to help that electron/desktop app that's phoning home. They wont help block inline media links (Messages on a Mac pre-rendering links) that show up in your chat programs which attempt to resolve to Facebook. They also wont block any software dependency library that you install without properly checking if it's got some social media tracking engine built in.

I don't even waste time or cpu cycles with browser based blocking applications. Steven Black's[1] maintained hosts files are the best for blocking adware, malware, fakenews, gambling , porn and social media outlets.

[1] - https://github.com/StevenBlack/hosts/


That doesn't stop anyone using IP addresses directly, and I find that a small minority does (but that minority does including, e.g. Microsoft for some Win10 updates).

Depending on your threat model, you might need to go the proxy/firewall route.

Hosts file is a weird middle ground - that has to be installed and maintained on every device, many of which (e.g. iphone/ipad) won't let you do that. It's better to set up a local DNS, which will serve every local machine; and as I mentioned, doing this at a firewall level is better yet.


That only works on your local network. You always need something on the device if you want to take it anywhere.


Homelab: pi-hole, pfsense, openvpn Mobile Devices: openvpn client


I started using the brave browser, which i noticed blocks alot of network requests. Loading the NYT took about 25 seconds to finish all requests using chrome. With brave it took 4 seconds. Also brave gives you the option to pay the sites you visit with BAT tokens, if you want. Brave in conjunction with Pi-hole looks like even more secure and perhaps further page load speed.

Edit: spelling


That's mostly what I use at home too. It works very well.

It doesn't quite work well while on the road sometimes. For those cases, I have a docker running diginc/pi-hole (with some additional hosts file blocking going on), then I point my laptops DNS towards that and am good to go.


Even better, replace pfsense with PF running on OpenBSD. Pi-Hole is awesome though, can't recommend it enough.


With PfSense you can block by country or ASN and avoid huge blocklists


This is pretty much my jam at home and away. Works great.


Running your own DNS server, and intercepting DNs traffic on your routerC is better for two reasons

1) processes and machines that bypass hosts are also caught 2) a large hoss file takes time to parse, line by line, slowing every DNA lookup. A DNs server should cache the entire better.

Dnsmasq takes entries in a hosts file format.


I have scripts and compiled utilities that transform HOSTS file to tinydns format. I use nsd as well which uses BIND format. I prefer the tinydns format.

Zone files are more flexible than HOSTS files, but I still use HOSTS as well. I have never had a concern about the speed of using HOSTS. It is certainly faster than DNS.

There is a comment in this thread where someone asserts that HOSTS was never designed to handle "millions of entries".

I would be interested in reading about a user who visits "millions" of websites or otherwise needs to do lookups on "millions" of domains.

I maintain a database of every IP address I am likely to use on a repeated basis, in several formats, with kdb+ as the "master" format. I believe most users will never come close to intentionally accessing "millions" of IP addresses in their entire lifetime.FN1 I could be mistaken and it would be interesting to learn of a user who can dispel this belief.

FN1. If you think about this, it may cause you to question the necessity of DNS for such users. Or not. Times have changed since the advent of HOSTS. They have also changed since the advent of DNS. For example, using "consumer" hardware, I can fit all the major ICANN TLD zones on external storage and the entire com zone (IMO, by far the most important) in memory. This is many, many more domains than I will ever need to look up. Assuming at best I will not live much longer than 100 years, I could not and will not explore them all or even a significant fraction.


If you know which DNS names you will need to know, then yes, there's no need for more than a hosts file.

Until DNS changes.

If I move a server from one IP to another, I change DNS, and in $TTL time everyone's pointing at the new server. Apart from you with a hosts file. How does that work if everyone has a hosts file?

If I say "check out this interesting story on blahblah.com", you don't have it in your hosts file, how do you get it?

I maintain a list of every phone number I am likely to use on a repeated basis, but sometimes I need to look up a phone number I don't know (in the old days this was a phone book locally, and directory inquiries further afield. Now it's ddg and assume they have a website. Which isn't in my hosts file or dns cache, and I've never visited before)

I maintain DNS entries for my home network of a dozen devices -- I host it on my mikrotik, but it's handy to have, when I type "ssh laptop" rather than remembering if it's on .71 or .73. It's one step better than a plain text file, as there's a standard based way to remotely query it. At work I maintain a DNS server with 2000 entries on my network, which is actually hosts file powered, but again I use dnsmasq for the DNS server rather than rsyncing that hosts file to 2000 machines.


"How does that work if everyone has a hosts files?"

In your particular case, I dont know. You have to do what best suits your needs, whatever they are.

Here is how someone else solves the problem of changing IP addresses. For my needs, I actually like this method.

The entire ICANN DNS used to be bootstrapped to a small text file called "root.hints", db.cache, named.root, named.cache, or something else. As far as I know, it still is.

How does one know the IP address from which to retrieve this text file?

Maybe they have it memorized, or written down somewhere, or perhaps it is written into some DNS software default configuration. In all cases, they have this address stored locally.

No remote lookup.

What happens when the administrator of the server that publishes the text file wants to change IP addresses?

This does not happen very often, but it does happen. What do they do? Considering that the entire ICANN DNS was bootstrapped to this one file, and assuming this is truly meant to be a dynamic system, then this is arguably the most important IP address on the internet.

They notify users in advance that the IP address is going to change.

Thats it.

As a www user, of course I would have to do a DNS lookup for blahblah.com. However I do not do lookups for the server with db.cache, for the .com nameservers, and in most cases not for the nameserver for blahblah.com either, and I do not do lookups using recursive caches. If blahblah.com changes its IP address I do not have to wait for changes to propagate through the system via TTLs. I am querying the authoritative nameserver, RD bit unset. If an IP address changes from the one I have stored, I know immediately when I try to access it. (I like being aware of these changes.) If I was relying a recursive cache I would probably not notice that the IP address had changed.

IME, IP address changes happen less frequently than people writing about DNS on the web would have one believe. Hence this system works well for me. Most domainnames I encounter are keeping the same IP address for long periods.

Ideally, if blahblah.com is not changing IP addresses frequently or unexpectedly but needs to make a change, she could publish a notice somewhere on her web server informing users she will be moving to a new IP address, just like the server that serves db.cache.


Also another viable solution although slightly more complicated than adding a line to a file. A question I have is whether you're talking about running your own DNS on your machine or on a server you control?

One benefit to the hosts file is that it travels with you everywhere you go. I have my DNS configured at home, but my hosts file for when I'm at a coffee shop, on a plane, work trip, or vacation.


I work around this with an ipsec vpn to home. Dns is setup on my router with unbound, I just point to that. When at a coffee shop or any untrusted network, I vpn into home.


He refers to a setup where egress DNS traffic is routed to a local DNS server. Thus, regardless of machine configuration, all machines use the local DNS server.


A convenient GUI app for managing the hosts file on macOS machines is Gas Mask, btw. You can have a local hosts file to block your pet peeves, then subscribe to a remote hosts file (such as the one linked above), and activate the combined hosts file.

https://github.com/2ndalpha/gasmask/releases


I personally find this one better: SwitchHosts https://github.com/oldj/SwitchHosts


Windows 8-10 users that use Windows Defender will notice that some hosts file entries will be ignored (like popular domains like facebook.com). You will also need to add an exception to the hosts file in Windows Defender.


I wonder what their rational for this is. I know in the past malware have modified the hosts file to block malware removal tool domains but why ignore entries for Facebook?

I heard that blackholing requests to Microsoft telemetry URLs also has no effect. Any way of finding the unlockable list I wonder.


> why ignore entries for Facebook?

Malware would redirect facebook.com to some scam site probably.

Given how popular FB is, Microsoft decided to "fix" this.

(This is all a hypothetical, I don't actually know this for sure.)


I mean, considering the use of 99.9% of Windows PCs out there it probably make sense.


Windows 8-10 users will also notice that host file entries for any Microsoft or Bing entries will be ignored.

I guess they hardware the IPs into Windows.


> I guess they hardware the IPs into Windows.

More likely they just bypass looking at the local hosts file for such names, so the request always goes out to your DNS servers.

Therefore blocking these names by redirecting to 127.0.0.1 will work if done at your DNS server (for instance if you run an instance of https://pi-hole.net/ for that).

Unless of course they make the lookup use specific name servers that they run, instead of the local resolvers that your machine is configured to look at, for those names but that is less likely.


In that last case, you can often redirect those queries if they are standard DNS requests on your router. That's how my local network is configured -- all DNS requests are sent to my Pi-hole instance, except those coming from the Pi-hole itself. Even something like:

    nslookup google-analytics.com 8.8.8.8
will return a local IP:

    Server:  google-public-dns-a.google.com
    Address:  8.8.8.8

    Name:    google-analytics.com
    Addresses:  192.168.1.2


Any browser plugin is inferior to using a hosts file.

This is objectively false. A hosts file blocklist cannot:

- Global blacklist and whitelist select URLs

- Block all sub-domains of a given URL

- Block third-party iframes and scripts (with whitelist)

A hosts file's feature set is not a superset of a browser based solution and the two complement each other.


DNSMasq can and does accomplish the first two items on your list, largely in hosts-file format.

Mind, I use both hosts and uMatrix. Each has its place.


Many users can't modify HOSTS on work computers, etc but can install an extension.

Hosts file is a real pain to disable/enable, at least quite a bit worse than clicking one button in your browser.

I do something very similar to hosts file (at router level rather than os), but there are a few drawbacks over an extension.


On top of that, on a non-jailbroken/rooted device, a user cannote replace/overwrite the hosts file.

On my jailbroken iphone(s) one of my first steps of security hardening is to replace the hosts file.


See my comment on TAForObvReasons - If you invest in a Safari Content Clocker that allows custom rules, you can effectively do the same thing as a hosts file. Safari Content Blockers prevent network traffic similar to host files works. The only thing you wont be able to block is some app that has a Facebook analytics or Facebook login dependency inside e.g. Spotify.


what is the recommended iOS way of blocking the facebook domains?


This works great for keeping spam off your devices -- off your local network at any rate. Not possible to modify iOS hosts file without jailbreaking it.

* Pi-hole®: A black hole for Internet advertisements – curl -sSL https://install.pi-hole.net | bash || https://pi-hole.net/


On an iPhone without jailbreak you can use 1blocker[1]. Since every browser on the iPhone is basically a UIWebView/SFSafariViewController controlled by iOS, Safari Content Blockers[2] apply globally preventing web visits. Safari Content Blockers also prevent Messages from rendering Facebook content inline.

My 1blocker rule called "Bye Facebook" is:

  https?://([a-z\.]*)?facebook\.com.*
I should probably update it to factor in a lot of these other TLD URLs now that I think about it.

- [1] https://1blocker.com (This costs money)

- [2] https://developer.apple.com/library/content/documentation/Ex...


> Since every browser on the iPhone is basically a UIWebView/SFSafariViewController controlled by iOS

Most browsers rely on WKWebView, which uses the Safari Nitro JavaScript engine but allows customization of the user interface as well. UIWebView is pretty much legacy at this point, and SFSafariViewController does not allow any customization beyond basic theming.

> Safari Content Blockers apply globally preventing web visits

Unfortunately, this is not true. They only work in Safari and Safari view controllers.

> Safari Content Blockers also prevent Messages from rendering Facebook content inline

Are you sure about this? As far as I know, Messages uses a web view.


Actually I just double checked and I was incorrect. I just remembered that I have another app on my phone called AdBlock[1] which is responsible for blocking requests at the network layer. They run their own DNS server and create a custom list to black hole network requests that match certain formats. If you add Facebook as a custom rule to AdBlock, that will prevent Messages from pre-rendering content and also block any messages to facebook from any service on your phone as long as you're connected to their VPN.

Sorry about the confusion... I'm really doing a lot to keep myself off of Facebooks radar.

- [1] https://itunes.apple.com/us/app/adblock/id691121579?mt=8


I do not believe this is true. In messages the previews need an approval the first time some domain should load a preview and then this setting is stored. AFAIK there is no way to recall the permission though.


> they aren't going to help that electron/desktop app that's phoning home.

What's your threat model? Mine is third-party tracking cookies, and desktop apps don't share my browser's cookie jar. So while technically I can be tracked by IP from a desktop app, Facebook can't tell if it's me or someone else at the same coffee shop.

In particular, one nice thing about Chrome extensions is that they don't apply to incognito windows. I regularly use HTTPS Everywhere in block-all-HTTP-requests mode + an incognito window on wifi connections I don't trust, because the incognito window will permit plaintext requests, but it doesn't read my cookies or write to my cache, so it's sandboxed from my actual web browsing. I can safely read some random website that doesn't support HTTPS with my only concern being my own eyes reading a compromised page; none of my logged-in sessions are at risk.

> any software dependency library that you install without properly checking if it's got some social media tracking engine built in.

... is this a thing? (I totally believe that it's becoming a thing, I just haven't seen it yet and am morbidly curious.)


> "Facebook can't tell if it's me or someone else at the same coffee shop."

Eventually, they will tie your various devices to you.

These a chapter / section on this (and FB) in Chaos Monkeys.

https://www.antoniogarciamartinez.com/chaos-monkeys/

That book was published 2+ yrs ago. I can only assume the technology is more thorough and sophisticated now.

p.s. see also Dragnet Nation

http://juliaangwin.com/dragnet-nation-available-now/


Browser fingerprinting is an easy path toward a “stronger than ip” correlation. [1] is an interesting starting point.

1: https://panopticlick.eff.org


That works only with JavaScript active which uMatrix blocks for 3rd party. The sites one visits mainly are not known for 1st party fingerprinting (that's mainly done by the ad networks). The extra paranoids (like me) can also block JS for certain 1st party sites.

I use uMatrix only experimentally (I rely on NoScript) but it offers a fascinating flexibility of control if one is in the mood. As well, NoScript is near useless when doing stuff with AWS where uMatrix offers the right flexibility (allow from site Y, but only when fetched from site X).


Derp, I missed the obvious. Thanks.

I had heard of uMatrix but didn't realize it had that functionality, which is pretty cool! Thanks for sharing!


While I acknowledge that your use case may be confined to browsing the internet, I still don't see what prevents a desktop app from reading your cookie jar.

Edit: your browser history (which may contain your profile URI) might be pretty out in the open, too.


Oh, yes, none of it is sandboxed from an actively malicious app—but an actively malicious app can just ignore your hosts file, too.

My threat model is a developer who includes a standard tracking snippet from a third party but is not going out of their way to reliably violate my privacy at all costs (because they have other features to ship, and the tracking snippet works on most computers). If your threat model includes actively malicious developers, stop running native apps from them at all.


>> stop running native apps from them at all.

I would dearly love to, if all OSes came with a permission system other than just "run in admin mode/sudo".


>don't apply to incognito windows

you can enable extension to run in incognito mode in settings


Just browsing through the "fake news" section that hosts file is ridiculous. There's a tremendous number of completely legitimate news sources that are blocked and many that, while lurid, are not in any sense "fake news." The list includes both liberal and conservative legitimate news sites.


uBlock has much the same effect. Requests err in the console with "ERR_BLOCKED_BY_CLIENT" instead of a 404, so technically it is possible to detect this with a first-party JavaScript file.


That's how the "please don't use an ad-blocker" stuff works.


What if the ad networks start using IPs directly instead of domains?


Block all accesses to non-local (192.168, 10., etc) IP-based URLs. Will break some legit stuff, but, not that much for most users, and they can just whitelist that as it comes up.

IP whitelists could also be aggregated and shared on github similar to the current DNS blacklists.


How can I do that? (osx or linux)


I don't know of any way currently. :-\ You'd probably have to write a browser plugin or get one of the existing ones like uBlock to implement it.


If you're using uBlock Origin or uMatrix to block third party scripts it won't matter. That's what makes using the dynamic filtering in uBlock Origin so powerful. Easily the best bang for your buck ad blocking that you can do.


if this form of blocking becomes popular, they will.


And together with the tool gasmask on Mac OS the list is a perfect tool to really block FB.


The trade-off is that hosts file blocks more processes, but the browser plugin updates more frequently.


apk, is that you?


LOL who is that? — I don't think I'm apk.


A troll on Slashdot who commonly advertises a HOSTS-based block list, claiming it superior to practically every security solution ever.


Oh now I have to look this up. Whomever that person is sounds quite entertaining.


This is downright silly.

HOSTS files are static. They were never designed for blocking ads or tracking. And for all we know, every connection does a linear search through the HOSTS file so the larger it gets, the more wasted time, because it was never designed to have millions of entries.


To add to this, I've seen that stupid hosts file blacklist from SpyBot cause some Windows network service get locked up for 40 seconds every time the laptop was resumed from suspend or booted up in Windows 7. Parsing the hosts file took a relatively extreme amount of time for exactly this reason, massive hosts files are a kludge at best.


I swear by uBlock/uMatrix, but it's amazing how much of the web it breaks and how little of the content of some sites are hosted by the site itself. The web has become very reliant on CDN's.

My public broadcaster (http://www.abc.net.au/news/) for instance is completely reliant on third parties for it's "live story" functionality. It loses half it's functionality at work where twitter is blocked and uBlock kills the other half. It also kills the live stream when it can't load one of the half a dozen trackers on the page.

I'd love the no-script functionality to be merged in too so that I could turn off javascript by default.


You can turn on disconnect’s blocklists in ublock origin rather than run both. ublock origin comes with quite a few lists, but most of them are turned off by default.


I recommend uBlock Origin in medium mode (check "I'm an advanced user" in the settings, and read this guide: https://github.com/gorhill/uBlock/wiki/Blocking-mode:-medium...

I default deny all 3rd party scripts and frames, in addition to the blocklists, and I only sparingly noop relevant domains, the bare minimum to make pages work, on a page-by-page basis.

On top of that, I have Privacy Badger, Cookie Autodelete Decentraleyes and I've turned on first-party isolation.

It's mostly unobtrusive once my most important websites have been properly noop'ed, and it's relatively simple to add temporary exceptions if needed.


I do the same thing with uBlock Origin with the "advanced user" setting enabled.


I'd prefer a solution that does not just work for a specific browser, but instead blocks all traffic regardless of browser, application, virtual machine, ...

That's just putting rules into /etc/hosts ?

edit - answered my own question :) Yes it will.


Consider something like Pi Hole (https://pi-hole.net/) as the DNS on your network. Where it will affect all devices on the network.


My problem with Pi Hole is that I'd rather have it return NXDOMAIN, instead of redirecting to some other IP.


If you want NXDOMAIN on Pi-Hole, upvote this feature request: https://discourse.pi-hole.net/t/implement-response-zone-poli...


This will only protect you while your on your own network. A lot of the juciest data is about your public location, for that you need something device/browser specific.


There's nothing (except possibly your ISP) stoping you from opening your firewall and using it remotely. I personally run dnsmasq (manually configured, but otherwise similar to pihole) on a VPS.


> There's nothing (except possibly your ISP) stopping you from opening your firewall and using it remotely.

My ISP won't but there are ways around that. The biggest problem I've faced is on the modem side of things, finding something I'd trust to be open to the internet, ideally something I can install openWRT or similar on and something I know will work in my market. It's an options minefield.


Just run a local caching resolver, on Linux that is super easy and uses little resources.


I've got a RaspberryPi Zero (WiFi via USB..ugh). Would that be too slow for DNS, or would having my DNS server be local vs remote negate that slow interface?


It's running fine for me on a Pi Zero W. There's honestly like no slowdown at all


I don't have a Pi0, but I've had no problems running it on a single-core CHIP.


Doesn't support OS X though.


How wouldn't it support OSX. You just set it up as the device that provides a mac with it's network. It doesn't have anything to do with OS's


wait do you mean you want to run it on OSX, or that the DNS in OSX is somehow different, and wont work with pihole as its server?


The idea is you use some inexpensive hardware (like a Raspberry Pi) and simple set all your devices to use it as a DNS server.


I use Little Snitch[1] (and its sibling Micro Snitch[2]) for filtering connections at the system level. I don't interact with it too often though, because I rarely install new apps.

Not to say /etc/hosts doesn't work, these days I just find I prefer things with better UX.

---

[1] https://www.obdev.at/products/littlesnitch/index.html

[2] https://www.obdev.at/products/microsnitch/index.html


Using Little Snitch to block all Facebook connections is like using goat to land on moon...


That made me laugh out a boogie.


To clarify, I whitelist my browser entirely in Little Snitch and delegate to uMatrix and other extensions.

I also don't pre-emptively load in rules into Little Snitch - I have it running in active/interrupt mode, so it prompts me whenever it tries to make a new connection I haven't signed off on before. Unsurprisingly, not very many apps try to connect to Facebook.


What a funny example, why is that?


Because it is completely impractical. I used LS but it's a waste of time to check and block ads servers or malicious domains, which is why most garbage should be blocked from hosts or dnsmasq.


The maintenance aspect of LS is definitely on the high side and only really dedicated folks will stick to it; if it were to come with auto-updated maintained lists it would most likely be used more


Little Snitch is for MacOS. As a linux user I desperately looked for an equivalent and found none. Douane was suggested. It's no good. What a sorry state of affair. We need a simple app-level filtering solution.


There's OpenSnitch[1], though it hasn't been touched in a while. Someone needs to step up and maintain it (maybe I should do that...).

[1]: https://github.com/evilsocket/opensnitch


Same story. I have always been dreaming of a Linux equivalent for LittleSnitch. More than a decade has passes since I've switched to Linux, still nothing...


Even better would be doing it on a device. It's a reason to have an intelligent router on your network where you run a custom dnsmasq or whatever, then you cover your phones and all the hootenanny that comes with a digital life. Like your fridge.


Does /etc/hosts support wildcarding subdomains?


No. It supports no wildcards at all.


Is there a way to export the list for a /etc/hosts.allow?


/etc/hosts won't work for your VM, probably


It will work if you set the /etc/hosts inside the VM.


Aye, that it will


I see no reason why not actually.


Your VM uses its own network stack and handles its own DNS resolution. /etc/hosts isn't a firewall, it's a zone file.


I have uBlock Origin and Privacy Badger. Would uMatrix replace those?


If you want uMatrix like features in uBlock Origin, go to settings and just click "I am an advanced user".

The dynamic filtering matrix should be available after that.

Granted the defaults are not as strict as uMatrix, it's a good middle ground.


Just enabling dynamic filtering doesn't do anything as far as blocking is concerned. You'll still have to set up how you want to handle it.

I would recommend "Medium mode"[1] as defined in the uBlock wiki.

[1] https://github.com/gorhill/uBlock/wiki/Blocking-mode:-medium...


Great, thanks!


Same here. I have zero desire to do any sort of manual configuration any time I visit a new website. Blocking third-party cookies will eliminate like 90% of tracking and uBlock origin and Privacy Badger handle a significant percentage of the rest.


You can also turn on Tracking Protection in Firefox:

https://support.mozilla.org/en-US/kb/tracking-protection

It runs after any add-ons. It's a useful second step that can catch things add-ons miss.


Using uMatrix together with uBlock Origin is a little bit redundant, as uBlock also offers a matrix functionality (enable advanced options). As a matter of fact, iirc uBlock is developed upon uMatrix's codebase by the same author.


I tried uMatrix several times and quit. It's crazy heavy. It takes forever to whitelist parts of a page until functional. Scripts then XHR then more cascaded scripts then frames..


A hidden beauty of uMatrix is that with a little training, it doubles as a reminder against low-value search results -- content scrapers, marketers you've already decided never to buy from again. Just red-out the first-party web site. Search engines that can't keep up with the SEO-optimizers e.g. Duckduckgo become more usable.


I second this. uMatrix is my go-to add-on to prevent the vast majority of connections I do not want. However, I pretty much always use my browser in privacy mode no matter what (for several reasons), so the bootstrapping process never ends for me, but that's a price I am more than willing to pay.


Just out of curiosity, ehst benefits do you get from privacy mode over several profiles?


I don't know enough about how different profiles operate to say with any certainty, unfortunately. So it's possible there is no benefit. However, for mobile there does appear to be at least one benefit. When I recently checked to see what data Google had on me using their free tool, I saw that the only web history they had associated with my device was all from URLs visiting while not in privacy mode. These were all links that I clicked from text messages which will always default to open in the regular, non-private tabs, even if you only have incognito tabs open. That right there is enough for me to always and forever use nothing but privacy mode. For the PC, this is a non-issue as I never log into my browser. Although I do use privacy mode there near 100% of the time as well.


> As a nice side-effect, I find I keep pretty up-to-date with new SAAS companies coming out!

What do you mean?


I'm continuing to have issues with frames not displaying unless I completely disable the addon. It's extremely frustrating. I'm in advanced mode and all options are as they should be.


How can you recommend these extensions without mentioning NoScript?


Every time this is brought up, in this context, people say the other things do what NoScript does.

And, btw, I use Privacy Badger instead of Disconnect.


uMatrix supercedes NoScript by a wide margin.


Not at all. NoScript does way more than just blocking JavaScript by domain.


what features does it offer over umatrix?


One minor feature I miss from noscript is that (unless I've missed a setting) umatrix can't block site scripts but allow bookmarklets. Though with the new extension API, I don't know if it's possible at all or not.


Look up ABE, for example, or just look through the Options, especially under Advanced.


Yet again, software freedom fighters got there years ago.

Free Software Foundation got there earlier. From publishing https://www.fsf.org/facebook published on on Dec 20, 2010. FSF & GNU Project founder Richard Stallman has been rightly objecting to Facebook for years in his talks and on his personal website at https://stallman.org/facebook.html.

Long-time former FSF lawyer Eben Moglen rightly called Facebook "a monstrous surveillance engine" and pointed out the ugliness of Facebook's endless surveillance (at length in http://snowdenandthefuture.info/PartIII.html but in other places in the same lecture series as well). See http://snowdenandthefuture.info/ for the entire series of talks.


Yes, but where do they offer solutions to transition (emphasis on transition) from what people currently use to a more open ecosystem?

At least in the software licensing arena, having personally visited a lecture from Stallman, I was left with the impression that he wasn't offering a solution, just a vision of a Utopia without any guidance on how to transition to it -- more specifically, how would we make money from open source software, when currently proprietary software is the default for making money.


> how would we make money from open source software

There are many existing examples, so this is clearly a solved problem already. You charge for support, or for feature requests, and so on. That's how SUSE and RedHat make their money.

The flaw with looking at proprietary software's monitisation is that it usually just boils down to "pay for the binary". This obviously won't work with free software, you need to charge for development rather than access (though you can also use a seat-based model where you only provide support for machines that have valid licenses).

(I work for SUSE.)


SUSE or RedHat are excellent examples when it comes to monetizing OSS for large businesses. What about consumer software? I do not think OSS will survive monetization when it comes to dealing with individual consumers.

And then there's the question of SaaS. OSS exists but a lot of high quality alternatives are paid. I don't think services like Todoist, Pocket, Evernote etc would exists on the open source model you described.


This is only half the story. Several business models around open source are working and proven (e.g. hardware vendors writing kernel drivers) but there is also a lot of unpaid volunteer work that everyone seems to take for granted. GIMP is one example. And probably a large number of the packages on NPM. Or think about OpenSSL, everyone was just assuming it must be well-funded, while nobody was actually funding it.


That definately is one solution, but perhaps made possible by restricting distribution of software in the first place to get a leg up (see comments regarding improvements and distribution restrictions regarding installer scripts in SLS http://www.linuxjournal.com/article/2750). This suggests that it isn't a solved problem, because the initial conversion to an open source model (or free software) with support on the side may have required a different model to start the venture.

None the less, it's admirable, and hopefully a net benefit for everyone.

My point was more about Stallman and co calling foul with regards to software freedom, codifying their own ideal, but not giving directions to reach that ideal. This feels like a safe pulpit to sit upon, where their view isn't falsifiable, useful when they want to say "I told you so", and eventually taking all credit for everyone else's efforts in between to make the end goal possible.


> perhaps made possible by restricting distribution of software in the first place to get a leg up

This is incorrect, you can download the full ISOs for SLE from the SUSE website, with 30 days worth of updates. The source code (and the system required to build it) is all publicly available on https://buid.opensuse.org/. I beleive RedHat have something similar.

I'm also not sure that an interview from 1994 with the creator of Slackware is a good indicator of the current state of distribution business models. Though even in 1994, both RedHat and SUSE were selling enterprise distributions.


Since the past determines the future, the relevant part of the 1994 interview is "... Instead, he claimed distribution rights on the Slackware install scripts since they were derived from ones included in SLS...", which as I understand it, is the restriction of distribution I was referring to (perhaps redistribution is more accurate).

This suggests that the business model benefited from restricting redistribution and modification of the source code, so breaks the assumptions that the business model was purely based on making money from open source, and so doesn't fully support the idea that proprietary software is unnecessary, in the case where we take SUSE as an example of saying it is "already solved".


Given the bang up job we've done so far, I'm not sure people making money off of software has been a net positive for humanity. Maybe it's the idea that money is the only way we can get software that is the problem.

I remember when the internet was mostly people's personal websites that they didn't make money off of, and frankly, the internet was better then. The best websites that exist now started in that era.


Isn't that what the "tragedy of the commons" describes, with the internet being diluted or poluted by content that doesn't add much value?

I think you can probably find a suitable subset of the internet and it still feels like the old days, but then you have to be happy with a much smaller community.

And fair point to regard money from software as a potential net negative, and I just don't have an answer that is objective. There is a lot of software that highlights the creativity of people, and I like it, and am happy to pay for it, and also happy to get it for free. Like paying for books or borrowing them from the library.


> I think you can probably find a suitable subset of the internet and it still feels like the old days, but then you have to be happy with a much smaller community.

I think most people would be happier with a smaller community. Facebook encourages a large number of low-quality connections, which are actually worse than not being connected at all: I don't want or need to interact with my friend's racist cousin I met at a party three years ago or my ex-roommate's mom who always wants to sell me homeopathy supplies. These people are actively detracting from your life.

Those are extreme examples, but even people I might get along with suck up your time. If I am not connected emotionally/socially with someone enough to get their phone number and send them a text occasionally, I probably don't need to give them even a few seconds of my time on a regular basis.

> And fair point to regard money from software as a potential net negative, and I just don't have an answer that is objective. There is a lot of software that highlights the creativity of people, and I like it, and am happy to pay for it, and also happy to get it for free. Like paying for books or borrowing them from the library.

I'd be happy to pay for good internet too, but unfortunately there are few businesses willing to do this. Ad sellers won the race to the bottom on price (a strange game--the only winning move is not to play) by simply being "free" to users. This works because of short-term thinking: users don't think ahead to how ads will affect their lives, and content providers don't plan to grow a business slowly the way a for-pay business grows.


Great response. Thanks!


> I was left with the impression that he wasn't offering a solution, just a vision of a Utopia without any guidance on how to transition to it

Stallman quit his job to write an entire free software operating system and essentially dedicated his entire life to it. What more do you want?! I don't even want to imagine a world without Richard Stallman.


That doesn't really offer a solution for those of us that need to get paid. The choice he may have made to quit his job and write free software is anecdotal and simply won't generalise to everyone.

Stallman judges those that write proprietary software, calling this type of software immoral, and yet doesn't offer guidance to get from the current situation to a better situation. Without realistic and generalisable guidance, it is simply self righteousness on his part, the same as any extreme idealist.


> That doesn't really offer a solution for those of us that need to get paid.

If you have anything at all to do with software you're "getting paid" by using the wealth of free software that makes up GNU Linux, OS X, etc. Free software constitutes a powerful non-scarcity-based model that is extremely good for productivity. Just because he doesn't kowtow to standard capitalist race-to-the-bottom models doesn't mean he's not "realistic".


Why should he come up with a solution to help you get paid? If you're making a living unethically it's not up to me to figure out a way for you to do it ethically. In any case, there are plenty of other people who have figured out how to do that and there isn't the slightest reason to believe that all software being free means nobody would get paid to write it.


He owes me nothing but he's not offering credible proof that his way is applicable to anyone but himself.

And as for calling something immoral which is the result of someones hard work and doesn't impact them if they choose not to use it, simply sounds like sour grapes and self righteousness.


All I can say is I'm glad we live in a world where some people do care about things which don't directly affect them.


No reason to disagree with that.


>more specifically, how would we make money from open source software,

See redhat.


> how would we make money from open source software

https://en.wikipedia.org/wiki/Business_models_for_open-sourc...


> do they offer solutions to transition (emphasis on transition) from what people currently use to a more open ecosystem?

A comment I made two weeks ago that is pertinent to this discussion:

Niche market software, used by a limited number of highly specialized professionals, is somewhat incompatible with the open source economic model. When a piece of software is used by very many users, and there is a strong overlap with coders or companies capable of coding, say an operating system or a a web server, open sources shines: there is adequate development investment by the power-users, in their regular course of using and adapting the software, that can be redistributed to regular users for free in an open, functional package. At the other end of the spectrum, when the target audience is comprised of a small number of professionals that don't code, for example advanced graphic or music editors or an engineering toolbox, open source struggles to keep up with proprietary because the economic model is less adequate: each professional would gladly pay, say, $200 each to cover the development costs for a fantastic product they could use forever, but there is a prisoner dilema that your personal $200 donation does not make others pay and does not directly improve your experience. Because the userbase is small and non-software oriented, the occasional contributions from outside are rare, so the project is largely driven by the core authors who lack the resources to compete with proprietary software that can charge $200 per seat. And once the proprietary software becomes entrenched, there is a strong tendency for monopolistic behavior (Adobe) because of the large moat and no opportunity to fork, so people will be asked to pay $1000 per seat every year by the market leader simply because it can.

A solution I'm brainstorming could be a hybrid commercial & open source license with a limited, 5 year period where the software, provided with full source, is commercial and not free to copy (for these markets DRM is not necessary, license terms are enough to dissuade most professionals from making or using a rogue compile with license key verification disabled).

After the 5 year period, the software reverts to an open source hybrid, and anyone can fork it as open source, or publish a commercial derivative with the same time-limited protection. The company developing the software gets a chance to cover it's initial investment and must continue to invest in it to warant the price for the latest non-free release, or somebody else might release another free or cheap derivative starting from a 5-year old release. So the market leader could periodically change and people would only pay to use the most advanced and inovative branch, ensuring that development investment is paid for and then redistributed to everybody else.


Nice one! There are probably caveats to the 5 year timing, or whatever mechanism is chosen, but at least its an idea to iterate on and try out.

Thanks for such a well thought out response.


I wonder how Facebook devs feel when they read such posts. Do they feel rejected ? shameful ? Does their salary really outweigh this collective disapproval of their peers ?


I actually just got a job there out of school, starting in a few months. Reading these comments is certainly interesting although it's not news to me that Hacker News hates Facebook.

I've long been skeptical of the effects of social media though, and I'm taking this job mostly just because doing otherwise seems like a really poor career choice. Plus it seems like Facebook is here to stay, and I can dream of helping to fix the problem instead of just enabling it.


Good attitude (:

EDIT: Is HN's Facebook hate getting so heated I'm getting down-voted for sending some good vibes to a newgrad about to start her/his first job?

I bet you all took you first job at Doctors Without Borders helping children in Angola. FFS the Waltons are the scourge of this world but I don't blame the kids going off to work at Walmart. I bet a lot of you pay taxes in the US too-- those taxes financed the war in Afghanistan but you didn't move to Morocco, did you?

Give me a break. Let this kid come in with a good attitude, eyes open, loud and proud. Who knows maybe he'll turn some heads. The guy signed and it is a good career move, what's wrong with cheering the guy up. Disappointed at you HN.


Actually, it is down to you to make a change, and do ethical things. There are ways to influence things which you mentioned from the war in Afghanistan to privacy issues with Facebook. But to do that, you have to care.


I agree (:


> I can dream of helping to fix the problem

That's about all though, isn't it? There's a negligible chance you'll actually fix the problem, unless you manage to leak evidence to the media or similar.

I write software for biologists, am I feel I'm much, much happier doing that than I would be working at Facebook.


You are enabling it. You know that and it will gnaw at your soul and make you deeply unhappy as long as you work there.


Oh yes. He will just be crying into piles of money. Get real, do you think people actually care that much? You are making huge assumptions.


I spilled OJ at myself when I read his post. I mean its great he is so gullible to believe he can change something at such big corp, and he reminds me when I was 16 with big head of dreams how to change the world.

But seriously - do we know any single example of an intern coming to a big corp and "saving it" - by that I mean steering it off the dark and deceiving waters and actually bringing it into light for the good of society and people in general??


Didn't Snowden have intentions to leak NSA documents right from the start..?


Getting a job offer at Facebook is a great achievement. Some people I know just moved to US to pursue masters and then apply at Facebook. If you have offers from another tech giants like Google than you can do your own analysis (SWOT maybe) to choose the right options. Some years down the line you can always switch to any company in the world.


I worked for a less-than stellar online publication in Australia. Think low-budget Daily Mail. I didnt care, I still got paid and got to switch off and do my own thing when I wanted to. I'm not my job.


“I’m not my job”

You are when your job requires you to sidestep your morals.


Sure. Okay then.

That doesn't bother me. I still go to the pub after work with friends and have a good time.


Must be nice to live free of consequence.


Can you explain your reasoning for claiming that working at a place that publishes low-quality news is immoral?


Examine the present embroglio over Facebook, Cambridge Analytica, weaponised viral clickbait, fake news, radicalisation, etc., etc.

Media are the information sourcing and feedback loop for societies. The print media went through its crisis of awareness in the early 20th century. See especially Lippmann's Public Policy.


I heard once that if there is something - whether of a monetary value or not - that allows you to "sidestep your morals", then you didn't have any morals in the first place.


Then I'd argue that I don't have any morals in the first place.

There are differing levels of how strongly I feel about certain moral values I hold. For example, working for a company that dealt in wholesale killing of others is obviously worse than working for an advertising network. Would I work for doubleclick for $1M a year? Hell yeah. Would I spy on citizens of my country for the same amount? No.

Does that make me not have morals? I don't know.


That is what I suspect but still.. Respect in the eye of peers is a human need. If I met a developper and learned he works for FB, it would be visible on my face that I feel somewhat put-off.


A lot of people have more important things to worry about than the respect of their peers.


>Respect in the eye of peers is a human need

'Peer' is very flexible. This could be a comparison to people the same age in other careers.

Also, keep in mind that Facebook engineers are constantly surrounded by other Facebook engineers so their SE peers probably do approve. They collectively don't think Facebook is a problem so they implicitly approve of each other.


I worked for eHow. We felt the same as the people complaining about it. But it paid the bills. Did get to learn how to scale REALLY fast.


I don't use FB and definitely do not like their business model, but it's also true that their users are there voluntarily and most of them are at least vaguely aware that they're being aggressively data harvested by an amoral, profit-maximizing corporation. So I don't see a legit reason to blame or hate on the devs there. Especially when you consider that most of them work on the "good" rather than the "evil" part of their stack.


Except for the folks who didn't make an account whose friends have contributed to "shadow profiles" for them...

I mean, let's just admit that you don't have to be a Facebook user and you don't have to sign a Facebook TOS for them to accumulate data about you, so it's not quite as cut-and-dry as you make it out to be.

As far as the "good" and "evil" parts of the stack... fair point. I think most devs are somewhat abstracted away from the collectively malicious vision, since most of the constituent parts are relatively benign on their own -- "let's identify faces in photos!", "let's automatically identify faces in photos", etc. It's product folks, or maybe even higher up than that, who connect the powerful pieces produced by devs to actually make Facebook the monster it is today. I'd guess that even the devs who have impact on that vision don't really have the power to dramatically sway that vision, they've got a bit of technical input at best.

Still, have you ever worked on a product you don't believe in? If you're just cashing in a check, I guess it could work, but if you're as idealistic as me you want to work on something that's doing good in the world. Especially when so many tech companies proclaim their intent to "make the world a better place" or "do cool things that matter."


> their users are there voluntarily

Not really. Many people just feel they have to use Facebook to connect with the society efficiently. Some people even consider those who don't use Facebook weird.


That's their problem. I never use FB and it does not hurt one bit. People have to be responsible for their own choices. No one is forced to use FB. Their weak will is not my problem.


The network effect is strong but it does not make FB use necessary or involuntary. You will not be put in a cage, fined, beaten, fired, etc, for not using FB.


Yep, but you might be left out of many events and slowly become a social outcast amongst your friends. As a first-hand non-facebook user, this aspect really blows.


Would you say the same about gambling software? What about software related to selling heroin?

All of these products are designed to be as addictive as possible (to varying degrees). The whole point of of an addiction is that your are there voluntarily. (Not saying facebook is as bad as the things above, just that they are all designed to addict.)


They certainly do make it as addictive as possible, but IMO that still does not mean users are there involuntarily. Persuasion is not the same as coercion.


Doesn't FB keep track of (possibly profile) people who don't directly use FB? They can possibly profile you from using data brokers and information actual FB users give to them.


That's a fair point but I would put much of the blame for that on the US government for not having proper EU-like privacy laws which make it illegal, or at least mandate an opt-out mechanism.


Or developers and companies could self regulate, because they care about other people.

Or we could have a legal system where everything is forbidden except for those things explicitly allowed.


In capitalism it's mostly the customer's responsibility to switch when a company becomes too-evil in some way. Depending on your politics you could, perhaps, argue with some truth that network effects make that extremely difficult for social media sites, so the only solution is to have the government extensively regulate FB, twitter, etc, but I am not entirely convinced. IIRC FB engagement with teenagers and early-20s folks is already declining noticably in the US (though there probably are multiple reasons for that beyond just privacy concerns).


Usually a replacement comes along to make it easier.

Slack vs irc

Facebook vs myspace vs geocities


> Or developers and companies could self regulate, because they care about other people.

I hope this is sarcasm.


Not really but probably should have been


Does their salary really outweigh this collective disapproval of their peers?

Isn't that the going ethos currently?

Something akin to: Pay me as much as possible, don't ask me to be part of your culture, don't ask me to work more than 40 hours a week, don't ask me to take stock, don't have a mission statement, make sure I'm working on something that is engaging mentally.


Depends on what you mean by “going ethos.” Healthy perspective on an employment relationship from your perspective, yes. Anathema to a large number of Silicon Valley startups where you will likely get branded as neither a culture fit nor a team player, as well, yes.

It’s wise in the Valley to hold such an opinion but not make it very prominently known. There are a number of people who want your job and will say something more palatable to your employer, and eat your free meal while shipping a feature at 10pm because they buy into the posters on the wall. Many allegations of ageism (but not all) can probably trace back to something like this, in my uninformed gut opinion, because you will almost certainly get replaced by a new grad when the hammer falls. I don’t even see it as personal, but a demonstration of incentives: they can get a loud 40 from you or a quiet 80 from a newly minted BSc. QED.

Just decline invites and push back on over 40. Expressing that attitude at a typical company within this audience basically paints “please lay me off” on your back. When you’re getting into post-senior titling, or you’re really specialized in a tough req to fill, is when that approach becomes more feasible.


I met some of them. They either don't think about it, or they actively revel in not being one of the proles that aren't in on it. The ones that stress about it eventually leave.


I worked for an AV vendor for years. The online hate it gets is huge and constant (mostly wrong, but some points are valid). And even though I don't like the monetization strategies for example, I believe the product itself was OK, and most importantly helpful to many.

As a Facebook user I obviously don't like what they do with the data, but at the same time I think they provide an OK service that is beneficial to many. I wouldn't mind working as a developer there on what I imagine is overwhelming majority of positions.


I think most of their peers are mature enough to realize that the poor (historic) management decisions of a company (which are now catching up to said company) are not reflective of the personality or character of the hundreds of thousands of employees performing various roles at said company.


They’re probably brushing up their linkedin pages


Pi-Hole [1] is another nice way to filter domains at the DNS level network wide, if you want a wider reaching solution that supports wildcards. Great way to use an extra Pi if you have one sitting around.

---

[1] https://pi-hole.net/


If your router is running pfSense, pfblockerng is the equivalent of pi-hole. You can put the same blocklists into it, even. Though the Steven Black combined list is usually enough on its own.


Sadly, Pi-Hole is not integrated into Debian. I feel uneasy running software not from the Debian repository. I hope Pi-Hole will be packaged soon.


More bare-bones than PiHole (no white/black list), simpler too: https://gitlab.com/moviuro/moviuro.bin/blob/master/lie-to-me


Debian has had its share of fuck-ups in its package management system. There's very little difference between blindly trusting debian, vs blindly trusting pihole. Don't pretend you check out the contents of all the packages you use.


That is unnecessarily aggressive. Also, besides the snarkyness, it is a bad argument. What you write seems to be an appeal to hypocrisy. To see that, the following analogon might be helpful: One doesn't need to personally comprehend every decision in a democracy to have more trust in it than in a dictatorship, and one can say that without living in a democracy.


Looks like this is already covered by the "Social" add-on to StevenBlack's hosts:

https://github.com/StevenBlack/hosts/blob/master/extensions/...


Steven Black's list is better. More complete and also has hosts for other social outlets, ad networks and trackers to block.

https://github.com/StevenBlack/hosts/


Is there a way to redirect to a local HTML file for any blacklisted host file addresses? Something like "You tried to access a site that's blocked in hosts file"? I tend to add blacklists like this then few months later wonder why some site doesn't work.


Yes. As the hosts file redirects to localhosts, you can run a local server, displaying a notification. As root:

    while true; do printf "blocked by hosts file" |nc -q 1 -l -p 80; done


You can run a server at 127.0.0.1:80, but it won’t work for HTTPS sites… unless you also configure your own certificates on the server.


That won't help if the site fails due to a backend request to FB falling over.


Is it really any use trying to enumerate all variants under *.facebook.com and similar?

The counts:

    307 facebook.com
    295 fbcdn.net
    250 tfbnw.net
     12 whatsapp.com
      9 instagram.com
      3 fb.com
      3 edgesuite.net
      2 metrix.net
      2 fbsbx.com
      2 fbcdn.com
      2 facebook.net
      2 edgekey.net
      2 cdninstagram.com
      2 akamaihd.net
      1 fb.me
      1 appspot.com


A bit further down in the replies reustle mentions: `It's a shame /etc/hosts doesn't support wildcards`


I find that ridiculous. Is there a reason why it's that way?


It's been around since the beginning of time itself I guess. You can try something like dnsmasq. One liner in the conf file: address=/.facebook.com/127.0.0.1

edit: For Ubuntu this should work (one versions from Trusty and newer):

sudo touch /etc/NetworkManager/dnsmasq.d/local

Put these lines into the above file and save:

  address=/.facebook.com/127.0.0.1
  address=/.fbcdn.net/127.0.0.1
  address=/.tfbnw.net/127.0.0.1
  address=/.whatsapp.com/127.0.0.1
  address=/.instagram.com/127.0.0.1
  address=/.fb.com/127.0.0.1
  address=/.edgesuite.net/127.0.0.1
  address=/.metrix.net/127.0.0.1
  address=/.fbsbx.com/127.0.0.1
  address=/.fbcdn.com/127.0.0.1
  address=/.facebook.net/127.0.0.1
  address=/.edgekey.net/127.0.0.1
  address=/.cdninstagram.com/127.0.0.1
  address=/.akamaihd.net/127.0.0.1
  address=/.fb.me/127.0.0.1
  address=/.appspot.com/127.0.0.1
And then: sudo systemctl restart network-manager


My thoughts were also, why so many subdomains? I wonder if it is to make the list seem more impressive and Facebook more all-encompassing.

"If you want to block facebook you need to block almost a thousand websites!"


"Then we will browse in the shade."


Where/How did you get that list?


grep -Po '\w+\.\w+$' | sort | uniq -c | sort -rhk1

Bit sloppy because it doesn't pick up the domain names with dashes. But my point was that if you want to blacklist *.facebook.com you shouldn't try to enumerate every single variant of it, that's not durable.


Let's put this in global context:

    Adblocking is a non-trivial task, but there are trivial solutions.
    
    1.) Install hosts-gen from http://git.r-36.net/hosts-gen/
    
    % git clone http://git.r-36.net/hosts-gen
    % cd hosts-gen
    % sudo make install
    
        # Make sure all your custom configuration from your current /etc/hosts is
        # preserved in a file in /etc/hosts.d. The files have to begin with a
        # number, a minus and then the name.
    
    % sudo hosts-gen
    
    2.) Install the zerohosts script.
    
    # In the above directory.
    % sudo cp examples/gethostszero /bin
    % sudo chmod 775 /bin/gethostszero
    % sudo /bin/gethostszero
    % sudo hosts-gen 
Add a cron job, and enjoy your faster and adfree-er internet. Further, you can add your custom (this FB) block to the local files in /etc/hosts.d, which then will be concatenated automatically.

[source]: https://surf.suckless.org/files/adblock-hosts/


This is a good thing to enable, but I think that smartphones contribute exponentially more data to Facebook services than laptops and browsers do. Smartphones give easy access to location, background running services, microphone. Even if you block these permissions to the app, Facebook gets the data from their data providers that use Facebook ads.


On android you can use dns66[0] to use these sort of blocklists.

[0] https://f-droid.org/packages/org.jak_linux.dns66/


Looks like a good alternative if you can't get root. It uses VPN to blackhole requests. If you have root, I'd use AdAway[0] which changes the hosts file directly.

[0] https://f-droid.org/packages/org.adaway/


Another great alternative is Blockada[1], it does the same as DNS66 and Adaway but in my experience does it felt much more reliable. It is available on f-droid[2]

[1]https://blokada.org/index.html

[2]https://f-droid.org/en/packages/org.blokada.alarm/


"Exponentially" means nothing here. Perhaps you are looking for "orders of magnitude"


Well, he could be referring to the relative changes over time of what is contributed by a desktop computer and what is contributed by a smart phone. Antiprivacy features on phones seem to get better at a much faster rate than antiprivacy features on a computer.


[amount of data collected from phones]=[amount of data collected from desktops]^[some exponent]


If the exponent is less than 1 then the amount of data collected from phones is less than that from desktops.


When you hear 'exponential back off algorithm', do you envision one that keeps retrying faster and faster?


I'd say better than 50% chance that the delay increases. But the phrase would be unambiguous if it were called an "exponentially increasing back off algorithm".


This is unnecessarily pedantic. An exponential back off algorithm has a 100% chance of increasing the delay, that's the whole point. Nowhere other than pure mathematics would I see the phrase "exponentially" and even consider a <1 exponent.


I think "needlessly" would work better than "unnecessarily" there.


My probability was for hearing the phrase "exponential back off algorithm" without knowing anything about the algorithm. I don't work in that field and had never heard of the term before the earlier post.

Experience suggests that most of the time when people say exponentially they mean an exponent greater than 1, but I have been surprised by what people have meant before so I personally wouldn't say that probability is greater than 90%. That's what I meant in more detail.


Correction: For some reason I was thinking x^n where n < 1 rather than n^f(x). If f(x) monotonically decreases with x then what I said still applies.


For iPhones, enterprise management will allow you to set things like /etc/hosts. I do this with my phone.

I don't know what Androids do.


Can you explain this a little more? Can this be done on a personal phone? I was under the impression that the hosts file was essentially untouchable on an iPhone.


Google around - 'iphone mobile device management'. There's a service that's free for a couple devices[1]. Apple also makes a (terrible) app called Configurator. There are a bunch of others, but most of them are designed for (and priced for) corporate use.

You need to learn a little about what you're doing if you want to go this route, and there is some setup. But basically, you're taking on the role of a corporate IT department, pre-configuring and possibly locking down the phone.

I set up a profile in Configurator a few years ago and am a little afraid of touching it - that application makes iTunes look thoughtfully designed and stable.

[1] https://www.jamf.com/products/jamf-now/


I advocate for iptables instead of DNS filtering.

Process of enumerating and rejecting facebook IPs :

* Query the RAD http://radb.net/query/ , search for AS32934

* Enumerate ip ranges by http://radb.net/query/?advanced_query=1

* Check inverse query by origin, use AS32934

* Grep the response route and route6 CIDR ranges

* Build a netfilter script with REJECT

Gives those scripts for iptables (updated once in a while) :

* https://cdn.rawgit.com/smigniot/mu/ea0f32867907b855063c56ae8...

* https://cdn.rawgit.com/smigniot/mu/ea0f32867907b855063c56ae8...

* https://cdn.rawgit.com/smigniot/mu/ea0f32867907b855063c56ae8...

* https://cdn.rawgit.com/smigniot/mu/ea0f32867907b855063c56ae8...

To enable :

* iptables -I OUTPUT -j no_facebook_out

* iptables -I INPUT -j no_facebook_in

* ip6tables -I OUTPUT -j no_facebook_out

* ip6tables -I INPUT -j no_facebook_in

By design, instagram and connect-with-facebook get muted too.


To get a list of all Facebook ip's:

  whois -h whois.radb.net '!gAS32934' | tr ' ' '\n' | awk '!/[[:alpha:]]/' > facebook.list
  whois -h whois.radb.net '!6AS32934' | tr ' ' '\n' | grep '::' >> facebook.list


I don't see https://messenger.com or https://m.me (which also leads to messenger)


The last commit to the file is on 4 Oct 2016. So you could expect that.


Ok, we've added 2016 above.

Unsurprisingly, there is recent stuff on https://github.com/jmdugan/blocklists/pulls. If anyone notices it getting updated, could you tell us? hn@ycombinator.com is best.


Its actually quite annoying to block all of facebook. There are a lot of innocuous sites that have at least some small reliability on facebook and blocking all of facebook makes using these sites a tad bit difficult / poor UX.


Any examples? I have blocked Facebook for many years, and I can't think of a single time where it has mattered.

I run without JavaScript by default, so maybe I just don't notice those kinds of things after years of conditioning.


I imagine running without JS would be way more impactful on site function, so yeah, it wouldn't change your experience much.


you run without js by default? God your internet must be boring :)


Only whitelisted sites run JS in my browser. If by 'boring', you mean vastly less annoying, yes, it is terribly boring.

I'd likely never look at the bulk of commercial websites if I had to render them the way owners intended them to render.


I block all js by default and whitelist as I go. Ironically my user experience is far better because I don't deal with intrusive dynamic behavior.


For anyone who's interested, I also maintain a tracking protection list for Internet Explorer. It's based originally on the Ghostery and Disconnect lists, but I now update it independently. It's designed to be concise and speedy, yet also comprehensive. Note, however, that due to the limitations of tracking protection lists in IE, it can't block everything. You may need to supplement it with a small hosts file. Check it out here: https://github.com/amtopel/tpl



It's a shame /etc/hosts doesn't support wildcards

0.0.0.0 *.facebook.com


You might want to take a look at dnsmasq. It's a nice choice for when you want a DNS server but BIND is overkill.


You could sort of work around that by just blocking their IP ranges:

https://stackoverflow.com/a/11164738


The ranges could change over time.

If you run your own DNS resolver you can use the wildcard trick.

Something like this in an RPZ zone should do it:

    facebook.com    IN CNAME .
    *.facebook.com  IN CNAME .
    facebook.net    IN CNAME .
    *.facebook.net  IN CNAME .
    fbcdn.com       IN CNAME .
    *.fbcdn.com     IN CNAME .
    fbcdn.net       IN CNAME .
    *.fbcdn.net     IN CNAME .
    fb.com          IN CNAME .
    *.fb.com        IN CNAME .
    fb.me           IN CNAME .
    *.fb.me         IN CNAME .
    tfbnw.com       IN CNAME .
    *.tfbnw.com     IN CNAME .


    *.facebook.com IN CNAME .
should be unnecessary since the DNS zone above it, facebook.com is already CNAME'd. Most resolvers will take a CNAME as "any further requests go to here", which to my experience usually includes NS servers.

(This is also why you don't CNAME your root domain, CNAME conflicts with any other record type)


What software actually parses /etc/hosts, at least on Linux?


> What software actually parses /etc/hosts, at least on Linux?

glibc resolver

A good entry point for reading more about it:

$ man nsswitch.conf

If your /etc/nsswitch.conf file's "hosts" line contains the keyword "files", then it potentially uses /etc/hosts. If "files" is first (typical default config), it looks there first, before the other places listed.

This is done under the hood when programs use resolver functions like gethostbyname or getaddrinfo.


You can see this in action on musl source code, which is arguably a much more readily understandable implementation of libc:

- Function that actually parses /etc/hosts is name_from_hosts(), implemented here: http://git.musl-libc.org/cgit/musl/tree/src/network/lookup_n...

- Which is called by __lookup_name() on the same file: http://git.musl-libc.org/cgit/musl/tree/src/network/lookup_n...

- Which is, in turn, called directly from getaddrinfo() [http://git.musl-libc.org/cgit/musl/tree/src/network/getaddri...], the actual function exposed to you as libc user.


you can do this with a DNS zonefile at your local resolver, though.


That would only slightly help considering they own so many TLD’s.


Looking at that lists it'd be 16 wildcard entries vs. 895 hostnames. A significant improvement.


I don't use that list, I use Steven Black's [1] list has 1004 entries which is more complete than this list. It would be less, but more than 16. Even at that, you're right it would definitely reduce the size.

- [1] https://github.com/StevenBlack/hosts/


It'd be more future-proof too.


Someone should start a business for this:

Provide people that care about privacy with a public DNS server they can use that auto blocks those domains (and update's its lists). I would pay for it (few dollars a month)

Feature suggestion: allow people to add their own entries so I can purposely block reddit or hacker news to reduce distractions.

Pretty sure I would set this DNS server on both my phone and desktop.


Can somebody elaborate why this link from 2016 is gaining steam here? Is it because Cambridge Analytica misused FB data? May be I am missing something, do we know if facebook was wittingly complicit?


I'm not a big fan of Facebook but I do find it useful. That being said, this feels to me like a coordinated attack campaign. Take an issue, blow it up, attach various other nebulous bad things and push it to a public that was already primed against that very service. Seems to be working great.

I do not know who or why is pushing this campaign but it definitely feels organized and calculated.


If Facebook is negligent or complicit, I'd certainly change how I interact with the platform. I haven't kept up with the details and would love to be pointed to some summation / analysis of the facts.


The sudden mainstream outrage toward Facebook is most likely because the Cambridge event is the perfect confluence of "Facebook", "Trump", and "bad".


The whole conversation, without having read into everything here in absolute detail, seem to be very tool oriented. Am I the only one here overwhelmed by the sheer amount of domains involved?


It's mostly subdomains since windows can't use wildcards (*.domain.com). Setting up such a large hosts-file might slow down your computer a bit though.

There are some tools that lets you run wildcards in the hosts-file but can't remember the names at the moment.


Again, a tool concern. Not trying to downplay the possible solutions, but rather bring attention to the magnetude


I'm not sure what you mean, I see 13 different domains in that list, the rest is subdomains of the same 13 domains. You can't count that as "sheer amount of domains". Our company probably have 2000 different subdomains on 5 domains? Subdomains we can create as we want to, it's just some letters before the domain part of the adress. Eg: Subdomain.Domain.com. That is what the wildcard is for, *.Domain.com catches them all no matter how many extra we create. Wildcarddomain is needed for example on an SSL certificate to accept any subdomain for the domain you ordered.


It's a ridiculous amount, looks like someone just created a pull request with 500+ more facebook domains.


block all of Google's IP addresses: https://support.google.com/a/answer/60764?hl=en (note: your internet (the web) will stop working properly if you do block all of those IPs, which is a big problem)


Additionally:

https://github.com/jmdugan/blocklists/blob/master/corporatio...

or

https://pastebin.com/V9nzBXnx

I only blocked some of these domains. Blocking all Google domains basically makes the Internet unusable.


Can you be more specific about your internet not working with those addresses blocked? What exactly doesn't work?


A lot of sites pull in popular js libraries from google; the idea being that they'll already be in a user's cache and even if they're not, google has a better (cheaper, faster and/or lower latency) CDN than the site author.


Most of this can be worked around by installing Decentraleyes, which replaces common CDN-loaded resources with local copies.


I'd like something so I can use the web from China (without a VPN) . Right now a lot of the common JS/fonts/etc from Google break - and webpages go wonky. Is there a way to preload a cache?


That sounds great, installing... It should be part of Firefox at this point, because it is getting ridiculous.


it's an advertising company's dream to be able to load code on most computers browsing the web....


In their defense I believe they do tend to host that stuff on domains that do not set or retrieve the regular google tracking cookies. Though there are other tracking methods that they might still be using.


it depends on what sites you are using, but I beg you to try it and I can almost guarantee you that it will break your browsing experience... (even if you aren't using google search or gmail)


> your internet (the web) will stop working properly if you do block all of those IPs

Hmm, how does China handle this then?


good question... I'd be curious to know if they block all of Google's IP addresses.


does this include instagram, messenger, and whatsapp domains too? I'm not sure if these services use their own domains.

'fb' itself will eventually be, if it's not already, just a data holding company for these and other acquisitions.


Yes it does. But you could have found this out way faster by just searching on the page.


I wish it were that easy. Good start, but Facebook will still:

1. Get your data from other websites/apps that you allow

2. Get your data through your friends that use Facebook


shouldn't this keep javascript from facebook domains from loading?


Yes, but a lot of data transfer happens on the backend without the client being involved.


This is certainly possible but hasn't been my experience. Most of Facebook stuff is xhr that is easy to block on the client side.

It's certainly possible that services are doing this on the backend, but it seems far easier to plug in Facebook's libraries in the frontend.


Why would you block all the domains but still keep your account that you would no longer be able to access? The account is the problem not the domains. You would have to block the domains on every device you use. Just kill the problem at the source and delete your entire surveillance account with facebook.


Because facebook tracks you even if you don't have an account.


Why only Facebook? All companies which store data are suspect.


Similar solution to blocking things at your local recursive DNS resolver, assuming you have a captive pool of devices, let's say in 10.240.0.0/24) in a LAN, all of which are given DHCP addresses and DHCP-assigned DNS resolvers, and you're in control of a bind9 server that's on the same LAN.

Not going to prevent people with admin rights on their workstations from using another DNS resolver (or VPN, or whatever), but a fairly low effort solution.

https://community.jisc.ac.uk/library/janet-services-document...


There is more coverage of this topic here: https://news.ycombinator.com/item?id=11791052


Does anyone have this for Google ads domains and/or YouTube?


I don't have a list that I can easily share, but you can curate your own off of https://github.com/StevenBlack/hosts


Didn’t know about this. Thanks for the link!


Man, that person put in some effort. That’s a lot of good lists.

Scrolling through them it’s really interesting to see the other sites companies own.

I always forget WhatsApp is Facebook.


This list presumably updates/moves around often.

Is there a service that, say, subscribes to a live list of this domain set (like adblock consumes easylist) and updates my hostfile automatically?

If not, that is a piece of software that I would find useful and worth paying for (with the ability to audit the software's ability to phone home about the rest of my hosts file)


Your host file, hmm. Maybe something based on disconnect.me. If you're mostly worried about the browser (which seems sensible for most users), you can just enable tracking protection in Firefox: https://support.mozilla.org/en-US/kb/tracking-protection


It would be useful to know how to generate this list in the first place, then just adopt that to create the list on our own, instead of coming back to this github repo to sync every so often.

I do not see this in the repository, presumably to get people to come back to his github repo for updates, but that's my cynicism.


I wrote a small tool that translates AdBlock Plus filter lists into hosts file format [1]. It can only translate simple domain-name rules but might be of interest to people in this thread.

[1] https://github.com/wwalexander/hostsblock


Thank you, I didn't know about the `cat -` trick to read from stdin (works the same as `echo hi | cat /dev/stdin`). Even after all this time, I still learn something new every day.


    echo hi | cat -
is also equivalent to

    echo hi | cat
You only need the - if you are concatenating other files with stdin [1]. Incidentally, any use of

    echo x | y
can be replaced (at least in Bash) with

    y <<< "x"
This is called a "here string" [2].

[1] https://www.freebsd.org/cgi/man.cgi?query=cat&manpath=FreeBS...

[2] http://tldp.org/LDP/abs/html/x17837.html


A lot of commenters mention dnsmasq. I wrote some scripts a while ago to help minimize a dnsmasq config that had been generated from a hosts file. People in this thread might find them useful.

https://petedeas.co.uk/dnsmasq/



Your list does basically nothing for Google tracking domains. Here is mine: (note that this blocks recaptcha which a lot of websites are now using for login annoyingly). I add entries for IPv4 and IPv6 (0.0.0.0 and ::1 respectively).

    0.0.0.0 google.com
    0.0.0.0 www.google.com
    0.0.0.0 fonts.googleapis.com
    0.0.0.0 google-analytics.com
    0.0.0.0 apis.google.com
    0.0.0.0 tpc.googlesyndication.com
    0.0.0.0 ssl.google-analytics.com
    0.0.0.0 www.google-analytics.com
    0.0.0.0 www-google-analytics.l.google.com
    0.0.0.0 stats.g.doubleclick.net
    0.0.0.0 clients.l.google.com
    0.0.0.0 pagead.l.doubleclick.net
    0.0.0.0 pagead2.googlesyndication.com
    0.0.0.0 googleads.g.doubleclick.net
    0.0.0.0 www-googletagmanager.l.google.com
    0.0.0.0 googleadapis.l.google.com
    0.0.0.0 gstatic.com
    0.0.0.0 ssl.gstatic.com
    0.0.0.0 www.gstatic.com
    0.0.0.0 www.googletagservices.com
    0.0.0.0 www.googletagmanager.com
    0.0.0.0 securepubads.g.doubleclick.net
    0.0.0.0 tpc.googlesyndication.com

To login to a google service such as gmail or enable captcha, comment out the three (*.)gstatic domains.


nice!


Minor segue, is there any easy way to Geo-block URLs, both by ccTLDs and by geolocation of IPs from certain countries.

I have pi-hole running but it doesn't support that currently, best it does is wildcard but even for that it needs domain and won't do just on the ccTLD.


ASNs, kinda, maybe.


Nice to see HackerNews create pull requests to make the list more up to date. I hope they get committed.

https://github.com/jmdugan/blocklists/pulls


This is a terrible approach. Facebook can rotate many of these names whenever they feel like.


Interesting to see several domain names/servers with 'mqtt' referenced. Wondering if Facebook interacts with IoT devices routinely, or perhaps they use MQTT for Messenger message transfers etc.?


I want to share my favorite HOSTS file provider [1] which includes FB addresses.

[1]: http://someonewhocares.org/hosts/


goatse!

:)


on macOS i use a bash script to get all Facebook ip addresses:

  whois -h whois.radb.net '!gAS32934' | tr ' ' '\n' | awk '!/[[:alpha:]]/' > "/etc/pf.anchors/usr.home.sub/facebook.list"
and then use a pfctl anchor to block them all

  table <facebook> persist file "/etc/pf.anchors/usr.home.sub/facebook.list"
  block drop quick to <facebook>


I need something like this that I can install on friend and family's phones/iPads/computers whenever they ask me to fix something for them >:)


Gas Mask is a neat macOS app to manage hosts. You can subscribe to a remote hosts file, too.



A blacklist approach to this is for sure a cat and mouse game. A better approach is to incrementally whitelist the domains you trust.


In general blacklists are a better choice overall for non-technical users. Do you really want an angry text message or phone call every time $FAMILY_MEMBER has some site that's rendering poorly because they haven't properly whitelisted one of the 12 legit domains it hits? And do you really trust them to not whitelist some ad & tracking domains?


Presumably, $FAMILY_MEMBER would have to get past the phone number whitelist too. So it might not be that bad.


Not sure why you were downvoted, but this is correct. Facebook (or anyone else) can easily create more domains.


I might do this. Just curious if this will break the internet for me... Will certain non Facebook pages fail to load?


The list has fbcdn-profile-a.akamaihd.net, but it missed fbcdn-creative-a.akamaihd.net

If anyone wants it


Are there any implications to having 40,000+ lines in your /etc/hosts?


It's basically a big lookup table, trading storage for speed.

The most noticable effect is that your web pages load faster, because a lot requests for unnecessary data (eg. Facebook in this example) complete immediately. Occasionally you will miss out on a webpage that depends on it.

Think uBlock Origin, but not for just your browser but your entire system.


Thanks I have used /etc/hosts for a long time. I however just realized exactly how big mine is getting.


I've seen no noticeable impact to 100k+ lines.


This is good to know. Thank you for the reply.


One of the posts I wish I could upvote more than once. Thank you.


This list must've updated a lot since 2016.


what is the difference between 0.0.0.0 and 127.0.0.1 with respect to redire ction?

will redirecting to localhost eat more cpu cycles?


0.0.0.0 is no host. 127.0.0.1 is localhost, and will still generate a query.

If you've a webserver there, its logs might get busy with the blocked traffic requests.


It's pathetic that it takes a literal propaganda campaign to make people see the problem with facebook after 10 years, but whatever I'll take it.


Any way to do this at the router level?


If you use dnsmasq you can just save this file, set 'addn-hosts=/path/to/list' in the dnsmasq config, and restart the service.


Pi-hole on a raspberry pi


Beavis?

:)


This is quite a powerful message!


Why would you block WhatsApp?


Owned by FB.


It may be owned by Facebook, but it's one of the viable secure messaging apps for people what don't use Signal. The other is Wire.


It's only viable if you're comfortable sharing your contact list and metadata with Facebook.


E2E encryption that isn't MTProto? Done.


Wow the hate/dislike is very real.


merci.


I can block domains on my laptop, no problem. But I have not been able to figure out any convenient way to block websites on my Android phone. My Android phone comes with a Chrome browser. Any ideas about how to block websites reliably on an unrooted/jail-not-broken Android phone?


Block at DNS level on a device (router or DNS server) and proxy all Android traffic to said device.

I use a pfsense router running OpenVPN and pfblockerNG. PfblockerNG sinkholes all DNS requests to domains from a list such as this one. Then by using OpenVPN I simultaneously encrypt my connection when roaming remotely and I can specify to use my home DNS server to sinkhole ad/tracking domains.


Thanks for the suggestion. I think this will work fine in a home network that I can control. But this is not going to work when I am traveling and using my carrier's 4G network. Am I right? Is there any nifty solution to address the later?

I am a little disappointed that I can't do something as simple as install plugins for my phone browser that can block sites.


> But this is not going to work when I am traveling and using my carrier's 4G network.

That's what VPNs are for. See openvpn, for example (or tinc, strongswan, etc)


Why not do something like *facebook.com?


Hosts syntax doesn't allow for that.

DNSMasq would, however, allow you to only specify each TLD.


I'd like to mention a problem with blocklists like this that you put into /etc/hosts. I've noticed that many sites trivially evade the blocklist by adding a redirect. I.e., if example.com is blocked, but it redirects to example.ru or example123.com or example.team, then it still works. The spammers and advertisers don't have to change all the existing links to example.com -- they simply need to add a new redirect every few weeks.


that's not how /etc/hosts works. the domain listed in /etc/hosts (example.com) will point to 0.0.0.0 (or 127.0.0.1)... you'll never even make it to the server so you won't get the redirect.


Oops, you're right. I discovered that it was my browser that was "helpfully" adding www in front of lots of domains I had blocked in /etc/hosts. For instance, if I blocked example.com, my browser would automatically try www.example.com (which might then redirect to something else entirely).

In my case, I'm using Firefox. I can stop this behavior by setting "browser.fixup.alternate.enabled" to "false" in about:config.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: