I was able to download and build the Gtk3 version of this in under 5 minutes. Fo...

iforgotpassword · on Aug 12, 2020

I'll try this tomorrow. The old opera presto also built in under five minutes and was in general very lean, but crashes on most modern JavaScript heavy sites. I still sometimes wonder where it would be today if they hadn't abandoned it or properly made it open source at least.

unicornporn · on Aug 11, 2020

Does it block ads?

Asking because: DNS-level adblocking is stupid.

CyberDildonics · on Aug 11, 2020

DNS level adblocking is fantastic, especially since it can be done network wide so easily and used as an extra layer of ad filtering.

vageli · on Aug 11, 2020

> DNS level adblocking is fantastic, especially since it can be done network wide so easily and used as an extra layer of ad filtering.

It's great, but DNS over HTTPS will end the party soon enough (if I was a smart TV manufacturer, I would be prioritizing adding dns over https to the device firmware to subvert network blocks).

1vuio0pswjnm7 · on Aug 12, 2020

I do not understand. I run my own DNS but I also gather DNS data in bulk from DNS over HTTPS or DNS over TLS sometimes. I retrieve the data outside of the browser and put it into my own zone files. Are you saying that applications and devices will make it practically impossible for the user to change DNS settings to point to localhost or RFC1918-bound DNS servers? How would they be able to do that, assuming the user can control the first upstream router. Even if they could do this, it seems a bit too heavy-handed.

Much easier I would think for application developers to just make an ad blocking extension, e.g., uMatrix, stop working. For example, they could say this is because the application now has its own built-in ad blocker. Nevermind that the developers are paid from the sale of web advertising services.

Arnavion · on Aug 11, 2020

DNS-adblocking in a router can be complemented by the router's firewall blocking outbound to all DoH provider IPs.

(It'll need to be a constantly-updating blocklist, but the DNS-adblock lists are also that already.)

0xcde4c3db · on Aug 12, 2020

I can't vouch for these since I haven't tried them yet, but it can apparently also be complemented by configuring your local DNS server to return NXDOMAIN for use-application-dns.net [1] and using a DoH proxy to protect upstream requests from snooping [2].

[1] https://support.mozilla.org/en-US/kb/canary-domain-use-appli...

[2] https://github.com/aarond10/https_dns_proxy

beamatronic · on Aug 11, 2020

That gives the consumer one more reason not to hook it up to the network, ever.

giancarlostoro · on Aug 11, 2020

I have to agree. I think DNS adblocking off a Raspberry Pi is the best. Unfortunately a power loss screwed up my Pis SD card.

1vuio0pswjnm7 · on Aug 11, 2020

Can you elaborate?

dependenttypes · on Aug 12, 2020

Cosmetic filtering and blocking specific files can't be done via dns filtering.

1vuio0pswjnm7 · on Aug 12, 2020

Could you give a working example, i.e., a website where you are doing cosmetic filtering or blocking specific files?

I also use a local proxy in addition to DNS which allows me to serve alternative resources or block/redirect certain URLs based on prefix/suffix/regex.

bartvk · on Aug 12, 2020

Not OP, but where I do cosmetic filtering, is on Stack Overflow. They display "hot network questions" on every page, with extremely interesting stuff from non-work Stack Overflow clones. "How many cats did Cmdr. Data have in Star Trek", that sort of stuff.

It has made me lose my focus on work repeatedly, and Stack Overflow really is a site related to work for me. So I block that column.

1vuio0pswjnm7 · on Aug 12, 2020

Thanks for that. Now I can definitely see the usefulness of this for interactive website use.

I am more of a non-interactive user and do not use a graphical, javascript-enabled browser much.

Here is a snippet I used to remove the annoying "hot network questions" from the page:

   sed '/./{/div id=\"hot-network-questions/,/<\/ul>/d;}' page.html

Out of curiosity I wanted to see if I could access all these networking questions non-interactively. That is, download all the questions, then download all the answers.

Some years ago, like 10 years or more, I was making some incremental page requests on SO, e.g., something like /q/1, /q/2, ... and I got blocked by their firewall. What amazed me at the time was the block was for many months, it may even have been a year. This is one of the harshest responses to crawling I ever encountered. One of the very few times I have ever been blocked by any site and the only time I ever got blocked for more than a few hours.

Things have definitely changed since then. To get all the networking questions, I pipelined 277 HTTP requests in a single TCP connection. No problems.

Here is how I got the number of pages of networking questions:

   y=https://networkengineering.stackexchange.com/questions
   x=$(curl $y|sed -n 's/.*page=//;s/\".*//;N;/rel=\"next\"/P;N;')
   echo no. of pages: $x

To generate the URLs:

   n=1;while true;do test $n -le $x||exit;
   echo $y?page=$n;n=$((n+1));done

I have simple C programs I wrote for HTTP/1.1 pipelining that generate HTTP, filter URLs from HTML and process chunked encoding in the responses.

Fastly is very pipelining friendly. No max-requests=100. Seems to be no limits at all.

There were 13,834 networking questions in total.

Wondering just how many requests Fastly would allow in one shot, I tried pipelining all 13,834 in a single TCP connection. It just kept going, no problems. Eventually I lost the connection but I think the issue was on my end, not theirs. At that point I had received 6,837 first pages of answers. 211MB of gzipped HTML.

So, it is quite easy these days to get SO content non-interactively.

It was also easy to split the incoming HTML into separate files, e.g., into a directory that I could then browse with a web browser.

   x=$(zgrep -c ^HTTP answers.gz)
   mkdir newdir; cd newdir;
   zcat ../answers.gz|csplit -k - '/^HTTP/' '{'$x'}'

bartvk · on Aug 12, 2020

Heh you sound like Richard Stallman, it's rumored he also doesn't use a browser.

As an aside, I've seen mirrors of Stack Overflow pop up when I use DuckDuckGo to search. Google seems to filter these out.

1vuio0pswjnm7 · on Aug 13, 2020

From what I have read of his philosophies I can imagine the reasons Stallman might not use a browser -- and I think that is an oft-repeated, old, unsubstantiated rumour. I would be he uses one. The reasons I prefer the command line to an over-sized, slow, graphical program are different. I was introduced to computers in the VAX era, not the Javascript era.

Curious if those SO mirrors did not show cruft like "Hot Network Questions" on every page, would you use them instead of relying on an ad blocker.

bartvk · on Aug 14, 2020

At the moment, I completely ignore the SO mirrors as my instinct is to go to the original source. I'll start paying attention, they might actually be the better website.

zamadatix · on Aug 13, 2020

uBlock Origin with all but the regional languages filter lists gives:

133,215 network filters ＋ 155,733 cosmetic filters

In the stats. Network filters being URL based not just domain based. The lists are easy to view from the uBlock settings page if you want an endless supply of examples. They are used in pretty much any style list: ad, privacy, annoyances, cookie banners, tracking

gruez · on Aug 12, 2020

AFAIK ads on google search results can't be blocked by DNS alone.

1vuio0pswjnm7 · on Aug 12, 2020

Can you provide a working example?

I certainly block plenty of Google-controlled domains. I normally do not use Google search and even when I do I never seems to trigger any ads. Maybe I am just not searching for things people want to sell. In the rare event I do trigger an ad, because I am not using a "modern" browser to do searches, these ads are not distracting and I can easily edit them out of the text stream if I want to.

gruez · on Aug 12, 2020

>Can you provide a working example?

Literally any search that uses the "expensive" keywords[1]. "car insurance quotes" would do nicely, for instance.

[1] https://www.wordstream.com/articles/most-expensive-keywords

1vuio0pswjnm7 · on Aug 13, 2020

It looks like even with all the more recent nonsense Google inserts into the results it is still easy to just extract the result URLs and leave behind the rest of the crud. If you want to retain the description text it is a little more work.

Interestingly, the /aclk? Ad URLs do not use HTTPS.

Seeing that these Ad URLs are still unobtrusive, I am wondering why anyone would want to remove them from the search results page. For cosmetic reasons?

I prefer searching from the command line. To remove the /aclk? Ad URL's I used sed and tr.

   #!/bin/sh
   # usage: $0 query > 1.htm
   # 
   x=$(echo y|tr y '\004');
   z=$(echo https://www.google.com/search?q=$@\&num=100|sed 's/ /%20/g');
   curl --resolve www.google.com:443:172.217.17.100 -Huser-agent: "$z"|sed "s/<a href=\"\/url?q=/"$x"&/g"|tr '\004' '\012'|sed -n '/url?q=/{s/.url?q=//;s/&amp;sa=.*\"><h3/\"><h3/;s/&amp;sa=.*\"><span/\"><span/;s/$/<br>/;/aclk?/d;p;}'