It's a read only blog with 2 pages. What do you gain for putting this over HTTPS...

SquareWheel · on July 3, 2022

Do you really want your ISP to know which piracy sites you frequent? This is all being sent in plain text. Or they could change the content, insert a redirect, or inject ads without your knowledge. TLS is needed on all websites - not just those with interaction.

cookiengineer · on July 3, 2022

I hate to break it to you, but why do you think ISPs override the DNS responses with TTL set to 0?

TLS itself is only useful if you also rely on DNS over HTTPS/TLS. Well, setting the issues with TLS 1.2 and earlier aside.

SquareWheel · on July 3, 2022

Those are problems too, but they aren't exploited nearly as often as MITMing cleartext has been historically. The solution you mention is already becoming widely-supported, as are newer protocols like QUIC that discourage snooping.

There's no reason to ignore a good solution just because it's not 100% perfect.

Tepix · on July 4, 2022

> why do you think ISPs override the DNS responses with TTL set to 0

mine doesn't.

generalizations · on July 3, 2022

https won't keep your ISP from knowing you visited the site. And the rest of those? For a text-only blog, they seem kinda trivial.

simlevesque · on July 3, 2022

> https won't keep your ISP from knowing you visited the site

If you use DoH, yes it does. Unless I'm mistaken. They only know the IP address of the remote server.

cookiengineer · on July 3, 2022

> They only know the IP address of the remote server.

It's the internet. Everyone can scrape links and measure/correlate which assets were on them to correlate likely visited websites.

Especially if every web page these days is pretty unique in terms of what kind of assets (network streams) with what kind of byte size were loaded at which point in the document loading timeline.

Now include the TLS fingerprint of your web browser and well, privacy went to shit.

HTTP needs an upgrade with scattering and rerouting on the fly, otherwise these deanonymization techniques can never be fixed.

philsnow · on July 4, 2022

> HTTP needs an upgrade with scattering and rerouting on the fly, otherwise these deanonymization techniques can never be fixed.

Isn't that e.g. TOR's job? Doesn't belong in HTTP.

Anunayj · on July 3, 2022

and the SNI, until ECH is widely adopted, SNI is leaked in plaintext when connecting to a server, it needs to because how will the server know which TLS cert to reply with?

kevin_thibedeau · on July 3, 2022

And nobody would ever think of keeping a reverse DNS index.

HideousKojima · on July 3, 2022

With shared hosting, NAT, etc. there may be several sites sharing the same IP.

kevin_thibedeau · on July 4, 2022

Doesn't amount to much if any one of those sites is on a watchlist.

kenniskrag · on July 3, 2022

only the destination IP. TLS encryption is inside tcp and around the http protocol.

Nextgrid · on July 3, 2022

Until ESNI becomes mainstream (and browsers offer the ability to enforce it), the domain name is also sent out in plaintext.

syntheticcorp · on July 3, 2022

ESNI has been dropped, a new spec alters how it works and renames it Encrypted client hello (ECH)

https://blog.mozilla.org/security/2021/01/07/encrypted-clien...

btdmaster · on July 3, 2022

ECH looks quite interesting, but isn't it quite easy to do a reverse DNS lookup for most domains?

1vuio0pswjnm7 · on July 4, 2022

The answer is no. There was a Cloudflare article on ECH a while back that mentioned the fallibility of using reverse DNS, but I am having trouble locating it. In any event, the people working on ECH have coined a term called the "anonymity set". Below is a Cloudflare article that uses this term.

https://blog.cloudflare.com/handshake-encryption-endgame-an-...

The "anonymity set" refers to the number of possible domains using a single IP address. The existence of that term implies that some IP addresses must have a number of domains associated with them, greater than 1. With these IP addresses, one cannot determine the domain name, the one that the www user sent, from a PTR query alone. Even prior to the introduction of SNI to TLS, when the only way to offer HTTPS was by using a dedicated IP address, discovering the contents of the encrypted Host header via reverse DNS was neither easy nor reliable.

If there are still people reading HN who believe that reverse DNS is reliable and makes plaintext SNI and ECH moot, and are going to comment as such in the future, I would be happy to post the results of an experiment where I take the DNS data for all the domains currently submitted to HN, i.e., a list of IP addresses found in the A records for these names, and do a PTR on each one. We can look at whether "most domains" are identifiable through PTR records.

Also remember the question is not whether ECH protects 100% from someone discovering what domain name the user sent. It does not. The question is whether ECH makes it more difficult to discover than simply sniffing plaintext SNI on the wire, which, of course, is even easier and more reliable than reverse DNS.

sudosysgen · on July 4, 2022

I will bet 10$ that with reverse DNS + DPI to try to suss out page size and caching behaviour you can identify anyone accessing this website and downloading the 7TB database.

1vuio0pswjnm7 · on July 4, 2022

No one has to "access this website" because they can read its contents in Internet Archive, Common Crawl, Google Cache, etc. Page size and caching behaviour will not work if the person is using HTTP/1.1 pipelining to request multiple pages from a variety of websites from Internet Archive, over a single TCP connection. (Using CDX API not HTML form at Wayback Machine page.)

The 7TB is via torrent, not via HTTPS. No rDNS needed.

1vuio0pswjnm7 · on July 3, 2022

Not disagreeing but presenting a hypothetical:

If the user requests the page from Internet Archive, Common Crawl or even Google Cache, how does the ISP know what the user requested. (NB. Neither IA nor Google Cache require sending SNI,^1 so the ISP may only see IP addresses).

With IA, the IP address alone does not reveal which IA site or page the user is requesting. There is more to IA than only Wayback Machine.

With Common Crawl, the user can send the Cloudfront domain name instead of a commoncrawl.org domain. Are all ISPs going to know that this is Common Crawl. Even if they expend the effort to learn, what benefit is achieved.

With Google Cache, the IP address alone does not reveal which Google site the user is accessing. Needless to say, there are many, many domains using these IP addresses.

There is nothing that requires any web user to retrieve web pages from a given host. The page may be mirrored at a number of hosts. Some of those hosts might offer HTTPS, support TLS1.3 and not require plaintext SNI/offer encrypted ClientHello.

Even assuming an ISP can determine what domain name a customer is sending in a Host header or ClientHello packet, it would still be necessary to subpoena the archive/CDN/cache to figure out precisely what pages were being requested.

1. The same party is controlling all the server certificates. IA controls the certificates for all IA domains, Amazon (issues and) controls all the certificates for Cloudfront customers and Google controls all the certificates for Google domains. Perhaps there are web users commenting on HN who believe that ingress/egress traffic for site saved/hosted/cached at an archive/CDN/cache is somehow private as against the company running the archive/CDN/cache in a meaningful way. I am not one of them.

As for the question of an ISP modifying the contents of web pages, this is an issue that could be addressed contractually in a subscriber agreement. It stands to reason that if this was a serious issue and not merely a hypothetical one raised by nerds debating the merits of TLS then it would be addressed in such agreements.

As for the "injection of advertising" issue as a argument in favour of the way TLS^2 is being administered on the web, IMO this is a bit silly since (a) it is trivial to filter such advertising (e.g., Javascript in the examples I saw) out out of the page and/or block it from running/connecting/loading and (b) the amount of "tech" company-mediated advertising that web users endure in spite of using TLS is enormous. More likely than being seen as a threat to web users, the injection of advertising by ISPs was seen as a threat to the advertising revenue of "tech" companies. The later are responsible for facilitating the injection of advertising (by their customers, not their competitors, i.e., ISPs), not preventing it.

2. By "TLS administration" I do not mean encryption as a concept nor certificates as a concept. I mean TLS administration measures designed to support "tech" companies first and web users second, if at all. A system where the questions of "threat model" and "trust" are both decided by "tech" companies not users.

robonerd · on July 3, 2022

> For a text-only blog

If somebody MITMs it, they can serve you anything they want.

enriquto · on July 3, 2022

> they can serve you anything they want.

Great. More books!

No really, I don't understand this argument. A static site served by plain http is perfectly appropriate. It's like a poster hanging on the wall for all to see. Of course people can paint over it, but it doesn't really matter.

criddell · on July 3, 2022

HTTP connections can be used as a weapon against others. One example is China’s Great Cannon.

https://citizenlab.ca/2015/04/chinas-great-cannon/

Aachen · on July 3, 2022

That's quite dated by now. If you are in a position to inject traffic, you are likely also able to simply use that uplink to send traffic of your own. I'd be surprised if this is still in use, especially outside of China (or a poor not-so-tech-savvy country like North Korea) and wasn't just a quick hack at the time.

criddell · on July 4, 2022

It's only dated if the vulnerability has been patched. In this case, the only way to patch it is to serve over HTTPS.

If it was a quick hack, then that makes it even worse. Difficulty is a mitigating factor.

robonerd · on July 3, 2022

They could serve you javascript that exploits your browser. At the very least, they could replace that bitcoin donation address with their own. That's a tempting target if nothing else.

SquareWheel · on July 3, 2022

And "they" isn't just your ISP. It's also that free wifi hotspot you connected to, or the hotel service, or your company's network. Even if you trust your ISP (and you probably shouldn't), there are other bad actors to be aware of.

solarengineer · on July 4, 2022

I can confirm this. A friend uses a government backed ISP, and he frequently receives popups announcing local government announcements.

Aachen · on July 3, 2022

If you think you're high value enough to have someone target you specifically by getting on your LAN or gaining access to (or coercing) an upstream ISP to serve you a browser 0-day reachable only by laying in wait for you to visit an HTTP site because there is no other way in, that's not going to be for a free books website.

Tepix · on July 4, 2022

This is a site asking you to commit piracy, i can totally see the some agency intercept it and replace the onion addresses with theirs so they can track everyone down.

Aachen · on July 4, 2022

If enough people find it okay to take such extreme measures for mundane, nearly victimless crimes, I hate to think what the future will be like. In the past, hoarding exploits was considered something for the military, for national security, and even there it was a hot debate and controversial and many parties/countries wanted restrictions like time limits until it's reported to the vendor. Entering homes was a thing of warrants because we wanted to limit government overreach. Now it's okay to employ both of these for reading books without permission? If there's one of you then there's probably more. The future is bright.

Tepix · on July 5, 2022

A Man-in-the-middle attack against a HTTP website to de-anonymize people would not require a 0-day exploit.

I agree that it would be an extreme measure.

robonerd · on July 4, 2022

Consider that the downloads page for this site tells you to use their tor hidden service. If you open http://pilimi.org/ in Tor, you'll go through an exit node that could be MITMing everybody opportunistically, not targeting you specifically.

ars · on July 3, 2022

They still know which sites you visit even with https.

pessimizer · on July 3, 2022

They don't know the page. In the case of this site it probably doesn't matter, but which page you're looking at is always going to be more interesting and informative than which site you looked at.

When the prosecutor is looking through your internet records and they see 50 wikipedia hits in some relevant time period, they're going to be upset that https exists.

kenniskrag · on July 3, 2022

only the destination IP. TLS encryption is inside tcp and around the http protocol.

hombre_fatal · on July 3, 2022

You don’t need to copy and paste your reply everywhere it’s relevant on HN. Even us flea brains can carry your remarks in our head and apply them to similar comments.

geoffeg · on July 3, 2022

And, unless you setup an appropriate DNS server and the default from your ISP, then they also know that you looked up the site's hostname(s).

notriddle · on July 3, 2022

The same thing you always get. Assurance that your free Wi-Fi hotspot isn’t tampering with the page.

krick · on July 3, 2022

This, and also MITM (like ISP) needs to make their own request to this site, to know what I read. And, technically, they cannot really be sure it's what I've read, since nothing says that this site is static.

I'm not that offended, and torrents are only available via TOR anyway, but I do actually appreciate the sentiment. There's no reason to be not using TLS.

ac0lyte · on July 3, 2022

Trust?

_ofdw · on July 3, 2022

Malicious sites can still use letsencrypt