This is known as "subdomain takeover" and is definitely a common problem. It's probably one of the most frequently reported bug types on hackerone.
I wonder -- has anyone written code to spin up EC2 instances and check for subdomains pointing to the IP? Not sure how you could do that efficiently (does rdns work after the IP has been recycled?), but a starting point might be gathering as many NXDOMAIN subdomains as possible, filtering the ones at cloud providers, and starting instances until you get a match.
That's surprisingly good. And it turns out that if search Google for my server's IP, I'll find even more things I host, albeit through listings on a bunch of random semi-garbage sites.
Interesting. I used this to discover that someone is pointing their (pretty decent) domain name towards my server. I won't, but I could theoretically make it display anything of my choosing.
I suggested something similar while working at an AWS competitor - automated RBL searching. A huge number of our public IPs were used by spammers or phishers on the free tier. They would use a public IP until it was blocked, then release that and get another from the pool.
As a cloud host, it's embarrassing to have to explain to your customers that their shiny new IP is on an RBL because it was last used to steal passwords.
Did this company have an automated process for dealing with abuse tickets? Usually you’ll get at least one abuse ticket before an IP is added to RBL, right? It shouldn’t come as a surprise when your IP is added to the list after you’ve had X number of abuse complaints about the IP.
Seems like a better (or at least, complementary) solution would be more proactive monitoring of the abuse@ email and prompt terminations of clients who generate too many complaints.
We didn't have an automated process - we manually investigated and shut down accounts. On the supply side, we had poor fraud detection, also manual, almost entirely one person.
There was lots of room to automate things like this, but instead we had a robust internal Powershell library and numerous Slack bots. They embraced the "do things that don't scale" mantra a little too literally.
Definitely true in my experience, which is funny because it renders the whole point of the RBL entirely moot. It’s certainly not a sustainable solution.
It's comparable to using an RBL that excludes dynamic IP clients. The difference is one of culpability IMO, where the dynamic IP clients are likely to be innocents with compromised boxes but these cloud instance spammers are actively hostile (insofar as spammers are hostile)
Received an email from campus IT that my phone was compromised.
On further digging I would load up a game every once in a while on the campus wireless. That game used a popular/legit Chinese CDN to host something on their news page which was flagged.
Easier to just use my wireless phone package instead of explaining my phone wasn't compromised.
Is IPv6 big enough to not recycle IPs in the modern high-churn deployment world? (It's probably big enough to give every human (or payment card holding customer) a few million blocks of IPs that they can personally control, and recycle IPs within each accountable block that)
For IPv6, clearly yes, at least so far. It's pretty typical to get at least a /64 aka LAN with a VPS. That's 18 x 10^18 addresses. ISP customers typically get a /56 (256 LANs) or a /48 (65536 LANs).
Sites like DomainTools have vast amounts of DNS records indexed and allow reverse lookups by IP address (so not actual reverse DNS) which would make your idea pretty doable.
You could probably do that quite efficiently with Passive DNS data. There are a bunch of providers (e.g. FarSight, RiskIQ, many others) that collect and aggregate DNS request data and make it searchable over time.
RDNS is probably not going to be helpful. I'm not aware of any cloud provider setting a PTR record by default, and I think most won't allow you to do that at all.
GCP as well as Digital Ocean definitely allow for configuring PTR records. AWS does too, I think. At least with GCP, you have to demonstrate ownership of the domain.
> But before deleting it, I copied the IP address so I could open a support ticket on DigitalOcean. I figured they would like to know that someone is illegally distributing content on one of their servers. Now that they know the IP address, they can shut it down.
It’s also possible you exposed a web service that wasn’t meant to be public.
This. The article was interesting but the copyright knighthood put me off.
What if the person was simply serving those files for himself over the internet (I've done it countless times) and Google caught it because the author was careless with handling DNS entries? Now DO has an IP and an accusation, more power is given to the DMCA-strike-first-ask-later status quo, all for what? It's not child pornography we're talking about, it's books for Christ's sake. There's no harm, it does not affect your life, why go through the effort of bringing trouble to someone else because of your own lack of care with sensitive issues such as DNS entries?
> What if the person was simply serving those files for himself over the internet
There are a couple of reasons to believe that this is not the case.
First, there were thousands of them. Someone having thousands of books is not unreasonable, of course, but both the breadth and depth of this collection is such that it is extremely unlikely it is someone's personal library.
Second, the PDFs aren't the actual books. They are just short blurbs describing the book and containing download deep links into bookfreenow.com. A couple examples [1] [2]. Clicking to create an account so you can start downloading redirects through some ad companies (and possibly some shady affiliate marketing companies), eventually reaching some download site (I think) that tells you no free slots are available and asks you to make an account.
(The bookfreenow.com pages for each book all seem to be the same template with just the book info substituted. Even the comments on the each page are from the same people, at the same times, and say the exact same things except they have the correct book title on each page. They aren't even trying to make it look like the comments are legit).
First, there were thousands of them. Someone having thousands of books is not unreasonable, of course, but both the breadth and depth of this collection is such that it is extremely unlikely it is someone's personal library.
Sounds like my personal library actually.
Second, the PDFs aren't the actual books. They are just short blurbs describing the book and containing download deep links into bookfreenow.com. A couple examples [1] [2]. Clicking to create an account so you can start downloading redirects through some ad companies (and possibly some shady affiliate marketing companies), eventually reaching some download site (I think) that tells you no free slots are available and asks you to make an account.
Well that’s a lot harder to explain in charitable terms! So is this even piracy, or just some kind of scam based on the promise of piracy?
It's a scam. I was searching some ebook for free and there are many links like that. It's funny that scammers might be more successful saving paid books than copyright warriors :)
It's funny that scammers might be more successful saving paid books than copyright warriors
You'll never actually get the book, because they don't have it. It's a scam to try to trap people who are trying to find free downloads of ebooks rather than paying for them.
(It's also kind of a funny definition of "save" you have there. With all due respect, if you want to save paid books, you should -- crazy as this may sound -- pay for them.)
Saving is wrong word, I guess. I mean that someone who's trying to find a pirated book will just stop trying after few unsuccessful attempts. For example copyright owners are forcing Google to hide search results with pirated content. But may be polluting search results with fake content is better strategy.
probably the second one. at some point in the signup process it will offer a free trial and require a credit card. not sure what happens next because I've never proceeded, but I'm pretty sure they don't actually have the books they claim to have
The only thing "shady" going on here is this discussion's unquestioned acceptance of propaganda language -- "piracy"/"pirate" -- as a descriptive term for alleged copyright infringement; copyright infringement which nobody has proven occurred in the first place. See https://www.gnu.org/philosophy/words-to-avoid.html#Piracy for more on how "piracy" is propaganda.
> What if the person was simply serving those files for himself over the internet (I've done it countless times) and Google caught it because the author was careless with handling DNS entries?
Author here, I reached out to DO because as a fellow content creator I felt morally obligated to report this.
I wouldn't like someone pirating my content and the people who created those 390,000+ PDFs put a lot of their time into making their content.
> There's no harm, it does not affect your life, why go through the effort of bringing trouble to someone else because of your own lack of care with sensitive issues such as DNS entries?
It does affect my life. I noticed now that there has been a couple of copyright infringement notices submit towards my domain (because of this PDF incident).
When you run an online business and your website is your entire brand, something like that is a big deal.
Also if you Googled for my name before I removed the A record, that SSL subdomain was coming up which was competing with my actual site's content. Not good!
> Also if you Googled for my name before I removed the A record, that SSL subdomain was coming up which was competing with my actual site's content. Not good!
Maybe do 301 redirections from ssl.nickjanetakis.com to your homepage, it can help with SEO.
I would agree with you in principle, but this is just another one of those spam sites that flood the search results with useless crap when you're looking for bookz; there's no copyright infringement here, just spam. They're easily distinguished from "real" PDFs of books because the text clearly doesn't make any sense.
Thus, in a "the enemy of my enemy is my friend" sort of way, I'm thankful for the OP for removing another fake ebook site from the Internet.
But Google also always respects robots.txt. If you don't want your content to appear just put a deny all for all domains. You can easily set that up in nginx by returning static content, you don't even need to create a file in the right directories.
Just got some new ips and I've been on the other end of this.
Some staging domain of a website still points to one of the ips I got and there's a health checker that keeps trying to ping /health on the domain.
Nowhere near the scale of this though, just some background noise I'll ignore
Same here. Soon after launching a cloud hosted virtual machine I noticed that it was getting a lot of unexpected HTTP traffic on port 80. They were GET requests with parameters that suggested to me that it was coming from some kind of Javascript browser tracker.
I rebooted the VM to get a new IP address and the traffic stopped. It's somebody else's problem now.
The problem for you is that you can't do much against it. The domain owner can easily change the record, you can't. The only way is to blackhole all requests coming in for domains other than the ones you want to host. Or is there any way to get the DNS records checked?
> Nowhere near the scale of this though, just some background noise I'll ignore
Same situation here. However being on the receiving end of a "formerly" Russian camgirl site makes this a little more than just noise. Any good ideas of what one could do with that?
I can see myself doing the same thing, particularly if I didn't have much free time when it happened. I wouldn't expect Google Alerts to warn me of a security issue!
The problem is, when the Google Alert hit my inbox it didn't show the ssl.nickjanetakis.com subdomain in the alert snippet. It just showed "Nick Janetakis".
Still, I should have clicked through to see what was up, but then again, the links looked very suspicious. I don't make a habit of clicking a bunch of unknown links sent via email, especially not when running Windows.
Second this. I found Google to be the best indicator for any issues with a web page. They have pretty mature systems and crawl actively, anything flagging up there is likely to be an issue. Even if it had been spam you'd want to know who creates spam pages impersonating your domain.
One of my old staging subdomains had an old Digital Ocean address left in it for a bit while we migrated some servers, and Google indexed some random ebook pirate site too, here[1] is a snap of the logs for anyone who is curious. Once I updated the DNS, Googlebot started to blow us up.
I never would have even noticed, had it not been for Googlebot indexing the crap out of us, and causing 10s of thousands of sessions to be created in a short time which threw our Munin graphs off the charts.
The site we were staging ran fine, redis handled it without breaking a sweat, but we're not a public facing service, so I just straight up blocked Google Bot w/ an nginx rule.
> I know I made a stupid mistake by not removing the A record but this could happen to anyone. I would like to see more services only allow for DNS based authentication by adding TXT records.
There is plenty of reasons why one will prefer HTML verification over TXT DNS verification. It's usually faster, and more predictable. Plus DNS are far from being completely secure.
It's also pretty nice if you're a provider hosting a website for the client (e.g. Github Pages, Shopify, etc). Getting them to point an A record to us is hard enough, but at least it's only once. Then you can use HTML verification for setting up LE certificates, Analytics, etc.
Just out of curiosity, how do you know this just wasn't an intentional side effect of someone hosting a website on a DO box? Namely, was the box just responding to anything that would connect to it?
Google has probably already crawled that domain previously, and when it asked for that IP address, it found some other persons website.
The screenshot shows the blog author's name attached to all of the search results as a proper, spaced name. (As opposed to a domain substring) It looks like intended impersonation.
I'm not very security minded. I have old domains and subdomains that I used to use that have long since passed - that lived on server IP's that have also long since passed on from my ownership as well.
I just double checked all of my old stuff - and there's not a trace left out there. Apparently, I cleaned up all my old DNS entries as things moved on, even though none of those domains are my 'brand' (as the author states it is his). As a non-security minded person, I find it hard to believe a security-minded person, whose good at his trade, forgets to do this.
What does this have to do with anything even remotely related to this article, other than it being a webserver? Symlink takeover is not a new vulnerability, and if someone has a user account on your server, you're already owned anyway. Escalating to root is trivial almost always.
The author spends the beginning of the article talking about how he takes security very seriously, that his webserver is practically uncompromisable, and that the odds of it being compromised are so remote because he has "the reflexes of a highly trained ninja" and doesn't run nginx as root.
I'm pointing out that his server isn't as uncompromisable as he's trying to lead the reader to believe.
If the author is truly a "ninja" they wouldn't be running their web application as the nginx www-data user in the first place, and then a web application exploit wouldn't inherently give anyone access to the nginx user either to exploit the log-rotation mechanism via symlink. One can read more about the CVE you linked here[1]. But basically the gist of it is this:
> As the /var/log/nginx directory is owned by www-data, it is possible for local attackers who have gained access to the system through a vulnerability in a web application running on Nginx (or the server itself) to replace the log files with a symlink to an arbitrary file.
This assumes the web application is also running as www-data, which wouldn't be that smart.
They probably didn't even know that it was accessible via that domain. Their webserver responded to any request with the default site, and google decided to crawl ssl.nickjanetakis.com and found all the pdfs.
That's what I thought too but then I noticed that someone bothered to put "- Nick Janetakis" in the titles of those PDF pages (check the screenshot in the article).
I don't think that's exactly what was going on though, although perhaps somebody else can chime in.
I don't think the "- Nick Janetakis" is actually in the title of the PDF, rather google has appended it to the actual title (the end of which has been replaced with an ellipsis).
I think google can get this from either the title of a html page or from a og:site_name entry of a html page (I'm not 100% on all this). It's possible that google took these from the "actual" ssl.nickjanetakis.com and still remembers the og:site_name and applies it to the pdf files?
I think that may be something Google appends to its search results for some links?
I googled my own domain (site:flurdy.com) and it appends "- flurdy" to some of my static pages. But not for all, especially not for subdomain apps. So I am not 100% sure.
What's the advantage of that compared to using a bare IP? You still need a server to host your illicit content, so you're still exposed that way. Also, DMCA requests are sent to service providers, not to whoever owns the domain. The whole arrangement is probably worse than a bare IP because your site can be "taken down" by someone else with no warning.
At a school I use to attend, their firewall filtered "inappropriate" content (which is a fun story on it's own...). Was a poor system, and in theory, would have a loophole around it...
I guess a spammer would be able to use the domain ranking for their spam until Google detects that content has changed. Probably easier than promoting a new spam domain
So, if I had a subdomain set up that way, and sci-hub or somebody came along and starting using it, would I have any legal obligation to do anything about it? Could my domain be seized?
Google is aggressive when it comes to crawling (which I think is OK) so it's very much possible that the one who hosted these PDF's had no idea that Google had crawled the site, or that it was under that domain.
It seems like they knew what they were doing, given that the search results have the author's name attached to them. (As a proper name, rather than cutting "nickjanetakis" out of "ssl.nickjanetakis.com")
What's the purpose for spreading PDFs like this? Are there ways to embed malware into PDFs so they attack the host machine of whoever downloads the files?
I don't disagree with the idea, but it is hard to think of a world where we only optionally pay for things. Do you think it's okay because the cost is prohibitive? Or because if we truly value it we will support it if it is free or not free? I don't necessarily know how to reason this issue out.
> I have Google Alerts set up so I get emailed when people link to my site. A few months ago I started to receive an absurd amount of notifications, but I ignored them. I chalked it up to “Google is probably on drugs”.
Between this quote and the bozo-level advice in the "Domain Validation Should Be More Strict" section ("I would like to see more services only allow for DNS based authentication by adding TXT records" is going to solve this problem? permanently decommissioning IPv6 addresses?), the one lesson I can take away from this article is to stay as far away as possible from any of this guy's security related courses.
I wonder -- has anyone written code to spin up EC2 instances and check for subdomains pointing to the IP? Not sure how you could do that efficiently (does rdns work after the IP has been recycled?), but a starting point might be gathering as many NXDOMAIN subdomains as possible, filtering the ones at cloud providers, and starting instances until you get a match.