Hacker News new | past | comments | ask | show | jobs | submit login
How I found a data leak of a company during a college lecture (ruwhof.net)
142 points by achillean on May 14, 2016 | hide | past | favorite | 63 comments



> On the next screen I just hit the connect button to see what would happen:

> Not really sure what it does, but I found several Droisys e-mail addresses in the database and decided to mail them that their database was exposed on the internet.

Oh man, this is a bad idea.

If you find something out there that you think you can connect to - poor password, no password, etc - you must be very careful.

In some cases, just reporting it will cause you problems. At this point, you probably haven't broken the law but some companies are just jackasses.

But once you connect, the game changes. It is "unauthorized access" and it's probable that you broke your local laws. At that point, it's not just a jackass company that gets involved, it's local law enforcement too.

And then you get data from the database. It's probably going to get even worse for you.

When you go a step further and share a howto on the web... this is a bad idea all around.

This is not proper disclosure. If I was him, I'd get a good lawyer.


I've always wondered about the legality of, say, going to google and entering "not for public release filetype:pdf" then downloading (presumably accidentally shared) confidential documents from someone's server. If no authentication is required, is it fair game?


That search, restricted to the past month, returns some interesting results. You can damn well bet I didn't click through.


I did, and clicked every result in the top 20.

I can assure you that nothing is going to happen to me.


I get exact 4 results when I put it in quotes, and they are all boring as hell.

Here's the link for anyone curious https://www.google.com/search?num=50&biw=1680&bih=949&tbs=qd...


If Google has already accessed, indexed, and published it, you are in pretty good company. At the least, you have the lawyers of a multi-billion $$$ company backing you.


No, you don't. Knowledge and intent are key factors in many crimes, and you and Google aren't similarly situated.


If google is providing you illegal knowledge, that is Google's problem.


The law doesn't work that way: knowledge is rarely illegal. Knowingly gathering without permission may be. Using a Google product as a tool in a crime doesn't make Google responsible for the crime and relieve your responsibility.


Can you construct a hypothetical situation where clicking a link on a Google result page (or, any page, for that matter) would be a crime?

If such a thing were possible, I would view it as the ultimate betrayal of the browser's "sandbox". Certainly it would be a top priority to categorize links into "known safe to click" and "clicker beware". Who knows, maybe Google's successor will be such an engine.


"Clicking a link" on a results page by itself is unlikely to be a crime (in any US jurisdiction). OTOH, having a certain criminal intent, constructing a search query aimed to realize that intent, and then clicking on a resulting link to complete the realization of that intent might be.


> Can you construct a hypothetical situation where clicking a link on a Google result page (or, any page, for that matter) would be a crime?

I'm not sure that's even necessary, and there's no point getting into a debate about the browser (you commanded it to do something, after all).

IANAL, but I don't think you need to be one to appreciate the potential for legal trouble. Depending on your interpretation of the CFAA and whether or not you agree with the assertion that the Ninth Circuit limited the scope of the CFAA's reach by requiring a certain degree of intent [1], unauthorized access alone could be construed as a crime. If you want a particularly extreme interpretation of the statute, you can find such almost anywhere you look (here's one from 2005 [2]).

In the latter case, it's notable that if you access material it 1) need not be trademarked, copyrighted, a trade secret, or even particularly sensitive--it need only be "valuable" and 2) unauthorized access is defined rather loosely as accessing "information in the computer that the accessor is not entitled so to obtain." One could argue that password protected resources or databases that are not publicly advertised are not considered something for dissemination to the public and therefore protected by statute.

So, if we apply the CFAA in a manner similar to what you might expect of a prosecutor who is up for re-election this year, let's look at the abuses the article's author committed:

Unauthorized access - check? There's no obvious revocation of the right to access Unilever's MongoDB database, but it probably passes the "reasonable person" test that this information isn't intended to be public. Playing the game of "intent" is a bit risky, so this might be another option in mounting a defense.

"Valuable" information - definite check (the author stated rather plainly: "Within the databases I found personal details like names, e-mail addresses and also private chat logs;" I suspect this would be considered "valuable" information). I don't think this is something I would have admitted. I certainly wouldn't have posted screen captures.

I admit the timing of this is funny, because I was just about to watch a few videos on bosnianbill's Youtube channel earlier when I got to thinking about how inconsistent lockpick possession laws are in the US, and it's interesting how it applies to this story. In some states (notably Tennessee), simply owning a lockpick without the appropriate license can land you a misdemeanor (fine, maybe jail time, depending on my memory of their law), while other states (like my own) require intent and/or possession of multiple "burglary tools" (e.g. a crowbar in addition to a lockpick). While intent alone is insufficient protection from particularly enthusiastic prosecutors, it does at least afford some defense if you wind up in front of a jury. Hoping for the same under the framework of the CFAA is a bit like playing with fire even if you successfully mount a defense (legal costs, opportunity costs from the time wasted on defense, etc).

Not worth it.

[1] http://www.bullivant.com/Computer-Fraud-Abuse-Act

[2] https://www.dorsey.com/newsresources/publications/2005/02/cf...


Federal prosecutors -- the only ones that can prosecute for criminal CFÀA violations -- are Presidential appointees, they are never up for election. So that scenario never actually occurs.


Google isn't going to back you in any way - why would they?


I think the parent alludes to Google being named as accomplices along with you. I do however think you're right that that might not mean much for your case. If nothing else, if two parties commit a crime, and one has a major legal team, it seems like the most probable outcome is that the other party will take the fall.


That's beyond a fantasy - Google isnt party to a conspiracy because they have a built in affirmative defense: our actions weren't taken to further a conspiracy, they were incidental.


The question is whether the content is fair game (to access). Google has already proved it to be fair game and if anyone wants to argue otherwise, they would need to then argue with the most flagrant offender, Google, who has much more than just "Confidential" PDFs.

Google would be guilty of any charge that could be levied against someone for accessing data that Google actively provides.


I'm sorry but I think this is rather ridiculous. Google's position is that they have automatically indexed everything that the server said it could, but will remove anything and provide websites a way of doing this.

Your position would have to be that you searched for obviously confidential documents, found them and downloaded them without knowing you shouldn't.


Guys, I think we got out in the weeds a little bit with the google thing. The question is if someone puts up a web server on the internet with no authentication and no notice that it's not open for public use, can they get me for "unauthorized access" if I download content from it? If not, what makes HTTP special - why not SQL or SMB?


The relevant question is not whether there is an explicit notice, but whether common sense suggests that you are intentionally making unauthorized accesses - as would be the case with the Google search you mentioned.

See also:

https://en.wikipedia.org/wiki/Goatse_Security#AT.26T.2FiPad_...


Common sense?

If you send a valid HTTP GET to someone's server and they respond with a 200 OK and some content, the access was not unauthorized. The HTTP protocol actually makes authorization an explicit mechanism that may be disabled or loosened at the implementor's leisure.


To be fair, the EFF took a position in the case I linked that suggests they might agree with you in the present Google hypothetical too:

https://www.eff.org/deeplinks/2013/07/weevs-case-flawed-begi...

Not only that, I was actually surprised to find that the New Jersey court cited a state precedent along similar lines:

http://cdn.arstechnica.net/wp-content/uploads/2014/04/weevru...

->

http://caselaw.findlaw.com/nj-superior-court/1508996.html

...though that was interpreting a state law and brought up the fact that the state law has some subtle differences from the federal CFAA (despite very similar wording, quite vague in both cases).

On the other hand, in Craigslist v. 3Taps, a district judge found that simply evading an IP ban, while otherwise accessing entirely (intentionally) public information, counts as unauthorized access under the federal law. And then there's the case of Aaron Swartz.

But anyway, even under the more permissive of the possible standards, your logic is too simplistic. What if I send a HTTP GET like this?

    GET /viewarticle.php?title=x%27%20UNION%20ALL%20SELECT%20%2A%20FROM%20%27users HTTP/1.1
It's a perfectly valid and well-formed request according to the HTTP standard, and even valid at the application level, in the sense that you technically can't rule out that an article might exist titled "x' UNION ALL SELECT * FROM 'users", and a correctly written server-side script would interpret the request simply as searching for such an article. But suppose the script isn't correct, and instead of showing an article dumps its user table. Would you say that my access to user data is authorized?

Well, I actually don't know how you'd answer the previous question, but I strongly doubt any court would answer yes. If you say no, then the implication follows that either the difficulty of constructing the dubious request, or perhaps the intent, or something else relatively wishy-washy and subjective can make the difference between authorized and unauthorized. It can't be reduced to some strict technical standard.


If the script is mixing title comments into executed SQL code, then I don't think there is much hope for it. This line of argument allows post facto rationalization for determining unauthorized access. To make a claim that something was unauthorized is to claim that there is some procedure that can determine whether something is authorized or not. That procedure is the thing that should actually be executed when deciding to serve a request. We are talking about cases where the written procedure says the request was authorized, but someone else claims that the actual procedure gives a different result [insert ad-hoc, post-facto rationalization here (ie. not policy)].

This is clearly nonsense, though it may take some time for courts to figure it out.


> This line of argument allows post facto rationalization for determining unauthorized access.

Ah, so the burglar with the bump key is allowed in because the action of the lock determines criminality? "If it opens it's allowed?"

You seem to be making the same fundamental mistake many technical individuals make when they interact with things outside of their knowledge sphere - you're attempting to map a space that is foreign to you into the world you know.

The legal system is not a computer. It does not run on rigid rules That's actually a really good thing: it allows flexibility in considering whether an action is a crime or not.

There's a spectrum to consider. It's clear on one end that a person who searches for "not for release filetype:pdf" may be looking for historical documents, and a person who attempts a SQL injection against a web application has sufficient guilty knowledge and intent.


The legal system does run on rigid rules. Yes, there is no perfect executor (subjectivity will still exist), but the rule of logic still applies. A legal system where you may be convicted of a crime on a whim is not a legal system, it is a farce.

Everyone seems to be ignoring that a 200 OK is explicit authorization, per the protocol. It would be one thing if we were talking about a protocol with no built in authorization primitive, but we aren't. Using HTTP establishes an authorization procedure. Claiming that it may be illegal to receive responses to well-formed requests to the server requires one to make the fundamental mistake of not understanding the technical protocols that are being used to communicate.

The legal system operates on a subset of the logic involved in the technical world. Its ideas and understanding will necessarily lag the reality being created and will be subservient to the logic being established, not adversarial.

Burglary is a crime because it is an intent to commit further crime, not because a door was opened. The difference with an HTTP authorization lock is that the authorizor gets to examine every request and must run their authorization policy on every one. Arguing that the policy that was actually ran was "wrong" is an admission of incompetence.

The analogous situation is where a business posts an "OPEN 24/7" sign by their open front door, but shootgun blasts people who walk through the door.


That's a good point. 401 Unauthorized... They even used the right word.


Documents are not obviously confidential if there is an established process for removing confidential documents, but the documents still show up in a simple search.

Your position is that you viewed everything that Google thought it could publish in regard to your query. It is ridiculous that someone could be jailed as a result of clicking a link on a Google search result page.


Consider the Google search that started this:

"not for public release filetype:pdf"

That's a pretty flagrant attempt at accessing confidential documents. It isn't like someone googles "how to catch a roadrunner" and accidentally downloads confidential Acme documents. This is a full on attempt to find poorly secured documents.

Now, consider what Google does. It runs bots (that respect things like robots.txt) and then publish links to everything that they can find.

Maybe I'm missing some subtlety, but I don't understand how these are similar. Can you explain yourself further?


That is a perfectly legitimate query. I would expect to find all manner of historical documents. Further, it does not matter what a document says. Claiming to be not for public release doesn't make it a crime to release it. The only possible exception here is for national secrets, but even then many exceptions have been made.


Good answer - thanks very much for clarifying!


Because there isn't going to be anything confidential that the search result returns. And anything you access is something that was widely available.

It'd be like googling, "Bank of America's Secret Backdoor Password to steal all it's money".


It's possible that I have missed some subtleties in your argument so let me ask for a bit of clarification.

Because there isn't going to be anything confidential that the search result returns.

Doesn't this assume that sysadmins are actually competent? And isn't there a ton of evidence that suggests that sysadmins have routinely allowed confidential data to be indexed by Google??

In that case, isn't this analogous to what would happen if I left my front door unlocked and you 'broke' in and stole my collection of Taylor Swift CDs. (I don't actually own any Taylor Swift CDs, but it makes my point easier).

Granted, I did a shitty job of securing my valuable music collection, and Taylor Swift CDs are widely available. But fundamentally, you still came in without permission and took something that belonged to me.

Recent history has shown that you can be prosecuted for all sorts of things in cyberspace. Accessing confidential directories, downloading poorly secured files, and exploiting poorly designed APIs have all been successfully prosecuted.

I wish that we lived in a world where doing things like that would be considered a part of intellectual freedom, but the unfortunate truth is that laws are applied in such a way as to make this highly risky. The silly thing is that the state of the law actually benefits hard core criminals...


I just did some searching, and found something similar. I cam across a Samba share for a company that has no security on it whatsoever.

I can't tell who it is specifically, without opening up any of the files, but it's either the brand, or a franchise of it.

I'm guessing I shouldn't connect to, or open anything on there due to 'unauthorized access'? But then I cannot report it either... guess it's their bad luck then?

Edit: Ok, so I found out the owner by running a whois on the IP. It's not who I initially thought it was, but after researching the company they're worth just over $3m... too worried to report it though, I've never reported any findings like this before.


Why not forward it to the technical contact of the domain and/or ip address?


Sign up for a disposable email account using Tor, then email blast a single email with all the details for reproducing the vulnerability to every official email address you can find, including a note that this is an anonymous address and there will be no further contact.

The larger problem is what percentage of companies with glaring vulnerabilities in their infrastructure have the competence to fix them?


This is sad. Probably smart, but the culture jerk companies and DAs have bred is one of rightfully fearing being the messenger because he always gets shot.


i don't see any contact info in your profile, but i'll report it for you? (not that you have any particular reason to trust me, just thought i'd offer to take the "heat")


It actually looks like the company was OK with it. The disclosure timeline shows they were responsive and addressed the problem quickly when they were made aware of its existence.


Right, but he didn't know this when he originally connected.


Edit: Too much info. Don't want to be google indexed.


Need more details on how you got a criminal record for that. Did you settle? Did you do more than just that?


Edit: Too much info.


Appears to be an alternative revenue stream traditional journalism has been desperately seeking.


Why are you lying? You didn't get a criminal record for that: [removed link proving his lie by request]


Thanks for pointing out my personal information.

You don't know local laws and haven't read the final agreement.


Oh come on. Like, don't 50% of Estonians have a misdemeanor because of traffic violations? Unless you wanted to work for Kapo or military, you should have no issues with that fine. Guys who have drug offenses still get work as software developers.


People are saying this guy accessed databases illegally, but what about Shodan? They crawl, port-scan, access, and index everything they can find, not just HTTP servers. They are what made it possible for this guy to find these unsecured databases in the first place. They even counted the total size of the exposed databases' records.

Why isn't Shodan being sued to pieces for illegally accessing private databases and being a "Google for black hats"? I've even found them port-scanning my home DSL connection.


So true, no good deed goes unpunished.


In the words of Phineas Fisher:

"NoSQL, or rather NoAuthentication, has been a huge gift to the hacker community. Just when I was worried that they'd finally patched all of the authentication bypass bugs in MySQL, new databases came into style that lack authentication by design."

From his account of the Hacking Team Hack, worth a read if you missed it.

http://pastebin.com/raw/0SNSvyjJ


Step 1) Bemoan how {OldSoftwarePackage} doesn't do X, Y, Z

Step 2) Write {NewSoftwarePackage} that does most of what {OldSoftwarePackage} did + X

Step 3) Spend an order of magnitude more time than expected finishing Y, which turns out to actually be rather hard because {Messy Real World Engineering Details}

Step 4) Never get to Z & eventually come up with a narrative about how Z was stupid anyway


Not sure what exactly you mean wrt NoAuth DBs, but having a simple on-connect password would be an improvement to all those databases.

Things like Elasticsearch not having even a basic password (esp since it's HTTP so it's trivial) is simply silly. And it's probably not a good idea to support no-auth connections at all - if it's really a hassle, just set the user/pass to the host name.


> Elasticsearch

Yea. I found that surprising when it came time to use ElasticSearch for my own purposes. If you want security, you need to setup something between the ElasticSearch server and the clients to moderate.


Setting up nginx to proxy to Elasticsearch with HTTP auth on top is fairly trivial. There's a couple of good articles on the web if you google for it. Also, should you have an Elasticsearch support contract, you get access to the Shield plugin which has extensive access control.

But yes, the fact that it is open OOTB is frustrating.


This is exactly what I do, but the fact that it's wide open and relies on you to use a different (and de-coupled) service for permissions was surprising to me (at the time).


Heh, sounds like how software is improved. Then when the next guy who wants Z comes along...


Don't worry .. it's Agile! We'll just keep pushing that feature back until we loose business. Later: "Remember people that was an MVP"


So you do everything you can to hide the IP address, but reveal that the domain is-savvy.nl resolves to it.

Either you're not actually that good at IT security or you're just making a huge brain fart.

    $ dig +short is-savvy.nl
    37.59.238.165
So it's not really a mystery what the X.X.238.165 and X.X.238.166 addresses actually refer to.


Wouldn't it be better and easier to do a WHOIS lookup on the actual IP address, and email whoever shows up there? Sometimes you'll get the hosting provider (and they can contact the customer for you, maybe even anonymously), and sometimes you'll find the company itself.

At least you're not bouncing around domains which can be pointed anywhere


I don't know why he bothered to blank out the IP address for is-savvy.nl. DNS will tell you that there is literally one A record which matches the last two bytes.


>The database had no username and password configured to protect it, so I assumed it was a public database with data for everyone to see ;-)

No, a wink won't protect you in court. You illegally accessed someone's data. As they say: just because the house is unlocked doesn't mean you're allowed in. Pray this doesn't get you in trouble. Sjonge jonge jonge...


For those having a trouble to access the site, this is the archive version: https://web.archive.org/web/20160514214255/http://sijmen.ruw...


A few years back I anonymously reported a huge data leak. Weeks later they asked for my name and address so they could "send me a gift".

To this day they still haven't fixed the leak. And I took a pass on that "gift".


What kind of laws apply in this locality?

In the US I wouldn't be able to connect to their MongoDB server without it being a crime, password or not.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: