Hacker News new | past | comments | ask | show | jobs | submit login
Facebook Stored Hundreds of Millions of User Passwords in Plain Text for Years (krebsonsecurity.com)
1252 points by snaky on March 21, 2019 | hide | past | favorite | 429 comments



> My Facebook insider said access logs showed some 2,000 engineers or developers made approximately nine million internal queries for data elements that contained plain text user passwords.

This imo is the truly alarming takeaway. FB employees were retrieving user passwords? Around two thousand FB employees? How in God's name is Zuckerberg going to perform his usual performative contrition about that one?

I'm just trying to imagine the data structures that were being retrieved from databases. Either they stored something like a big user account data type that contained their password in plaintext, which imo is a really weird design choice, or logs for other services were being mixed in with logs leaking the user/pass combos.

Surely one of the engineers could have noticed and said 'wait a minute... those are logins' over the course of the years? We hear all the time that people want FB to follow a responsible social practices (the debate on what those are rages on, which is great imo), but can't FB at least wrangle its own code base?

On the other hand, we shouldn't take the stance that heads should roll, imo - it would just create a chilling effect that would deter other companies from ever going public about their own security mishaps.

edit: I should probably tone it down in this comment but I'll leave it for posterity


>FB employees were retrieving user passwords? Around two thousand FB employees?

This sounds to me like they were writing it into some log analysis tool, and people happened to pull it up.

I have no knowledge of FB internals, but as an example let's say they have Splunk, which is a popular distributed log analysis tool (or some custom equivalent, because FB is huge). Splunk makes it very easy to pull up tons of logging for any app - that's what it's for. So write a poorly-constrained query and you can get millions of log lines from many different apps.

It's also feasible that everyone (or nearly so) in the company has access to Splunk.

So if somebody, somewhere accidentally writes passwords into a logfile, and it gets indexed into Splunk, a ton of people might theoretically see it, but just skim past it because it's not what they were looking for. Picture trying to hunt down an error, and wading through many screens full of logging from different apps - if there's a password in there you could easily not notice if you're not looking for it.

That's all speculation, but it's also entirely plausible. All it takes is accidentally logging out those passwords to somewhere that Splunk can see.

Also not trying to say this is in any way OK or acceptable, just that I can understand how a seemingly-small mistake could quickly result in thousands of people having access to passwords.


This should be the main takeaway. If true, then 2000 people decided not to raise the alarm. And 9,000,000 queries with results that had passwords. This completely invalidates the other thread of people discussing giving facebook a pass because accidental logging could happen to any company.


> If true, then 2000 people decided not to raise the alarm

Have you ever worked at a software co, discovered <terrible software practice>, raised a flag, and were told "thanks, we've added a ticket to the backlog"?

I'd bet that some of them did flag it.


Explicitly logging a password is one of those practices that doesn't sit on the backlog.

It's probably a bit more complicated than that. Usually the things that I encounter have to do with how HTTP requests are logged.

For example, putting sensitive information in a URL that's loaded over HTTPS is considered insecure because many companies have policies where they log every URL that their employees visit. (Think of a password reset link.)

A lot of inexperienced programmers don't realize this, because they don't realize that you can man-in-the-middle yourself, and that most corporate computers come preconfigured to allow the employer to man-in-the-middle everyone.

So, if a password reset link never expires, it means that some guy in IT can own an account that was reset on a corporate computer.

(This, basically, is how they catch people viewing porn on their work computers.)

Anyway, my point is that the problem is probably something where a junior programmer transmitted a password in a way that they didn't realize was being logged.


> Explicitly logging a password is one of those practices that doesn't sit on the backlog.

If that is your experience, then that's a truly wonderful thing.

Might it be possible that at many companies, teams with deadlines to hit will tend to prioritize feature work over details like this? Perhaps especially so when teams are not rewarded for fixing vulnerabilities and are punished for not meeting deadlines? Particularly when the actual bug at hand is that the full contents of a POST are being logged, and a PM might not read the ticket enough to understand that this includes a password...

Again, you're completely right in every way about what should happen. It just could be possible that this could reflect something other than all experiences all software engineers have had.


> Explicitly logging a password is one of those practices that doesn't sit on the backlog.

For you maybe. For Facebook obviously it's different.


>For Facebook obviously it's different.

What evidence do you have for that? Nowhere in the article does it say facebook "explicity logged" passwords. The logging likely happened through some unintended and roundabout process that is far from explicit.


Why would this be down voted?

I think incompetence over malice is almost certainly the right answer here.


How many times does a company of Facebook's size get to say "oopsie, we're sorry" before you'll stop giving them to benefit of the doubt? I see this as a malicious disregard for the security and privacy of their users, and their history aligns with that view.


That's a valid point of view, of course. The failure is certainly well beneath what one should expect Facebook.

Nonetheless, I stick to incompetence over malice. There's just so much more of the former in the world than the latter!


Back when I was sysadmin, I catched everyone watching porn by suddenly walking into their offices, but I'm sure your method is better.


You're probably right, and I would love to see that confirmed. FB wouldnt have to link to the Jira ticket or give the name or details of the ticket, but verify that was the case and then explain why nothing was done.


Still interesting that not one of them went all out and publicly derided the company. The whistleblower clause would probably have protected them. However our whistleblower laws could use a brush up.


If it's not a feature that makes money why spent developer money on it? -- Every manager ever.

GDPR may levy fines but ultimately it hasn't done anything to stop managers focusing on new features and sales income.


How much do you want to bet that some of those FB employees decided to login by turning off "location detection" flag and in an incognito browser?

I would not be surprised. Some of those 9,000,000+ queries and 2,000+ employees must have had some nefarious use. Statistically speaking...


Judging by the maturity level of the discourse on teamblind this precisely what happened.


Judging by working with hundreds of software engineers over a decade, this seems very unlikely to have happened.


No you're misrepresenting what (probably) happened. They had some system that logged API requests (fine and normal). Some API requests include plaintext passwords (also fine and normal).

The issue is presumably that they had no exclusion to the logging for sensitive information like passwords, which is honestly very easy to overlook.

So two thousand Facebook employees were not "retrieving passwords". They were looking at the API logs, which is a normal thing to do.


This. It's a bit baffling to me that even the HN audience doesn't come to this conclusion immediately - maybe people want to assume the worst because it's Facebook?


Definitely. This always happens when Facebook comes up here. People forget to use their brains and just rant about how evil Facebook is.


> Some API requests include plaintext passwords (also fine and normal)

Wait, no ? How is that fine and normal ? It doesn't seem fine and normal all.


It's completely standard. If you encrypt the password in the browser, potential hackers have access to the encryption code which makes it useless.

Check the network tab in dev-tools when logging in to hn for example and you'll see your password in the request body :)


Maybe normal but still not best practice


What that quote is saying is that the logs that contained the passwords were accessed by 2,000 engineers. Most of those engineers would only be looking at data relevant to their job, only a security engineer would be in a position to notice the passwords, which is what happened.


Isn't it possible that those developers just wanted to look at something harmless like for example failed logins and did not even see or reflect on that the log messages also included the password? But if so it is a bit worrying that 2000 eyes did not spot the bug.


Maybe many people were just using SELECT * FROM ... and got plain text passwords they weren't necessarily looking for.


I read the article and this is what I got from it too. I `SELECT *` all the time and see hashed passwords. It is very rare that I need that hash, usually I'm doing something entirely unrelated, but it's just easy to get all the rows and only use what you need, especially when you are troubleshooting and don't even know what you need yet.


Probably some poor engineers trying to check if their partners were cheating on them. Tools like FB make it easier after all.


(Edit: reworded a bit to make it clear I don't think this is acceptable)

Sounds a lot like some service was logging the full body of a signup/login request, which then was readable for anyone with access to the logging/tracing infrastructure.

Dumb mistake but it's not hard to imagine this happening, considering that FB probably has a bunch of services involved in the login/signup flow to prevent bots/spam, abuse, etc.

Not to imply this is acceptable, especially at a IT company like FB with vast resources and know how. Raw passwords are an especially big screw-up. There are a lot of failures here, from actually logging something so sensitive over giving access to so many employees to not noticing this for years. (Assuming this was actually log data).

BUT if we are honest, anonymizing log data is rarely a priority. Even if it is, leaking sensitive data can happen easily in a lot of different points in the infrastructure. In actual application code, client + server exception tracing (just imagine a deserialization exception which contains part of the input) , web server, load balancer, proxies, service mesh... There is a lot of interesting stuff hiding in the logs at pretty much every company.

This is a good time to look in the mirror and audit your logging and tracing data. Unless you are in a highly regulated field like finance/healthcare or there is a strong company-wide culture for security/privacy with regular audits already, I can almost guarantee you will find at least one data point that should not be where it is.

Protecting sensitive data needs to be a big consideration for every dev, ops and especially management, which has to allocate enough time for security reviews and audits.


> BUT if we are honest, anonymizing log data is rarely a priority. Even if it is, leaking sensitive data can happen easily in a lot of different points in the infrastructure. In actual application code, client + server exception tracing (just imagine a deserialization exception which contains part of the input) , web server, load balancer, proxies, service mesh... There is a lot of interesting stuff hiding in the logs at pretty much every company.

I think so too, and I agree with you.

A couple of months ago we had to discuss with a colleague that wanted to log the request/response body because "without it, it makes debugging almost impossible". We certainly don't log, and we made sure that even exceptions don't spit internal data, and if it happens, it's only an exceptional case, a bug (so far, I didn't see). It's tough, because of course when you have access to plain data you can do anything you want to, but we owe our customers this additional step. Laziness can't be justified in this context.

However, generally speaking, I think it's a common practice in many companies, it's not the first time I see this happen, and I bet it's not gonna be the last.

It's a matter of mindset. The more I read such articles, the more I lose trust in even large companies. Typically smaller companies don't care, in order to move "fast". However, when this happens to large corporations, where does it end? Today it's facebook with the passwords, tomorrow it might be amazon with tons of credit card numbers because of a legacy system not anymore maintained...


On that note, it might be better to change the title to "Facebook logged... passwords in plaintext". When you say FB stored them in plaintext, that's generally understood to mean "used plaintext as the primary mode for storage when authenticating users".


> readable for anyone with access to the logging/tracing infrastructure

The article says more than 20,000 Facebook employees. A quick search shows that they have around 35k now - do 60% of employees at Facebook really need access to all of those logs?


Or is the the article actually correct?


You are technically correct, it is a dumb mistake and sadly not that hard to imagine happening. It's also inexcusable, and I would expect even junior engineers to know better than to log credentials as part of request processing


I doubt someone wrote "log.print(user.creds)". They probably wrote "log.print(req.args)" in a (what they felt) was an unrelated section of code. Sucks, but could easily happen.

I'd be interested in a system or tooling that could identify that something sensitive made it into a log. I think it is practically impossible, but would be interesting.


> I'd be interested in a system or tooling that could identify that something sensitive made it into a log. I think it is practically impossible, but would be interesting.

Have all strings printed to logs go through a common checking routine. That checking routine simply checks for the presence of certain hard coded sequences, and raises an alarm if they are found.

Whenever a production system is updated, run a test suite. The tests includes logging in to a test account whose password is one of the aforementioned hard coded sequences. If your system accepts payments, the tests can include a test purchase using a test credit card number that is one of the aforementioned hard coded sequences. In general, for each type of sensitive information, have a test that supplies sensitive information of that type, with that information being one of the hard coded sequences the log checkers checks for.

This won't stop you from accidentally logging sensitive data in production, but it should catch it during the post-deployment tests so you can fix it quickly.


Interesting, but way harder than it sounds. Very often log systems live as a service for multiple microservices that do real world work. Propagating sentinel data through each of those systems is a nightmare because:

- those systems often have real-world secondary effects

- they sometimes have a tendency to validate away sentinel data prior to logging because the data is no good (e.g. a credit card number that isn't real or a password for a user that doesn't exist), although that can depend on scale and cost

- cross-team coordination of how to handle the sentinel data introduces coupling across teams/services which is contrary to the goal of microservices.


How is this not a priority? And even worse, how is it possible that a bazillion of the supposedly smartest engineers did not fix it? In many other industry you'd get killed over this as a software engineer.


I don't buy that. In any other industry, you'd have a hard time even explaining what the problem is. "Someone wrote down my password in plaintext - yeah, that was me, it's sitting right there on my monitor".


If you read the Google Dapper paper from way back in 2010 it has this to say about sensitive information and logging:

> Logging some amount of RPC payload information would enrich Dapper traces since analysis tools might be able to find patterns in payload data which could explain performance anomalies. However, there are several situations where the payload data may contain information that should not be disclosed to unauthorized internal users, including engineers working on performance debugging. Since security and privacy concerns are nonnegotiable, Dapper stores the name of RPC methods but does not log any payload data at this time. Instead, application-level annotations provide a convenient opt-in mechanism: the application developer can choose to associate any data it determines to be useful for later analysis with a span.

Given by how influential this paper was and how it likely influenced FBs own tracing system it's crazy that they would choose an opt out model. To me this system design is their biggest mistake and one could have easily been prevented.


If I can make sure that passwords are securely stored and nowhere available in plain text even for hobby projects that will never see real users, how can Facebook with allegedly the top developers not make sure the same thing? And if you log something like this then everybody that ever saw these logs should be alarmed and press for changing that. Inexcusable to ever have that going on for more than 2 minutes. In 2012 Facebook was not a small startup with just Zuck zucking along.... This company is just so bad.


The article does clearly address that Twitter and Github had to admit to the same issue but the scope and duration of the problem was far smaller.

> Both Github and Twitter were forced to admit similar stumbles in recent months, but in both of those cases the plain text user passwords were available to a relatively small number of people within those organizations, and for far shorter periods of time.

That is to say that it is somewhat remarkable due to all the servies that use facebook as a login.


Can happen for sure. Have seen it happen in other places too.

But this going on for up to 7 years? Yeah, I'm less than impressed by FB.

Had a less than stellar opinion of them before, they are definitely hitting the Mariana trench now.


100% agree. Logging sensitive data is something that we should never do, but in orgs with 100s-1000s of devs, it’s something that almost always happens. At some point somewhere a piece of middleware or whatever logs full request bodies, query params, etc., and nobody notices. It’s an incredibly common mistake because it’s so easy to catch, and so easy to not notice.

This is still a very significant by Facebook, but as a dev I understand how it happens. It’s a mistake that happens at the vast, vast majority of tech companies.


Any privacy conscious company would not allow any random data, especially auth data, to be logged. I guess we are used to FB no caring about any of that but it's not normal nor acceptable.


How do you stop it and still have an effective development org? Services need to be debugged, so requests and responses need to be logged...


Its pretty easy, you configure your logging library NOT to log the attribute, key/value pair, whatever containing the credential. If you can't modify it on the server side (which you can lazy bones), you tell your central logging system to mask it out before it is written to disk.

This isn't difficult or non-standard. If you are logging all client request/responses full take including auth creds, credit cards, SSN, etc, you are likely doing it wrong, and possibly violating some industry regulations.


At a company I worked for, if we logged any production data, we had to confirm there was no PII in there and no passwords or tokens, and very few people had access to these logs.

There's many layers of wrong if what FB did: carelessly logging production data, letting thousands of employee accessing these logs, and of all these people apparently none of them cared to mention there was a problem here, or if they did it was ignored by management. They don't have any excuse here.


You implement your logging infrastructure with awareness of PII and other sensitive information. Whitelisting fields to log would nearly fix it, blacklisting fields would be a bare minimum.


This is a good question and I have been at multiple shops which had this bug induced by a "let's just log everything" accident somewhere in the codebase. It's very nice to think of logging as a sort of "aspect" or a middleware that gets deployed across the whole stack, but it's a bit of a mistake.

You have something like four options for fixing it:

1. Every request is responsible for its own logging. This is actually not a bad approach because state-altering requests really need to be logged whereas state-viewing requests are much more optional, they help to try and guess “what were they doing when they ran into this bug they’ve reported?” but mostly they just occupy database rows. The risk is that someone is in a rush and commits something which does not discharge its logging obligation. You can build a system which forces this if you want, “the router will dispatch to your function and one of your function's arguments will be a locally-stateful logger, and once you are finished I will check whether the logger has handled anything, and if not I will log an error. So you should always `$logger->noLoggingNecessary()` somewhere explicitly in the codebase and then if this is wrong it gets caught in code review more consistently.”

2. The sensitive data is used to generate a bearer token and this flow is outsourced to its own un-logged server. You explicitly use the bearer token to construct everything important about the user account in a step before the logging begins, then delete the bearer token from the rest of the request. This flow can actually get really slick: the bearer token can contain the user data, optionally encrypted, with a message authentication code to ensure the user didn't tamper with it: you can then hit a near-empty Redis instance (or a near-empty table) looking for revoked bearer tokens super-fast, since you probably don’t see too much session revocation. So, user data lookup actually becomes unbelievably cheap because it's mostly CPU bound with (check empty key/value store, MAC-or-decrypt, parse body, pass to the handler function).

3. The logging service becomes controller-aware: each controller specifies whether it is supposed to be logged and the logging service just respects that flag and is otherwise global. So it might log that the login controller was accessed, but it doesn't log anything else about the controller.

4. The logging service becomes message-model-aware. This one is actually kind of slick, too, it means that you describe declaratively what sorts of data types are present in the messages that are transmitted to and from the server: and the first thing you do when you get a request is to validate the request against the model you have declared for messages to that request's namespace. So you will have a `validate($model, $value)` function that takes some arbitrary JSON data and a model and returns a normalized version of that data; a natural extension to this traversal that you're already doing (either by returning two normalized results or calling the function with an extra `options={removeSensitiveData: true}` type of argument) will allow you to define in the message-model itself whether the property is sensitive and should never be logged.


I don't mean to engage in whataboutism here, but unless you're under the impression that no tech company is privacy conscious, what you're saying isn't true (though that'd be a reasonable impression, to be fair).

It's absolutely a security failure and it's not acceptable. But it's not a matter of "allowing" it to happen so much as, "Which vulnerability will we be caught by?" And it actually is a pretty normal vulnerability. For example, Apple[1], GitHub[2] and Twitter[3] have been vulnerable to this exact issue in recent memory.

I also don't mean to be defeatist. This kind of problem is preventable. But it's merely one dumb mistake in a universe of dumb mistakes that leads to serious security failures, all of which are easy to make. The most sophisticated and well-funded information security teams in the world - usually the FAANG teams - still miss things which look pretty silly in isolation.

At this scale being privacy conscious is necessary but insufficient. You can't realistically conclude anything about a company's dedication to privacy based on whether or not it was impacted by this kind of vulnerability. Making a corporate policy to hash passwords in the the database instead of storing them in plaintext is easy to codify, easy to implement and easy to verify. A corporate policy to never log authentication credentials is not nearly as well-defined, even if it's equally as important. That means more mental overhead, disagreement and uncertainty in preventing it. Ultimately, it also means more mistakes can be - and are - made.

________________________

1. https://darthnull.org/security/2014/03/10/cve-2014-1279-touc...

2. https://www.zdnet.com/article/github-says-bug-exposed-accoun...

3. https://arstechnica.com/information-technology/2018/05/twitt...


> Sounds a lot like some service was logging the full body of a signup/login request, which then was readable for anyone with access to the logging/tracing infrastructure.

Wouldn't that also include failed login attempts complete with the failed passwords?

I mention this due to FB's history of even logging the entered passwords of failed attempts, which Zuck supposedly used to hack into people's Email accounts [0].

Because if that also applies here, then a whole lot of people just had way more leaked than just their FB password.

[0] https://www.businessinsider.com/henry-blodget-okay-but-youve...


A written statement from Facebook provided to KrebsOnSecurity says the company expects to notify “hundreds of millions of Facebook light users, tens of millions of other Facebook users, and tens of thousands of Instagram users.”

Does this imply it was a Facebook Lite - specific problem?


Yes


On a scale of Facebook with literally billions of users this is pretty much unexcusable.

This community usually scoffs at entities, which store passwords in plain text. Why do you give Facebook a pass?


Because there's two types of storing passwords in plain text. There's the "your password is stored in plaintext in the database" way which everyone agrees is 10 different kinds of stupid, and then there's the "we accidentally logged the body of all requests that went through this system, and it turns out login requests came through here" kind. One is a bad security decision (because you must decide how to store passwords in the DB), and the other is a very easy mistake to make which can go unnoticed for a long time (because you can be attempting to log things completely unrelated to logins).

Same same but different.


I've worked in healthcare / biotech for more than a decade and I can promise you that the FDA would see no difference between the two types of gaffes. As custodians of sensitive information it was our responsibility to ensure that said information didn't leak, period.

I don't know anything about FB's infrastructure, but when I was lead I would have viewed leaks in logs as far worse than something in the DB because our DB's were harder to gain access to.

I get what you're saying, but it's irrelevant. Easy to screw up, hard to screw up, doesn't matter; just don't screw up because the result is the same. This stuff is security 101. If you're logging requests then you need to ensure they don't contain sensitive info.


I don't think anyone's arguing that it's ok, just that it's less "easy to screw up" vs "hard to screw up" and more "did something really stupid on purpose" and "did something really stupid by accident."

But yes, either way the end result is the same and you shouldn't do it.


Yes but how many people looked at these logs, saw the passwords, and said nothing? If it was logged presumably the logs were viewed from time to time. The mistake is easy to make, but it's also easy to correct if there is a culture to report such mistakes.


The article indicates it was searchable and used for something by employees, so most of these comments to the effect of ‘they must have been doing this accidentally because Facebook is just so big therefore they are excused’ are invalid. Some group of people did this on purpose and knew it was happening.


You're probably being downvoted for speculation, but that sounds completely reasonable to me. At least one person had to have noticed a password in a log file they were viewing at some time. Most people viewing log files know that passwords should not be there. It would trigger alarm bells, and depending on the person, also excitement -- you have somebody's facebook password.

It's conceivable that someone then told their friend at work, and a few of these 2,000 developers knew of this secret internal stash of passwords they could access whenever they wanted to "prank" someone on facebook...


Why would humans be reading these log files?


For plenty of reasons, I'm sure. From the article:

> My Facebook insider said access logs showed some 2,000 engineers or developers made approximately nine million internal queries for data elements that contained plain text user passwords.


That doesn't sound like a human reading the log. It sounds like automation.

A script counting the number of successful logins, for instance, could read the same data element that unintentionally contained passwords.


We don’t have enough information to make that judgment.


When you’re developing a feature like this, don’t you look at the data that you’re logging in to make sure things are working properly? I would imagine that Facebook has many layers of abstraction, but in somewhere, incompetence or inattention must be involved, sometimes known as negligence, if not outright knowledge of this. The definition of that is a complex thing, but if someone could have reasoned that this was logging plaintext passwords, and either saw it and didn’t bother to change things, or didn’t think about it carefully enough to realize that it was doing this, it would be considered negligent. I know that I would feel that my trust in this organization has been betrayed, as a user.


The logs being in a searchable index makes it more likely that the password storage was inadvertent, not less. It implies that the primary usage model for the logs was targeted queries, not people starting at the top of the logs and reading down in such a way that nothing could have been missed.


My interpretation was that they were using the plaintext password as one of the searchable fields, presumably for development related to authentication.


That's true. Kerbs article says:

> My Facebook insider said access logs showed some 2,000 engineers or developers made approximately nine million internal queries for data elements that contained plain text user passwords.


Those numbers are pretty meaningless without some concept of what queries they were making. When I was working with large logs I'd regularly pull huge amounts of logs to only look at maybe a handful of lines. Even then often only scanning for the things I'm specifically working on. Not saying this is what happened just saying those 2,000 employees almost definitely were not doing careful analysis on the complete result of their 9 million queries(And shouldn't have!).


'Only scanning for the things one is working on' to the point of missing huge security holes or safety risks is exactly that's wrong with the industrial division-of-labor approach. Workers start to assume that even if they do notice a problem, someone else is either going to fix it or has examined and OKed the current procedure. Workers who speak up and assert that something they noticed actually looks more important than whatever task they were originally assigned to are typically punished rather than rewarded, and fired if they make too much of a fuss because otherwise managerial authority might be diminished. Most firms run internally like a dictatorship, and given Facebook's absurd ownership structure (where there are voting and non-voting shares Zuckerberg retains a majority of the voting shares) it's a recipe for dysfunction.


This is true but not very interesting. Like literally every large tech company in the entire industry, Facebook has a specialized team of people that work on security --- it has one of the better versions of that team, for what it's worth. Just like ordinary Apple engineers, ordinary Google engineers, ordinary LinkedIn engineers, and ordinary Airbnb engineers, ordinary Facebook engineers don't have an end-to-end picture of how all of security at Facebook works.

There's no evidence that there was whistleblowing suppression about this issue anywhere at Facebook, is there?


Out of 2000 users, there was not one that had a look at username/password? Impressive. Also developers during dev. Seriously, this is not justifiable, it's pure laziness if we want to be positive.


What if one among the 2000 was curious ?


Zuck doesn't believe in hiring employees over 30 because they're not 'smart' about tech. Great example why that's a mistake.

If he did, someone would have seen that with 10-15 years of corporate experience and would have found that simply unacceptable. They also would have had the gravitas to know how to escalate things to the top and gotten it fixed.


I understand that, but I believe they do not have the correct mentality here. I'm not certain how many of the people saying that have worked on systems as sensitive as FB. They think it's easier to screw this up because, for them, it is, but they don't operate at this level.

This sort of thing should not be any easier to screw up when you operate at the scale of FB. If it happens then you have the wrong procedures in place. In my personal example our logs and logging practices/code were audited in the same way our DB layer was. There was no difference; if a system touches sensitive data, it is a massive vulnerability.


This. If you have two or three engineers at a startup it's a very easy mistake to make. With thousands of engineers it's an impossible mistake to make, so it could have only been done purposely.


"could only have been done purposely". In this case there's also a chance it was negligence. Who gives a shit. Move fast. A negligent culture can probably do more damage than the odd individual doing something deliberately.


And even with a startup of two or three engineers, it's pretty much unforgivable.


It depends on how many users you have and what sort of data you're protecting. If you have a billion users then sending password reset tokens to your analytics provider is a very serious vulnerability. If you have 100 users and you're running a generic message board then it's not really a vulnerability because the difficulty of that method of attacking the assets under protection almost certainly exceeds their value.


> that method of attacking the assets under protection almost certainly exceeds their value

This was most likely the reasoning Sony had used (prior to 2012) when deciding how to safeguard the account info of users who had registered for marketing promotions.

It turned out that user credentials are unlike, say, office furniture, in that the cost of a data breach can vastly exceed the "value" of the protected "asset".


That's the best practice in the security world though. There's no such thing as an unhackable system, only systems where A) the cost of attacking them is more than the value of the assets under protection B) where the time it takes to attack the system is long enough for the attack to be detected.

C.f. Bruce Schneier's book: https://www.schneier.com/books/beyond_fear/


That book was from 2003.

This is Schneier in 2016: https://www.schneier.com/essays/archives/2016/03/data_is_a_t...

Data Is a Toxic Asset, So Why Not Throw It Out?

"All this makes data a toxic asset, and it continues to be toxic as long as it sits in a company's computers and networks. The data is vulnerable, and the company is vulnerable. It's vulnerable to hackers and governments. It's vulnerable to employee error. And when there's a toxic data spill, millions of people can be affected. The 2015 Anthem Health data breach affected 80 million people. The 2013 Target Corp. breach affected 110 million.

If data is toxic, why do organizations save it?

..."


I really could not disagree with this more.


The problem is, you can't ensure that. You can do best effort filtering/scanning only. Let's say you have an API and someone writes a client with a memory corruption bug which results in password and some other field being sent swapped 1 in 10k times. Now you have password logged. You maybe can't even tell exactly where and it's close to impossible to detect automatically. "just don't screw up" is not possible in this case and the chance you'll learn about this soon is really small.

Or if you think a client bug is a stretch: If you're running a large enough website which logs usernames on failed logins, I bet you have at least one password or a concatenated usernamepassword in your logs. Just because someone wasn't paying attention to the text box focus.


Large orgs could have a central knowledge base of all places someone could type a password (as required by code reviews, and perhaps by automated detection of password-type inputs), and a bot which attempts to fill correct and incorrect passwords for a number of canary users, and scans across all logs for plaintext canary passwords. It's shocking that someone like Facebook, which has so many partner-facing and user-facing interfaces, wouldn't have developed something like this.

Free startup idea: build a SaaS for the bot phase of this, and make the log-scanner simple and open source so enterprise security teams can deploy it freely. Offer a certification process and consulting. Hire lobbyists (and folks with inroads into insurance companies) to paint plaintext passwords as the devil and make yourself a de facto legal requirement. And make sure your log-scanner doesn't do any logging of its own :)


You can never ensure anything completely in engineering. Perfection is never the goal, but you can do a much better job than FB has here.

You're talking in hypotheticals. Nowhere did I say "nothing ever goes wrong if you know what you're doing". What about the issue at hand? What's your opinion on passwords being logged on every request for years?


> This stuff is security 101. If you're logging requests then you need to ensure they don't contain sensitive info.

If what you meant by that is "best effort" then sure. The issue at hand is only obvious now that we're reading about it. It could easily be missed depending on how their infrastructure works.


>The issue at hand is only obvious now that we're reading about it

No, it's not. It's completely obvious to anyone who has ever worked on a system which deals with sensitive information. It can only "easily be missed" if you're not auditing your code and logs. This stuff is basic to anyone who works in industries where the data is a liability.


This is exactly how Zuck was able to steal login information in his stories about founding facebook.

I find it hard to believe this is accidental.


This is borderline illegal, if not simply illegal. What if he entered into Paypal accounts or bank accounts? I don't want a world where if I enter the wrong password, I have companies pouring over my other accounts trying to breach them. Let alone if I reuse a password by accident.


Do you have a reference? I was wondering the same but have no reference.



Yep i remember this story. So clever and yet so dirty.


Wow! That does not look accidental at all!


> I don't know anything about FB's infrastructure

This explains a lot of the comments here. Facebook’s scale is not like anything most engineers have worked on. Facebook probably has logs in the 100s of terabytes. Ensuring that sensitive data isn’t logged takes more than some occasional greps.


This feels like a slightly differently framed version of the "Too big to fail" argument and does pretty much nothing for me.

Collecting data on such massive scales is literally FB's whole business.

But with that also comes a responsibility that shouldn't simply be waved away with "But they are so big, it's so difficult!"

Because when it's about monetizing their massive amounts of, often illegally collected, data then FB seem to have no issues having everything in order and getting stuff to work, regardless of how "difficult" it might be.

Probably has to do with the fact that there's no money in protecting users data properly and FB seems to be pretty much immune from negative PR having any bad consequences.


>takes more than some occasional greps

Of course at FB scale you'd automate this by creating a set of canary accounts with unique passwords that you perform a search for in the ETL pipeline, or some other handy place. This will at least catch inadvertent plaintext password logging.


Facebook ops team: "We're waist-deep in dead canaries."


My point is that it's irrelevant. Have more data? Need more auditing. Any system which touches sensitive data is subject to security review. Yes, FB systems are massive. They should have massive oversight as well. You may as well be defending a nuclear meltdown because desiging nuclear power plants is hard.

>Ensuring that sensitive data isn’t logged takes more than some occasional greps.

Right; it requires investment into process, requirements, testing, and oversight. Most importantly, it requires a company wide, top down mentality that your customer's privacy and protection is more important than your margins.

If you can't (won't) dedicate the resources required to ensure your customers data is protected then you have no business operating at such a scale.


Exactly. Can every piece of data logged be tied back to a legitimate business purpose? What’s needed here is a mentality change: These logs should be thought of as liabilities rather than assets. You should log only what you need, while you need it, and then turn off the log when you’re done. If your mentality is “log everything, always, because maybe we’ll need it later” then these privacy and security trash fires should be expected.


Log everything: you don't care about privacy.

Treat logs as liabilities: why can't you solve the issue I experienced yesterday?

You can't win. Either you log more than you think you need right now or you can't do engineering investigations on past data. You're going to end up somewhere in the middle realistically.


Everything is a tradeoff, how much cost that fb incure due to this incident, I would guess not so much, at least not big enough to warrant massive resources needed.


Well, that's the problem really; they obviously don't care.


According to the timeline of events in The Fine Article, this story only exists because someone cared. At many companies the story would end at "one diff reviewer noticed passwords getting logged in one diff". All of the numbers in this story come from an internal investigation to see where else they're making the same mistake, _so they stop doing that_. That's not what you do if you don't care.


If FB truly cared it would likely never have happened, let alone gone on for years. I'm not saying no employee at FB cares; I'm saying FB as an organization doesn't, and we have plenty of "I'm sorry, we'll do better" statements to back that up.


[flagged]


Crossing into personal attack isn't allowed here, regardless of how right you are or feel you are. Most of your comments to HN have unfortunately been like this. Would you mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the spirit of this site to heart? It's particularly important when your points are right, so as not to discredit them.


It's hard to take you seriously when your argument boils down to name calling and how obviously awesome FB is.


Only 100s of terabytes?


I work fairly extensively in health care technology and have since ~2005 and am not aware of the particular regulation that would cause the "FDA" to react to a password storage security gap. Could you be more specific about the regulations you're referring to?

Unsafe logging of PHI would be explicitly problematic, but passwords aren't PHI.


I never said the FDA was specifically interested in passwords. The FDA was brought up in reference to my work with PHI. The FDA does care about how you secure access to sensitive data however. If you leak passwords which allow unauthorized access to PHI, you're going to have a problem.

The analogy was intended to focus on practices which should be employed by companies which handle sensitive data, not the specifics of what the FDA is looking for.


These aren't "leaked" passwords; they're logged in plaintext internally, which is not great, but is not the same thing as having leaked them.


"leaked" in the sense that they're somewhere they shouldn't be. I could have chosen a better word. if I were to store database passwords on a file share the entire company had access to we would consider that a "leak" even though they weren't exposed externally.


HIPAA/HITECH would not necessarily; the analysis would have to include compensating controls and who had access. The fact pattern in this Facebook case would probably not support a violation. Again: crappy stored passwords aren't PHI.


I think perhaps I wasn't clear enough on the boundaries of my analogy and where it falls apart.

You work in the industry so you know that the FDA is not highly prescriptive. They give you a general outline of, essentially, process. HIPAA goes into more detail, but still, an organization has quite a bit of leeway as to how the meet regulatory requirements.

As far as passwords go, you're correct. Passwords in logs aren't a direct violation as far as I'm aware. If e.g. an employee gained access to data they shouldn't have access to per internal policy and used that data in an illegal manner, _then_ you have a problem.

Still though, I never meant to imply that what happened at FB would violate FDA regs. What I said was:

>I've worked in healthcare / biotech for more than a decade and I can promise you that the FDA would see no difference between the two types of gaffes.

The "two types" in question here were defined by the GP ("easy to make" mistakes like logging and something something about database design.) I wasn't referring to passwords specifically, and I don't believe that this is as bad as e.g. PHI just sitting around for anyone to see.

The analogy was meant to convey this; in a health care environment we are required to secure (encrypt) PHI and PII at rest. If we were found in violation of that, it wouldn't matter if it got there via logs or poor database design.

Again, I realize this is not an FDA/HIPAA situation. My experience with sensitive data is in that sort of environment, and I believe the same sort of mentality should be taken by FB in regards to the security and privacy of their users.

Wow, didn't expect that to get so long. My thumbs aren't made for this.


This is how professional software engineering and infrastructure implementation is done, and I appreciate you pointing it out so forcefully.


Corollary: Most web development simply lack the rigor of engineering.


A lot of HNers probably have the job title "software engineer" without any of the qualifications of an "engineer"... that's probably why you're being down-voted.

But you're not wrong.


Most software engineers aren’t. Engineering requires licensing and exams through a governing body. Can’t have the prestige of the title without the responsibility. Software developers want to have their cake and eat it too (power and respect with no oversight and governance).

Pile on the regulation (GDPR for starts, more PII protection to follow up, extremely painful fines for failures).


In the US, you only need certifications and testing for a "professional engineer". Most electrical, chemical, mechanical, or civil engineers are not certified PEs. Yet I don't think anyone is claiming they aren't engineers.


I have a PhD in chemical engineering, but not a PE, since I didn't go the industry route. I wouldn't feel comfortable calling myself an Engineer without it, and that was the attitude expressed by my professors and cohort coming out of undergrad.


It's my opinion that the title "Engineer" should be on par with things like "Doctor" or "Lawyer". It should require certification and there should be consequences for claiming to be one if you aren't.


I don't think anyone would get in trouble for job titles like "computer doctor" or "network protocol lawyer". Practicing medicine or practicing law (or in some cases, practicing as an engineer) without the required qualifications, regardless of job title ... different story.


> Software developers want to have their cake and eat it too (power and respect with no oversight and governance).

This doesn't come from software developers.

This comes from the SV culture of "move fast and break things."


I painted too broad strokes. I agree, and admit my mistake.


Just replace "software developer" with "self-proclaimed software engineer" in your comment and you'd be spot-on.


Nothing about engineering really covers security best principles and practices.

I went through classical engineering school, and I'd summarize with three categories:

* Newtonian Physics & The Mathematics Behind It (i.e. Differential Equations)

* Ethics

* Technical Communications


That's because there is no software engineering program, and other engineering programs don't deal with "security" per se.


My alma mater offers a degree in Software Engineering.


I think that was actually invented at Facebook in the early days, unless I'm mistaken. Its a good mantra unless its security related


> Engineering requires licensing and exams through a governing body.

Offering engineering services to the public requires licensure. Working at places like Boeing, General Motors, Texas Instruments, and Caterpillar does not (for the vast majority of positions); It just requires an accredited engineering degree.

Note that this is independent of the software industry needing more engineering rigor, which it desperately does.


I couldn't agree more.


> Most software engineers aren’t.

"Software Engineer" is just a job title.

> Engineering requires licensing and exams through a governing body.

"Software Engineering" doesn't, it's just a job title.

> Can’t have the prestige of the title without the responsibility.

There is no real-world prestige associated with having "Software Engineer" as a job title (as opposed to "Software Developer").

> Software developers want to have their cake and eat it too (power and respect with no oversight and governance).

"Power and respect" is not actually something that is commonly associated with a guy sitting at a desk for a salary.

> Pile on the regulation (GDPR for starts, more PII protection to follow up, extremely painful fine for failures).

You just learned that Facebook stored passwords in plaintext for years and nothing bad happened. In fact, it's hard to conceive that anything particularly disastrous could have happened. Yet, you call for more regulation that will drive up costs for everything. Whole industries could become unprofitable and those high-paid "Software Engineers" would become redundant.

I have the opposite opinion: The fact that computer security is generally poor teaches people how to properly use computers. They need to assume that there are no secrets, because that is the plain reality. No amount of regulation will make computers secure. No certification or best-practice encryption scheme will prevent users from writing their password on post-it notes, or from opening "sexy.jpg.exe", or from answering that call from the "Microsoft Service Center".


In Ontario the Professional Engineers (http://www.peo.on.ca/) crack down on unlicensed use of the "Engineer" word in job titles pretty aggressively.

The point is that the public should have confidence in engineers. Software developers who tout themselves as "software engineers" actively piggy-back upon, and undermine this confidence. It's malicious, and you should feel bad for defending the practice.

If you still don't see why, "Lawyer" and "Doctor" are "just job titles" but obviously that's a problem. I trust you can see that.


I believe there are exceptions like the "Combat Engineer" in the Canadian Forces.


> In Ontario the Professional Engineers (http://www.peo.on.ca/) crack down on unlicensed use of the "Engineer" word in job titles pretty aggressively.

Yeah well, good for them I guess. I don't think anybody else cares.

> The point is that the public should have confidence in engineers.

Why? "Engineer" is a very broad term used in a variety of occupations for very different things. If you're looking at someone's credentials and you're so uninformed that you can't tell the difference between a licensed engineer in a particular profession and a guy that has "Engineer" written on their business card, you're not qualified to make a decision either way.

> If you still don't see why, "Lawyer" and "Doctor" are "just job titles" but obviously that's a problem. I trust you can see that.

I can see the argument for why a medical doctor or a lawyer should be afforded some amount of protection, because they are directly dealing with laymen. I don't buy the same argument for the word "Engineer", much less "Software Engineer". It just doesn't really mean anything. We already have degrees and certifications for when specifics matter.


In Canada, the term engineer is protected, but in the states it is specialization dependent. I went to school for software engineering and I have my ring (I'm not a full engineer for a few more years though).

We focus a lot on testing and I just finished a mandatory ethics workshop. Its a different approach to software than I read about elsewhere. Almost no time spent on algorithms either - bad for interviews, but great for being able to write software that works though.


> "Software Engineer" is just a job title.

And like the job title "sandwich engineer", it should probably be derided.

Apologies to anybody working in the honorable trade of sandwich crafting. I love sandwiches and appreciate your work. They bring me much more enjoyment than software.


I agree. The job title is literally _engine_er. Unless you oversee the operation of engines (as in locomotives), you are not an engineer.

But seriously, the definition of engineer in regular lexicon is someone who designs, builds, or maintains complex systems. I struggle to think how software does not tick all of those boxes. Sandwiches are a different story.

Perhaps you are confusing engineer with professional engineer? I can see how it might be easy to mix them up if you are not paying attention, but professional engineer carries a different meaning. Most software engineers are not professional engineers but are unquestionably engineers.


The word engine signifies a product of ingenuity. A siege engine is a clever device to attack fortifications. Engineers don't just have to be engaged with engines in the modern sense to do engineering. The PE bureaucracy is just gate-keeping by the guild masters.


A 'software engineer' need not even have a college degree, the only qualification for calling yourself a software engineer is finding an employer willing to give you that title. Even non-PE mechanical engineers at least have the bare minimum decency to get an undergraduate engineering degree before calling themselves engineers.

Of course, some people get software engineering degrees, or computer science degrees (that term is another can of worms entirely), but that is not the rule.


It becomes a meaningless word if anyone can just call themselves one. There has to be some standard, some governing body, some degree of rigour applied here.


> It becomes a meaningless word if anyone can just call themselves one.

Only to the extent that all words are meaningless. However, we keep a reference known as a dictionary that helps maintain some consistency around meanings of words.

The Oxford Dictionary defines engineer as:

1. A person who designs, builds, or maintains engines, machines, or structures.

2. A person who controls an engine, especially on an aircraft or ship.

3. A skilful contriver or originator of something.

Is software an engine, machine, or structure? That is debatable, although I would suggest that it does meet the definition of machine. Under that suggestion, software engineer is a perfectly appropriate term. Designing, building, and maintaining is exactly what software engineers do.

If you disagree that software falls under the definition of engine, there is still that third definition. Is software created by a skilful contriver or originator of something? I think that is a definite yes.

> There has to be some standard, some governing body, some degree of rigour applied here.

Professional engineers are expected to display those things. That is unrelated to engineers. Different terms with different meanings. Software engineers may not be professional engineers, although they can be.


> 1. A person who [...] maintains engines, machines, or structures.

So we should call car mechanics "mechanical engineers"? And maintaining structures? With all due respect to the janitors out there, their job is vital to society but it's not engineering.


> So we should call car mechanics "mechanical engineers"?

By definition, I don't see why not. I suspect the spirit of maintain may be a little more nuanced, but words are fluid and if someone wants to interpret it that way, then it is so.

> With all due respect to the janitors out there, their job is vital to society but it's not engineering.

If by janitor you are thinking of someone who sweeps the floors, I think that is well beyond the spirit of maintain. That, to me, is cleaning.

Maintenance of a structure is more like addressing a beam that is cracking. I doubt that is a role that an average janitor deals with, and in many cases will legally require a professional engineer to spec the resolution.

But words are fluid, so if you interpret a janitor as someone who maintains a structure, then I guess engineer fits. It really makes no difference either way.


Professional engineers do engineering.

Title only engineers work on complex systems.

This distinction has largely been lost and most "Software Engineers" think they are doing PE Engineering when they are not.


Do you think that if someone calls themselves a sandwich engineer people will begin to believe that designing cars is like makings subs? Do you really think the public can't tell the difference?


I think "Software Engineers" can't tell the difference.


Same with electrical, chemical, mechanical, and civil engineers since almost none of them are PEs?

Seems a bit silly.


Worked in two data sensitive industries as well. There is no excuse, completely agree. This is amateur and completely indicative of a broken dev process


Data anonymization and de-identification is nothing new. Especially when it comes to logging. I don't get why a lot of you are downplaying it in this thread. This is just as bad as plain text passwords.


The diffirence is intent and/or gross negligence. Storing passwords in plain text in a database requires a gross lack of security mentality in both design stages and implementation changes. It is also drilled into people's brains constantly about how bad of an idea it is. To put it simply, it cannot happen accidentally.

Logging like this can easily be attributed to an accident. The person who implemented this logging should get hit with some repercussions because he surely tested the logging and must have seen the passwords when glancing by eye. But other than that, this was clearly a minor oversight.


Yes. Consider this not-unlikely scenario: The people who implemented the logging did it as a feature of a generic API proxy (there’s no way Facebook implements logging separately for each of their bazillion services!), and no doubt put in a provison for masking sensitive data. They tested it and it worked fine.

Then some devs miles and years away didn’t use that feature properly and accidentally failed to not log passwords in an incoming request. They may not even have been looking at those request logs because that’s not the request they were testing.

Then that feature went into production and this oversight was magnified millions of times.

At large scale you don’t just tail the production log firehose and look for stuff. You have to search for specifics to find anything st all. So if nobody was debugging this thing in production it’s quite plausible nobody saw the passwords in the log.

One way to catch this sort of thing is sentinel data — in this case, have a unique value for a test account’s password and test every service with it, then search everywhere you can think of for that value.


So here is the thing: It was presumably relatively easy for you to come up with that scenario, which you called "not-unlikely". Then what you do is you put that scenario into your risk analysis when you're designing the authentication architecture, and figure out mitigations to make sure that particular mistake becomes (very) unlikely.

The notion that "it could easily happen" that is being brought up throughout this thread should really only suggests that people aren't doing even rudimentary security assessments (or, hopefully, they're not working with security sensitive software).

If you can't solve it technically, you solve it through processes and training. Same goes for any other industry -- if a construction worker said that it's just one bad morning away from dropping a two tonne girder on a playground, we would never accept that. Or a pilot crashing an airliner into the waiting hall when they're supposed to land. Somehow it seems that large parts of the software industry simply hasn't reached the level of maturity we expect from pretty much all other industries.

Facebook is an enormous company. They should be able to have entire departments working on these topics. It's not a one-person hobby project we're talking about.


I'm sure they did figure out mitigations. They failed. Things fail. Two airliners just failed rather spectacularly, and that's the very industry you're benchmarking against.

>Somehow it seems that large parts of the software industry simply hasn't reached the level of maturity we expect from pretty much all other industries.

True, but that's a rather broad brush — in terms of actual risk of damages there is nowhere near an equivalence between "airliner crashing into waiting hall" and "logging some plaintext passwords".

Of course the culture, priorities, and domain are also very different between social network engineering and airliner engineering, which is by the way one reason Facebook could grow from nothing to mind-bogglingly gigantic in a decade, while it takes a decade to get just one new airliner into production.


The point I was making by comparing to a pilot, which I realise I could have expressed a lot more clearly, is that it is perfectly possible to mitigate risks through proper training and procedures even if it's not possible technically. (I.e. all it takes for a plane to crash is to turn the flight controls a few centimetres in the wrong way at the wrong time, yet it almost never happens.)

Of course things fail and people screw up. What I don't agree with are arguments along the lines of this just being a slight oversight, and that those can easily happen. It should require serious failure on multiple levels for anything like this to happen at that scale, if they are implementing things properly, not minor oversight.


Exactly — my scenario was an example of how failures at multiple levels could have caused this to happen. My "not-unlikely" is meant retroactively — now that it's happened, what's a not-unlikely explanation for how it was allowed to happen in a company the size of Facebook?

I didn't intend to imply it was a "slight oversight" — it's clearly a significant oversight — but there are people saying it's obviously gross negligence because how could this ever happen in a company that wasn't completely incompetent, etc. No, terrible accidents can and do happen even in companies that are trying hard to do a good job. Just like when a 737 crashes, you shouldn't assume Boeing is totally incompetent, but rather that several things must have gone wrong at once.


> The notion that "it could easily happen" that is being brought up throughout this thread should really only suggests that people aren't doing even rudimentary security assessments

Precisely.


How is this in any way better or more excusable? Calling it a 'minor oversight' could also apply to the DB being in plaintext.

Storing passwords in a plaintext DB heppes in the same manner-a dev is lazy or ignorant of the security reprocussions. Which is what happened here; barring evidence this was done maliciously we can assume that this was accidental. But that doesn't make it any more excusable, since it should be clear that logs can also contain sensitive data that needs to be protected/anonymized.

Considering how selective FB is for hiring, I would hope we could expect a higher standard.


Not to mention 200-600 million users had their passwords exposed for many years. That must be a massive trove of log files.


I see this differently. If someone stores passwords in plaintext in the database, they’re idiots that don’t know any better.

If someone logs your password in plaintext but has it encrypted in the database that’s grossly negligent.


You assume that storing passwords in plaintext is intent as opposed to gross negligence. In many past instances it’s been intentional and gross negligence because the people making all the design and implementation choices were not knowledgeable about best practices.


Because we all know we are one bad morning away from doing it ourselves. People are distinguishing between extreme incompetence (storing plaintext passwords in databases) vs people trying to do their jobs.

It's the same reason when there are large outages, where the comments are split between the enraged customers, and then the ones that are "Man, sucks to be them" as know they could easily have been the one that got the config push wrong.


The fact that sensitive data like this is just one config push away from being exposed means it's a brittle design (not just the code, but the entire design /review / deployment process - this is a platform with 1 billion users, not your 1-dev WordPress webpage). That said, this is further aggravated by the simple fact that for whatever reason, 20k of their employees had access to these logs.

All in all it just paints a picture of an irresponsible company where their house is not in order.


Exactly. It's one thing not to have enough controls in place to catch something like this on the forum for your warcraft guild. It's another thing entirely when you operate at the scale of Facebook.


"people are distinguishing between extreme vs people trying to do their jobs"

Those things are in no way exclusive. Storing passwords in plaintext is awful practise. Including the contents of form submissions in logging is a similarly awful practise. I really don't see what the difference is.


I like logging and at our company we blanket log all GET parameters on every request, similarly I think it's a Good Idea™ to never log POST parameters, but there are some pages where we've started logging some whitelisted ones because they can help diagnose some errors we run across, but, in an environment where all data is coming into the server in JSON blobs attached in request bodies then it can be a bit harder to make any sort of blanket whitelist.


Earlier in the thread:

>Because there's two types of storing passwords in plain text. There's the "your password is stored in plaintext in the database" way which everyone agrees is 10 different kinds of stupid, and then there's the "we accidentally logged the body of all requests that went through this system, and it turns out login requests came through here" kind.

Basically the conscious decision to store passwords in plain text is worse than unintentionally doing so in logs. One is purposeful, one is not. Yes it's bad, but it's not as bad as if it was done on purpose. And generally, this how most laws are enforced or implemented.


Can't both be attributed to ignorance? You could store passwords in plaintext in a database because you didn't know better. You can store plaintext passwords in logs because you don't know better. If I store passwords in plaintext am I being purposeful or am I being ignorant of best practises?

I think people are drawing a distinction where one doesn't really exist.


>You can store plaintext passwords in logs because you don't know better.

This whole conversation is about how it likely was not done on purpose, and that it was the result of logging HTTP requests and responses.

If they wrote logger.info(password), yea I'd say that's as outrageous.


If a single developer is able to push something like this to production without scrutiny in "one bad morning" on a platform with billions of users, the company itself is the problem.


I'm working with Rails and Phoenix on two different projects right now and all of them filter out passwords in the log files [1] [2].

I'm also working on a Django project but we're not logging HTTP calls arguments there. I think we could use a filter like [3] but I'd rather have the framework to automatically take care of that.

And if I'd be writing a web app from scratch, no matter if I've been doing web for 25 years, I'm sure I'll do a lot of silly mistakes. That's why I prefer to build on top of frameworks.

[1] https://guides.rubyonrails.org/configuring.html#rails-genera...

[2] https://hexdocs.pm/phoenix/Phoenix.Logger.html

[3] https://djangosnippets.org/snippets/2966/


You're right, the impact is just as bad. I think people in this thread (including myself) are downplaying it because it's a very easy mistake to make, rather than a terrible design decision.


The major difference is that we don't have standard patterns for protecting against one attack, vs the other.

With password storage everyone knows the patterns, or we expect them to.

With everything between the request and password storage, we don't.

This type of attack could easily be prevented. When secrets come in, immediately store them (ideally at the web framework level) in a type that overrides print/debug formatting. Then add a "get_raw" to it, and you can now grep for that being used anywhere outside of storage to a DB (and your DB libs should take the Secret type too). Or don't use a `get_raw` and instead use a `hash` method that returns a safely hashed version of it.

Further, your secret type could at the very least add a round of SHA256, maybe even with a pepper?, just to be sure.

This isn't hard, I've done it before. The problem is that it isn't something that people feel embarrassed not to do, vs storing plaintext creds.

Impact is the same - creds are plaintext in a DB. Attackers always expect sensitive data in logs, so it isn't as if you'd get lucky and they'd miss this.


> When secrets come in, immediately store them (ideally at the web framework level) ...

But, in practice, a lot of this stuff is being logged by things like proxies that could be several layers in front of the "web framework level".


Do proxies usually log POST data? Are these proxies terminating TLS?

Either way, yeah, you're right - it is not a perfect solution. But it means that for any system your engineers build, so long as they build them using your web framework that imposes this password type, you can grep your codebase for bugs.

You could implement client side hashing as a best-effort "in transit" mechanism but that has obvious downsides. Not sure how I'd feel about that approach in practice, but I can't see a big downside.


A mistake that should have been found very quickly, given the eyes. Not for years without ever apparently being known (I find that hard to believe)


When you're FB, Google, ADP, a bank, whatever, you have failed if this sort of thing is "an easy mistake to make." Responsibility should never fall on a single dev or team to begin with. This was going on for _years_.

It's either ametuer hour over there or the organization simply doesn't care enough to invest heavily in the protection of their users.


This happened to Apple on the desktop, CVE-2014-1317 - In that case, they were logging the body of failed API requests (hex encoded) to a file in /var/log. It turned out that the login request occasionally hit an error, logging your AppleID password.

It was easy to overlook, since as you said, it was intended to log something else (failed API requests), it only happened in the case of an error, and it was hex encoded. I only happened to stumble across it out of curiosity.


The Apple ID team at Apple is one of the worst.


Interesting. Care to elaborate?


Anecdotally, Apple ID auth on iOS/macOS has been a mess for me. Changing my password would result in multiple login prompts per device. Sometimes inputting the new password works, sometimes another prompt comes up after a few minutes.

Also, though the flexibility of being able to use a different ID for iCloud, home sharing, iTunes, App Store, Messages, etc. is neat, it's pretty annoying to need to set it in each of those places. (And TBH it seems like being able to share purchases/access/etc between IDs is more useful than being able to have separate ones, yet the iCloud family stuff took a while to arrive.)

The issues I've seen aren't as bad now as they used to be, but they haven't left a good impression.


> and the other is a very easy mistake to make which can go unnoticed for a long time (because you can be attempting to log things completely unrelated to logins).

This is not an excuse. If you're logging all request data, you need to strip or encrypt sensitive information in that request data. Handling Persistence of sensitive data is web development 101. Just because it's not in a database doesn't give you a pass to leave it unprotected.

This level of incompetency is unacceptable.


Different causes, identical security implications for the user's passwords that were improperly stored.

Both are security fails, regardless of the cause of the fail.

And a company the size of, and with the resources of, facebook, most certainly should not get a pass for the "oops, we logged more than we should have" cause. A corp. of their size, and with their resources, should be doing password handling correctly, every single time, no exceptions, no excuses.


The point perhaps is that they’re both terrible security designs and the impact to the user is the same regardless of how the “accident” happened. Nobody in FB’s position gets a pass for being irresponsible and negligent.


I disagree that they are both terrible security designs. We expect every company to not use plaintext for auth. We do not expect every company to have infra/ops setup to prevent logging on login requests.


By extending this logic, a car manufacturer should be blamed for not designing proper brakes for their car. But if a worker then accidentally installs the breaks wrong they are not responsible?

Imho, a company (especially as big as facebook) should have the right process and procedures to prevent these kind of problems and ensure developers have proper training to make them aware of the consequences of their actions.


Interesting analogy and I see your point. If a car manufacturer's process made it easy to install the brakes wrong they would be held responsible (probably with a recall or damages for lives lost due to faulty brakes).

I guess part of this is that passwords aren't considered that important to many people :(


Who is "we" in this scenario? Governing bodies do. Engineers I've worked with in health care do, as do our PM's, security officers, etc.

I designed a clinical testing platform a couple of years ago. Our initial requirements stated very clearly that PHI and PII were not to appear in logs. This is basic stuff for anyone who actually works at this scale / level of sensitivity.


>We do not expect every company to have infra/ops setup to prevent logging on login requests.

What? I absolutely expect every company to not log my password in plain text. In my 15 years as a developer across several companies and industries, I have never seen anybody log passwords, or advocate for logging passwords.

I'm struggling to think why any employee of any company should be able to view a plain text password in any form. Why would there not be an expectation here?


You're taking about expectation not to log the password. That's fine.

The parent was taking about expectation about infrastructure that validates this. This is both very uncommon and impossible to do 100% correctly. You can scan logs for a prefix (password=), you can do entropy counting, you can try to decode hex values in text. But if you find a base64 encoded hex string representing "foobar" - how do you even know it's a password?

Short of trying all possible decodings of all possible substrings against your full password database, this is an impossible task. (You can do best-effort things though)


Ah ok thanks, I misunderstood.


That’s certifiably depressing.


It might be depressing, but I think it's true. What do you think?


It is not true in my experience. It may seem true to the typical CRUD app web dev, but do they really know what it's like to be a custodian of so much sensitive information?


I would expect FB to have that, though.


there's two types of storing passwords in plain text

nope.

there are n types of storing passwords in plain text (including storing them in memory).

you just named 2.

all n types fall under infosec responsibility to prevent, and none are acceptable. there is no pass because logs were less intentional than storing in a table


I think they are indeed the same outcome but I'd count plain-text DB passwords as grossly negligent while as this was merely negligent. Clearly both are terrible and deserve a shaming, but not hashing passwords going into a DB is a decision that someone pretty much necessarily had to have the full scope of and made a bad decision on. This logging issue could potentially be attributed to a mistake in team communication (i.e. the people we hired to auto-log requests didn't realize that `/login` should be blacklisted for logging and for some reason we never explicitly told them to do so).

So I agree that these actions are indeed same but different.

They ultimately speak to a failure at Facebook, but it doesn't speak to utter incompetence at Facebook.


Yeah, there's so many ways this could happen. The one that came to mind for me was an engineer who thought passwords were being filtered by the time it got to their part of the stack, but somehow didn't notice they weren't because the payload was minified or because they tested in a test environment where they knew there wasn't the filtering but didn't look at prod (where there was also not the filtering).

So many ways this could have gone down, so easy to do.


>the other is a very easy mistake to make which can go unnoticed for a long time

Sure, then the question for anyone at facebook would be from the Bobs: "What exactly is it, you'd say, you do around here??

FB claims to have "top minds" in essentially every discipline...

Heck, they are IMO a revolving door to the .gov/nsa/infosec community...

So... I call BS on your statement as "whoops! Easy mistake!"

WTF is it that you'd say you do around here, Mr. FB-Security-Guy??


Maybe the HN crowd needs to re-think login security. About 25 years ago challenge-response was a HUGE thing in login security. As https gained momentum and traction that went away. But the nice thing about challenge-response (at least for Javascript enabled clients) is that the password is one-way hashed before sending to the server.

I wonder if it's time to up our game again and go back to that model.


I was just thinking the same thing. There is no reason the password itself even has to be sent over the wire.


The second is way worse. You don’t need access to the database for it. Most systems make it difficult to log a plaintext password; it should be filtered in your logs. This is day 1 shit.


What I'm hearing is that, because they collected it and stored the passwords as part of their reckless attempt to log every action that every human makes online in order to manipulate them, its more excusable than if it was in a specific and protected user credential database?

Its only an easy mistake when you're vacuuming up everything people do.


>Same same but different.

They're both issues with which any moderately competent engineer would be very familiar. Heck I've seen commercial contracts that specified that audits be done to ensure no password and PII leaks into log files. Everywhere I've worked in the past 20 years that logged stuff conducted audits on a regular basis to check that sensitive data wasn't being logged or in inadvertently stored in a database.

The grey area might be for example when some data gets logged as a blob of HEX that it turns out has a password in it if you run the right decoder function on it. I've seen cases like that though show up in audits -- you see a blob of HEX or MIME and go find out what might be in it.

Anyway, it really isn't accurate to say that this is a kind of <slaps forehead> <doh, why didn't we think of that> thing. It gets thought about all the time.


The logging issue is something we've become aware of more recently, but it's just as bad. There is no intrinsic difference between the statement "don't store passwords in plain text" and "don't log PII or passwords in plain text."


All the more reason to make the client side send HMAC(HMAC(username + password) + Unix Epoch rounded to last 5 min block)) over the wire in its POST to the auth endpoint.

All the transport encryption and DB encryption/hashing/salting won't protect you from this kind of logging mistake, but the above would.

P.S. There are ways to make the above even better by adding a nonce that has to be requested from the server before POST etc.


If they were following best practices, the server would never have access to plaintext passwords. The client / frontend would hash the password contents and send that across the wire.


That's actually not a best practice; on the contrary it's extremely uncommon. Not because it's bad, but because it just doesn't actually add a meaningful security improvement. On the other hand, it does add non-negligible complexity to your authentication system. In particular, it would have done absolutely nothing to prevent this specific vulnerability.

If you hash your users' passwords using a key-derivation algorithm on the client-side, each user's password simply becomes the original password's digest. From the server's perspective nothing has changed. Moreover the server will need to re-hash the password digest sent over the wire, because if the server is compromised the password digests can be directly replayed to the server to compromise corresponding accounts.

Additionally, since the shared secret between the client and server is the user's password digest, the password needs to be hashed using the same salt every time the user authenticates. So each user's password digest is still a de facto unique password which will be sent over the wire anyway. This scheme basically retrofits the server's job onto the client with added complexity. It's like the (slightly) faster horses version of password authentication, when we could really be experimenting with developing cars (like two/multi-factor authentication, more robust server-side controls and provable correctness).

That's not to say the scheme has no benefits whatsoever. It does mean that user passwords will be more complex, because the actual token stored by the server (the de facto password) is a digest. But there are two drawbacks - with enough users you'll still see many duplicated digests in your password database, even if you randomly generate salts on the client side. More importantly, you're offloading hashing to the client side in JavaScript. JavaScript can be very fast in 2019, but companies like Facebook and Google still maintain very low latency, substantially stripped-down versions of their websites[1] because client-side hashing isn't going to be nearly as fast as server-side hashing for a huge number of people. There's also a sizable population of people who don't even have JavaScript enabled, or who might have incompatible browsers.

tl;dr - Client-side hashing is not a best practice (and not widely deployed) because it comes with a nontrivial complexity increase, lower client compatibility and negligible security benefits. It also would not have prevented this vulnerability.

_______________________

1. For example, mbasic.facebook.com.


Your criticism (outside of complexity) of the suggestion may be unfounded, consider:

  Client: asks server for nonce
  Server: sends nonce
  ---- OR ----
  Nonce arrives with login page

  Client: sends HMAC(nonce + HMAC(username + password + appname) + Unix Epoch rounded to last 5 min block))

  Server: 
  1. gets response
  2. using username as key pulls HMAC(username + password + appname) from DB
  3. Computes HMAC(last nonce sent to username + DB HMAC + Unix Epoch rounded to last 5 min block)) and compares to user token
  4. last nonce is cleared
This algorithm would have prevented the attack (only the client computed HMAC would be in the logs) and is not subject to replay.


To be fair what you're describing is a PAKE, which is substantially different from "merely" moving the key-derivation functionality of password hashing from the server to the client. They're categorically different things. But you're right - if you're going down the rabbit hole of client-side hashing, you might as well implement a PAKE instead.

This kind of gets to the heart of what I was referring to when I said client-side hashes are like faster horses rather than cars. If you're spending this much effort, a superior protocol is better than an unorthodox, modified one. SRP is a PAKE which basically takes your proposal and moves it into a different layer of abstraction (TLS), and OPAQUE makes improvements upon it which allow you to use elliptic curves[1]. There are other reasons not to use PAKEs, but they're a much more coherent and defensible suggestion than just bolting the key derivation system onto the client rather than the server.

______________________

1. https://blog.cryptographyengineering.com/2018/10/19/lets-tal...


This is the same as:

    Server sends nonce
    Client sends HMAC(nonce + password + time)
Your inner HMAC becomes the new password which now is stored in plaintext in the DB. You just call it something else.

There are better ways to implement this idea, like SRP/PAKE https://en.m.wikipedia.org/wiki/Secure_Remote_Password_proto...


That’s hardly a best practice and if followed consistently just means that the hash is your actual password, which means if someone steals the hash from a log file they can still impersonate you.

Unless you actually mean some sort of challenge response scheme which is rather uncommon to see, e.g. http “digest” authentication or SRP.


The big problem of a service having your true password (instead of the hash of the password) is that many users use the same password for a multitude of services (read Gmail, Hotmail, Yahoo, Amazon, ...)

So it's bad practice to keep or transfer the users cleartext password. It should never leave her browser/client. Period.


If the service has only the hash of the password, then that's the password, which would then be subject to the same problems as a "cleartext password".


Password managers and service specific 2FA solve that problem quite nicely for now (edit: although yes most users aren’t willing to do that).


The irony of this post is that most people would probably beef up the security of their DB, far beyond whatever layer of security they prop up for their logs.


A company that has lower acceptance rate than Harvard is "accidentally logging passwords"? Yeah, sure :D


Just because it's an easier oversight to make doesn't mean FB should be cut any more slack over it.


What are logs but a database with a particular schema?


This isn't a case of a company intentionally storing passwords in plaintext in their database. That's a big security no-no, because password hashing is such a basic and fundamental part password-based authentication that it's nearly impossible to miss if anyone involved in maintaining that system is remotely security-conscious.

This situation, on the other hand, seems to be more of an unintentional capture of passwords by their logging system. That's much harder to notice and more akin to a bug or security vulnerability, which is something pretty much every system suffers from sooner or later.

In such cases I'm more concerned with the company's response to the vulnerability than the fact that it exists at all. Here it seems like Facebook noticed the problem and proactively fixed it before it was exploited, which to me is a good sign.


Impossible is just a couple of mistakes away. Paper holds whole worlds.


Agree this is pretty much inexcusable.

Logging request or response payloads without an explicit whitelist should raise flags for any developer. There are very few cases where you can assert that not only in the present but also for all future use cases of a system, the entirety of a payload will not contain sensitive user data.

Only a whitelist will suffice to maintain good security. It's common for developers to attach sensitive data for debugging and other use cases under arbitrary paths.

Systems can improve further by adding patterns and other heuristics to drop values from the whitelist that look like sensitive data.


I would argue that an organization the scale of Facebook is the most likely place for this mistake to happen.

Log in and sign up pages -- especially on mobile web -- are being constantly iterated on by "growth" and "emerging markets" teams whose first priority is getting graphs to go up and to the right, not making sure the pages are secure. Their entire mission is to make things work with yesterday's technology (feature phones, old Android tablets, and so forth).

On the other hand, the dedicated security folks are focusing on ensuring there are multiple factors protecting high-profile targets, so they're working on SMS as a second factor, Login Codes, Security Keys, and primarily focused on Desktop Web, iOS, and Android. Or they're doing fuzzing, or building robots to chase down engineers for buffer overflows and the like.


Dude, you'd be surprised how much data your average company leaks in logging.

Ask any HIPPA compliant company how they scrub their logs from errors that include medical data and you'll truly see how bad things are.


This is, supposedly, one of the top 4 global tech companies. Not some regional med-tech enterprise shop with a fleet of ageing WindowsNT servers. It is literally inexcusable.


It happens at fortune 5 medical companies. When you sign up to Facebook you know you’re signing up to Facebook. When you go to the doctors office you are probably completely unaware that your PHI is being shared with a couple dozen other companies often intentionally and sometimes unintentionally. If you are a human in the United States there is a high probability your PHI is sitting in a log file on a server owned by some company you’ve never even heard of.

Fwiw none of this is necessarily a breach of hipaa laws.


Wait till you get into companies that allow doctors to have other doctors review your charts... your data gets silently shipped to a third party company who shares it with doctors... and in most cases you don't even get told it happens.


OT: how much of a pain is it under HIPPA to deal with medical data provided by the patients themselves through channels that aren't supposed to be used for such data?

I've never dealt with medical data, but do deal with credit card data so have to deal with PCI. With credit card data we run into customers who send emails, or use online chat, or leave voice mails telling us they have a new card and giving the number. We never ask for them to do this, and indeed everything we tell them says never to send us a credit card by such means, but they do it anyway.

That brings all those systems into scope for PCI, which is a pain in the ass.

We use helpdesk and support chat software licensed from a third party. We had to write scripts that understand its DB schema and data formats and can find and remove credit card information, and keep them up to date as the helpdesk and chat software is updated.

A few years ago, the vendor tried to discontinue the stand-alone version and move its users to their cloud service version, but had to drop that plan when they found out that a lot of their customers were doing the same thing we were, and absolutely could not move to a cloud service unless that cloud service handled PCI issues. Apparently they didn't want to deal with making their cloud service handle that, and there had been no talk since then of dropping the stand alone version.

Do people dealing with HIPPA run into similar issues?


Yes, this is why people build in HIPPA compliant clouds like AWS... but I left that business years ago. You should always audit your third part services to see what data is sent in debug logs.


Just as alarming for me is that Facebook engineers don't seem to understand risk management:

"In this situation what we’ve found is these passwords were inadvertently logged but that there was no actual risk that’s come from this. We want to make sure we’re reserving those steps and only force a password change in cases where there’s definitely been signs of abuse."

Inadvertently logging passwords is a risk. If those logs were accessed then that's a bigger risk. Signs of abuse is an issue. There is no such thing as an "actual risk", there are just probabilities (and possible consequences). Once a consequence happens, it is no longer a risk -- then it's actual.


Because storing plain-text passwords in DB is a sign of pure incompetence, while accidentally logging a password is a huge screw up, but unintentional one - and frankly a type of mistake that is not unimaginable to happen to any of us. You need to log a full requests to urgently debug some issue, and in all this mess you forget to remove the logger afterwards... and boom, you've got yourself a log full of passwords, credit cards, and all other kinds of secrets. In real world it's just like that, humans make errors, that's why your logs should always be heavily protected and regularly audited.


I had this kind of issue, a sloppy logging tool, when I had <100 customers. It was fixed before we got to 200.

Early FB (c2005) could get the excuse, new team, just getting started. But, no excuse after the first round of funding


We scoff at companies that intentionally store the passwords in plain text, then compare that stored value with the credentials provided to authenticate a session. This is a common mistake for novice engineers. We are dismayed by but I wouldn't quite say scoff at accidental logging of secure data, which is a much more common accident and perpetrated by otherwise educated and advanced engineers every day.


Yeah I agree this is pretty much inexcusable. I'm not sure why the pass.


Facebook didn't mean to log failed sign-in information. It just kind of happened.


People scoff at companies that store passwords in plain text on purpose. Inadvertently logging them is not the same thing. And Facebook’s scale is why it’s understandable how it could happen.


Someone had to write the code and then use the logs. Individuals, like any company. Then the article says it was actually searchable and used for something by employees. Why does the entire company’s scale make this excusable?


I don’t think “pretty much”, it’s just inexcusable.


the_duke is not "the community"

so when you say, "the community usually scoffs at entities which store passwords in plaint text. Why do you give Facebook a pass" is a fallacy of composition.


I don’t get it!!!

Facebook is supposed to be hiring the best developers. It has tons of money. This is the most basic security consideration - hashing passwords on the client at least. At LEAST.

I am constantly surprised by how basic practices were ignored by these corporations that got so big, while the small guys implement them. I guess people really were "dumb fucks" to trust Zuckerberg with their passwords.


so yeah, dev have access to the password of your account, but really they dont need it, they probably have access to the hash in the database and login anyway. So really NO harm have been done


Having (read) access to the hash shouldn't give any access to the account.

Edit: You've made this comment twice in these threads. It's wrong.


Yeah I was wrong, it is only true if the dev has access to the prod database


It is neither a standard practice nor a best practice to give your devs access to your production auth databases.


Perhaps not in more mature organizations, but it's standard practice at every startup I've ever worked for. One place had the dev office VPN'd into production at all times.


Did any of those startups have 200-600 million users?


No, these were small organizations. I'm just saying it's not uncommon. Sadly, some places don't even have a dev environment...


This does not sound like logging, because I would imagine that logs wouldn't be kept for years (2012!). The article implies that the passwords were stored and searchable in clear text. Yes, logs can be searchable, but this isn't what the article implies.


The article doesn't claim that passwords from 2012 are viewable today, just that they have been storing them in plaintext since 2012.


I really don’t know about this. Encrypting our log information was literally one of the first things we did when starting our latest product.

We store maybe a few million customer records without any significant PII.

Now there is Facebook which apparently hasn’t even bothered? And is logging the full body of requests somewhere? That’s a major WTF, even for Facebook.


>prevent bots/spam, abuse, etc.

Do they inspect the passwords too?


The fact that the top comment is Facebook apologism informing others how this is "understandable" makes me wonder if there's something more than organic voting going on...


Please don't break the site guidelines with insinuations about astroturfing.

https://news.ycombinator.com/newsguidelines.html

This is extremely well-trodden territory for HN: https://hn.algolia.com/?sort=byDate&dateRange=all&type=comme.... The short version is that insinuations without evidence (someone posting a view you disagree with is not evidence) are at least as toxic and far more common than the thing being insinuated.


To think the only organic response is mob mentality against anything Facebook is silly. This is HN. We're here to discuss technology. As much as I stay away from Facebook these days, this particular flavor of security issue has happened to the best of companies in the past year, and it's natural to discuss that and why.

To genuflect to with "I hate Facebook too" before having a rational discussion is worthless virtue signaling.


There are still lots of "organic posts" on Reddit threads about Facebook's fuck ups implying that the "anti-facebook narrative" is a "Soros conspiracy", which leads me to believe Facebook is still working with the "Definers".


Dumb mistake?

This was a feature for the government. Plain and simple.


Yes because the government would want Facebook to store user passwords. But only some of them, specifically facebook lite users, you know, the users that can't even afford a modern phone and reasonable internet. And they would want to use the passwords directly, instead of having some other backdoor for getting data from these accounts, so that way the users they spy on can see that they're being spied on because of the suspicious logins using their password. You've cracked the case =)


I think the technical security problem here (properly whitelisting parameters in logs) is just a symptom and not the core underlying concern (as the article mentions Twitter and Github just dealt with similar issues)

To me, it seems very likely someone before 2019 laid eyes on these logs and either:

  a) Decided not to report it (implies serious security culture issue)
 
  b) Reported it and no action was taken (implies serious security process issue)

  c) Didn't even acknowledge it was inappropriate (implies a serious security training failure)
If you've already become complicit towards regularly violating the privacy of your end-users, one can easily understand an employee devaluing the seriousness of clear-text passwords in a log.

Are FB employees so regularly exposed to sensitive data that they have become desensitized to the seriousness of clear-text passwords in an internally accessible log?


This is very basic stuff, all user-identifying information should be tokenized before being logged. Controlling access to production login logs or exposing them on only a need to know basis is another basic security principle. Sounds like FB is actually a wild wild west internally.


What is now a common practice was nonexistent a few years ago. 20 years ago you could bypass Windows login with a few clicks. 15 years ago CORS was nonexistent. 15 years ago it was common to send sensitive data unencrypted..and so on.

Ex Facebook people told me that until around 2010 FB management turned a blind eye on its employees digging around their databases. Then they released a warning that people should stop and started locking prod data down. A few months later those who still peeked around prod data were let go.


As the article mentions, Twitter disclosed a similar mistake a year ago...and GitHub before them. These are just the organizations responsible enough to say something.

Can we take a moment to acknowledge that this is an easy mistake to make? A logger doesn't care if it's a password or not. Strings are strings. As long as the answer is "humans should be more careful" we'll be seeing these kinds of disclosures regularly across the industry.

My best attempt to address this in my teams has been to use different data types for different data classifications. Naked strings must be loaded in one of these data types after input sanitization. That makes it easier to catch accidental inappropriate use. This is useful for managing PII as well.


Interestingly, Apple seems to have something like this in place where format strings, by default, are not logged (they show up as <private>). I'm thinking that something like this might be useful to have in general.


That's why modern languages need something like a typedef!

At least C# has a concept of a SecureString for things like passwords.


How does SecureString actually work? Presumably it's possible to coerce one into a "normal" string, and I'm sure that someone will end up doing this in the codebase because it's more convenient to work with.


The point of SecureString is to ensure that sensitive information gets removed from RAM in a timely manner. It's just a wrapper around some native memory that gets zeroed when the SecureString object is disposed.

Otherwise, when you use standard C# strings, you don't know when the garbage collector will collect them.

(Granted, you could do similar things with pinning a char[] to a char* and zeroing it before you end the pin, but who wants to jump through those hoops?)


But what I want to pass a SecureString to someone? Doesn't a copy have to take place in that case?



> The general approach of dealing with credentials is to avoid them and instead rely on other means to authenticate, such as certificates or Windows authentication.

Sure... I bet you eat right and get plenty of exercise, right?

Anyway, that link appears to be best practices when writing cross-platform code.


Yes, this is an industry-wide problem. Many logging API's are designed for convenience, not security. We need systematic solutions that are still easy to use (or they won't be used).


I mean, yeah it is easy to make but it is also easy to fix. How long did this occur prior to correction? 1000's of searches for said data in the logs. I suspect we are only hearing about it because Facebook, all of a sudden, has been publicly humiliated multiple times for poor engineering/business practices and has decided that it is time to grow up and take privacy seriously. Otherwise, they risk their existence.


Yet another example of why its important to use _unique_ passwords for every site you have an account on. Even if the site you're using does password storage properly, that's no guarantee plaintext credentials couldn't leak through other means, such as, in this case, improperly configured logging systems.

In the future, WebAuthn may be able to solve this problem for good, as sites will only have access to a unique public key rather than a plaintext password. Until then, a password manager is your best defense against this type of issue.


This is so prevalent in technology companies that it is funny to read this thread with everybody throwing mud at Facebook without considering that most probably the company they are working on has had (or maybe even currently has) the same issue.


That's a fallacy my dude. You can throw shade at your own company and Facebook for the same reason. It doesn't and shouldn't minimize what Facebook has done.


Posted this on the other thread from Facebook, but at what point do we start imposing strict fines on companies that are found to have done this?

Granted, I guess we wouldn't be hearing about this instance at all if there was to be some sort of fine attached - it would have just been swept under the rug - so maybe that's not a good idea. I'm just tired of the "oops we stored your passwords in plaintext lol" from companies with engineers that should clearly know better.


When we can start fining you for your mistakes, when developers can start getting fired for any mistakes immediately.

I don't care about Facebook, but storing user passwords is the plain is not "privacy violation" without the password they still have access to all your data. storing user passwords by logging it is stupid amateur security mistake.

I can understand if Facebook stored the password and used it to access your other accounts with permissions to invite users then sure fine em. But mistakes are mistakes, they owned up to it.


Storing passwords in plain text is beyond a simple slap-on-the-wrist mistake, and it has real security implications.


>>When we can start fining you for your mistakes, when developers can start getting fired for any mistakes immediately.

Sounds good to me. Plenty of people in other industries get fired all the time for violating best practices, regulations, and so on. Why should software be any different? Are we special?


Most tech companies have a “blame the process, learn, and fix the process” approach. I’m not sure what industries you’re talking about but manufacturing and aviation seem like they have a similar process.


An internal tool that allowed for logging clear text passwords was a mistake. A culture that allowed said system to exist for 7 years without being surfaced by any type of internal security audit is something else entirely. Financial penalties could/should/would target the latter not the former.


I wouldn't mind that, more responsibility on the software developer means more leverage to push back. I guarantee you I'm not rushing for a deadline if I think I'm compromising security that may put a black marge on my career.


It's likely to have been a breach of GDPR, so if this situation had existed when GDPR was in force, the answer to your question would be "at this point".


I guess that's true, yeah. Will be interesting to see if there are fines from the EU.


It all depends on if one can prove negligence.


If being bad at your job is a crime then lock me up.


Depends on the job. If you're a licensed professional, this could very much be the result of you "being bad at your job".


If you get a license and are bad at your job then lock up the licensor.


I agree, it's time for there to be criminal negligence penalties for these most egregious failures of even basic security practice.


So they logged plaintext passwords, what a dumb mistake. To top it all it sounds like 20K employees had access to these infra components. I have had my doubts on FB's internal access control model for a while now. Specially after one particular story where an employee claimed to have internal access to private information and used that to stalk people on Tinder - https://www.wsj.com/articles/facebook-fires-employee-who-bra...


This seems like the passwords were inadvertently written into some log files. While this is a very bad security issue for a company like Facebook, I am pretty sure that this type of bug is much more prevalent in the industry than we would like to assume.


It's just such an easy mistake to make. Some new ops guys logs all the traffic through a subsystem and boom - plaintext passwords.


Yeps. And at Facebook-scale, when you realize the issue after it has went live you have already "stored hundreds of millions of user passwords"


That's a good point in favor of less hacking things together and more engineering - Facebooks seems to have not embraced strict engineering processes which are the only way to tackle these things. Facebook has to change it's own principles to succeed long term.


I would say that this needs to change on a broader scale within the software development community than just with Facebook.


An easy mistake with serious potential consequences. I'd be interested in hearing how Facebook will prevent this from happening again in the future


A lot of folks are angry about this, but I'm not seeing much in they way of "this is how that should be handled" aside from "be vigilant." The closest is "encrypt logs," but those logs exist to be seen, else they wouldn't exist.


Hash client side. Send hash.

This is how it is done for 20 years (by those who truly care about the users).


If this is so common, it should be easy for you to provide a link to the javascript password hasher on a site you use. Does this site do that? Pretty sure it doesn't. But you sound very confident about this, so I'm sure you'll have no trouble finding some other site that handles its passwords this way.


What?! No!!!

As many have said before, transmitting the hash simply turns the hash into the password itself. Anyone who has your hash, has your password.

The biggest reason for hashing is that, even if an attacker gets access to the hashed passwords, they still can't use them to log in. Client-side hashing completely and utterly undoes that benefit.

Passwords in transit should be protected by SSL. Not a hash.


You should check out SCRAM. It solves this problem without sending the hash or the plain text.


Fascinating. Yes that does solve the problem, and most importantly doesn't send the hash (unlike what the commenter I was responding to was suggesting, which continues to horrify me).

Most interestingly, it solves the problem where the server may not be trusted. I see how this would have protected against the Facebook problem.

I'm curious, have any major websites you're aware of adopted SCRAM as a best security practice? It feels kind of like overkill, since generally the server is considered to be trusted... but at the same time, it would definitely prevent accidental logging.


I have not heard of any website using SCRAM, but recent versions of PostgreSQL use it for password authentication.


The solution is trivial: a watchdog. Have a watchdog log into a dummy account with a "well-known" password and then inspect logs for that well-known password.


That doesn't necessarily help when logs are sampled. Eg, for high volume services or endpoints, only 1 in X,000 requests might be logged, and you have no theoretical guarantee that you would ever catch any one individual request in your logs.


Good point, though I consider that as just an implementation detail. Bypass sampling for designated system accounts and you will be able to see if systems leak cleartext secrets into logs or not.


But then it's unbelievable that they don't have watchdog for this in place given how big and user-facing that company is. Also out of those 2000 employees having access at least some would have noticed that, wouldn't they? And it still took far to long to correct that.


Why does traffic going through the subsystem contain plaintext passwords?

It was my understanding yhat my password, on a properly configured login page, never left my browser, much less crossed multiple machines that had the ability to read it.


No, your browser sends the password to the server in the HTTP request, only protected during transport by it being HTTPS. On Facebook's side, the HTTPS gets decrypted and the request is passed on (again potentially encrypted in transport, but the machines handling the request obviously need to see what is in the request, which includes your PW)


Ideally Facebook's website or any website should encrypt the password client side and send that via HTTP to the server where the server decrypts in internal and authenticate login session?

Unfortunately, we are in 2019 and client-side hashing is rare because people use SSL instead.

One can argue that company like Facebook that is well stocked with tech resources should have figured this out already but here we are.


Let's say a company does implement client side hashing. Now the server just needs the hash to log you in. Isn't the hash now basically the password? Now if the hash leaks or gets logged, a malicious user can still login to the company service using that hash. Only difference is, since it's not a plaintext password, with proper salting the user is somewhat protected from having their other services with similar password hacked.


Consider this approach: Let H be a hashing function and p be the users password. In the database you store a nounce n, and H(H(p⊕n)). When authenticating the server sends n, and client responds with x=H(p⊕n). Now server can compute H(x) and compare with the stored value to authenticate the client. Finally, after it has been authenticated, the client generates a new nounce n', and sends H(H(p⊕n')) and n' to the server, which is stored in the database for the next log in.

Replace the outer H with a proper key derivation function for extra credits.

This avoids sending any secret value over to the server, so no server side logging will cause a problem.


After reading the comments I have an idea. I'm sure someone thought of it already though. If the password is p and the hash function is H(), server stores the hashed password H(p).

When the login screen loads, server sends the server time so with reasonably fast internet, the client can estimate the server time. Let's call the estimated current server time t. On login, client sends H(H(p)+t) with t. Now the server can compute H(H(p)+t) with the t from the client and verify if the hash match and also check if t is within few seconds of the current server time.

This way if any data that goes over the network leaks for gets logged, it'll only be valid for few seconds. Also salting before hashing should go somewhere in there but it'll make it a bit more complicated.


SCRAM has solved this and with it the server can verify that the client has the plain text password while only storing the hashed password a d without exchanging a plain text password.


FB could also rotate the hash any time, with a short overlap period to allow for already-loaded pages, to eliminate this issue.


No. It would be nice if this was the case but browser sadly just send the password to the servers instead of using something like SCRAM which only requires you send the password when setting it and not on every authentication.


Facebook's own announcement is titled "Keeping Passwords Secure"[0]

Facebook comms: You can't really use the word "keep" when the entire post is about how you have failed to keep passwords secure, by storing it in plaintext.

Or at least, you probably shouldn't use that word. It signals that you intend to keep doing what you're doing, which wasn't just keeping passwords insecure up until sometime in January.

It's up until today when you effectively disclosed that you haven't notify hundreds of millions of users that their passwords were compromised for months after discovering it. I know they "were never visible to anyone outside of Facebook", but that group still includes some +25K people.

[0] https://newsroom.fb.com/news/2019/03/keeping-passwords-secur...


Facebook comms knows that. That's why Facebook comms used the word "keep." It's also why they said "some" passwords and not "hundreds of millions" of passwords.

Facebook comms also knows that "there is nothing more important to us than protecting people’s information" is a bald-faced lie, since Facebook is literally in the business of selling people's information to companies like Cambridge Analytica, but sometimes it's okay to lie as long as it sounds like a meaningless platitude so that nobody would be expected to seriously believe it.


And they found 2000 employees accessed the data. I don't care how secure they think the passwords are in those 2000 people's hands. You can bet those have been sold on the black market if they leaked to that many people. Just takes 1 unscrupulous person, which is why you encrypt passwords so even if it leaks, the leaker can't get the original!


Why informing them and not resetting their password as a standard security measurement? I think the headline makes it even worse because it's so tone deaf.


To me it’s no surprise these things happen considering these companys only care about hiring leetcode experts.

Who knew data structure and algorithms doesn’t qualify you to build a secure real world web application?

Knowledge of practical things like security has zero value to these companies when hiring engineers.


Well if you hire enough people, and you're not 100% successful at selecting the good ones... Also even good people make mistakes at times. Almost impossible to prevent. The most actionable thing you can do is build less things, and focus on using of the shelve micro services (either in-house or third party) for everything


But as someone else said, even though it maybe a common mistake (I've seen similar mistakes at my work), how is it that a number of those thousands of engineers didn't raise the alarm and get traction sooner on securing it? Companies need to incentivize engineers to not have a "not my problem" attitude.


Reminder that Facebook also stores(stored?) 3 separate hashes for your password: passwoRd, PASSWOrD, and PasswoRd. They weren't super transparent about this fact.

Source: https://www.zdnet.com/article/facebook-passwords-are-not-cas...


I don't really see a reason they needed to be super transparent about that fact. It doesn't really have a significant security impact. It's just a small QOL improvement they implemented.


Honestly, given the possibility of accidental caps lock, and how mobile keyboards try to capitalize the first letter...

...that seems like a clever feature, and it doesn't reduce security in any meaningful way at all.

I would never bother to program that, out of my own laziness, but I respect that they did. It's really thinking about users.


Good to know. Someone in Facebook probably writes all his emails in all caps.


You don't need to store 3 hashes, infact, you would need many hashes for all possible cases. It is easier to drop all cases pre hashing, and only store a lower case hashed value.


Hashing the the lowercase version of the password string is not the same as what they’re doing. What you’re suggesting greatly reduces the security of the password, what they’re doing only divides the search space by 3 (which is nothing).


My Instagram was compromised a couple of years ago. My password was very secure and completely unique to Instagram so I found this extremely odd. Someone had logged in from the Bay Area. I believe it was compromised twice within 24 hours and then my Facebook was too with a completely unique password as well. Facebook emailed me to change all passwords. I have perhaps over 50 acquaintances working at Facebook. Could someone internally have accessed it as a prank? I don’t remember all the details but I did change my bank account immediately after as well just in case.


I find it pretty shocking that other commenters are looking at this as excusable. I mean, is that OK/excusable at your company? Logging payloads/bodies of sensitive requests in plain text - 0 obfuscation. That's ok? Wow. Other commenters are saying "it's logging so it's a forgivable mistake". Is it though? Obviously the world won't end because of these decisions, but holy hell I can't believe this wasn't caught/brought up in some type of code review. This seems pretty 101-ish


This kind of thing can often be hard to catch in a code review, because often it's the combination of several systems that cause this to happen. Tracing the user's password from submission form all the way to logger would probably require jumping through several layers, most of which are just handed black box blobs that they hand to the next system.


Personally as a developer I'm not thinking Facebook is bad but rather that in general we are bad and we pass over plain text passwords to endpoints (through SSL) and just trust the data will not be abused (logging) but would be cool to have client-side encryption prior to sending to any 3rd party so figuring out a way to do that cleanly on different devices where a password, although the same everywhere possibly but is always encrypted uniquely for a particular company prior to going over the wire so even if compromised internally would only be for that particular service. This is probably just a legacy problem in the end where a new approach would simply make this problem obsolete like a yubikey or built in protocol at some point.


All the answers I see on stack say there is no gain to do hash or encrypt prior to sending. I'm not sure I can agree with this. Let's say you hash a password and some attacker gains this. They can now log into that website BUT NOT everyone else for your username if you happen to use the same password. Jeez am I wrong here?


I have looked into this in the past and came to the same conclusion. Essentially, the "password" sent to the website is the client-side salted/hashed version of the user's actual password. The server could then salt/hash the "password" another time before storing it. This could result in the same issue if the "password" is logged, but it protects the user's true password from being discovered. Maybe a security expert can weight in on this, because I don't understand why this wouldn't be the standard.


Ideally the server doesn’t need to remember the salt it sent to the client, so it should be signed together with a timestamp to avoid reuse.

While you’re at it you can also add some hash puzzle to be solved by the client increasing difficulty with failed logins.


Facebook posted a response with the title "How we secure passwords".

The euphemism of company press releases always make me laugh.


What irks me is that they're not going to issue a mandatory password reset, since they only do so "in cases where there’s definitely been signs of abuse."

Even if it's an internal server, having passwords stored in plain-text always merits a password reset. Storing it on a file-server unencrypted somewhere is just as bad (and arguably worse), than storing them in plaintext in the database.


We're sorry.


We take security very seriously.


> There is nothing more important to us than protecting people’s information, and we will continue making improvements as part of our ongoing security efforts at Facebook," Pedro Canahuati, the company's vice president for engineering security and privacy, wrote in the post.


Nah; That's Equifax!


Twitter found they did the same thing last year https://news.ycombinator.com/item?id=16989534


Same for Github: https://news.ycombinator.com/item?id=16974851

1 comment, "Crazy level of transparency, they noticed, fixed it and notified the users even though this was an internal leak. I bet most businesses would just sweep it under the rug."


Some number of Facebook employees ABSOLUTELY knew this was going on, quietly talked amongst themselves, maybe brought it up, but then it was knocked back down. 100% I guarantee it. One company I worked for had plain-text passwords and it was quietly discussed (or not discussed) until we EVENTUALLY fixed it (years). Fuck facebook with a rusty pitchfork. This is the tip of the iceberg with their data-handling practices. I bet their systems are fucked throughout.


I remember when I was in Greek college and we were building a login app with PHP and MySQL. We actually were storing every password in plain text and I asked my professor why shouldn't we hashed them for better security and he was like "that's not necessary cause we overload the server hashing all those passwords". For a moment I thought I was paranoid but then I start thinking how vulnerable some systems may be.


How do companies keep messing this up. The cardinal rule of web application security is to NEVER store plain text passwords anywhere. The only time your application should have access to plain text passwords is when it is hashing the password or verifying the password against a hash.

If you need to log all request data for some reason, strip the passwords out.

It really isn't difficult.


In properly built software, the clear text password never leaves the client (browser in this case). There is no real need to have the password sent over the net.


Meanwhile the stock of facebook has risen with 0.5%. How does this even work? This is enormous lack of oversight and proper processes.

In addition the original post of facebook is rich in media speak:

First they speak of _some_ users and _readable format_ and next paragraph they speak of _hundreds of millions_

We estimate that we will notify hundreds of millions of Facebook Lite users, tens of millions of other Facebook users, and tens of thousands of Instagram users.

As part of a routine security review in January, we found that some user passwords were being stored in a readable format within our internal data storage systems.

With this technique, we can validate that a person is logging in with the correct password without actually having to store the password in plain text.


Better question:

As a system administrator, how does one find issues like "improper logging that leads to credential capture"? And how do es one find credentials for internal services crammed in and hidden in old crufty source tomes?

Obviously, when you find it, you nuke the creds, change them, and fix it. But you can't automate, because that requires the raw data. And if you hash the passwords, you end up having to hash everything in order to perhaps find the bad hashes. And when dealing with MBs/s of log, you can't offset-hash everything to scan... and you're back to raw text-matching passwords..

How do others do this, so we can do the right thing?


> Obviously, when you find it, you nuke the creds, change them, and fix it.

The status quo is to toggle a boolean on their account. If they sign in and this boolean is true, then don't actually sign them in. Tell them they need to change their password first.

Then you'd send out an email to the account holder to change their password. If they actually own the email, and aren't an imposter, then you're good to go.

So in terms of automation, it's literally just a boolean check during log ins, and make sure you're actually doing password resets correctly.


You've reduced the scope of my original question, to that of compromised accounts. My question is a magnitude larger than that.

For example, someone checks in code and left a config file semi-populated which included a live login credential. Obviously the answer is "Dont do that!" but we all are human. People accidentally post apikeys, credentials and other secure things. It happens.

But how do you automatically find issues like this? If PII is radioactive, credentials are gamma-emitters.

We solved this at a job long ago with a list of plaintext passwords on a machine you couldn't log into except local (butt-in-seat), and it took your SVN commit, scanned it, and gave a pass/fail. That methodology at least stopped our internal shared service accounts from being accidentally compromised.


Ah, sorry about that. I got hung up on the second half of your post. Can't say I can think of a good automated solution for bad security hygiene among employees. :/

Though if you guys primarily use Git, this does wonders: https://github.com/zricethezav/gitleaks

The only thing I can think of that would help prevent credential compromises is to either implement a company password manager (akin to your butt-in-seat solution) with an ACL, and only accessible on the local network. That shouldn't be too much friction for employees to actually utilize it.

Next is having a secure channel to transmit secrets. That + Password manager has personally helped stop my coworkers in the past from sending passwords in emails, slack messages, post-it notes, text files on their computer, a committed file in a repo, etc.


That's fine, it happens :)

Yeah we've seen that git module. We use something similar to that.

I was also thinking inadvertent log capture as well. Capturing creds in an ELK stack is just as bad. And I'm guessing this FB issue is some variation of complete logging snarfing creds.

I'm just trying to think of coherent ways we can scan, detect, and act on creds that work across a backend. Seems to be a rather large hole.


Facebook's press release https://newsroom.fb.com/news/2019/03/keeping-passwords-secur... on the matter is pretty nonchalant:

>Keeping Passwords Secure >As part of a routine security review in January, we found that >some user passwords were being stored in a readable format >within our internal data storage systems.

Nothing to see here folks.


Facebook seems to be the biggest cyber-threat and insecurity to the entire United States. When does Facebook get shut down for being a national security threat?


Sadly many companies did in their early days from that period (even when it was known to be bad). Blackberry had plaintext passwords stored in a postgres dB up until 2.0 of BES (circa 2007). More so as no constraints (yes many many people had a password of "pass","god" and the like).

So let us not forget legacy systems we have all encountered or read nostalgic stories about how old XYZ kit is still in use today.

Or to put it another way - the amount of work that went into Y2k, just to fix a date field, fixing a plaintext password is more work and we have had no password2k like event and any event is local with some company falling foul and by that - getting hacked and data shared publicly.

So even today, you can guarantee that their are still systems out there using plaintext password storage. Some will be legacy hangover reasons, but not all.

But it sure does make you think, how many legacy systems are still in use today and by legacy, they don't even have to be that old in some fields or work. Sure does make you wonder about satellite security given the age of some of those still operational and in orbit.


Why is it normal to not hash passwords in the browser before sending them over the wire? Seems like it's opening the door to issues such as this one, or even more serious ones like when cloudflare leaked web server memory into other sites.

I understand that you'd still need another hashing layer before inserting it into the db, but wouldn't some hashing client-side be useful?


Yes we should be hashing to avoid these sort of password logging issues. This is one of those old internet tech-debt that should be resolved in the future either via JS or HTML standard where password is hashed before sending to the server.

Reasons for this are: - Internet services have grown and it's too much of a burden for user to have different password for every service.

- Even if users have different password, they use the same pattern and add some number of special character at the end which defeats the purpose if password pattern is revealed.

In order to protect the pattern of password across multiple services these login services should use client-side hashing. Think of it like something along the lines of SSL green icon in chrome, services not using client-side hashing are leaking it somewhere (no way to tell if they are not leaking).


This does not help at all. If you hash client side,then the hash will became your password. If you accidentally log this hash, anyone who have access to that hash can log in as you easily by constructing logun request.


For all intents and purposes that hash becomes the users password at that point. Also you can't really salt that hash in a meaningful way.


With 20,000 employees who potentially had access, there's no way of knowing if any password was abused. They should message their users to change their password, especially if they use it on other sites, just to be sure. And because many sites use Facebook OAuth, there's a lot of room for abuse.


With 20,000 employees who potentially had access, we can rest assured that passwords were abused.


Exactly! It's nearly certain that at least one of them had an enemy, ex, etc, that he/she wanted to exploit.


Or just a person saving the data for a rainy day


I hope Amazon, Apple and Google are doing better. How can we know for sure? It's not unreasonable to ask because I would have thought it completely NUTS to think FB used clear text but apparently they did (or still do). What about everyone else? Does anyone know how to find out?


> Does anyone know how to find out?

Join the ops team at each of those companies, work your way to a position where you can analyze log files, then start analyzing them and see if you can find data that should have been obscured.

I don’t see any way to find out otherwise, we are talking about querying TBs of internal, company specific private logs to see if someone made a mistake. The best anyone could tell you is “I work for company X and I don’t think we’ve had this problem.”

Edit: just using a password manager with unique password for every site will solve most of the problems from a customer perspective. My Facebook password is unique to Facebook and I have 2FA so even an engineer with my password from a log couldn’t login to my account.


At Google you don't get to read random log files, sometimes even from your own project. There are entire pipelines for raw and sanitized logs, with expiration dates and corresponding access controls. Unless you're on the security team, crawling randomly through logs for no good reason is a quick way to get in trouble.


> At Google you don't get to read random log files

> crawling randomly through logs for no good reason is a quick way to get in trouble.

So do you get to or do you not?


I meant crawling your own logs


If it happens to a company with the best engineers, focus on security and almost endless budget...


No, this did not happen by mistake. It is plain and simple that they don't have focus on security.

There is no true need for the cleartext password to leave the users browser.


So if it's not by mistake, do you think they're intentionally trying to reduce security? Are you saying that every website that hashes on the server (almost 100% of the internet) isn't just negligent, but consciously sabotaging user security?


Possibly this is the biggest deal of all:

> and searchable by thousands of Facebook employees

Any time we use a big Web service we understand that there must some employee at that service who could see our activity, but we expect the number of employees who could see our data must be strictly limited.

Thousands of Facebook employees could see user passwords? If true, that is an amazing number.

I know that when I use Gmail, there must be some employee at Google who could potentially see what I've written. But I've also read that Google has strict controls about who could see my email, and the activity of their employee is logged.

What Facebook allowed sounds serious, if only because they were so lackadaisical about enforcing strict policies around data access.


What I find more in more in life is that those that get ahead are doing so by not doing things the right way. We have such a large population now that it seems those at the top have got there because they have done something wrong in order t get there.


This is really inexcusable, but so much of what facebook has done or does is at this point I'm not surprised. I blocked all their IPs and do not have an account.

If i had to guess, it was probably legacy code that causes this. Mark Z. started this with raw PHP, if he had access to or had use an framework, they would have provided salted passwords for free. (it is basically impossible to do this in django, rails, etc). That does not make it Ok since they have 1000s of engineers are qualified to fix it, but they probably have/had higher priorities, clearly security is not one of them


Feels like this falls ln the infosec team. Our team has a scanner that parses all* logs for passwords and PII.

We know that the infosec team at Facebook was a second class citizen. So, its no surprise that this got through.

*sampling on large sets


What's PII here? I'm not familiar with this acronym.


Personally Identifying Information


Let's say that this problem occurred because they logged the body of post requests (including the passwords).

Could this problem be mitigated by hashing the password on the front-end (and a second time on the back-end)?

Like this, the server would never see the plain text password.

Also, if a facebook employee intercepts the hashed password, he could only use it to login to a facebook account. But, I guess he could do that anyway if he has access to the db.

However, this employee would have no way to know the real password and use it on another website.


I wonder if this is related to the fact Facebook passwords are (were?) not case sensitive?[1]

(Ex: some script checks for hashes of the password, but not hashes of the permutations that are also accepted?)

[1] https://www.zdnet.com/article/facebook-passwords-are-not-cas...


Facebook passwords are case sensitive. The only case swapping that happens is to correct leaving your caplocks key on, or if your keyboard auto-capitalizes the first letter (like some mobile keyboards). This reduces the search space by a factor of 3 (aka nothing), not by all permutations of capitalization.


>Facebook passwords are case sensitive. The only case swapping that happens is to correct leaving your caplocks key on

That's not the definition of "case sensitive" I've seen used by non-FB engineers.


I've seen this so many times in self-rolled auditing kludges that it has become kind of a running joke. I'm now surprised if anyone does anything more sophisticated than sym encryption of master credentials used to guard secrets without using the same secret for both as a first pass. Embedded master passwords abound in most legacy OSS configurations.


To facebook's credit: I created a brand new email address and unique password to be used exclusively for my Facebook account, and 6 years later (with no password changes), I have received no spam and had no discernable security issues.

Maybe I was just fortunate, but everyone should use a burner email if they choose to use Facebook.


Oh the fun that is scanning source repositories, pretty much a guarantee at any large company you are going to find plain text passwords, as in hard coded.

as for more fun in plain text passwords it is no worse than any system which permits an admin to extract the user's current password and those systems still exist


Move fast and break things, am I right?



One of the many, many reasons to not ever use the "login with your Facebook account" (or Google or whatever) on any site. And that's before you even take into consideration what sites like Facebook and Google are doing with that information.


> an ongoing investigation has so far found no indication that employees have abused access to this data

what sort of indication would they be looking for? presumably it wouldn't be hard for an employee to have made a copy for themselves without leaving a trace of evidence.


I wonder this every time I see it in a report. It’s not like every file access is recorded for 10 years. Or at all. If you’re lucky you know who accessed a machine since the last time you rotated logs. But let’s say the data was mounted and accessible to all internal machines; literally anyone could have looked at it and done whatever they wanted without anyone knowing.


Is anybody surprised? This is the same company that served its login form over unencrypted http for many years. Prior to using a password manager, I used my password for "likely incompetent" companies for my Facebook account.


Another reason to use passwordless login. Just send an one-time login code via email.

That’s what my companies do and there’s no way we can leak passwords... there aren’t any!

I honestly don’t understand the point of using passwords when a website allows reset by email.


>In this situation what we’ve found is these passwords were inadvertently logged but that there was no actual risk that’s come from this.

We, because of all those 2000 devs who had access to them, nobody could never had stolen them...


Did someone say that it happened because Facebook believes in transparency?


Is it even possible for this to be the result of negligence when Facebook has some of the best programming talent in the world? Maybe, but I don't think so. I think this was INTENTIONAL. The question at least is why? IDK. Some FB employees probably didn't mind because it gave them unbelievable spying powers. Another thing to consider is how the NSA had a strong relationship with FB for years and the odds of them not knowing about FB's clear text practices is zero. They knew. People within law enforcement probably knew as well. Given how nothing happens at FB without Zuck's apporoval there's no doubt he knew as well.


It's a shame we're still using passwords, even more so that they are sent to the server. FIDO UAF or U2F is the way this should be done.


I'm surprised more sites don't hash passwords with a per user salt and hash/match with a per session salt before transmitting them.


What the Fck FB?

Why would you ever store plaintext password?


How the hell are plaintext passwords ever touching a FB server in 2019 ? oO

You should always hash the pwd client-side first... wtf ?


> You should always hash the pwd client-side first... wtf ?

This is very rare. What production systems do you know of that use SRP or similar mechanisms? As far as I know the vast majority of companies and applications use server side hashing.


How about private posts? Every time when you type something on Facebook, it's all logging to logs?


This is just not acceptable. I deleted my Facebook account over a year ago and I am doing just fine. I don't miss a bit of it.

At a company at the scale of Facebook mistakes like this are not acceptable. It looks like engineers there don't have too much experience in the field, but they are probably very good at flipping binary trees on a white board.


Not a great look for Stamos.


I do wonder if there was a devops feature built on top of this.


Wasn't it up to 6% of worldwide annual revenue for each individual case, according to the GDPR?

In reality though, I would be very surprised if this resulted in any fine at all.


4% for each European user, theoretically. Realistically, nothing is gonna happen and they will keep happily earning a fuckton of money from unaware users.


Orwellian Cyberpunk Idiocracy


this is why it’s a good idea to delete your old logs


I'm shocked.


but can you code the DAG in 30 minutes?


Somewhat unrelated but if you have two Facebook accounts with very similar login names/email address, and different passwords, it doesn't matter which login name that you use, it will login into the account based on the password only. I contacted Facebook about it and they say that it is an intentional "feature".


I can actually log into Facebook with a password I used on there roughly two years ago, which to me seems like a VERY bad security issue. I'm hoping they are at least fingerprinting my devices so that if someone else tries to access my account with an old password it won't let them, but still...


What do you mean by "here"


Criminal negligence that can invite class action suit


[flagged]


He controls 60% of the vote, so he probably wont.

But wait a few days, when he crawls out of his hidy hole, to ensure us, once again, that this mistake is totally on him and to promise betterment.

On a less sarcastic note: It troubles me that their new "privacy mindedness" is taken at face value in some of the "serious" press. That's despite Facebook's history of outright lies.


I'm skeptical that they can turn their culture around, but I do think they do see damage to their reputation as an existential threat (either via users leaving the platform or via regulatory pressure.)


Because some developer typed:

> log.info(req.body)

and no one realized that login data would get captured?

Get real.


Smart companies typically overload loggers to do tokenization over many types of requests automatically to avoid this kind of inadvertent exposure.


Why? Revenue is growing.


[flagged]


Could you please stop posting unsubstantive comments to Hacker News?


My facebook feed has really become stale in the past few years. It used to be a good way to see what your friends are up to, and see pictures. But now it's ads, news stories, posts from my old university, bad memes, and that's about it. Even instagram has become full of influencers, models, brands, etc. I just want a platform to stay in touch with friends and that's it.


Why is that an issue ? devs will also likely have access to the hash of the password of a user, and then access his account.

We dont care if passwords are leaking, this is your issue to not use the same password on all websites. Password are not really a private thing for developers inside a company


It's an issue for the same reason storing passwords in plaintext in the database is an issue. Reading the hash != access to the account. Passwords should be private to everyone except the account owner. If devs have access to all user passwords then you have a fucking issue.


On many systems, "real passwords" are just a tcpdump away (most sites aren't doing end-to-end TLS. It's TLS terminated at the load balancer / proxy level, everything else is in the clear.)


That is a much smaller issues than passwords stored in logs, because the people who have access to running tcpdump usually also have the ability to intercept traffic in other ways (e.g. by updating the application to log all traffic or by inspecting RAM). On the other hands access to logs is usually handed out to a much bigger group of people.


True, but thankfully service meshes like Istio are increasingly used which encrypt the traffic between each service.


Also true.


>Why is that an issue ?

Are you serious?

>devs will also likely have access to the hash of the password of a user

No, devs should absolutely not have access to the password hashes of users in a properly configured environment




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: