Facebook Stored Hundreds of Millions of User Passwords in Plain Text for Years

pizza · on March 21, 2019

> My Facebook insider said access logs showed some 2,000 engineers or developers made approximately nine million internal queries for data elements that contained plain text user passwords.

This imo is the truly alarming takeaway. FB employees were retrieving user passwords? Around two thousand FB employees? How in God's name is Zuckerberg going to perform his usual performative contrition about that one?

I'm just trying to imagine the data structures that were being retrieved from databases. Either they stored something like a big user account data type that contained their password in plaintext, which imo is a really weird design choice, or logs for other services were being mixed in with logs leaking the user/pass combos.

Surely one of the engineers could have noticed and said 'wait a minute... those are logins' over the course of the years? We hear all the time that people want FB to follow a responsible social practices (the debate on what those are rages on, which is great imo), but can't FB at least wrangle its own code base?

On the other hand, we shouldn't take the stance that heads should roll, imo - it would just create a chilling effect that would deter other companies from ever going public about their own security mishaps.

edit: I should probably tone it down in this comment but I'll leave it for posterity

breischl · on March 21, 2019

>FB employees were retrieving user passwords? Around two thousand FB employees?

This sounds to me like they were writing it into some log analysis tool, and people happened to pull it up.

I have no knowledge of FB internals, but as an example let's say they have Splunk, which is a popular distributed log analysis tool (or some custom equivalent, because FB is huge). Splunk makes it very easy to pull up tons of logging for any app - that's what it's for. So write a poorly-constrained query and you can get millions of log lines from many different apps.

It's also feasible that everyone (or nearly so) in the company has access to Splunk.

So if somebody, somewhere accidentally writes passwords into a logfile, and it gets indexed into Splunk, a ton of people might theoretically see it, but just skim past it because it's not what they were looking for. Picture trying to hunt down an error, and wading through many screens full of logging from different apps - if there's a password in there you could easily not notice if you're not looking for it.

That's all speculation, but it's also entirely plausible. All it takes is accidentally logging out those passwords to somewhere that Splunk can see.

Also not trying to say this is in any way OK or acceptable, just that I can understand how a seemingly-small mistake could quickly result in thousands of people having access to passwords.

rhacker · on March 21, 2019

This should be the main takeaway. If true, then 2000 people decided not to raise the alarm. And 9,000,000 queries with results that had passwords. This completely invalidates the other thread of people discussing giving facebook a pass because accidental logging could happen to any company.

cellularmitosis · on March 21, 2019

> If true, then 2000 people decided not to raise the alarm

Have you ever worked at a software co, discovered <terrible software practice>, raised a flag, and were told "thanks, we've added a ticket to the backlog"?

I'd bet that some of them did flag it.

gwbas1c · on March 21, 2019

Explicitly logging a password is one of those practices that doesn't sit on the backlog.

It's probably a bit more complicated than that. Usually the things that I encounter have to do with how HTTP requests are logged.

For example, putting sensitive information in a URL that's loaded over HTTPS is considered insecure because many companies have policies where they log every URL that their employees visit. (Think of a password reset link.)

A lot of inexperienced programmers don't realize this, because they don't realize that you can man-in-the-middle yourself, and that most corporate computers come preconfigured to allow the employer to man-in-the-middle everyone.

So, if a password reset link never expires, it means that some guy in IT can own an account that was reset on a corporate computer.

(This, basically, is how they catch people viewing porn on their work computers.)

Anyway, my point is that the problem is probably something where a junior programmer transmitted a password in a way that they didn't realize was being logged.

Kalium · on March 22, 2019

> Explicitly logging a password is one of those practices that doesn't sit on the backlog.

If that is your experience, then that's a truly wonderful thing.

Might it be possible that at many companies, teams with deadlines to hit will tend to prioritize feature work over details like this? Perhaps especially so when teams are not rewarded for fixing vulnerabilities and are punished for not meeting deadlines? Particularly when the actual bug at hand is that the full contents of a POST are being logged, and a PM might not read the ticket enough to understand that this includes a password...

Again, you're completely right in every way about what should happen. It just could be possible that this could reflect something other than all experiences all software engineers have had.

sqldba · on March 22, 2019

> Explicitly logging a password is one of those practices that doesn't sit on the backlog.

For you maybe. For Facebook obviously it's different.

Thorrez · on March 22, 2019

>For Facebook obviously it's different.

What evidence do you have for that? Nowhere in the article does it say facebook "explicity logged" passwords. The logging likely happened through some unintended and roundabout process that is far from explicit.

smueller1234 · on March 22, 2019

Why would this be down voted?

I think incompetence over malice is almost certainly the right answer here.

908087 · on March 22, 2019

How many times does a company of Facebook's size get to say "oopsie, we're sorry" before you'll stop giving them to benefit of the doubt? I see this as a malicious disregard for the security and privacy of their users, and their history aligns with that view.

smueller1234 · on March 22, 2019

That's a valid point of view, of course. The failure is certainly well beneath what one should expect Facebook.

Nonetheless, I stick to incompetence over malice. There's just so much more of the former in the world than the latter!

e7ebanlv · on March 22, 2019

Back when I was sysadmin, I catched everyone watching porn by suddenly walking into their offices, but I'm sure your method is better.

deusofnull · on March 21, 2019

You're probably right, and I would love to see that confirmed. FB wouldnt have to link to the Jira ticket or give the name or details of the ticket, but verify that was the case and then explain why nothing was done.

rhacker · on March 21, 2019

Still interesting that not one of them went all out and publicly derided the company. The whistleblower clause would probably have protected them. However our whistleblower laws could use a brush up.

sqldba · on March 22, 2019

If it's not a feature that makes money why spent developer money on it? -- Every manager ever.

GDPR may levy fines but ultimately it hasn't done anything to stop managers focusing on new features and sales income.

weka · on March 21, 2019

How much do you want to bet that some of those FB employees decided to login by turning off "location detection" flag and in an incognito browser?

I would not be surprised. Some of those 9,000,000+ queries and 2,000+ employees must have had some nefarious use. Statistically speaking...

kitsune_ · on March 21, 2019

Judging by the maturity level of the discourse on teamblind this precisely what happened.

sanderjd · on March 21, 2019

Judging by working with hundreds of software engineers over a decade, this seems very unlikely to have happened.

IshKebab · on March 21, 2019

No you're misrepresenting what (probably) happened. They had some system that logged API requests (fine and normal). Some API requests include plaintext passwords (also fine and normal).

The issue is presumably that they had no exclusion to the logging for sensitive information like passwords, which is honestly very easy to overlook.

So two thousand Facebook employees were not "retrieving passwords". They were looking at the API logs, which is a normal thing to do.

planb · on March 22, 2019

This. It's a bit baffling to me that even the HN audience doesn't come to this conclusion immediately - maybe people want to assume the worst because it's Facebook?

IshKebab · on March 22, 2019

Definitely. This always happens when Facebook comes up here. People forget to use their brains and just rant about how evil Facebook is.

C4stor · on March 22, 2019

> Some API requests include plaintext passwords (also fine and normal)

Wait, no ? How is that fine and normal ? It doesn't seem fine and normal all.

yogaboll · on March 22, 2019

It's completely standard. If you encrypt the password in the browser, potential hackers have access to the encryption code which makes it useless.

Check the network tab in dev-tools when logging in to hn for example and you'll see your password in the request body :)

mesozoic · on March 22, 2019

Maybe normal but still not best practice

oldmanbythesea3 · on March 21, 2019

What that quote is saying is that the logs that contained the passwords were accessed by 2,000 engineers. Most of those engineers would only be looking at data relevant to their job, only a security engineer would be in a position to notice the passwords, which is what happened.

jeltz · on March 21, 2019

Isn't it possible that those developers just wanted to look at something harmless like for example failed logins and did not even see or reflect on that the log messages also included the password? But if so it is a bit worrying that 2000 eyes did not spot the bug.

memory_grep · on March 21, 2019

Maybe many people were just using SELECT * FROM ... and got plain text passwords they weren't necessarily looking for.

pard68 · on March 21, 2019

I read the article and this is what I got from it too. I `SELECT *` all the time and see hashed passwords. It is very rare that I need that hash, usually I'm doing something entirely unrelated, but it's just easy to get all the rows and only use what you need, especially when you are troubleshooting and don't even know what you need yet.

zxcvvcxz · on March 21, 2019

Probably some poor engineers trying to check if their partners were cheating on them. Tools like FB make it easier after all.

the_duke · on March 21, 2019

(Edit: reworded a bit to make it clear I don't think this is acceptable)

Sounds a lot like some service was logging the full body of a signup/login request, which then was readable for anyone with access to the logging/tracing infrastructure.

Dumb mistake but it's not hard to imagine this happening, considering that FB probably has a bunch of services involved in the login/signup flow to prevent bots/spam, abuse, etc.

Not to imply this is acceptable, especially at a IT company like FB with vast resources and know how. Raw passwords are an especially big screw-up. There are a lot of failures here, from actually logging something so sensitive over giving access to so many employees to not noticing this for years. (Assuming this was actually log data).

BUT if we are honest, anonymizing log data is rarely a priority. Even if it is, leaking sensitive data can happen easily in a lot of different points in the infrastructure. In actual application code, client + server exception tracing (just imagine a deserialization exception which contains part of the input) , web server, load balancer, proxies, service mesh... There is a lot of interesting stuff hiding in the logs at pretty much every company.

This is a good time to look in the mirror and audit your logging and tracing data. Unless you are in a highly regulated field like finance/healthcare or there is a strong company-wide culture for security/privacy with regular audits already, I can almost guarantee you will find at least one data point that should not be where it is.

Protecting sensitive data needs to be a big consideration for every dev, ops and especially management, which has to allocate enough time for security reviews and audits.

mk89 · on March 21, 2019

> BUT if we are honest, anonymizing log data is rarely a priority. Even if it is, leaking sensitive data can happen easily in a lot of different points in the infrastructure. In actual application code, client + server exception tracing (just imagine a deserialization exception which contains part of the input) , web server, load balancer, proxies, service mesh... There is a lot of interesting stuff hiding in the logs at pretty much every company.

I think so too, and I agree with you.

A couple of months ago we had to discuss with a colleague that wanted to log the request/response body because "without it, it makes debugging almost impossible". We certainly don't log, and we made sure that even exceptions don't spit internal data, and if it happens, it's only an exceptional case, a bug (so far, I didn't see). It's tough, because of course when you have access to plain data you can do anything you want to, but we owe our customers this additional step. Laziness can't be justified in this context.

However, generally speaking, I think it's a common practice in many companies, it's not the first time I see this happen, and I bet it's not gonna be the last.

It's a matter of mindset. The more I read such articles, the more I lose trust in even large companies. Typically smaller companies don't care, in order to move "fast". However, when this happens to large corporations, where does it end? Today it's facebook with the passwords, tomorrow it might be amazon with tons of credit card numbers because of a legacy system not anymore maintained...

SilasX · on March 21, 2019

On that note, it might be better to change the title to "Facebook logged... passwords in plaintext". When you say FB stored them in plaintext, that's generally understood to mean "used plaintext as the primary mode for storage when authenticating users".

kissickas · on March 21, 2019

> readable for anyone with access to the logging/tracing infrastructure

The article says more than 20,000 Facebook employees. A quick search shows that they have around 35k now - do 60% of employees at Facebook really need access to all of those logs?

spdionis · on March 21, 2019

Or is the the article actually correct?

p_rude · on March 21, 2019

You are technically correct, it is a dumb mistake and sadly not that hard to imagine happening. It's also inexcusable, and I would expect even junior engineers to know better than to log credentials as part of request processing

sethammons · on March 21, 2019

I doubt someone wrote "log.print(user.creds)". They probably wrote "log.print(req.args)" in a (what they felt) was an unrelated section of code. Sucks, but could easily happen.

I'd be interested in a system or tooling that could identify that something sensitive made it into a log. I think it is practically impossible, but would be interesting.

tzs · on March 21, 2019

> I'd be interested in a system or tooling that could identify that something sensitive made it into a log. I think it is practically impossible, but would be interesting.

Have all strings printed to logs go through a common checking routine. That checking routine simply checks for the presence of certain hard coded sequences, and raises an alarm if they are found.

Whenever a production system is updated, run a test suite. The tests includes logging in to a test account whose password is one of the aforementioned hard coded sequences. If your system accepts payments, the tests can include a test purchase using a test credit card number that is one of the aforementioned hard coded sequences. In general, for each type of sensitive information, have a test that supplies sensitive information of that type, with that information being one of the hard coded sequences the log checkers checks for.

This won't stop you from accidentally logging sensitive data in production, but it should catch it during the post-deployment tests so you can fix it quickly.

solidasparagus · on March 22, 2019

Interesting, but way harder than it sounds. Very often log systems live as a service for multiple microservices that do real world work. Propagating sentinel data through each of those systems is a nightmare because:

- those systems often have real-world secondary effects

- they sometimes have a tendency to validate away sentinel data prior to logging because the data is no good (e.g. a credit card number that isn't real or a password for a user that doesn't exist), although that can depend on scale and cost

- cross-team coordination of how to handle the sentinel data introduces coupling across teams/services which is contrary to the goal of microservices.

kitsune_ · on March 21, 2019

How is this not a priority? And even worse, how is it possible that a bazillion of the supposedly smartest engineers did not fix it? In many other industry you'd get killed over this as a software engineer.

SpicyLemonZest · on March 21, 2019

I don't buy that. In any other industry, you'd have a hard time even explaining what the problem is. "Someone wrote down my password in plaintext - yeah, that was me, it's sitting right there on my monitor".

dtrailin · on March 21, 2019

If you read the Google Dapper paper from way back in 2010 it has this to say about sensitive information and logging:

> Logging some amount of RPC payload information would enrich Dapper traces since analysis tools might be able to find patterns in payload data which could explain performance anomalies. However, there are several situations where the payload data may contain information that should not be disclosed to unauthorized internal users, including engineers working on performance debugging. Since security and privacy concerns are nonnegotiable, Dapper stores the name of RPC methods but does not log any payload data at this time. Instead, application-level annotations provide a convenient opt-in mechanism: the application developer can choose to associate any data it determines to be useful for later analysis with a span.

Given by how influential this paper was and how it likely influenced FBs own tracing system it's crazy that they would choose an opt out model. To me this system design is their biggest mistake and one could have easily been prevented.

tonyjstark · on March 21, 2019

If I can make sure that passwords are securely stored and nowhere available in plain text even for hobby projects that will never see real users, how can Facebook with allegedly the top developers not make sure the same thing? And if you log something like this then everybody that ever saw these logs should be alarmed and press for changing that. Inexcusable to ever have that going on for more than 2 minutes. In 2012 Facebook was not a small startup with just Zuck zucking along.... This company is just so bad.

deusofnull · on March 21, 2019

The article does clearly address that Twitter and Github had to admit to the same issue but the scope and duration of the problem was far smaller.

> Both Github and Twitter were forced to admit similar stumbles in recent months, but in both of those cases the plain text user passwords were available to a relatively small number of people within those organizations, and for far shorter periods of time.

That is to say that it is somewhat remarkable due to all the servies that use facebook as a login.

lapnitnelav · on March 21, 2019

Can happen for sure. Have seen it happen in other places too.

But this going on for up to 7 years? Yeah, I'm less than impressed by FB.

Had a less than stellar opinion of them before, they are definitely hitting the Mariana trench now.

yashap · on March 21, 2019

100% agree. Logging sensitive data is something that we should never do, but in orgs with 100s-1000s of devs, it’s something that almost always happens. At some point somewhere a piece of middleware or whatever logs full request bodies, query params, etc., and nobody notices. It’s an incredibly common mistake because it’s so easy to catch, and so easy to not notice.

This is still a very significant by Facebook, but as a dev I understand how it happens. It’s a mistake that happens at the vast, vast majority of tech companies.

laurent123456 · on March 21, 2019

Any privacy conscious company would not allow any random data, especially auth data, to be logged. I guess we are used to FB no caring about any of that but it's not normal nor acceptable.

icedchai · on March 21, 2019

How do you stop it and still have an effective development org? Services need to be debugged, so requests and responses need to be logged...

g45y45 · on March 21, 2019

Its pretty easy, you configure your logging library NOT to log the attribute, key/value pair, whatever containing the credential. If you can't modify it on the server side (which you can lazy bones), you tell your central logging system to mask it out before it is written to disk.

This isn't difficult or non-standard. If you are logging all client request/responses full take including auth creds, credit cards, SSN, etc, you are likely doing it wrong, and possibly violating some industry regulations.

laurent123456 · on March 21, 2019

At a company I worked for, if we logged any production data, we had to confirm there was no PII in there and no passwords or tokens, and very few people had access to these logs.

There's many layers of wrong if what FB did: carelessly logging production data, letting thousands of employee accessing these logs, and of all these people apparently none of them cared to mention there was a problem here, or if they did it was ignored by management. They don't have any excuse here.

dymk · on March 21, 2019

You implement your logging infrastructure with awareness of PII and other sensitive information. Whitelisting fields to log would nearly fix it, blacklisting fields would be a bare minimum.

crdrost · on March 21, 2019

This is a good question and I have been at multiple shops which had this bug induced by a "let's just log everything" accident somewhere in the codebase. It's very nice to think of logging as a sort of "aspect" or a middleware that gets deployed across the whole stack, but it's a bit of a mistake.

You have something like four options for fixing it:

1. Every request is responsible for its own logging. This is actually not a bad approach because state-altering requests really need to be logged whereas state-viewing requests are much more optional, they help to try and guess “what were they doing when they ran into this bug they’ve reported?” but mostly they just occupy database rows. The risk is that someone is in a rush and commits something which does not discharge its logging obligation. You can build a system which forces this if you want, “the router will dispatch to your function and one of your function's arguments will be a locally-stateful logger, and once you are finished I will check whether the logger has handled anything, and if not I will log an error. So you should always `$logger->noLoggingNecessary()` somewhere explicitly in the codebase and then if this is wrong it gets caught in code review more consistently.”

2. The sensitive data is used to generate a bearer token and this flow is outsourced to its own un-logged server. You explicitly use the bearer token to construct everything important about the user account in a step before the logging begins, then delete the bearer token from the rest of the request. This flow can actually get really slick: the bearer token can contain the user data, optionally encrypted, with a message authentication code to ensure the user didn't tamper with it: you can then hit a near-empty Redis instance (or a near-empty table) looking for revoked bearer tokens super-fast, since you probably don’t see too much session revocation. So, user data lookup actually becomes unbelievably cheap because it's mostly CPU bound with (check empty key/value store, MAC-or-decrypt, parse body, pass to the handler function).

3. The logging service becomes controller-aware: each controller specifies whether it is supposed to be logged and the logging service just respects that flag and is otherwise global. So it might log that the login controller was accessed, but it doesn't log anything else about the controller.

4. The logging service becomes message-model-aware. This one is actually kind of slick, too, it means that you describe declaratively what sorts of data types are present in the messages that are transmitted to and from the server: and the first thing you do when you get a request is to validate the request against the model you have declared for messages to that request's namespace. So you will have a `validate($model, $value)` function that takes some arbitrary JSON data and a model and returns a normalized version of that data; a natural extension to this traversal that you're already doing (either by returning two normalized results or calling the function with an extra `options={removeSensitiveData: true}` type of argument) will allow you to define in the message-model itself whether the property is sensitive and should never be logged.

throwawaymath · on March 21, 2019

I don't mean to engage in whataboutism here, but unless you're under the impression that no tech company is privacy conscious, what you're saying isn't true (though that'd be a reasonable impression, to be fair).

It's absolutely a security failure and it's not acceptable. But it's not a matter of "allowing" it to happen so much as, "Which vulnerability will we be caught by?" And it actually is a pretty normal vulnerability. For example, Apple[1], GitHub[2] and Twitter[3] have been vulnerable to this exact issue in recent memory.

I also don't mean to be defeatist. This kind of problem is preventable. But it's merely one dumb mistake in a universe of dumb mistakes that leads to serious security failures, all of which are easy to make. The most sophisticated and well-funded information security teams in the world - usually the FAANG teams - still miss things which look pretty silly in isolation.

At this scale being privacy conscious is necessary but insufficient. You can't realistically conclude anything about a company's dedication to privacy based on whether or not it was impacted by this kind of vulnerability. Making a corporate policy to hash passwords in the the database instead of storing them in plaintext is easy to codify, easy to implement and easy to verify. A corporate policy to never log authentication credentials is not nearly as well-defined, even if it's equally as important. That means more mental overhead, disagreement and uncertainty in preventing it. Ultimately, it also means more mistakes can be - and are - made.

________________________

1. https://darthnull.org/security/2014/03/10/cve-2014-1279-touc...

2. https://www.zdnet.com/article/github-says-bug-exposed-accoun...

3. https://arstechnica.com/information-technology/2018/05/twitt...

freeflight · on March 21, 2019

> Sounds a lot like some service was logging the full body of a signup/login request, which then was readable for anyone with access to the logging/tracing infrastructure.

Wouldn't that also include failed login attempts complete with the failed passwords?

I mention this due to FB's history of even logging the entered passwords of failed attempts, which Zuck supposedly used to hack into people's Email accounts [0].

Because if that also applies here, then a whole lot of people just had way more leaked than just their FB password.

[0] https://www.businessinsider.com/henry-blodget-okay-but-youve...

lainga · on March 21, 2019

A written statement from Facebook provided to KrebsOnSecurity says the company expects to notify “hundreds of millions of Facebook light users, tens of millions of other Facebook users, and tens of thousands of Instagram users.”

Does this imply it was a Facebook Lite - specific problem?

gaogao · on March 21, 2019

CaptainZapp · on March 21, 2019

On a scale of Facebook with literally billions of users this is pretty much unexcusable.

This community usually scoffs at entities, which store passwords in plain text. Why do you give Facebook a pass?

EnFinlay · on March 21, 2019

Because there's two types of storing passwords in plain text. There's the "your password is stored in plaintext in the database" way which everyone agrees is 10 different kinds of stupid, and then there's the "we accidentally logged the body of all requests that went through this system, and it turns out login requests came through here" kind. One is a bad security decision (because you must decide how to store passwords in the DB), and the other is a very easy mistake to make which can go unnoticed for a long time (because you can be attempting to log things completely unrelated to logins).

Same same but different.

EpicEng · on March 21, 2019

I've worked in healthcare / biotech for more than a decade and I can promise you that the FDA would see no difference between the two types of gaffes. As custodians of sensitive information it was our responsibility to ensure that said information didn't leak, period.

I don't know anything about FB's infrastructure, but when I was lead I would have viewed leaks in logs as far worse than something in the DB because our DB's were harder to gain access to.

I get what you're saying, but it's irrelevant. Easy to screw up, hard to screw up, doesn't matter; just don't screw up because the result is the same. This stuff is security 101. If you're logging requests then you need to ensure they don't contain sensitive info.

wlesieutre · on March 21, 2019

I don't think anyone's arguing that it's ok, just that it's less "easy to screw up" vs "hard to screw up" and more "did something really stupid on purpose" and "did something really stupid by accident."

But yes, either way the end result is the same and you shouldn't do it.

dumbfounder · on March 21, 2019

Yes but how many people looked at these logs, saw the passwords, and said nothing? If it was logged presumably the logs were viewed from time to time. The mistake is easy to make, but it's also easy to correct if there is a culture to report such mistakes.

code_duck · on March 21, 2019

The article indicates it was searchable and used for something by employees, so most of these comments to the effect of ‘they must have been doing this accidentally because Facebook is just so big therefore they are excused’ are invalid. Some group of people did this on purpose and knew it was happening.

chatmasta · on March 21, 2019

You're probably being downvoted for speculation, but that sounds completely reasonable to me. At least one person had to have noticed a password in a log file they were viewing at some time. Most people viewing log files know that passwords should not be there. It would trigger alarm bells, and depending on the person, also excitement -- you have somebody's facebook password.

It's conceivable that someone then told their friend at work, and a few of these 2,000 developers knew of this secret internal stash of passwords they could access whenever they wanted to "prank" someone on facebook...

YokoZar · on March 21, 2019

Why would humans be reading these log files?

chatmasta · on March 22, 2019

For plenty of reasons, I'm sure. From the article:

> My Facebook insider said access logs showed some 2,000 engineers or developers made approximately nine million internal queries for data elements that contained plain text user passwords.

YokoZar · on March 22, 2019

That doesn't sound like a human reading the log. It sounds like automation.

A script counting the number of successful logins, for instance, could read the same data element that unintentionally contained passwords.

code_duck · on March 22, 2019

We don’t have enough information to make that judgment.

code_duck · on March 22, 2019

When you’re developing a feature like this, don’t you look at the data that you’re logging in to make sure things are working properly? I would imagine that Facebook has many layers of abstraction, but in somewhere, incompetence or inattention must be involved, sometimes known as negligence, if not outright knowledge of this. The definition of that is a complex thing, but if someone could have reasoned that this was logging plaintext passwords, and either saw it and didn’t bother to change things, or didn’t think about it carefully enough to realize that it was doing this, it would be considered negligent. I know that I would feel that my trust in this organization has been betrayed, as a user.

tptacek · on March 21, 2019

The logs being in a searchable index makes it more likely that the password storage was inadvertent, not less. It implies that the primary usage model for the logs was targeted queries, not people starting at the top of the logs and reading down in such a way that nothing could have been missed.

code_duck · on March 22, 2019

My interpretation was that they were using the plaintext password as one of the searchable fields, presumably for development related to authentication.

wlesieutre · on March 21, 2019

That's true. Kerbs article says:

> My Facebook insider said access logs showed some 2,000 engineers or developers made approximately nine million internal queries for data elements that contained plain text user passwords.

meitme · on March 21, 2019

Those numbers are pretty meaningless without some concept of what queries they were making. When I was working with large logs I'd regularly pull huge amounts of logs to only look at maybe a handful of lines. Even then often only scanning for the things I'm specifically working on. Not saying this is what happened just saying those 2,000 employees almost definitely were not doing careful analysis on the complete result of their 9 million queries(And shouldn't have!).

anigbrowl · on March 21, 2019

'Only scanning for the things one is working on' to the point of missing huge security holes or safety risks is exactly that's wrong with the industrial division-of-labor approach. Workers start to assume that even if they do notice a problem, someone else is either going to fix it or has examined and OKed the current procedure. Workers who speak up and assert that something they noticed actually looks more important than whatever task they were originally assigned to are typically punished rather than rewarded, and fired if they make too much of a fuss because otherwise managerial authority might be diminished. Most firms run internally like a dictatorship, and given Facebook's absurd ownership structure (where there are voting and non-voting shares Zuckerberg retains a majority of the voting shares) it's a recipe for dysfunction.

tptacek · on March 21, 2019

This is true but not very interesting. Like literally every large tech company in the entire industry, Facebook has a specialized team of people that work on security --- it has one of the better versions of that team, for what it's worth. Just like ordinary Apple engineers, ordinary Google engineers, ordinary LinkedIn engineers, and ordinary Airbnb engineers, ordinary Facebook engineers don't have an end-to-end picture of how all of security at Facebook works.

There's no evidence that there was whistleblowing suppression about this issue anywhere at Facebook, is there?

mk89 · on March 21, 2019

Out of 2000 users, there was not one that had a look at username/password? Impressive. Also developers during dev. Seriously, this is not justifiable, it's pure laziness if we want to be positive.

vijay_n · on March 21, 2019

What if one among the 2000 was curious ?

rmason · on March 22, 2019

Zuck doesn't believe in hiring employees over 30 because they're not 'smart' about tech. Great example why that's a mistake.

If he did, someone would have seen that with 10-15 years of corporate experience and would have found that simply unacceptable. They also would have had the gravitas to know how to escalate things to the top and gotten it fixed.

EpicEng · on March 21, 2019

I understand that, but I believe they do not have the correct mentality here. I'm not certain how many of the people saying that have worked on systems as sensitive as FB. They think it's easier to screw this up because, for them, it is, but they don't operate at this level.

This sort of thing should not be any easier to screw up when you operate at the scale of FB. If it happens then you have the wrong procedures in place. In my personal example our logs and logging practices/code were audited in the same way our DB layer was. There was no difference; if a system touches sensitive data, it is a massive vulnerability.

Alex3917 · on March 21, 2019

This. If you have two or three engineers at a startup it's a very easy mistake to make. With thousands of engineers it's an impossible mistake to make, so it could have only been done purposely.

discreteevent · on March 21, 2019

"could only have been done purposely". In this case there's also a chance it was negligence. Who gives a shit. Move fast. A negligent culture can probably do more damage than the odd individual doing something deliberately.

JohnFen · on March 21, 2019

And even with a startup of two or three engineers, it's pretty much unforgivable.

Alex3917 · on March 21, 2019

It depends on how many users you have and what sort of data you're protecting. If you have a billion users then sending password reset tokens to your analytics provider is a very serious vulnerability. If you have 100 users and you're running a generic message board then it's not really a vulnerability because the difficulty of that method of attacking the assets under protection almost certainly exceeds their value.

marshray · on March 21, 2019

> that method of attacking the assets under protection almost certainly exceeds their value

This was most likely the reasoning Sony had used (prior to 2012) when deciding how to safeguard the account info of users who had registered for marketing promotions.

It turned out that user credentials are unlike, say, office furniture, in that the cost of a data breach can vastly exceed the "value" of the protected "asset".

Alex3917 · on March 21, 2019

That's the best practice in the security world though. There's no such thing as an unhackable system, only systems where A) the cost of attacking them is more than the value of the assets under protection B) where the time it takes to attack the system is long enough for the attack to be detected.

C.f. Bruce Schneier's book: https://www.schneier.com/books/beyond_fear/

marshray · on March 21, 2019

That book was from 2003.

This is Schneier in 2016: https://www.schneier.com/essays/archives/2016/03/data_is_a_t...

Data Is a Toxic Asset, So Why Not Throw It Out?

"All this makes data a toxic asset, and it continues to be toxic as long as it sits in a company's computers and networks. The data is vulnerable, and the company is vulnerable. It's vulnerable to hackers and governments. It's vulnerable to employee error. And when there's a toxic data spill, millions of people can be affected. The 2015 Anthem Health data breach affected 80 million people. The 2013 Target Corp. breach affected 110 million.

If data is toxic, why do organizations save it?

..."

JohnFen · on March 22, 2019

I really could not disagree with this more.

viraptor · on March 21, 2019

The problem is, you can't ensure that. You can do best effort filtering/scanning only. Let's say you have an API and someone writes a client with a memory corruption bug which results in password and some other field being sent swapped 1 in 10k times. Now you have password logged. You maybe can't even tell exactly where and it's close to impossible to detect automatically. "just don't screw up" is not possible in this case and the chance you'll learn about this soon is really small.

Or if you think a client bug is a stretch: If you're running a large enough website which logs usernames on failed logins, I bet you have at least one password or a concatenated usernamepassword in your logs. Just because someone wasn't paying attention to the text box focus.

btown · on March 21, 2019

Large orgs could have a central knowledge base of all places someone could type a password (as required by code reviews, and perhaps by automated detection of password-type inputs), and a bot which attempts to fill correct and incorrect passwords for a number of canary users, and scans across all logs for plaintext canary passwords. It's shocking that someone like Facebook, which has so many partner-facing and user-facing interfaces, wouldn't have developed something like this.

Free startup idea: build a SaaS for the bot phase of this, and make the log-scanner simple and open source so enterprise security teams can deploy it freely. Offer a certification process and consulting. Hire lobbyists (and folks with inroads into insurance companies) to paint plaintext passwords as the devil and make yourself a de facto legal requirement. And make sure your log-scanner doesn't do any logging of its own :)

EpicEng · on March 21, 2019

You can never ensure anything completely in engineering. Perfection is never the goal, but you can do a much better job than FB has here.

You're talking in hypotheticals. Nowhere did I say "nothing ever goes wrong if you know what you're doing". What about the issue at hand? What's your opinion on passwords being logged on every request for years?

viraptor · on March 21, 2019

> This stuff is security 101. If you're logging requests then you need to ensure they don't contain sensitive info.

If what you meant by that is "best effort" then sure. The issue at hand is only obvious now that we're reading about it. It could easily be missed depending on how their infrastructure works.

EpicEng · on March 21, 2019

>The issue at hand is only obvious now that we're reading about it

No, it's not. It's completely obvious to anyone who has ever worked on a system which deals with sensitive information. It can only "easily be missed" if you're not auditing your code and logs. This stuff is basic to anyone who works in industries where the data is a liability.

pnutjam · on March 21, 2019

This is exactly how Zuck was able to steal login information in his stories about founding facebook.

I find it hard to believe this is accidental.

human · on March 21, 2019

This is borderline illegal, if not simply illegal. What if he entered into Paypal accounts or bank accounts? I don't want a world where if I enter the wrong password, I have companies pouring over my other accounts trying to breach them. Let alone if I reuse a password by accident.

kerng · on March 21, 2019

Do you have a reference? I was wondering the same but have no reference.

freeflight · on March 21, 2019

https://www.businessinsider.com/henry-blodget-okay-but-youve...

arrty88 · on March 21, 2019

Yep i remember this story. So clever and yet so dirty.

ArtDev · on March 21, 2019

Wow! That does not look accidental at all!

oldmanbythesea3 · on March 21, 2019

> I don't know anything about FB's infrastructure

This explains a lot of the comments here. Facebook’s scale is not like anything most engineers have worked on. Facebook probably has logs in the 100s of terabytes. Ensuring that sensitive data isn’t logged takes more than some occasional greps.

freeflight · on March 21, 2019

This feels like a slightly differently framed version of the "Too big to fail" argument and does pretty much nothing for me.

Collecting data on such massive scales is literally FB's whole business.

But with that also comes a responsibility that shouldn't simply be waved away with "But they are so big, it's so difficult!"

Because when it's about monetizing their massive amounts of, often illegally collected, data then FB seem to have no issues having everything in order and getting stuff to work, regardless of how "difficult" it might be.

Probably has to do with the fact that there's no money in protecting users data properly and FB seems to be pretty much immune from negative PR having any bad consequences.

dboreham · on March 21, 2019

>takes more than some occasional greps

Of course at FB scale you'd automate this by creating a set of canary accounts with unique passwords that you perform a search for in the ETL pipeline, or some other handy place. This will at least catch inadvertent plaintext password logging.

PixyMisa · on March 22, 2019

Facebook ops team: "We're waist-deep in dead canaries."

EpicEng · on March 21, 2019

My point is that it's irrelevant. Have more data? Need more auditing. Any system which touches sensitive data is subject to security review. Yes, FB systems are massive. They should have massive oversight as well. You may as well be defending a nuclear meltdown because desiging nuclear power plants is hard.

>Ensuring that sensitive data isn’t logged takes more than some occasional greps.

Right; it requires investment into process, requirements, testing, and oversight. Most importantly, it requires a company wide, top down mentality that your customer's privacy and protection is more important than your margins.

If you can't (won't) dedicate the resources required to ensure your customers data is protected then you have no business operating at such a scale.

ryandrake · on March 21, 2019

Exactly. Can every piece of data logged be tied back to a legitimate business purpose? What’s needed here is a mentality change: These logs should be thought of as liabilities rather than assets. You should log only what you need, while you need it, and then turn off the log when you’re done. If your mentality is “log everything, always, because maybe we’ll need it later” then these privacy and security trash fires should be expected.

viraptor · on March 21, 2019

Log everything: you don't care about privacy.

Treat logs as liabilities: why can't you solve the issue I experienced yesterday?

You can't win. Either you log more than you think you need right now or you can't do engineering investigations on past data. You're going to end up somewhere in the middle realistically.

matz1 · on March 21, 2019

Everything is a tradeoff, how much cost that fb incure due to this incident, I would guess not so much, at least not big enough to warrant massive resources needed.

EpicEng · on March 21, 2019

Well, that's the problem really; they obviously don't care.

NoodleIncident · on March 21, 2019

According to the timeline of events in The Fine Article, this story only exists because someone cared. At many companies the story would end at "one diff reviewer noticed passwords getting logged in one diff". All of the numbers in this story come from an internal investigation to see where else they're making the same mistake, _so they stop doing that_. That's not what you do if you don't care.

EpicEng · on March 21, 2019

If FB truly cared it would likely never have happened, let alone gone on for years. I'm not saying no employee at FB cares; I'm saying FB as an organization doesn't, and we have plenty of "I'm sorry, we'll do better" statements to back that up.

noidea_ · on March 22, 2019

[flagged]

dang · on March 22, 2019

Crossing into personal attack isn't allowed here, regardless of how right you are or feel you are. Most of your comments to HN have unfortunately been like this. Would you mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the spirit of this site to heart? It's particularly important when your points are right, so as not to discredit them.

EpicEng · on March 22, 2019

It's hard to take you seriously when your argument boils down to name calling and how obviously awesome FB is.

zeckalpha · on March 22, 2019

Only 100s of terabytes?

tptacek · on March 21, 2019

I work fairly extensively in health care technology and have since ~2005 and am not aware of the particular regulation that would cause the "FDA" to react to a password storage security gap. Could you be more specific about the regulations you're referring to?

Unsafe logging of PHI would be explicitly problematic, but passwords aren't PHI.

EpicEng · on March 21, 2019

I never said the FDA was specifically interested in passwords. The FDA was brought up in reference to my work with PHI. The FDA does care about how you secure access to sensitive data however. If you leak passwords which allow unauthorized access to PHI, you're going to have a problem.

The analogy was intended to focus on practices which should be employed by companies which handle sensitive data, not the specifics of what the FDA is looking for.

tptacek · on March 21, 2019

These aren't "leaked" passwords; they're logged in plaintext internally, which is not great, but is not the same thing as having leaked them.

EpicEng · on March 22, 2019

"leaked" in the sense that they're somewhere they shouldn't be. I could have chosen a better word. if I were to store database passwords on a file share the entire company had access to we would consider that a "leak" even though they weren't exposed externally.

tptacek · on March 22, 2019

HIPAA/HITECH would not necessarily; the analysis would have to include compensating controls and who had access. The fact pattern in this Facebook case would probably not support a violation. Again: crappy stored passwords aren't PHI.

EpicEng · on March 22, 2019

I think perhaps I wasn't clear enough on the boundaries of my analogy and where it falls apart.

You work in the industry so you know that the FDA is not highly prescriptive. They give you a general outline of, essentially, process. HIPAA goes into more detail, but still, an organization has quite a bit of leeway as to how the meet regulatory requirements.

As far as passwords go, you're correct. Passwords in logs aren't a direct violation as far as I'm aware. If e.g. an employee gained access to data they shouldn't have access to per internal policy and used that data in an illegal manner, _then_ you have a problem.

Still though, I never meant to imply that what happened at FB would violate FDA regs. What I said was:

>I've worked in healthcare / biotech for more than a decade and I can promise you that the FDA would see no difference between the two types of gaffes.

The "two types" in question here were defined by the GP ("easy to make" mistakes like logging and something something about database design.) I wasn't referring to passwords specifically, and I don't believe that this is as bad as e.g. PHI just sitting around for anyone to see.

The analogy was meant to convey this; in a health care environment we are required to secure (encrypt) PHI and PII at rest. If we were found in violation of that, it wouldn't matter if it got there via logs or poor database design.

Again, I realize this is not an FDA/HIPAA situation. My experience with sensitive data is in that sort of environment, and I believe the same sort of mentality should be taken by FB in regards to the security and privacy of their users.

Wow, didn't expect that to get so long. My thumbs aren't made for this.

toomuchtodo · on March 21, 2019

This is how professional software engineering and infrastructure implementation is done, and I appreciate you pointing it out so forcefully.

arkitaip · on March 21, 2019

Corollary: Most web development simply lack the rigor of engineering.

mfoy_ · on March 21, 2019

A lot of HNers probably have the job title "software engineer" without any of the qualifications of an "engineer"... that's probably why you're being down-voted.

But you're not wrong.

toomuchtodo · on March 21, 2019

Most software engineers aren’t. Engineering requires licensing and exams through a governing body. Can’t have the prestige of the title without the responsibility. Software developers want to have their cake and eat it too (power and respect with no oversight and governance).

Pile on the regulation (GDPR for starts, more PII protection to follow up, extremely painful fines for failures).

jjeaff · on March 21, 2019

In the US, you only need certifications and testing for a "professional engineer". Most electrical, chemical, mechanical, or civil engineers are not certified PEs. Yet I don't think anyone is claiming they aren't engineers.

tsbertalan · on March 21, 2019

I have a PhD in chemical engineering, but not a PE, since I didn't go the industry route. I wouldn't feel comfortable calling myself an Engineer without it, and that was the attitude expressed by my professors and cohort coming out of undergrad.

mfoy_ · on March 21, 2019

It's my opinion that the title "Engineer" should be on par with things like "Doctor" or "Lawyer". It should require certification and there should be consequences for claiming to be one if you aren't.

mcbits · on March 22, 2019

I don't think anyone would get in trouble for job titles like "computer doctor" or "network protocol lawyer". Practicing medicine or practicing law (or in some cases, practicing as an engineer) without the required qualifications, regardless of job title ... different story.

balls187 · on March 21, 2019

> Software developers want to have their cake and eat it too (power and respect with no oversight and governance).

This doesn't come from software developers.

This comes from the SV culture of "move fast and break things."

toomuchtodo · on March 21, 2019

I painted too broad strokes. I agree, and admit my mistake.

mfoy_ · on March 21, 2019

Just replace "software developer" with "self-proclaimed software engineer" in your comment and you'd be spot-on.

balls187 · on March 21, 2019

Nothing about engineering really covers security best principles and practices.

I went through classical engineering school, and I'd summarize with three categories:

* Newtonian Physics & The Mathematics Behind It (i.e. Differential Equations)

* Ethics

* Technical Communications

mfoy_ · on March 21, 2019

That's because there is no software engineering program, and other engineering programs don't deal with "security" per se.

balls187 · on March 22, 2019

My alma mater offers a degree in Software Engineering.

jammygit · on March 22, 2019

I think that was actually invented at Facebook in the early days, unless I'm mistaken. Its a good mantra unless its security related

vonmoltke · on March 21, 2019

> Engineering requires licensing and exams through a governing body.

Offering engineering services to the public requires licensure. Working at places like Boeing, General Motors, Texas Instruments, and Caterpillar does not (for the vast majority of positions); It just requires an accredited engineering degree.

Note that this is independent of the software industry needing more engineering rigor, which it desperately does.

EnFinlay · on March 21, 2019

I couldn't agree more.

gridlockd · on March 21, 2019

> Most software engineers aren’t.

"Software Engineer" is just a job title.

> Engineering requires licensing and exams through a governing body.

"Software Engineering" doesn't, it's just a job title.

> Can’t have the prestige of the title without the responsibility.

There is no real-world prestige associated with having "Software Engineer" as a job title (as opposed to "Software Developer").

> Software developers want to have their cake and eat it too (power and respect with no oversight and governance).

"Power and respect" is not actually something that is commonly associated with a guy sitting at a desk for a salary.

> Pile on the regulation (GDPR for starts, more PII protection to follow up, extremely painful fine for failures).

You just learned that Facebook stored passwords in plaintext for years and nothing bad happened. In fact, it's hard to conceive that anything particularly disastrous could have happened. Yet, you call for more regulation that will drive up costs for everything. Whole industries could become unprofitable and those high-paid "Software Engineers" would become redundant.

I have the opposite opinion: The fact that computer security is generally poor teaches people how to properly use computers. They need to assume that there are no secrets, because that is the plain reality. No amount of regulation will make computers secure. No certification or best-practice encryption scheme will prevent users from writing their password on post-it notes, or from opening "sexy.jpg.exe", or from answering that call from the "Microsoft Service Center".

mfoy_ · on March 21, 2019

In Ontario the Professional Engineers (http://www.peo.on.ca/) crack down on unlicensed use of the "Engineer" word in job titles pretty aggressively.

The point is that the public should have confidence in engineers. Software developers who tout themselves as "software engineers" actively piggy-back upon, and undermine this confidence. It's malicious, and you should feel bad for defending the practice.

If you still don't see why, "Lawyer" and "Doctor" are "just job titles" but obviously that's a problem. I trust you can see that.

gao8a · on March 22, 2019

I believe there are exceptions like the "Combat Engineer" in the Canadian Forces.

gridlockd · on March 21, 2019

> In Ontario the Professional Engineers (http://www.peo.on.ca/) crack down on unlicensed use of the "Engineer" word in job titles pretty aggressively.

Yeah well, good for them I guess. I don't think anybody else cares.

> The point is that the public should have confidence in engineers.

Why? "Engineer" is a very broad term used in a variety of occupations for very different things. If you're looking at someone's credentials and you're so uninformed that you can't tell the difference between a licensed engineer in a particular profession and a guy that has "Engineer" written on their business card, you're not qualified to make a decision either way.

> If you still don't see why, "Lawyer" and "Doctor" are "just job titles" but obviously that's a problem. I trust you can see that.

I can see the argument for why a medical doctor or a lawyer should be afforded some amount of protection, because they are directly dealing with laymen. I don't buy the same argument for the word "Engineer", much less "Software Engineer". It just doesn't really mean anything. We already have degrees and certifications for when specifics matter.

jammygit · on March 22, 2019

In Canada, the term engineer is protected, but in the states it is specialization dependent. I went to school for software engineering and I have my ring (I'm not a full engineer for a few more years though).

We focus a lot on testing and I just finished a mandatory ethics workshop. Its a different approach to software than I read about elsewhere. Almost no time spent on algorithms either - bad for interviews, but great for being able to write software that works though.

darkpuma · on March 21, 2019

> "Software Engineer" is just a job title.

And like the job title "sandwich engineer", it should probably be derided.

Apologies to anybody working in the honorable trade of sandwich crafting. I love sandwiches and appreciate your work. They bring me much more enjoyment than software.

randomdata · on March 21, 2019

I agree. The job title is literally _engine_er. Unless you oversee the operation of engines (as in locomotives), you are not an engineer.

But seriously, the definition of engineer in regular lexicon is someone who designs, builds, or maintains complex systems. I struggle to think how software does not tick all of those boxes. Sandwiches are a different story.

Perhaps you are confusing engineer with professional engineer? I can see how it might be easy to mix them up if you are not paying attention, but professional engineer carries a different meaning. Most software engineers are not professional engineers but are unquestionably engineers.

kevin_thibedeau · on March 21, 2019

The word engine signifies a product of ingenuity. A siege engine is a clever device to attack fortifications. Engineers don't just have to be engaged with engines in the modern sense to do engineering. The PE bureaucracy is just gate-keeping by the guild masters.

darkpuma · on March 21, 2019

A 'software engineer' need not even have a college degree, the only qualification for calling yourself a software engineer is finding an employer willing to give you that title. Even non-PE mechanical engineers at least have the bare minimum decency to get an undergraduate engineering degree before calling themselves engineers.

Of course, some people get software engineering degrees, or computer science degrees (that term is another can of worms entirely), but that is not the rule.

mfoy_ · on March 21, 2019

It becomes a meaningless word if anyone can just call themselves one. There has to be some standard, some governing body, some degree of rigour applied here.

randomdata · on March 22, 2019

> It becomes a meaningless word if anyone can just call themselves one.

Only to the extent that all words are meaningless. However, we keep a reference known as a dictionary that helps maintain some consistency around meanings of words.

The Oxford Dictionary defines engineer as:

1. A person who designs, builds, or maintains engines, machines, or structures.

2. A person who controls an engine, especially on an aircraft or ship.

3. A skilful contriver or originator of something.

Is software an engine, machine, or structure? That is debatable, although I would suggest that it does meet the definition of machine. Under that suggestion, software engineer is a perfectly appropriate term. Designing, building, and maintaining is exactly what software engineers do.

If you disagree that software falls under the definition of engine, there is still that third definition. Is software created by a skilful contriver or originator of something? I think that is a definite yes.

> There has to be some standard, some governing body, some degree of rigour applied here.

Professional engineers are expected to display those things. That is unrelated to engineers. Different terms with different meanings. Software engineers may not be professional engineers, although they can be.

darkpuma · on March 22, 2019

> 1. A person who [...] maintains engines, machines, or structures.

So we should call car mechanics "mechanical engineers"? And maintaining structures? With all due respect to the janitors out there, their job is vital to society but it's not engineering.

randomdata · on March 22, 2019

> So we should call car mechanics "mechanical engineers"?

By definition, I don't see why not. I suspect the spirit of maintain may be a little more nuanced, but words are fluid and if someone wants to interpret it that way, then it is so.

> With all due respect to the janitors out there, their job is vital to society but it's not engineering.

If by janitor you are thinking of someone who sweeps the floors, I think that is well beyond the spirit of maintain. That, to me, is cleaning.

Maintenance of a structure is more like addressing a beam that is cracking. I doubt that is a role that an average janitor deals with, and in many cases will legally require a professional engineer to spec the resolution.

But words are fluid, so if you interpret a janitor as someone who maintains a structure, then I guess engineer fits. It really makes no difference either way.

EnFinlay · on March 21, 2019

Professional engineers do engineering.

Title only engineers work on complex systems.

This distinction has largely been lost and most "Software Engineers" think they are doing PE Engineering when they are not.

michaelmrose · on March 21, 2019

Do you think that if someone calls themselves a sandwich engineer people will begin to believe that designing cars is like makings subs? Do you really think the public can't tell the difference?

EnFinlay · on March 22, 2019

I think "Software Engineers" can't tell the difference.

hombre_fatal · on March 21, 2019

Same with electrical, chemical, mechanical, and civil engineers since almost none of them are PEs?

Seems a bit silly.

smrtinsert · on March 22, 2019

Worked in two data sensitive industries as well. There is no excuse, completely agree. This is amateur and completely indicative of a broken dev process

kitsune_ · on March 21, 2019

Data anonymization and de-identification is nothing new. Especially when it comes to logging. I don't get why a lot of you are downplaying it in this thread. This is just as bad as plain text passwords.

hak8or · on March 21, 2019

The diffirence is intent and/or gross negligence. Storing passwords in plain text in a database requires a gross lack of security mentality in both design stages and implementation changes. It is also drilled into people's brains constantly about how bad of an idea it is. To put it simply, it cannot happen accidentally.

Logging like this can easily be attributed to an accident. The person who implemented this logging should get hit with some repercussions because he surely tested the logging and must have seen the passwords when glancing by eye. But other than that, this was clearly a minor oversight.

wrs · on March 21, 2019

Yes. Consider this not-unlikely scenario: The people who implemented the logging did it as a feature of a generic API proxy (there’s no way Facebook implements logging separately for each of their bazillion services!), and no doubt put in a provison for masking sensitive data. They tested it and it worked fine.

Then some devs miles and years away didn’t use that feature properly and accidentally failed to not log passwords in an incoming request. They may not even have been looking at those request logs because that’s not the request they were testing.

Then that feature went into production and this oversight was magnified millions of times.

At large scale you don’t just tail the production log firehose and look for stuff. You have to search for specifics to find anything st all. So if nobody was debugging this thing in production it’s quite plausible nobody saw the passwords in the log.

One way to catch this sort of thing is sentinel data — in this case, have a unique value for a test account’s password and test every service with it, then search everywhere you can think of for that value.

grahn · on March 21, 2019

So here is the thing: It was presumably relatively easy for you to come up with that scenario, which you called "not-unlikely". Then what you do is you put that scenario into your risk analysis when you're designing the authentication architecture, and figure out mitigations to make sure that particular mistake becomes (very) unlikely.

The notion that "it could easily happen" that is being brought up throughout this thread should really only suggests that people aren't doing even rudimentary security assessments (or, hopefully, they're not working with security sensitive software).

If you can't solve it technically, you solve it through processes and training. Same goes for any other industry -- if a construction worker said that it's just one bad morning away from dropping a two tonne girder on a playground, we would never accept that. Or a pilot crashing an airliner into the waiting hall when they're supposed to land. Somehow it seems that large parts of the software industry simply hasn't reached the level of maturity we expect from pretty much all other industries.

Facebook is an enormous company. They should be able to have entire departments working on these topics. It's not a one-person hobby project we're talking about.

wrs · on March 21, 2019

I'm sure they did figure out mitigations. They failed. Things fail. Two airliners just failed rather spectacularly, and that's the very industry you're benchmarking against.

>Somehow it seems that large parts of the software industry simply hasn't reached the level of maturity we expect from pretty much all other industries.

True, but that's a rather broad brush — in terms of actual risk of damages there is nowhere near an equivalence between "airliner crashing into waiting hall" and "logging some plaintext passwords".

Of course the culture, priorities, and domain are also very different between social network engineering and airliner engineering, which is by the way one reason Facebook could grow from nothing to mind-bogglingly gigantic in a decade, while it takes a decade to get just one new airliner into production.

grahn · on March 21, 2019

The point I was making by comparing to a pilot, which I realise I could have expressed a lot more clearly, is that it is perfectly possible to mitigate risks through proper training and procedures even if it's not possible technically. (I.e. all it takes for a plane to crash is to turn the flight controls a few centimetres in the wrong way at the wrong time, yet it almost never happens.)

Of course things fail and people screw up. What I don't agree with are arguments along the lines of this just being a slight oversight, and that those can easily happen. It should require serious failure on multiple levels for anything like this to happen at that scale, if they are implementing things properly, not minor oversight.

wrs · on March 21, 2019

Exactly — my scenario was an example of how failures at multiple levels could have caused this to happen. My "not-unlikely" is meant retroactively — now that it's happened, what's a not-unlikely explanation for how it was allowed to happen in a company the size of Facebook?

I didn't intend to imply it was a "slight oversight" — it's clearly a significant oversight — but there are people saying it's obviously gross negligence because how could this ever happen in a company that wasn't completely incompetent, etc. No, terrible accidents can and do happen even in companies that are trying hard to do a good job. Just like when a 737 crashes, you shouldn't assume Boeing is totally incompetent, but rather that several things must have gone wrong at once.

lacey · on March 21, 2019

> The notion that "it could easily happen" that is being brought up throughout this thread should really only suggests that people aren't doing even rudimentary security assessments

Precisely.

zrobotics · on March 21, 2019

How is this in any way better or more excusable? Calling it a 'minor oversight' could also apply to the DB being in plaintext.

Storing passwords in a plaintext DB heppes in the same manner-a dev is lazy or ignorant of the security reprocussions. Which is what happened here; barring evidence this was done maliciously we can assume that this was accidental. But that doesn't make it any more excusable, since it should be clear that logs can also contain sensitive data that needs to be protected/anonymized.

Considering how selective FB is for hiring, I would hope we could expect a higher standard.

glenneroo · on March 21, 2019

Not to mention 200-600 million users had their passwords exposed for many years. That must be a massive trove of log files.

Aeolun · on March 21, 2019

I see this differently. If someone stores passwords in plaintext in the database, they’re idiots that don’t know any better.

If someone logs your password in plaintext but has it encrypted in the database that’s grossly negligent.

lacey · on March 21, 2019

You assume that storing passwords in plaintext is intent as opposed to gross negligence. In many past instances it’s been intentional and gross negligence because the people making all the design and implementation choices were not knowledgeable about best practices.

cflewis · on March 21, 2019

Because we all know we are one bad morning away from doing it ourselves. People are distinguishing between extreme incompetence (storing plaintext passwords in databases) vs people trying to do their jobs.

It's the same reason when there are large outages, where the comments are split between the enraged customers, and then the ones that are "Man, sucks to be them" as know they could easily have been the one that got the config push wrong.

kitsune_ · on March 21, 2019

The fact that sensitive data like this is just one config push away from being exposed means it's a brittle design (not just the code, but the entire design /review / deployment process - this is a platform with 1 billion users, not your 1-dev WordPress webpage). That said, this is further aggravated by the simple fact that for whatever reason, 20k of their employees had access to these logs.

All in all it just paints a picture of an irresponsible company where their house is not in order.

lostapathy · on March 21, 2019

Exactly. It's one thing not to have enough controls in place to catch something like this on the forum for your warcraft guild. It's another thing entirely when you operate at the scale of Facebook.

untog · on March 21, 2019

"people are distinguishing between extreme vs people trying to do their jobs"

Those things are in no way exclusive. Storing passwords in plaintext is awful practise. Including the contents of form submissions in logging is a similarly awful practise. I really don't see what the difference is.

munk-a · on March 21, 2019

I like logging and at our company we blanket log all GET parameters on every request, similarly I think it's a Good Idea™ to never log POST parameters, but there are some pages where we've started logging some whitelisted ones because they can help diagnose some errors we run across, but, in an environment where all data is coming into the server in JSON blobs attached in request bodies then it can be a bit harder to make any sort of blanket whitelist.

ApolloFortyNine · on March 21, 2019

Earlier in the thread:

>Because there's two types of storing passwords in plain text. There's the "your password is stored in plaintext in the database" way which everyone agrees is 10 different kinds of stupid, and then there's the "we accidentally logged the body of all requests that went through this system, and it turns out login requests came through here" kind.

Basically the conscious decision to store passwords in plain text is worse than unintentionally doing so in logs. One is purposeful, one is not. Yes it's bad, but it's not as bad as if it was done on purpose. And generally, this how most laws are enforced or implemented.

untog · on March 21, 2019

Can't both be attributed to ignorance? You could store passwords in plaintext in a database because you didn't know better. You can store plaintext passwords in logs because you don't know better. If I store passwords in plaintext am I being purposeful or am I being ignorant of best practises?

I think people are drawing a distinction where one doesn't really exist.

ApolloFortyNine · on March 21, 2019

>You can store plaintext passwords in logs because you don't know better.

This whole conversation is about how it likely was not done on purpose, and that it was the result of logging HTTP requests and responses.

If they wrote logger.info(password), yea I'd say that's as outrageous.

908087 · on March 21, 2019

If a single developer is able to push something like this to production without scrutiny in "one bad morning" on a platform with billions of users, the company itself is the problem.

pmontra · on March 21, 2019

I'm working with Rails and Phoenix on two different projects right now and all of them filter out passwords in the log files [1] [2].

I'm also working on a Django project but we're not logging HTTP calls arguments there. I think we could use a filter like [3] but I'd rather have the framework to automatically take care of that.

And if I'd be writing a web app from scratch, no matter if I've been doing web for 25 years, I'm sure I'll do a lot of silly mistakes. That's why I prefer to build on top of frameworks.

[1] https://guides.rubyonrails.org/configuring.html#rails-genera...

[2] https://hexdocs.pm/phoenix/Phoenix.Logger.html

[3] https://djangosnippets.org/snippets/2966/

EnFinlay · on March 21, 2019

You're right, the impact is just as bad. I think people in this thread (including myself) are downplaying it because it's a very easy mistake to make, rather than a terrible design decision.

staticassertion · on March 21, 2019

The major difference is that we don't have standard patterns for protecting against one attack, vs the other.

With password storage everyone knows the patterns, or we expect them to.

With everything between the request and password storage, we don't.

This type of attack could easily be prevented. When secrets come in, immediately store them (ideally at the web framework level) in a type that overrides print/debug formatting. Then add a "get_raw" to it, and you can now grep for that being used anywhere outside of storage to a DB (and your DB libs should take the Secret type too). Or don't use a `get_raw` and instead use a `hash` method that returns a safely hashed version of it.

Further, your secret type could at the very least add a round of SHA256, maybe even with a pepper?, just to be sure.

This isn't hard, I've done it before. The problem is that it isn't something that people feel embarrassed not to do, vs storing plaintext creds.

Impact is the same - creds are plaintext in a DB. Attackers always expect sensitive data in logs, so it isn't as if you'd get lucky and they'd miss this.

aeorgnoieang · on March 21, 2019

> When secrets come in, immediately store them (ideally at the web framework level) ...

But, in practice, a lot of this stuff is being logged by things like proxies that could be several layers in front of the "web framework level".

staticassertion · on March 21, 2019

Do proxies usually log POST data? Are these proxies terminating TLS?

Either way, yeah, you're right - it is not a perfect solution. But it means that for any system your engineers build, so long as they build them using your web framework that imposes this password type, you can grep your codebase for bugs.

You could implement client side hashing as a best-effort "in transit" mechanism but that has obvious downsides. Not sure how I'd feel about that approach in practice, but I can't see a big downside.

newaccoutnas · on March 21, 2019

A mistake that should have been found very quickly, given the eyes. Not for years without ever apparently being known (I find that hard to believe)

EpicEng · on March 21, 2019

When you're FB, Google, ADP, a bank, whatever, you have failed if this sort of thing is "an easy mistake to make." Responsibility should never fall on a single dev or team to begin with. This was going on for _years_.

It's either ametuer hour over there or the organization simply doesn't care enough to invest heavily in the protection of their users.

dunham · on March 21, 2019

This happened to Apple on the desktop, CVE-2014-1317 - In that case, they were logging the body of failed API requests (hex encoded) to a file in /var/log. It turned out that the login request occasionally hit an error, logging your AppleID password.

It was easy to overlook, since as you said, it was intended to log something else (failed API requests), it only happened in the case of an error, and it was hex encoded. I only happened to stumble across it out of curiosity.

captainredbeard · on March 21, 2019

The Apple ID team at Apple is one of the worst.

pmart123 · on March 21, 2019

Interesting. Care to elaborate?

eppsilon · on March 21, 2019

Anecdotally, Apple ID auth on iOS/macOS has been a mess for me. Changing my password would result in multiple login prompts per device. Sometimes inputting the new password works, sometimes another prompt comes up after a few minutes.

Also, though the flexibility of being able to use a different ID for iCloud, home sharing, iTunes, App Store, Messages, etc. is neat, it's pretty annoying to need to set it in each of those places. (And TBH it seems like being able to share purchases/access/etc between IDs is more useful than being able to have separate ones, yet the iCloud family stuff took a while to arrive.)

The issues I've seen aren't as bad now as they used to be, but they haven't left a good impression.

scriptkiddy · on March 21, 2019

> and the other is a very easy mistake to make which can go unnoticed for a long time (because you can be attempting to log things completely unrelated to logins).

This is not an excuse. If you're logging all request data, you need to strip or encrypt sensitive information in that request data. Handling Persistence of sensitive data is web development 101. Just because it's not in a database doesn't give you a pass to leave it unprotected.

This level of incompetency is unacceptable.

pwg · on March 21, 2019

Different causes, identical security implications for the user's passwords that were improperly stored.

Both are security fails, regardless of the cause of the fail.

And a company the size of, and with the resources of, facebook, most certainly should not get a pass for the "oops, we logged more than we should have" cause. A corp. of their size, and with their resources, should be doing password handling correctly, every single time, no exceptions, no excuses.

dcow · on March 21, 2019

The point perhaps is that they’re both terrible security designs and the impact to the user is the same regardless of how the “accident” happened. Nobody in FB’s position gets a pass for being irresponsible and negligent.

EnFinlay · on March 21, 2019

I disagree that they are both terrible security designs. We expect every company to not use plaintext for auth. We do not expect every company to have infra/ops setup to prevent logging on login requests.

aequitas · on March 21, 2019

By extending this logic, a car manufacturer should be blamed for not designing proper brakes for their car. But if a worker then accidentally installs the breaks wrong they are not responsible?

Imho, a company (especially as big as facebook) should have the right process and procedures to prevent these kind of problems and ensure developers have proper training to make them aware of the consequences of their actions.

EnFinlay · on March 21, 2019

Interesting analogy and I see your point. If a car manufacturer's process made it easy to install the brakes wrong they would be held responsible (probably with a recall or damages for lives lost due to faulty brakes).

I guess part of this is that passwords aren't considered that important to many people :(

EpicEng · on March 21, 2019

Who is "we" in this scenario? Governing bodies do. Engineers I've worked with in health care do, as do our PM's, security officers, etc.

I designed a clinical testing platform a couple of years ago. Our initial requirements stated very clearly that PHI and PII were not to appear in logs. This is basic stuff for anyone who actually works at this scale / level of sensitivity.

nck4222 · on March 21, 2019

>We do not expect every company to have infra/ops setup to prevent logging on login requests.

What? I absolutely expect every company to not log my password in plain text. In my 15 years as a developer across several companies and industries, I have never seen anybody log passwords, or advocate for logging passwords.

I'm struggling to think why any employee of any company should be able to view a plain text password in any form. Why would there not be an expectation here?

viraptor · on March 21, 2019

You're taking about expectation not to log the password. That's fine.

The parent was taking about expectation about infrastructure that validates this. This is both very uncommon and impossible to do 100% correctly. You can scan logs for a prefix (password=), you can do entropy counting, you can try to decode hex values in text. But if you find a base64 encoded hex string representing "foobar" - how do you even know it's a password?

Short of trying all possible decodings of all possible substrings against your full password database, this is an impossible task. (You can do best-effort things though)

nck4222 · on March 21, 2019

Ah ok thanks, I misunderstood.

dcow · on March 21, 2019

That’s certifiably depressing.

EnFinlay · on March 21, 2019

It might be depressing, but I think it's true. What do you think?

EpicEng · on March 21, 2019

It is not true in my experience. It may seem true to the typical CRUD app web dev, but do they really know what it's like to be a custodian of so much sensitive information?

code_duck · on March 21, 2019

I would expect FB to have that, though.

SonicSoul · on March 21, 2019

there's two types of storing passwords in plain text

nope.

there are n types of storing passwords in plain text (including storing them in memory).

you just named 2.

all n types fall under infosec responsibility to prevent, and none are acceptable. there is no pass because logs were less intentional than storing in a table

munk-a · on March 21, 2019

I think they are indeed the same outcome but I'd count plain-text DB passwords as grossly negligent while as this was merely negligent. Clearly both are terrible and deserve a shaming, but not hashing passwords going into a DB is a decision that someone pretty much necessarily had to have the full scope of and made a bad decision on. This logging issue could potentially be attributed to a mistake in team communication (i.e. the people we hired to auto-log requests didn't realize that `/login` should be blacklisted for logging and for some reason we never explicitly told them to do so).

So I agree that these actions are indeed same but different.

They ultimately speak to a failure at Facebook, but it doesn't speak to utter incompetence at Facebook.