It wasn't actually stolen as it says in the README. It was a misconfigured Apache server which leaked raw, unprocessed PHP code. I received the code to home.php and profile.php but I didn't save it at the time (I was very very new to learning PHP and didn't realize the significance of what I was looking at).
I knew the significance the moment I saw the `<?php` but force of habit, I had already pressed shift + F5 by the time my brain registered. Somehow never got the code again, maybe some sort of load balancer was leaking something that was cached but just barely.
I want to be clear: I don't care, and I doubt Facebook cares.
But legally, I think this code was stolen. Facebook owns the copyright to the source code, so copying and distributing is theft in the same way that copying and distributing database contents is theft.
> In many states, "theft" is an umbrella term that includes all different kinds of criminal taking. This is the case in New York. Under the New York Codes, theft can be any type of taking, like identity theft, theft of intellectual property, theft of services and theft of personal property
If you're a mathematician and not a lawyer you might think those are different operations. But lawyers, judges, and juries have a unique capacity to argue that you're guilty of subtraction even if you only multiplied.
Does it matter? How is a random person on the internet sufficiently qualified to define theft in a complex domain like digital copyright and intellectual property? You don't have to be a lawyer to be skeptical of what someone says online.
My point is if they replied “yes” where would you be? It’s not inappropriate to be skeptical but asking a person for credentials online is next to useless.
Even if they were a lawyer, lawyers are frequently wrong in their interpretation of law, which is why they argue the relevant points in court, and a judge gets to decide which of them is right.
> Hah, clearly business success isn't a function of code quality
Mind you, early Facebook wasn’t exactly a “tech business”—the code wasn’t making them any money, such that having a bug in the code would make them less money.
Really, Facebook only became a “tech business” once they got into 1. Messaging, and 2. Advertising. Then they had SLAs, and breaking those SLAs meant losing users/customers; and their ability to deliver on those SLAs became directly related to the quality of their code.
I work as a programmer in the games industry and I’ve noticed that frequently as well. Commercially successful games aren’t always a strong indicator of code quality and often it tends to be that the bad code “luckily didn’t matter” in cases of success.
Business success isn't a function of technical/artistic quality in any field. There's a loose correlation sure, but there are hundreds or thousands of examples where poorly written/made books, TV shows, music, films, games etc also did really well commercially as well as the opposite. Many would say something like Firefly is better than something like Twilight, but one made millions of dollars and the other got cancelled in two seasons.
Knowing what to make and who to market it to will usually matter a lot more than how you build it.
> is line 89 of search.php valid?? "$user 0 && ..." ? aren't we missing an comparison operator?
Good job. And perhaps that's the culprit. Everybody assumed it was a plain syntax error, but I don't think it's possible. Rather, it seems me that:
if ($user <something else ... ) {
was written here
but we'll never know because
the browser mistook it for an HTML tag,
and the user probably copied page contents
instead of saving it.
};
if (user_was_working_at_facebook_in_2007())
possibly_confirm('?');
/* lots of missing code may follow... until */
if (who knows what > 0 && ...
Maybe. I think that's less likely than syntax error, for two reasons.
#1, if you look in index.php, there's a < on line 228 and a > on line 258, correctly rendered. Granted, the < is part of a <=, which weakens that argument.
#2, if you look at the surrounding code, $user > 0 makes logical sense given what the code is doing (and assuming 0 represents an invalid/nonexistent userid, which I believe it does given that facebook userids increase monotonically).
Except that a syntax error isn't likely at all - this was production code.
BTW just running "php -l" would have nailed this error - PHP refuses to run a script that cannot pass the tokenization step, resulting in a blank page (and the error being logged), so such a macroscopic error would probably be short-lived even in their test environment.
Amazing that they initially wrote this code and now to join FB you need to answer questions based on backtracking and dynamic programming. :) I wonder if they could do the questions themselves back then.
Dynamic programming questions are explicitly not used in current Facebook interviews.
From time to time, Facebook and other companies study the effectiveness of their hiring process by comparing employee performance and interview performance. I believe dynamic programming questions were removed because there was not a strong link between success in this question and future performance.
That's super cool, do you know if any of those results / methodology were every publicly published? I've been trying to find prior art in interview analytics and data driven hiring in general to try to improve things at my current workplace.
There was a point in time (roughly 2015-2016 timeframe when dynamic programming interview questions were really in vogue. Every interview loop from smallish startup to leviathan corporation felt like it had at least one.
This sounds very counter-intuitive! Did you hear this from someone working at Facebook or did you read it online? If it's the latter it'd be great if you could share a link!
> It also has a very interesting property as an adjective, and that is it's impossible to use the word dynamic in a pejorative sense. Try thinking of some combination that will possibly give it a pejorative meaning. It's impossible.
I don't exactly understand why this is getting downvoted, I'm genuinely curious if companies have published anything on that.
Also for me, it _is_ counter-intuitive, since DP was one of the hardest things for me when I started with programming, and I'd at least expect a non-trivial correlation between being good at DP and job performance, that's why I was asking about where I can read more about this.
I used to know PHP, and this code is very indicative of the imperative style that was popular during that era. I believe the PHP crowd has mostly gone deeply into OOP. With that said, Facebook mostly worked remarkably well. My bank, on the other hand has their web presence written in Java. And it works about 80% of the time. Sometimes one just has to try twice. So, "good code", "bad code" will always take a back seat to "working code". (Not that I'd want to maintain this beast)
I guess what's really at play is whether or not the code is sloppy because the developers are making a conscious decision to not refactor yet, or it's sloppy because they don't know what they're doing.
I've found lots of great developers write huge, sloppy, 300 line methods, if it gets the job done. I love clean code, but too much abstraction is a liability unless there's a good reason to introduce it. Part of expertise is knowing when it's okay to break the rules.
But, no, that's not at all what I'm saying. I'm saying, one language is extremely well thought out, and offers great guard-rails for doing OOP very well. And yet, the code is broken enough that sometimes I just have to give up on getting my account balance. Yet, Facebook from that era typically "just worked"... except when it didn't, and everyone would lose their minds. My point was, code that works, even if ugly, is what the customer cares about.
I'm still on the fence if I'd recommend anyone use PHP... Swoole performs very well. Laravel has a great community. It's not my cup of tea but, I try to keep an eye on it.
It is 12 years old - these were just before the first versions that had any kind of serious OO capability, however most apps were already built in the "Imperative" style that this is.
The hate for PHP on that gist is strong - admittedly back in the early noughties it wasn't fantastic but it was the shortest route to getting a functional website put up. PHP was simple and had a very low barrier to entry.
Latest versions are much better - The speed is one of its biggest draws, since version 7 it has taken the crown as fastest interpreted language, I believe version 8 will improve on this even more
> It is 12 years old - these were just before the first versions that had any kind of serious OO capability
PHP5 is the release where most of the OO improvements happened (but even PHP 4.3 had a bunch) and PHP 5.0 was released 3+ years before this code snippet leaked.
I wrote a heavily OO based version of Yahoo! Address Book using PHP 5.1 in 2005-2006, so it was definitely doable and fairly common to do so by that time.
It uses global variables all over the place and includes other files and variables can start stepping over the others. PHP had class support since 2004 (code seems to be from 2007) and they just write function names in global space with bad naming like "redirect" where you can't tell which part of the system it's from.
The whole code is written top to bottom without breaking them up into functions for clarity. You can't even tell which variable is for local use and which is meant to carry some state further down the lines.
Comments are joke too when you read words like "holy shit", "cool stuff", "FIXME?" and "retarded", you can guess the programmer isn't really a focused person and comments read like a personal note instead of trying to give clues to others.
A good code is a code that you feel like maintaining on first sight and this apparently is far from it.
Well ... this 'joke' code created a multi billion dollar business and sowed the seed for a platform that is shaping our world in ways unimaginable (in more harmful ways than good, but that is beside the point) ... you can't complain!
"relatively well organized" is the problem. It's written in a fairly unstructured imperative style that's perfectly suitable for short scripts, but even these relatively small pieces of code are outgrowing it. The extensive use of global variables is an obvious thing that I would expect to become a problem soon. It's much less of a problem in php than in most other web frameworks due to the design of handling each HTTP request in a fresh process (so things like the use of $user in both index.php and search.php can't conflict), but it's still quite fragile.
By the standards of 2007 PHP this is good code, and today there's probably lots of worse PHP being written (even if the community as a whole has moved on quite a bit).
By the standards of early 2004 (when Facebook was originally written) this is still mediocre PHP. People who cared about writing good PHP were reading magazines like this (2003):
They knew global variables and hundreds of lines of top-level code were bad PHP, they just didn't know yet that trying to ape Java wasn't good PHP. If I remember correctly, DHH came out of this period: he got so sick of enterprisey PHP that he jumped ship to Ruby and wrote Rails. That was 2005.
Fair enough that Facebook hadn't done a major rewrite by 2007, but still: not good code.
Because most of the time you don't get hired at FB for a specific team, you might end up working on PHP sure, but the interview process for SWE is general and they test you on algorithms regardless.
Your post seems to indicate that "working on PHP" and "algorithms" are disjoint. Why? If you're writing code with many users, then algorithmic knowledge helps, regardless of language.
Although I will say I think dynamic programming is not a good interview topic. I think it's basically proxy for "got a CS degree from certain schools". But algorithms in general are valid to ask about.
I would almost say you need to understand algorithms better while programming in PHP because the language obscures certain things. Ditto for programming in JS -- for example to encode certain algorithms in JS you will run into the fact that it has only has floats, no ints.
People used to think you couldn't write algorithms in JS either. Now there are LALR(1) parser generators in JS, etc.
Is there anybody from 2007 still working at Facebook desperately scared they'll be found out for not being good at algorithms or are these people all in management now? ;)
The way that technical learning works... if they aren't stubborn (and probably fired because of it) then they've likely adapted. Developers grow in knowledge over time as they hone their craft and every senior dev can easily call up some terrible crap they wrote when just getting started.
It's also likely that even at the time the devs working on this wanted to start refactoring more - it's one thing to recognize bad code, having the muscle within a company to allocate resources to fixing that code is a different matter.
The lack of `chroot`[1] makes me sad off the bat - for some reason that function seems like a secret, everyone actually wants to use it (or wanted to before autoloading became as easy as it is) but nobody did.
Additionally I'd love to see that file split up into smaller chunks simply to lower the scope of thought.
It looks like nearly all of those function calls are modifying variables passed by reference instead of resolving the value out via `return` this isn't bad and is indistinguishable at a technical level in terms of functionality, but it's a kind of horrible approach from expressability.
They're doing things with datetime that are unsafe and wrong (like assuming 246060 is the number of seconds in a day) but people getting datetime logic wrong is as old as... well time.
Oh, and you've got some pretty bizarre looking function signatures - I'm sure there is a reason for this but I'd want to ask some questions about this one...
It's possible to go through this and nitpick a bunch of stuff, it looks like it's mostly just an older style though. The big problems aren't here though... I'm not seeing any reads into $_POST (and `param_get_slashed` looks like a nice function for sanitizing input) - additionally, I'm not seeing a single line of SQL nor am I seeing any memcached calls, so the data access layer may already be well isolated architecturally.
In theory, yes. But that's still bad, because it means that a nontrivial amount of your application code (as well as whatever is launching it, like the PHP-FPM server or the web server) is running as root.
It's decent as far as 2000s era PHP goes. If you want to poke your eyes out, look at the source of some of the PHP web forum software out there.
The leaked code ignores pretty much all design patterns and software architecture norms. But not having to deal with that, and being able to cobble something together in a hurry, was the appeal of PHP. You can always rewrite later. Nowadays Facebook's landing page is served by highly tuned binaries compiled from C++. If they decided in 2007 to start out with that, we probably would not know who Zuckerberg is.
It seems to me (with absolutely no experience) that architecture and design patterns are more trouble than they're worth for code in the low thousands of LoC, which this appears to be.
It's not really bad for the time it was written, but today you wouldn't want to write PHP code like this.
Lots of global variables that can clobber each other (if one bit of code, even in one of the includes, redefines $user, everything explodes), no classes, no code autoloader...
How are classes are a sign of superior code? How is an autoloader a sign of superior code? Keep in mind, this is old PHP and those things were new or nonexistent.
I always find this kind of stuff interesting.
Albert Gonzales broke into a bunch of my work's servers (for years) at my first job, after a customer of ours pissed him off. I had some AIM coversations and lurked/logged in one of his advertised IRC hangouts. Most of the transcripts went to the secret service, which was very interested in his Credit Card fraud activities (as advertised on IRC).
Pretty sure all their code is intensely different now - they did write the HHVM engine in 2011 and I wouldn't be surprised if they ported as much logic as possible to that and added strict typing over it all.
I'm not even sure they could do that. What would be the difference between that and some megacorp issuing a takedown-request for an internal document leaked by the press?
I know it's just two of the files, but taking into account the rest of the pages for the FB app in 2007, it does not seem like a lot of code. One or two people could have written and maintained a project that size. What were all the new hires doing from 2005-2007?
The company wasn't very large at that time. Probably less than 100 engineers in 2007. One person could understand the bulk of the codebase in a a reasonable amount of time.
There was way more code than you're seeing here though, note all the includes at the top.
What always strikes me is how sloppy and poorly written commercial code is when it is leaked - esp compared to open source projects. Looks like someone’s CS200 project a lot of the time.
I was also writing some spaghetti PHP code around 2007 and remember making choices that were the benefit of reducing server stress while making my life harder. No framework was a big one.
Remember, Friendster died due to crashing and Facebook, for all its spaghetti code at the time, was remarkably solid and rarely crashed. It is easy to forget how hard the sysadmin side was for a growing startups before we could spin up infinite cloud servers and VC wasn't just an open spigot for anything with growth.
Still really cool to see.