Facebook PHP Source Code from August 2007

thrusong · on Jan 30, 2020

It wasn't actually stolen as it says in the README. It was a misconfigured Apache server which leaked raw, unprocessed PHP code. I received the code to home.php and profile.php but I didn't save it at the time (I was very very new to learning PHP and didn't realize the significance of what I was looking at).

Still really cool to see.

therein · on Jan 31, 2020

Same happened to me.

I knew the significance the moment I saw the `<?php` but force of habit, I had already pressed shift + F5 by the time my brain registered. Somehow never got the code again, maybe some sort of load balancer was leaking something that was cached but just barely.

jhardy54 · on Jan 30, 2020

I want to be clear: I don't care, and I doubt Facebook cares.

But legally, I think this code was stolen. Facebook owns the copyright to the source code, so copying and distributing is theft in the same way that copying and distributing database contents is theft.

But again:

jaywalk · on Jan 30, 2020

Your definition of theft is wrong. Legally, this was not stolen.

krsdcbl · on Jan 30, 2020

Copyright infringement and theft are two completely different things, legally.

I know specially the film and music industry put a lot of effort to equalise those terms in the media, yet they remain two separate things.

shawnz · on Jan 30, 2020

Are you a lawyer?

disconnected · on Jan 30, 2020

You don't have to be a lawyer to understand that subtraction and multiplication are two completely different operations.

shawnz · on Jan 30, 2020

Are you sure?

https://legalbeagle.com/8608294-difference-between-larceny-t...

> In many states, "theft" is an umbrella term that includes all different kinds of criminal taking. This is the case in New York. Under the New York Codes, theft can be any type of taking, like identity theft, theft of intellectual property, theft of services and theft of personal property

LeifCarrotson · on Jan 30, 2020

If you're a mathematician and not a lawyer you might think those are different operations. But lawyers, judges, and juries have a unique capacity to argue that you're guilty of subtraction even if you only multiplied.

To a lawyer, bits have color: https://ansuz.sooke.bc.ca/entry/23.

naniwaduni · on Jan 31, 2020

You can totally find a mathematician to convince you that those are the same operation! Probably easier than the lawyer, even.

webkike · on Jan 30, 2020

are you?

ampersandy · on Jan 30, 2020

Does it matter? How is a random person on the internet sufficiently qualified to define theft in a complex domain like digital copyright and intellectual property? You don't have to be a lawyer to be skeptical of what someone says online.

webkike · on Jan 31, 2020

My point is if they replied “yes” where would you be? It’s not inappropriate to be skeptical but asking a person for credentials online is next to useless.

nineteen999 · on Jan 31, 2020

Even if they were a lawyer, lawyers are frequently wrong in their interpretation of law, which is why they argue the relevant points in court, and a judge gets to decide which of them is right.

nicky0 · on Jan 30, 2020

Copyright infringement =/= Theft.

zug_zug · on Jan 30, 2020

One way to look at this is to say "Hah, how shameful."

The other way to look at this is to say "Hah, clearly business success isn't a function of code quality"

derefr · on Jan 30, 2020

> Hah, clearly business success isn't a function of code quality

Mind you, early Facebook wasn’t exactly a “tech business”—the code wasn’t making them any money, such that having a bug in the code would make them less money.

Really, Facebook only became a “tech business” once they got into 1. Messaging, and 2. Advertising. Then they had SLAs, and breaking those SLAs meant losing users/customers; and their ability to deliver on those SLAs became directly related to the quality of their code.

lxm · on Jan 30, 2020

As a counterpoint, code choices and perceived site slowness contributed to Friendster's and MySpace's decline. Both had messaging and advertising.

krapp · on Jan 30, 2020

>"Hah, clearly business success isn't a function of code quality"

They aren't nearly as strongly correlated as many developers might like to believe.

danbolt · on Jan 31, 2020

I work as a programmer in the games industry and I’ve noticed that frequently as well. Commercially successful games aren’t always a strong indicator of code quality and often it tends to be that the bad code “luckily didn’t matter” in cases of success.

seppin · on Jan 30, 2020

I prefer to think of it as "perfectly coded" projects have 0 commercial value until someone figures out a customer for it.

CM30 · on Jan 31, 2020

Business success isn't a function of technical/artistic quality in any field. There's a loose correlation sure, but there are hundreds or thousands of examples where poorly written/made books, TV shows, music, films, games etc also did really well commercially as well as the opposite. Many would say something like Firefly is better than something like Twilight, but one made millions of dollars and the other got cancelled in two seasons.

Knowing what to make and who to market it to will usually matter a lot more than how you build it.

cia-killer · on Jan 31, 2020

This code is from 2007, honestly it's not that bad for 2007 standards. I've seen worse code that was written with Node last year

aruggirello · on Jan 30, 2020

> is line 89 of search.php valid?? "$user 0 && ..." ? aren't we missing an comparison operator?

Good job. And perhaps that's the culprit. Everybody assumed it was a plain syntax error, but I don't think it's possible. Rather, it seems me that:

    if ($user <something else ... ) {
        was written here
        but we'll never know because
        the browser mistook it for an HTML tag,
        and the user probably copied page contents
        instead of saving it.
    };

    if (user_was_working_at_facebook_in_2007())
        possibly_confirm('?');

    /* lots of missing code may follow... until */

    if (who knows what > 0 && ...

earenndil · on Jan 31, 2020

Maybe. I think that's less likely than syntax error, for two reasons.

#1, if you look in index.php, there's a < on line 228 and a > on line 258, correctly rendered. Granted, the < is part of a <=, which weakens that argument.

#2, if you look at the surrounding code, $user > 0 makes logical sense given what the code is doing (and assuming 0 represents an invalid/nonexistent userid, which I believe it does given that facebook userids increase monotonically).

aruggirello · on Jan 31, 2020

Except that a syntax error isn't likely at all - this was production code.

BTW just running "php -l" would have nailed this error - PHP refuses to run a script that cannot pass the tokenization step, resulting in a blank page (and the error being logged), so such a macroscopic error would probably be short-lived even in their test environment.

treyp · on Jan 31, 2020

not sure what the reason was, but it's simply a missing >. other files were leaked, and you can see the same pattern elsewhere:

  if ($user > 0 && is_unregistered($user))

searching the web for this comment might be helpful if you'd like more source code:

  $user can be < 0 in AIM mode, just use 0 there

warpech · on Jan 31, 2020

This might be due to the code pasted in some markdown editor at a point. This often strips the < and > characters.

growt · on Jan 30, 2020

So there's an unknown amount of Spaghetti Code still missing from this mess?

shimylining · on Jan 30, 2020

Amazing that they initially wrote this code and now to join FB you need to answer questions based on backtracking and dynamic programming. :) I wonder if they could do the questions themselves back then.

angf · on Jan 30, 2020

Dynamic programming questions are explicitly not used in current Facebook interviews.

From time to time, Facebook and other companies study the effectiveness of their hiring process by comparing employee performance and interview performance. I believe dynamic programming questions were removed because there was not a strong link between success in this question and future performance.

lordCarbonFiber · on Jan 30, 2020

That's super cool, do you know if any of those results / methodology were every publicly published? I've been trying to find prior art in interview analytics and data driven hiring in general to try to improve things at my current workplace.

mountainofdeath · on Jan 30, 2020

There was a point in time (roughly 2015-2016 timeframe when dynamic programming interview questions were really in vogue. Every interview loop from smallish startup to leviathan corporation felt like it had at least one.

duderific · on Jan 30, 2020

Sadly we are still doing them at my company. I feel like we lose good candidates because of it.

akhilcacharya · on Jan 31, 2020

Last loop I had there was 3 explicitly DP problems, and this was in 2017/2018.

fkfaduc · on Jan 30, 2020

This sounds very counter-intuitive! Did you hear this from someone working at Facebook or did you read it online? If it's the latter it'd be great if you could share a link!

dzlobin · on Jan 30, 2020

I don't know of any online links to point to but I am an active interviewer at FB and I know we don't ask DP questions

ceejayoz · on Jan 30, 2020

Why is it counter-intuitive that people who interview well may still not function well in a specific organization?

objektif · on Jan 30, 2020

What is counterintuitive about this? That best dynamic programmers are the absolute best programmers? I mean come on man..

SegFaultx64 · on Jan 30, 2020

Seems like pretty clearly GP isn't familiar with the definition of dynamic programming in this context: https://en.wikipedia.org/wiki/Dynamic_programming

naniwaduni · on Jan 31, 2020

Key takeaway:

> It also has a very interesting property as an adjective, and that is it's impossible to use the word dynamic in a pejorative sense. Try thinking of some combination that will possibly give it a pejorative meaning. It's impossible.

How times have changed!

objektif · on Jan 31, 2020

Whats GP here are you referring to me?

fkfaduc · on Jan 31, 2020

I don't exactly understand why this is getting downvoted, I'm genuinely curious if companies have published anything on that.

Also for me, it _is_ counter-intuitive, since DP was one of the hardest things for me when I started with programming, and I'd at least expect a non-trivial correlation between being good at DP and job performance, that's why I was asking about where I can read more about this.

asdfman123 · on Jan 30, 2020

I didn't read it carefully and I don't know much PHP, but is the code really that bad? There's all kinds of worse code out there running everything.

As long as it's relatively well organized, you can worry about refactoring as you scale up.

nobleach · on Jan 30, 2020

I used to know PHP, and this code is very indicative of the imperative style that was popular during that era. I believe the PHP crowd has mostly gone deeply into OOP. With that said, Facebook mostly worked remarkably well. My bank, on the other hand has their web presence written in Java. And it works about 80% of the time. Sometimes one just has to try twice. So, "good code", "bad code" will always take a back seat to "working code". (Not that I'd want to maintain this beast)

asdfman123 · on Jan 30, 2020

I guess what's really at play is whether or not the code is sloppy because the developers are making a conscious decision to not refactor yet, or it's sloppy because they don't know what they're doing.

I've found lots of great developers write huge, sloppy, 300 line methods, if it gets the job done. I love clean code, but too much abstraction is a liability unless there's a good reason to introduce it. Part of expertise is knowing when it's okay to break the rules.

barbarbar · on Jan 30, 2020

So the bank would be better of with php?

nobleach · on Jan 31, 2020

Well.... the math may work more in my favor.

But, no, that's not at all what I'm saying. I'm saying, one language is extremely well thought out, and offers great guard-rails for doing OOP very well. And yet, the code is broken enough that sometimes I just have to give up on getting my account balance. Yet, Facebook from that era typically "just worked"... except when it didn't, and everyone would lose their minds. My point was, code that works, even if ugly, is what the customer cares about.

I'm still on the fence if I'd recommend anyone use PHP... Swoole performs very well. Laravel has a great community. It's not my cup of tea but, I try to keep an eye on it.

xs83 · on Jan 31, 2020

It is 12 years old - these were just before the first versions that had any kind of serious OO capability, however most apps were already built in the "Imperative" style that this is.

The hate for PHP on that gist is strong - admittedly back in the early noughties it wasn't fantastic but it was the shortest route to getting a functional website put up. PHP was simple and had a very low barrier to entry.

Latest versions are much better - The speed is one of its biggest draws, since version 7 it has taken the crown as fastest interpreted language, I believe version 8 will improve on this even more

jsjohnst · on Jan 31, 2020

> It is 12 years old - these were just before the first versions that had any kind of serious OO capability

PHP5 is the release where most of the OO improvements happened (but even PHP 4.3 had a bunch) and PHP 5.0 was released 3+ years before this code snippet leaked.

I wrote a heavily OO based version of Yahoo! Address Book using PHP 5.1 in 2005-2006, so it was definitely doable and fairly common to do so by that time.

mekster · on Jan 31, 2020

This code is a joke.

It uses global variables all over the place and includes other files and variables can start stepping over the others. PHP had class support since 2004 (code seems to be from 2007) and they just write function names in global space with bad naming like "redirect" where you can't tell which part of the system it's from.

The whole code is written top to bottom without breaking them up into functions for clarity. You can't even tell which variable is for local use and which is meant to carry some state further down the lines.

Comments are joke too when you read words like "holy shit", "cool stuff", "FIXME?" and "retarded", you can guess the programmer isn't really a focused person and comments read like a personal note instead of trying to give clues to others.

A good code is a code that you feel like maintaining on first sight and this apparently is far from it.

palerdot · on Jan 31, 2020

Well ... this 'joke' code created a multi billion dollar business and sowed the seed for a platform that is shaping our world in ways unimaginable (in more harmful ways than good, but that is beside the point) ... you can't complain!

mekster · on Jan 31, 2020

I don't complain, it's the FB devs that probably complained and had to do a complete rewrite of this joke at some point.

plorkyeran · on Jan 30, 2020

"relatively well organized" is the problem. It's written in a fairly unstructured imperative style that's perfectly suitable for short scripts, but even these relatively small pieces of code are outgrowing it. The extensive use of global variables is an obvious thing that I would expect to become a problem soon. It's much less of a problem in php than in most other web frameworks due to the design of handling each HTTP request in a fresh process (so things like the use of $user in both index.php and search.php can't conflict), but it's still quite fragile.

By the standards of 2007 PHP this is good code, and today there's probably lots of worse PHP being written (even if the community as a whole has moved on quite a bit).

evunveot · on Jan 30, 2020

> By the standards of 2007 PHP this is good code

By the standards of early 2004 (when Facebook was originally written) this is still mediocre PHP. People who cared about writing good PHP were reading magazines like this (2003):

https://www.phparch.com/magazine/2003-2/june/

and having discussions like this (2004):

https://www.sitepoint.com/community/t/to-hopefully-clear-up-...

They knew global variables and hundreds of lines of top-level code were bad PHP, they just didn't know yet that trying to ape Java wasn't good PHP. If I remember correctly, DHH came out of this period: he got so sick of enterprisey PHP that he jumped ship to Ruby and wrote Rails. That was 2005.

Fair enough that Facebook hadn't done a major rewrite by 2007, but still: not good code.

chubot · on Jan 30, 2020

What do those things have to do with each other?

Code can't use algorithms because it's written in PHP? I'm not following.

shimylining · on Jan 30, 2020

Because most of the time you don't get hired at FB for a specific team, you might end up working on PHP sure, but the interview process for SWE is general and they test you on algorithms regardless.

chubot · on Jan 30, 2020

Your post seems to indicate that "working on PHP" and "algorithms" are disjoint. Why? If you're writing code with many users, then algorithmic knowledge helps, regardless of language.

Although I will say I think dynamic programming is not a good interview topic. I think it's basically proxy for "got a CS degree from certain schools". But algorithms in general are valid to ask about.

I would almost say you need to understand algorithms better while programming in PHP because the language obscures certain things. Ditto for programming in JS -- for example to encode certain algorithms in JS you will run into the fact that it has only has floats, no ints.

People used to think you couldn't write algorithms in JS either. Now there are LALR(1) parser generators in JS, etc.

lhnz · on Jan 30, 2020

Good point.

Is there anybody from 2007 still working at Facebook desperately scared they'll be found out for not being good at algorithms or are these people all in management now? ;)

munk-a · on Jan 30, 2020

The way that technical learning works... if they aren't stubborn (and probably fired because of it) then they've likely adapted. Developers grow in knowledge over time as they hone their craft and every senior dev can easily call up some terrible crap they wrote when just getting started.

It's also likely that even at the time the devs working on this wanted to start refactoring more - it's one thing to recognize bad code, having the muscle within a company to allocate resources to fixing that code is a different matter.

smacktoward · on Jan 30, 2020

The trick in any business is to get it to a point where you can afford to hire people smarter than you are.

iudqnolq · on Jan 30, 2020

Can someone explain to a newbie why this code is so bad? Reading through it it seemed to generally make sense and not be too complicated.

munk-a · on Jan 30, 2020

The lack of `chroot`[1] makes me sad off the bat - for some reason that function seems like a secret, everyone actually wants to use it (or wanted to before autoloading became as easy as it is) but nobody did.

Additionally I'd love to see that file split up into smaller chunks simply to lower the scope of thought.

It looks like nearly all of those function calls are modifying variables passed by reference instead of resolving the value out via `return` this isn't bad and is indistinguishable at a technical level in terms of functionality, but it's a kind of horrible approach from expressability.

They're doing things with datetime that are unsafe and wrong (like assuming 246060 is the number of seconds in a day) but people getting datetime logic wrong is as old as... well time.

Oh, and you've got some pretty bizarre looking function signatures - I'm sure there is a reason for this but I'd want to ask some questions about this one...

    $permissions = privacy_get_reduced_network_permissions($user, $user);

It's possible to go through this and nitpick a bunch of stuff, it looks like it's mostly just an older style though. The big problems aren't here though... I'm not seeing any reads into $_POST (and `param_get_slashed` looks like a nice function for sanitizing input) - additionally, I'm not seeing a single line of SQL nor am I seeing any memcached calls, so the data access layer may already be well isolated architecturally.

1. https://www.php.net/manual/en/function.chroot.php

duskwuff · on Jan 30, 2020

`chroot()` has no place in a web application. The system call requires the process to be running as root.

voltagex_ · on Jan 31, 2020

Can you call that and then drop permissions?

duskwuff · on Jan 31, 2020

In theory, yes. But that's still bad, because it means that a nontrivial amount of your application code (as well as whatever is launching it, like the PHP-FPM server or the web server) is running as root.

zelly · on Jan 30, 2020

It's decent as far as 2000s era PHP goes. If you want to poke your eyes out, look at the source of some of the PHP web forum software out there.

The leaked code ignores pretty much all design patterns and software architecture norms. But not having to deal with that, and being able to cobble something together in a hurry, was the appeal of PHP. You can always rewrite later. Nowadays Facebook's landing page is served by highly tuned binaries compiled from C++. If they decided in 2007 to start out with that, we probably would not know who Zuckerberg is.

iudqnolq · on Jan 30, 2020

It seems to me (with absolutely no experience) that architecture and design patterns are more trouble than they're worth for code in the low thousands of LoC, which this appears to be.

ceejayoz · on Jan 30, 2020

It's not really bad for the time it was written, but today you wouldn't want to write PHP code like this.

Lots of global variables that can clobber each other (if one bit of code, even in one of the includes, redefines $user, everything explodes), no classes, no code autoloader...

cosmotic · on Jan 31, 2020

How are classes are a sign of superior code? How is an autoloader a sign of superior code? Keep in mind, this is old PHP and those things were new or nonexistent.

jszymborski · on Jan 30, 2020

Some interesting comments:

> // Holy shit, is this the cleanest fucking frontend file you've ever seen?!

https://gist.github.com/nikcub/3833406#file-search-php-L72

wonderment · on Jan 30, 2020

I always liked Nik's comments and was wondering why he had stopped posting here.

https://news.ycombinator.com/threads?id=nikcub

https://www.zdnet.com/article/security-consultant-granted-ba...

https://www.zdnet.com/article/goget-hacker-sentenced-to-400-...

Supermancho · on Jan 30, 2020

I always find this kind of stuff interesting. Albert Gonzales broke into a bunch of my work's servers (for years) at my first job, after a customer of ours pissed him off. I had some AIM coversations and lurked/logged in one of his advertised IRC hangouts. Most of the transcripts went to the secret service, which was very interested in his Credit Card fraud activities (as advertised on IRC).

https://usa.kaspersky.com/resource-center/threats/top-ten-gr...

rahuldottech · on Jan 30, 2020

2013 HN discussion: https://news.ycombinator.com/item?id=6538270

tmpz22 · on Jan 30, 2020

From index.php, an if statement for one particular user

    // Merman's Admin profile always links to the Merman's home
    if (user_has_obj_attached($user)) {
          redirect('mhome.php', 'www');
    }

epriest · on Jan 30, 2020

"Merman" was an internal project codename, not an individual user. I think it was a very early version of the feature that eventually became "Pages".

jabyess · on Jan 30, 2020

From the article linked in the gist:

> This leak is not good news for Facebook, as it raises the question of how secure a Facebook users private data really is.

I don't even think I need to comment on how poorly this has aged.

trustfundbaby · on Jan 30, 2020

Could you elaborate? I thought it was an amazing foreboding of exactly how poor Facebook turned out to be with handling user privacy

jabyess · on Jan 30, 2020

That's basically what I meant, sorry. It's an insight into what could have been a problem back then, and clearly became a serious problem.

joeblau · on Jan 30, 2020

> Worth preserving as part of Internet history.

Can't Facebook just issue a takedown request and have these files removed?

ceejayoz · on Jan 30, 2020

It wouldn't be worth the cost of the letter.

This code's as valuable as a Blockbuster card by now.

veeralpatel979 · on Jan 30, 2020

How would it help them?

I _highly_ doubt they are running this code in production today.

riffraff · on Jan 30, 2020

what for? It's not like it would be particularly dangerous to see code which is 13 year old.

munk-a · on Jan 30, 2020

Pretty sure all their code is intensely different now - they did write the HHVM engine in 2011 and I wouldn't be surprised if they ported as much logic as possible to that and added strict typing over it all.

ocdtrekkie · on Jan 30, 2020

It’s worth noting that Windows occasionally is inflicted by discovered vulnerabilities that are over twenty years old.

Not sure how applicable that would be to Facebook’s codebase over this much time. But worth noting.

giarc · on Jan 30, 2020

Just look at Swift and all the Cocoa classes. Many are prefixed with NS, which comes from the NeXTSTEP days.

fkfaduc · on Jan 30, 2020

I'm not even sure they could do that. What would be the difference between that and some megacorp issuing a takedown-request for an internal document leaked by the press?

hyggemonster · on Jan 30, 2020

Reminds me that they had the poke feature.

ifaxmycodetok8s · on Jan 30, 2020

they still have the poke feature

tdevito · on Jan 30, 2020

I know it's just two of the files, but taking into account the rest of the pages for the FB app in 2007, it does not seem like a lot of code. One or two people could have written and maintained a project that size. What were all the new hires doing from 2005-2007?

ahupp · on Jan 30, 2020

The company wasn't very large at that time. Probably less than 100 engineers in 2007. One person could understand the bulk of the codebase in a a reasonable amount of time.

There was way more code than you're seeing here though, note all the includes at the top.

zxcvbn4038 · on Jan 31, 2020

What always strikes me is how sloppy and poorly written commercial code is when it is leaked - esp compared to open source projects. Looks like someone’s CS200 project a lot of the time.

Lavomk · on Feb 1, 2020

Looking at this makes me not missing php but somehow it's like back in time.

sideproject · on Jan 31, 2020

Is this Mark Z's code? Wonder why he didn't use a framework and went with pure PHP.

lubujackson · on Jan 31, 2020

I would assume the answer was speed.

I was also writing some spaghetti PHP code around 2007 and remember making choices that were the benefit of reducing server stress while making my life harder. No framework was a big one.

Remember, Friendster died due to crashing and Facebook, for all its spaghetti code at the time, was remarkably solid and rarely crashed. It is easy to forget how hard the sysadmin side was for a growing startups before we could spin up infinite cloud servers and VC wasn't just an open spigot for anything with growth.

Remember the fail whale on Twitter?

sandes · on Jan 30, 2020

I think NipAlert of big head had better code.