Hacker News new | past | comments | ask | show | jobs | submit login
Facebook PHP Source Code from August 2007 (gist.github.com)
229 points by patrickdevivo on Jan 30, 2020 | hide | past | favorite | 96 comments



It wasn't actually stolen as it says in the README. It was a misconfigured Apache server which leaked raw, unprocessed PHP code. I received the code to home.php and profile.php but I didn't save it at the time (I was very very new to learning PHP and didn't realize the significance of what I was looking at).

Still really cool to see.


Same happened to me.

I knew the significance the moment I saw the `<?php` but force of habit, I had already pressed shift + F5 by the time my brain registered. Somehow never got the code again, maybe some sort of load balancer was leaking something that was cached but just barely.


I want to be clear: I don't care, and I doubt Facebook cares.

But legally, I think this code was stolen. Facebook owns the copyright to the source code, so copying and distributing is theft in the same way that copying and distributing database contents is theft.

But again:


Your definition of theft is wrong. Legally, this was not stolen.


Copyright infringement and theft are two completely different things, legally.

I know specially the film and music industry put a lot of effort to equalise those terms in the media, yet they remain two separate things.


Are you a lawyer?


You don't have to be a lawyer to understand that subtraction and multiplication are two completely different operations.


Are you sure?

https://legalbeagle.com/8608294-difference-between-larceny-t...

> In many states, "theft" is an umbrella term that includes all different kinds of criminal taking. This is the case in New York. Under the New York Codes, theft can be any type of taking, like identity theft, theft of intellectual property, theft of services and theft of personal property


If you're a mathematician and not a lawyer you might think those are different operations. But lawyers, judges, and juries have a unique capacity to argue that you're guilty of subtraction even if you only multiplied.

To a lawyer, bits have color: https://ansuz.sooke.bc.ca/entry/23.


You can totally find a mathematician to convince you that those are the same operation! Probably easier than the lawyer, even.


are you?


Does it matter? How is a random person on the internet sufficiently qualified to define theft in a complex domain like digital copyright and intellectual property? You don't have to be a lawyer to be skeptical of what someone says online.


My point is if they replied “yes” where would you be? It’s not inappropriate to be skeptical but asking a person for credentials online is next to useless.


Even if they were a lawyer, lawyers are frequently wrong in their interpretation of law, which is why they argue the relevant points in court, and a judge gets to decide which of them is right.


Copyright infringement =/= Theft.


One way to look at this is to say "Hah, how shameful."

The other way to look at this is to say "Hah, clearly business success isn't a function of code quality"


> Hah, clearly business success isn't a function of code quality

Mind you, early Facebook wasn’t exactly a “tech business”—the code wasn’t making them any money, such that having a bug in the code would make them less money.

Really, Facebook only became a “tech business” once they got into 1. Messaging, and 2. Advertising. Then they had SLAs, and breaking those SLAs meant losing users/customers; and their ability to deliver on those SLAs became directly related to the quality of their code.


As a counterpoint, code choices and perceived site slowness contributed to Friendster's and MySpace's decline. Both had messaging and advertising.


>"Hah, clearly business success isn't a function of code quality"

They aren't nearly as strongly correlated as many developers might like to believe.


I work as a programmer in the games industry and I’ve noticed that frequently as well. Commercially successful games aren’t always a strong indicator of code quality and often it tends to be that the bad code “luckily didn’t matter” in cases of success.


I prefer to think of it as "perfectly coded" projects have 0 commercial value until someone figures out a customer for it.


Business success isn't a function of technical/artistic quality in any field. There's a loose correlation sure, but there are hundreds or thousands of examples where poorly written/made books, TV shows, music, films, games etc also did really well commercially as well as the opposite. Many would say something like Firefly is better than something like Twilight, but one made millions of dollars and the other got cancelled in two seasons.

Knowing what to make and who to market it to will usually matter a lot more than how you build it.


This code is from 2007, honestly it's not that bad for 2007 standards. I've seen worse code that was written with Node last year


> is line 89 of search.php valid?? "$user 0 && ..." ? aren't we missing an comparison operator?

Good job. And perhaps that's the culprit. Everybody assumed it was a plain syntax error, but I don't think it's possible. Rather, it seems me that:

    if ($user <something else ... ) {
        was written here
        but we'll never know because
        the browser mistook it for an HTML tag,
        and the user probably copied page contents
        instead of saving it.
    };

    if (user_was_working_at_facebook_in_2007())
        possibly_confirm('?');

    /* lots of missing code may follow... until */

    if (who knows what > 0 && ...


Maybe. I think that's less likely than syntax error, for two reasons.

#1, if you look in index.php, there's a < on line 228 and a > on line 258, correctly rendered. Granted, the < is part of a <=, which weakens that argument.

#2, if you look at the surrounding code, $user > 0 makes logical sense given what the code is doing (and assuming 0 represents an invalid/nonexistent userid, which I believe it does given that facebook userids increase monotonically).


Except that a syntax error isn't likely at all - this was production code.

BTW just running "php -l" would have nailed this error - PHP refuses to run a script that cannot pass the tokenization step, resulting in a blank page (and the error being logged), so such a macroscopic error would probably be short-lived even in their test environment.


not sure what the reason was, but it's simply a missing >. other files were leaked, and you can see the same pattern elsewhere:

  if ($user > 0 && is_unregistered($user))
searching the web for this comment might be helpful if you'd like more source code:

  $user can be < 0 in AIM mode, just use 0 there


This might be due to the code pasted in some markdown editor at a point. This often strips the < and > characters.


So there's an unknown amount of Spaghetti Code still missing from this mess?


Amazing that they initially wrote this code and now to join FB you need to answer questions based on backtracking and dynamic programming. :) I wonder if they could do the questions themselves back then.


Dynamic programming questions are explicitly not used in current Facebook interviews.

From time to time, Facebook and other companies study the effectiveness of their hiring process by comparing employee performance and interview performance. I believe dynamic programming questions were removed because there was not a strong link between success in this question and future performance.


That's super cool, do you know if any of those results / methodology were every publicly published? I've been trying to find prior art in interview analytics and data driven hiring in general to try to improve things at my current workplace.


There was a point in time (roughly 2015-2016 timeframe when dynamic programming interview questions were really in vogue. Every interview loop from smallish startup to leviathan corporation felt like it had at least one.


Sadly we are still doing them at my company. I feel like we lose good candidates because of it.


Last loop I had there was 3 explicitly DP problems, and this was in 2017/2018.


This sounds very counter-intuitive! Did you hear this from someone working at Facebook or did you read it online? If it's the latter it'd be great if you could share a link!


I don't know of any online links to point to but I am an active interviewer at FB and I know we don't ask DP questions


Why is it counter-intuitive that people who interview well may still not function well in a specific organization?


What is counterintuitive about this? That best dynamic programmers are the absolute best programmers? I mean come on man..


Seems like pretty clearly GP isn't familiar with the definition of dynamic programming in this context: https://en.wikipedia.org/wiki/Dynamic_programming


Key takeaway:

> It also has a very interesting property as an adjective, and that is it's impossible to use the word dynamic in a pejorative sense. Try thinking of some combination that will possibly give it a pejorative meaning. It's impossible.

How times have changed!


Whats GP here are you referring to me?


I don't exactly understand why this is getting downvoted, I'm genuinely curious if companies have published anything on that.

Also for me, it _is_ counter-intuitive, since DP was one of the hardest things for me when I started with programming, and I'd at least expect a non-trivial correlation between being good at DP and job performance, that's why I was asking about where I can read more about this.


I didn't read it carefully and I don't know much PHP, but is the code really that bad? There's all kinds of worse code out there running everything.

As long as it's relatively well organized, you can worry about refactoring as you scale up.


I used to know PHP, and this code is very indicative of the imperative style that was popular during that era. I believe the PHP crowd has mostly gone deeply into OOP. With that said, Facebook mostly worked remarkably well. My bank, on the other hand has their web presence written in Java. And it works about 80% of the time. Sometimes one just has to try twice. So, "good code", "bad code" will always take a back seat to "working code". (Not that I'd want to maintain this beast)


I guess what's really at play is whether or not the code is sloppy because the developers are making a conscious decision to not refactor yet, or it's sloppy because they don't know what they're doing.

I've found lots of great developers write huge, sloppy, 300 line methods, if it gets the job done. I love clean code, but too much abstraction is a liability unless there's a good reason to introduce it. Part of expertise is knowing when it's okay to break the rules.


So the bank would be better of with php?


Well.... the math may work more in my favor.

But, no, that's not at all what I'm saying. I'm saying, one language is extremely well thought out, and offers great guard-rails for doing OOP very well. And yet, the code is broken enough that sometimes I just have to give up on getting my account balance. Yet, Facebook from that era typically "just worked"... except when it didn't, and everyone would lose their minds. My point was, code that works, even if ugly, is what the customer cares about.

I'm still on the fence if I'd recommend anyone use PHP... Swoole performs very well. Laravel has a great community. It's not my cup of tea but, I try to keep an eye on it.


It is 12 years old - these were just before the first versions that had any kind of serious OO capability, however most apps were already built in the "Imperative" style that this is.

The hate for PHP on that gist is strong - admittedly back in the early noughties it wasn't fantastic but it was the shortest route to getting a functional website put up. PHP was simple and had a very low barrier to entry.

Latest versions are much better - The speed is one of its biggest draws, since version 7 it has taken the crown as fastest interpreted language, I believe version 8 will improve on this even more


> It is 12 years old - these were just before the first versions that had any kind of serious OO capability

PHP5 is the release where most of the OO improvements happened (but even PHP 4.3 had a bunch) and PHP 5.0 was released 3+ years before this code snippet leaked.

I wrote a heavily OO based version of Yahoo! Address Book using PHP 5.1 in 2005-2006, so it was definitely doable and fairly common to do so by that time.


This code is a joke.

It uses global variables all over the place and includes other files and variables can start stepping over the others. PHP had class support since 2004 (code seems to be from 2007) and they just write function names in global space with bad naming like "redirect" where you can't tell which part of the system it's from.

The whole code is written top to bottom without breaking them up into functions for clarity. You can't even tell which variable is for local use and which is meant to carry some state further down the lines.

Comments are joke too when you read words like "holy shit", "cool stuff", "FIXME?" and "retarded", you can guess the programmer isn't really a focused person and comments read like a personal note instead of trying to give clues to others.

A good code is a code that you feel like maintaining on first sight and this apparently is far from it.


Well ... this 'joke' code created a multi billion dollar business and sowed the seed for a platform that is shaping our world in ways unimaginable (in more harmful ways than good, but that is beside the point) ... you can't complain!


I don't complain, it's the FB devs that probably complained and had to do a complete rewrite of this joke at some point.


"relatively well organized" is the problem. It's written in a fairly unstructured imperative style that's perfectly suitable for short scripts, but even these relatively small pieces of code are outgrowing it. The extensive use of global variables is an obvious thing that I would expect to become a problem soon. It's much less of a problem in php than in most other web frameworks due to the design of handling each HTTP request in a fresh process (so things like the use of $user in both index.php and search.php can't conflict), but it's still quite fragile.

By the standards of 2007 PHP this is good code, and today there's probably lots of worse PHP being written (even if the community as a whole has moved on quite a bit).


> By the standards of 2007 PHP this is good code

By the standards of early 2004 (when Facebook was originally written) this is still mediocre PHP. People who cared about writing good PHP were reading magazines like this (2003):

https://www.phparch.com/magazine/2003-2/june/

and having discussions like this (2004):

https://www.sitepoint.com/community/t/to-hopefully-clear-up-...

They knew global variables and hundreds of lines of top-level code were bad PHP, they just didn't know yet that trying to ape Java wasn't good PHP. If I remember correctly, DHH came out of this period: he got so sick of enterprisey PHP that he jumped ship to Ruby and wrote Rails. That was 2005.

Fair enough that Facebook hadn't done a major rewrite by 2007, but still: not good code.


What do those things have to do with each other?

Code can't use algorithms because it's written in PHP? I'm not following.


Because most of the time you don't get hired at FB for a specific team, you might end up working on PHP sure, but the interview process for SWE is general and they test you on algorithms regardless.


Your post seems to indicate that "working on PHP" and "algorithms" are disjoint. Why? If you're writing code with many users, then algorithmic knowledge helps, regardless of language.

Although I will say I think dynamic programming is not a good interview topic. I think it's basically proxy for "got a CS degree from certain schools". But algorithms in general are valid to ask about.

I would almost say you need to understand algorithms better while programming in PHP because the language obscures certain things. Ditto for programming in JS -- for example to encode certain algorithms in JS you will run into the fact that it has only has floats, no ints.

People used to think you couldn't write algorithms in JS either. Now there are LALR(1) parser generators in JS, etc.


Good point.

Is there anybody from 2007 still working at Facebook desperately scared they'll be found out for not being good at algorithms or are these people all in management now? ;)


The way that technical learning works... if they aren't stubborn (and probably fired because of it) then they've likely adapted. Developers grow in knowledge over time as they hone their craft and every senior dev can easily call up some terrible crap they wrote when just getting started.

It's also likely that even at the time the devs working on this wanted to start refactoring more - it's one thing to recognize bad code, having the muscle within a company to allocate resources to fixing that code is a different matter.


The trick in any business is to get it to a point where you can afford to hire people smarter than you are.


Can someone explain to a newbie why this code is so bad? Reading through it it seemed to generally make sense and not be too complicated.


The lack of `chroot`[1] makes me sad off the bat - for some reason that function seems like a secret, everyone actually wants to use it (or wanted to before autoloading became as easy as it is) but nobody did.

Additionally I'd love to see that file split up into smaller chunks simply to lower the scope of thought.

It looks like nearly all of those function calls are modifying variables passed by reference instead of resolving the value out via `return` this isn't bad and is indistinguishable at a technical level in terms of functionality, but it's a kind of horrible approach from expressability.

They're doing things with datetime that are unsafe and wrong (like assuming 246060 is the number of seconds in a day) but people getting datetime logic wrong is as old as... well time.

Oh, and you've got some pretty bizarre looking function signatures - I'm sure there is a reason for this but I'd want to ask some questions about this one...

    $permissions = privacy_get_reduced_network_permissions($user, $user);
It's possible to go through this and nitpick a bunch of stuff, it looks like it's mostly just an older style though. The big problems aren't here though... I'm not seeing any reads into $_POST (and `param_get_slashed` looks like a nice function for sanitizing input) - additionally, I'm not seeing a single line of SQL nor am I seeing any memcached calls, so the data access layer may already be well isolated architecturally.

1. https://www.php.net/manual/en/function.chroot.php


`chroot()` has no place in a web application. The system call requires the process to be running as root.


Can you call that and then drop permissions?


In theory, yes. But that's still bad, because it means that a nontrivial amount of your application code (as well as whatever is launching it, like the PHP-FPM server or the web server) is running as root.


It's decent as far as 2000s era PHP goes. If you want to poke your eyes out, look at the source of some of the PHP web forum software out there.

The leaked code ignores pretty much all design patterns and software architecture norms. But not having to deal with that, and being able to cobble something together in a hurry, was the appeal of PHP. You can always rewrite later. Nowadays Facebook's landing page is served by highly tuned binaries compiled from C++. If they decided in 2007 to start out with that, we probably would not know who Zuckerberg is.


It seems to me (with absolutely no experience) that architecture and design patterns are more trouble than they're worth for code in the low thousands of LoC, which this appears to be.


It's not really bad for the time it was written, but today you wouldn't want to write PHP code like this.

Lots of global variables that can clobber each other (if one bit of code, even in one of the includes, redefines $user, everything explodes), no classes, no code autoloader...


How are classes are a sign of superior code? How is an autoloader a sign of superior code? Keep in mind, this is old PHP and those things were new or nonexistent.


Some interesting comments:

> // Holy shit, is this the cleanest fucking frontend file you've ever seen?!

https://gist.github.com/nikcub/3833406#file-search-php-L72



I always find this kind of stuff interesting. Albert Gonzales broke into a bunch of my work's servers (for years) at my first job, after a customer of ours pissed him off. I had some AIM coversations and lurked/logged in one of his advertised IRC hangouts. Most of the transcripts went to the secret service, which was very interested in his Credit Card fraud activities (as advertised on IRC).

https://usa.kaspersky.com/resource-center/threats/top-ten-gr...



From index.php, an if statement for one particular user

    // Merman's Admin profile always links to the Merman's home
    if (user_has_obj_attached($user)) {
          redirect('mhome.php', 'www');
    }


"Merman" was an internal project codename, not an individual user. I think it was a very early version of the feature that eventually became "Pages".


From the article linked in the gist:

> This leak is not good news for Facebook, as it raises the question of how secure a Facebook users private data really is.

I don't even think I need to comment on how poorly this has aged.


Could you elaborate? I thought it was an amazing foreboding of exactly how poor Facebook turned out to be with handling user privacy


That's basically what I meant, sorry. It's an insight into what could have been a problem back then, and clearly became a serious problem.


> Worth preserving as part of Internet history.

Can't Facebook just issue a takedown request and have these files removed?


It wouldn't be worth the cost of the letter.

This code's as valuable as a Blockbuster card by now.


How would it help them?

I _highly_ doubt they are running this code in production today.


what for? It's not like it would be particularly dangerous to see code which is 13 year old.


Pretty sure all their code is intensely different now - they did write the HHVM engine in 2011 and I wouldn't be surprised if they ported as much logic as possible to that and added strict typing over it all.


It’s worth noting that Windows occasionally is inflicted by discovered vulnerabilities that are over twenty years old.

Not sure how applicable that would be to Facebook’s codebase over this much time. But worth noting.


Just look at Swift and all the Cocoa classes. Many are prefixed with NS, which comes from the NeXTSTEP days.


I'm not even sure they could do that. What would be the difference between that and some megacorp issuing a takedown-request for an internal document leaked by the press?


Reminds me that they had the poke feature.


they still have the poke feature


I know it's just two of the files, but taking into account the rest of the pages for the FB app in 2007, it does not seem like a lot of code. One or two people could have written and maintained a project that size. What were all the new hires doing from 2005-2007?


The company wasn't very large at that time. Probably less than 100 engineers in 2007. One person could understand the bulk of the codebase in a a reasonable amount of time.

There was way more code than you're seeing here though, note all the includes at the top.


What always strikes me is how sloppy and poorly written commercial code is when it is leaked - esp compared to open source projects. Looks like someone’s CS200 project a lot of the time.


Looking at this makes me not missing php but somehow it's like back in time.


Is this Mark Z's code? Wonder why he didn't use a framework and went with pure PHP.


I would assume the answer was speed.

I was also writing some spaghetti PHP code around 2007 and remember making choices that were the benefit of reducing server stress while making my life harder. No framework was a big one.

Remember, Friendster died due to crashing and Facebook, for all its spaghetti code at the time, was remarkably solid and rarely crashed. It is easy to forget how hard the sysadmin side was for a growing startups before we could spin up infinite cloud servers and VC wasn't just an open spigot for anything with growth.

Remember the fail whale on Twitter?


I think NipAlert of big head had better code.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: