I am baffled as to why you'd build your castle atop a crumbling foundation.
I have wondered why FB didn't use a proper language with proper typing to begin with. I mean, I "understand" logistically: they already had a giant codebase in PHP, migrating a codebase is expensive, and it's difficult to hire and train 1000s of hackers in e.g., OCaml. (They do have some OCaml people, but they are outliers. OCaml was my favorite thing to write there, though it didn't afford some of the same niceties and interactivity as the PHP code they had, only because the support was down by several orders of magnitude.)
But at the same time, layering FP with a home rolled static type checking server (??) is bug prone and is certainly yak shaving (which they have time and money to do). Now they've written (1) a compiler to C++, (2) a compiler to VM byte code, (3) a corresponding runtime for each, (4) extensions to PHP, (5) a type checker, and (6) an inference engine. That's a lot of stuff. And in the end, it's still PHP, which is duly disliked. (Though Facebookers don't seem to care. The prevalent attitude toward it is that "PHP, as it's coded here, is mostly like C++, and that's OK.")
Writing correct type checkers and inference engines is kind of difficult. They seemed to take the approach of just building onto it incrementally until it just seems to work. That approach led to many bugs in many cases that just simply aren't thought of when one is trying to build inference engines by hand, as opposed according to theory. Type checking and inference is an area ripe with theory and attached formal, mathematical semantics. Standard ML's standard is perhaps the most infamous; it's a collection of mathematical statements about the language. That way, the compiler is now almost an engine to prove your code is correct. I don't see how the same guarantee can be made with something that is just cobbled together.
> I am baffled as to why you'd build your castle atop a crumbling foundation.
Because perfect is not the enemy of the good? Because "build atop a crumbling foundation" has demonstrated time and again to be, by far, the most successful way to accomplish anything in computing? Unless you have some example of perfect, now dominant, technologies that have been created ex nihilo that I'm missing? I mean we (facebook) are still using PHP and MySql, improving both. And when we need to break things out, we head into C++, the queen mother of castles on broken foundations.
> But at the same time, layering FP with a home rolled static type checking server (??) is bug prone and is certainly yak shaving (which they have time and money to do). Now they've written (1) a compiler to C++, (2) a compiler to VM byte code, (3) a corresponding runtime for each, (4) extensions to PHP, (5) a type checker, and (6) an inference engine. That's a lot of stuff.
All languages, runtimes, and standard libraries (and databases, and source control, and on and on) are "broken" at sufficient scale. You're going to be spending time rebuilding things other people take for granted no matter who you are and what language and technology you are working in. The underlying assumption that a "proper language" gives you these things for free is completely false.
> Writing correct type checkers and inference engines is kind of difficult.
Just so we are clear, you ask why Facebook didn't rewrite 10,000 human-years of code into a mythical unnamed "proper" language, but you consider writing a type checker to be "difficult". I think you might have vastly inaccurate pictures of what is and isn't "difficult".
> That way, the compiler is now almost an engine to prove your code is correct. I don't see how the same guarantee can be made with something that is just cobbled together.
Computing history is littered with dead projects from people who believed that anything less than perfect is unworkable or non-valuable.
Because "build atop a crumbling foundation" has demonstrated time and again to be, by far, the most successful way to accomplish anything in computing?
I can't imagine where that sort of conclusion comes from. Building on a crumbling foundation seems to be just about the most proven, reliable way to ensure your software project won't survive more than a short time without needing serious effort just to maintain it and/or a big rewrite.
Much of the world still runs on C, a language that was created long before many of us here were born. Other languages have risen, peaked, and fallen into relative obscurity since then, yet C endures, because for all its faults, it is simple and predictable, the epitome of a sound foundation.
Large amounts of COBOL is still driving back office systems in large organisations. The cost of hiring people with the skills to maintain them is probably horrific, but those systems are still there, doing their jobs decades later.
You can still run applications from nearly 20 years ago on Windows today, in no small part because of Microsoft's persistent focus on compatibility and keeping the basics reliable over that time. Similar stories can be told in *nix world.
What major accomplishments in computing that have been built atop crumbling foundations can claim anything even close to the scale of success of these examples? You surely can't be talking about the move fast and break stuff philosophy that seems to drive everything at trendy software shops like Facebook and Google, or the kind of MVP/lean start-up hype we hear about ad nauseam on HN and the lasts-just-long-enough-to-exit web apps that result.
PHP is a "crumbling foundation" in exactly the same way DOS, Windows, COBOL, and C and C++ were, are, "crumbling foundations". I don't see how you are disagreeing with me. You are proving the point.
You aren't actually going to claim that Windows was well-conceived and theoretically well-founded, are you? Windows has been a never ending refinement on exceptionally shaky ground.
It seems like you're now arguing that PHP isn't a crumbling foundation -- a point that reasonable people could debate -- rather than that building atop crumbling foundations is by far the most successful ways to accomplish anything in computing, which was the claim I challenged.
Incidentally, my point about languages like C or operating systems like Windows was not that they are theoretically wonderful under the hood, merely that they provided a reliable foundation. C has been standardised for a long time and is widely portable. Code that was designed for early incarnations of Windows will often run with little modification even on today's systems because the essential underlying models and APIs have been diligently preserved over the years even as many other changes were going on around them.
I'm curious just what definition you're using for "crumbling foundation". Is it that old software doesn't break on it, in which case PHP/HHVM/Hack isn't a crumbling foundation either - Facebook is built on it, and Facebook is clearly still running.
Or is it that maintenance is difficult and programmers will run into all sorts of ugly corner cases and features that are just grafted onto each other? Because those apply to C and C++ and Win32 and Google and basically every other large software system as well. That's the point the grandparent is making - if you look at pretty much any successful, evolving software system under the hood, you'll see a byzantine mess of complexity, and it's a wonder that it ever works.
I didn't have some some specific, technical definition in mind, but if I were to try and pin it down, perhaps in software terms it would be something like "a dependency that is unreliable in the long term".
Clearly this isn't an absolute scale. As our industry evolves and we develop more reliable ways to achieve our goals, something that we regarded as being a relatively stable foundation in the past may no longer be regarded as such in the future when our standards have risen. Moreover, what constitutes "long term" might vary wildly among different projects.
I suppose my basic objection to the original claim (that building atop crumbling foundations is by far the most successful way to accomplish anything in computing) is that significant achievements in computing tend not to happen overnight but rather to develop over time, and the more stable your foundations, the better chance your project has of developing far enough to achieve significant things.
If you define "crumbling foundations" as "a dependency that is unreliable in the long term", then your conclusion is circular. Of course no long-term products will be built on crumbling foundations, because if a technology stack has long-term successes then, by your definition, it is not a crumbling foundation.
I think the other posters are using a definition of "crumbling foundation" as "one which most engineers hate, which slows them down through excess complexity". And by that definition, almost all successful projects are built on crumbling foundations, because the fact that the project was successful leads you to add features to it, and adapt it in ways that the original architecture didn't anticipate. This process only ends when the software becomes so complex that all further attempts at modification fail, at which point everybody hates the codebase and it is, by pretty much any definition, a "crumbling foundation".
I think it's funny that this whole thread is based on the initial haphazard word choice of reikonomusha yet he is not even a participant in the debate.
If you define "crumbling foundations" as "a dependency that is unreliable in the long term", then your conclusion is circular.
Not at all. I'm arguing that in computing, worthwhile results often take time to achieve, and therefore that foundations that are likely to be around for longer will improve the chances of achieving such results. Alternatively, from the opposite point of view, the odds of achieving something worthwhile go down significantly if you have only a short time time to achieve it, as inevitably you will if you are building on foundations that aren't themselves going to be around for long (whatever we choose to call them).
I'd disagree if you say PHP powers about 70% of the web. Web servers (and all other kinds of servers) are not written in PHP neither are protocols that drive the web. Operating systems are not written in it either. These (and many more) are what power the web, not web applications. Your definition of what "powers the web" is remarkablly wrong.
Facebook doesn't use web servers or operating systems written in PHP either, right? So if the language that web applications are written in isn't very important to the overall dependability of web applications, then it stands to reason that Facebook's choice of PHP for their web applications shouldn't be a problem either, right?
I had this question in mind while reading your reply. What point are you trying to make with your comment? For my part I wanted to point out that it's unacceptable to say PHP powers 70% of the web. (Check my previous comment's parent.) It has nothing to do with Facebook. And I didn't allude to your inference that choice of language doesn't affect the overall dependability of web applications.
>I didn't have some some specific, technical definition in mind, but if I were to try and pin it down, perhaps in software terms it would be something like "a dependency that is unreliable in the long term"
In this case you are mostly hand-waving, and what you say amounts to "I don't like X language".
> It seems like you're now arguing that PHP isn't a crumbling foundation
I think you need to go back and read the message you replied to. He is not arguing that at all. He's pointing to a long range of other things that he is asserting are also crumbling foundations.
"Windows has been a never
ending refinement on exceptionally shaky ground." No. Windows NT was a complete rewrite and it was " well-conceived and theoretically well-
founded"
What you say is technically correct, which is the best kind of correct. However, fewer than 1% of Windows programmers ever talk to NTOSKRNL, and probably an order of magnitude fewer do it regularly.
Most of the time, you're talking to Win32/64 or high-level services based on DCOM or .NET, where the "well-conceived" and "theoretically well-founded" stuff doesn't turn up. You can go your whole career without knowing that there's a well-designed kernel under all that cruft.
I'd guess that less than half of Windows developers could say what the object manager does.
For more than half of Windows developers, if you told them there was an object manager, they would ask how you can turn off that service to improve performance.
This isn't true. Every Linux users encounters problems with drivers. By integrating them into the kernel we have an eco system where drivers are out of reach of many users and OEMs. Consider the difficultly in making a desktop scanner for linux. Additionally kernel changes can break a driver with little recourse on the OEMs side, you must simply bend to Linus's will. At the end of the day we suffer from Linux's driver architecture.
So true. Windows' services and drivers system may not be beautiful but it full on works most of the time and it allows even non pro Windows users to change basic stuff with far greater ease than Linux which is a real shame given Linux' potential for transparent usability (a potential it has had since the late 1990s but I have personally given up waiting for it to come true)
I'm appalled and little bit insulted that you group PHP in the same group with COBOL, C and C++ in terms of their foundation. COBOL and C were designed by some of the greatest pioneers of the field, and indeed PHP is built on C.
Just because they are remembered as being first doesn't mean they were well designed, or actually pioneers. At the same time C was being written others were working on Lisp, and ML. It has taken nearly half a century for some of their innovations to be recognized as good ideas and taken up by main stream languages, while C was a small improvement upon existing language design.
C designers choose deliberately to ignore the other system level programming languages at the time, which already offered more memory safe constructs, in their quest for a portable macro assembler.
PHP could have been easily done in any language with native compiler toolchains.
Why would you feel insulted? Did you personally create any of those languages? In fact, did you personally create any language? If not, maybe you should't be quite so dismissive of the accomplishment it was to create PHP. Sure, it's not the best language out there, and valid criticisms can be levelled at it, but as they say: It's better to have tried and failed...
What you just said is plain stupid. So our language designers can be critical of other languages? Bullshit! Since programmers can vote with their feet and gravitate towards better languages (is it miraculous that almost all programmers have a distaste for PHP?) we should give reasons why we use C and not PHP. If they were equally crappy what would have been the cause of choice of one over the other? I can call PHP the worse language ever, and I don't need to have created a language already.
>So [only] language designers can be critical of other languages?
Yes. Or rather, other people can be critical too, but language designer' opinions have far more validity.
In other words, anybody can say whatever uninformed BS he wants (it's a free country). But that's no replacement for being an expert in what you're discussing.
>Since programmers can vote with their feet and gravitate towards better languages (is it miraculous that almost all programmers have a distaste for PHP?)
You'd be surprised. PHP is one of the most popular languages and one of the most used languages, so far from "all programmers have a distaste for PHP". So if we were to use that "voting" argument alone (which I find wrong), PHP should be considered a very good language. Not the intention you had, I guess.
While PHP has it warts, it's mainly the less pragmatic and more fad prone programmers that have issues with PHP, those who look for silver bullets and like to feel superior by choice of programming language, editor and the like.
As for programmers "voting with their feet", well, they don't do quite a good job at it. The best languages (like LISP, Smalltalk, OcamL, Haskell, to name but a few etc) are seldom the most popular too.
> but language designers' opinions have far more validity
Here's the case we're talking about the work of a language designer. By the way, not all people capable of creating a programming language have created one yet. Marc-André Cournoyer, who wrote the book that Jeremy Ashkenas learnt language design from to write CoffeeScript, has not created a language himself.
Given the widespread use of PHP for web programming, it seems obvious that in the domain of web programming, developers have voted with their feet for PHP.
Given that these supposed "energetic amateurs" have created an enormous percentage of the code that web applications across the internet run on, their votes automatically count. If PHP developers had never produced anything of value, that would be a different discussion.
I also get a bit (not terribly, just a bit) insulted if somebody says "Beethoven sucks", because the person who said it is technically the same species as I am.
A few of the complaints about PHP (the inconsistent function library, for example) exist because PHP was originally a very thin scripting language layer for using existing C libraries.
C was not build by the "greatest pioneers". Just succesful pioneers.
There were far more evolved and elegant languages at the time C was created (and during the time it took for C to rise, even more were made). LISP for one, but also languages with the same performance characteristics and systems programming capabilities as C.
C definitely wasn't seen as a "great language design" -- just a very useful and pragmatic one (e.g see also the classic "Worse is better" essay).
It can definitely be simple, but if you're writing C you're compiling directly on hardware. If you change the hardware, your program will not run. That is literally the definition of unreliable.
Reliability can be measured in successes and failures. Failing reliably is not necessarily a bad thing. Some things you build may or may not work on other hardware. That's unreliable. You assert, with confidence, that C will fail. That's reliable. You can rely on it failing.
But these days, most interesting applications run on C++, which started from the arguably crumbling foundation (from C++'s point of view, not per se) of C, and grew organically over several major revisions into something hideously complex.
This trait of C++ is not a good thing, but the amount of successful software written in it seems to prove that it's not fatal either.
> You can still run applications from nearly 20 years ago on Windows today, in no small part because of Microsoft's persistent focus on compatibility and keeping the basics reliable over that time.
Layers of hacks on hacks to keep old software running correctly is exactly "building on a crumbling foundation", and probably the reason Microsoft is trying to get rid of Win32, having severely limited its availability on ARM in favor of the WinRT APIs. But I'd say the venerable success of Win32 demonstrates that the crumbling foundation works.
> This trait of C++ is not a good thing, but the amount of successful software written in it seems to prove that it's not fatal either.
That depends. It could also be argued that our current tools are limiting what we can accomplish and at some point we'll need a revolution, otherwise we'll hit a ceiling, just like when concrete replaced the need to carve and place rocks on top of each other.
For example living organisms are way more adaptable, more self-healing than anything we've ever built. Our own body should be a text-book example of massive parallelism involving trillions of independent agents that cooperate with each other. In particular, the process of wound healing that happens when you cut yourself is quite fascinating: https://en.wikipedia.org/wiki/Wound_healing
If we wanted to simulate the human body, or at least the human brain, since that's the most interesting part, somehow I'm not seeing C++ in that picture.
C++ belongs to my list of languages that I enjoy using, sadly it was build in quicksand foundations due to C compatibility as a way to make it mainstream.
Add JavaScript. Super crappy but its ubiquity for web scripting is really saving it some real bashing. When we finally have options, we'd relish in our freedom and say what it was like to work with badly written programming languages.
> What major accomplishments in computing that have been built atop crumbling foundations can claim anything even close to the scale of success of these examples?
Neither of those would even remotely qualify is accomplishments in computing. But they also are terrible examples. Facebook isn't using PHP anymore, that's what this discussion is all about. Wikimedia is terrible, and wikipedia is almost exclusively static content being served by squid.
What are we discussing? Are we discussing about the tool or the things made from tools?
Its like trolling a raw cast iron hammer used to build a furniture factory and then saying the whole factory is entirely useless just because some invented a stainless steel hammer.
We're discussing both. The original question posed was something like "what amazing stuff has been built on crumbling foundations", and the response was "wikipedia". I don't necessarily agree with the sentiment of the question, but wikipedia is not a good example of amazing technology.
Given a programming task, it will be written faster, easier, and in a more maintainable and less bug-prone fashion if it is not PHP.
The opposite opinion is basically indefensible. Sure, you can still dig a trench with a spoon (and if you have enough money to wield a bunch of workers with a spoon), even if a shovel would do a better job.
Let's begin.
1) PHP autocraptastically converts strings that look like numbers, into numbers, resulting in all sorts of weirdness like this: https://eval.in/111886
2) PHP 5.4's OWN TEST SUITE has 91 failures and only 70% coverage. There is NOTHING more "WTF" than that! Why even bother having a test suite?? http://gcov.php.net/viewer.php?version=PHP_5_4
3) Why the fuck are all of these different things equal, and how does this NOT result in problems? http://i.imgur.com/pyDTn2i.png
4) String increment is dumb to begin with, but why does it not even match the behavior of string decrement? https://eval.in/60631
That's a small fraction of not-thought-out PHP language features that result in REAL bugs and security holes. Which consume large swathes of programmer time. Which, apparently, Facebook can afford to swallow.
I'm sorry, but your position, as valiant as you are defending it, is literally indefensible. And I don't give a fuck how big Facebook is, they would STILL be better-served by switching SOME of their code to a different language. ANY modern programming language wouldn't suffer from this imbecilic, immature language design.
You're making a mistake. The question is not whether to start a company with PHP vs language X. The company is long started. The question is not whether or not to poof into existence a port from all of FB to language X. That's not possible. The question is, given that PHP is the current language, with all its faults, will it it cost more (including all definitions of cost) to make the switch? How long will it take? Does it get the job done? How bad is the damage?
The question more pertinent to your argument is, did they make a mistake years ago choosing PHP? That's when the could have conceivably gone with language X.
> The question is not whether or not to poof into existence a port from all of FB to language X. That's not possible.
And yet Twitter has slowly migrated on the backend from Ruby to Scala/Java. They still use Ruby for the frontend, though it's not clear to what extent, since they've also migrated to a single-page, fat client design on the desktop. And at the very least, their choice of using Ruby when prototyping, was at least sane.
I understand that large codebases can't be migrated easily. But you can migrate individual components when needed, if you have a modular, service-oriented design.
Also - building new functionality in PHP is indefensible, unless their code-base is one big monolithic hairball, which I doubt it is.
> The question is, given that PHP is the current language, with all its faults, will it it cost more (including all definitions of cost) to make the switch? How long will it take? Does it get the job done? How bad is the damage?
An excellent point. Which in fact is an argument in favor of modularizing your code as much as possible, as early as possible. There are tools now like Apache Thrift which make this easier: http://thrift.apache.org/
It's interesting how most people seem to attribute the quality and longevity of a software to the language it is written in or the frameworks it uses rather than to the amount of thought that was put into its design. Sure, the former is important, but largely overrated.
I can't fathom why you're under the impression that some code hasn't been switched to another language. Furthermore, your vitriol seems quite effective at undermining your thesis.
Fair enough. And to be honest, there was a single night not too many years ago where I wrote up an auction site that had most of the functionality of eBay, in a single night, in PHP (someone else had already done the frontend work, fortunately- I just built the backend).
I think most of you guys are missing a crucial thing here -- how decisions get made. Facebook had two options:
1.) Rewrite all the PHP code in Java/Ruby/C++/whatever
2.) Write HHVM/Hack/etc to transparently convert the existing PHP code.
Option #2 ended up getting a lot of publicity for the engineers. If they chose option #1, not only is it a lot of hard boring and possibly error prone work, there is no chance for any of the engineers to get the type of publicity they are currently getting.
All in all, engineers tend to work on whatever will make them get noticed, not necessarily the better technical choice nor what is in the best interest of the company... this is especially true in companies like Facebook and Google where there are a lot of very smart engineers doing relatively mundane work.
So there you have it Hack/HHVM are all just publicity stunts, the more you feed it, the happier their PR gets.
> Just so we are clear, you ask why Facebook didn't rewrite 10,000 human-years of code into a mythical unnamed "proper" ...
You don't need to rewrite anything (at least, not all at once.) Personally, I'd have expected you to make something akin to CoffeeScript or ClojureScript that targets PHP, and can "link with" your existing PHP modules (or rather, with their HHVM bytecode representations.) Then treat the PHP code as a constantly-dwindling Big Ball of Mud (http://laputan.org/mud/).
> Personally, I'd have expected you to make something akin to CoffeeScript or ClojureScript that targets PHP, and can "link with" your existing PHP modules.
We took a similar approach to what you described. Then we called it Hack.
This is what we mean by seamless inter-operation: the HHVM runtime understands both syntaxes and runs both <?php and <?hh code in the same process. Whether Hack integrates into the runtime at the parsing (current) or bytecode (future?) layer is an implementation detail.
But the syntax you've got now is effectively a superset of PHP, and comes with all the problems of PHP. You've effectively wrapped your Big Ball of Mud... in slightly different-colored mud. The whole point of a clean-break-targeting-interoperability like this is that you can stop using mud at all, and it'll still work with what you've got now. In fact, what other reason would you have?
If you mean that future versions of Hack will evolve to have a different syntax, while still targeting the runtime... then you'll still have code around from the intermediate era, and you'll have to interoperate with that too, won't you? You'll have PHP, PHP-looking-Hack, and actually-nice-to-code-in-Hack.
PHP's syntax is not what's preventing you from writing large maintainable systems in it. Many of the more successful languages throughout history became successful BECAUSE they used a syntax that's superficially similar to something familiar.
That's not to say that syntax doesn't matter, but semantics and pragmatics tends to matter more, despite Wadler's Law of Language Design stating that most people don't understand this.
I think you are putting way too much emphasis on syntax here. The important contribution of Hack is the type system and this is something that a syntactic-sugar translator like Coffeescript can't hope to achieve.
I also wouldn't count Clojurescript here. Its a whole different language that just happensto compile down to JS.
> Because "build atop a crumbling foundation" has demonstrated time and again to be, by far, the most successful way to accomplish anything in computing?
That's only because the currency for building things on top of crumbing foundations has been sweat and man-power. We aren't that far off from the Egyptians that were using hundreds of thousands of slaves per pyramid. It's a good thing that we've transcended the necessity for hundreds of thousands of slaves when raising buildings, don't you think?
And yet, here you are, claiming that building stuff with broken tools is the most successful way to accomplish anything in computing. Actually I view it as nothing short of a miracle, showing human determination in action ;-)
> All languages, runtimes, and standard libraries (and databases, and source control, and on and on) are "broken" at sufficient scale.
That's a fallacy. Just because both X and Y are broken, that doesn't mean they are equal, as some things are more broken than others and PHP is more broken than anything else mainstream (C++ at least has reasons). Also, I don't see how "at scale" changes things in PHP's favor, I really don't.
If you're trying to argue that "at scale" the level of brokenness converges to the same levels, then that's a stupid thing to say. After all, Twitter didn't had to build its own JVM and the stuff they run on top of the JVM is probably more power efficient than you'll ever be with HHVM. Probably saner too.
> We aren't that far off from the Egyptians that were using hundreds of thousands of slaves per pyramid. It's a good thing that we've transcended the necessity for hundreds of thousands of slaves when raising buildings, don't you think?
Facebook is a relatively mature company now that is in the business of making money. In fact, I would argue that [framework of the week] is more buggy and broken because it less well understood.
Actually Twitter has been pretty successful in migrating a good chunk of their codebase from Ruby to Scala.
Facebook can keep PHP for the thin web layer that renders the page, but they could have migrate the meat of their code to a safer and more robust language.
The HN crowd seems to dislike (or despise perhaps?) PHP, but it's really not that bad. Yes it has a lot of warts, but it has a lot of things that make it nice for web development.
a) try your new code by saving in your editor, and hitting reload in your web browser.
b) it's very approachable. People who only know HTML and CSS can be expected to do a little bit of PHP work to integrate their changes. If you setup the right network mounts, they just need to edit files and reload (see a)
c) it's not super high overhead at runtime. If you're not using a framework, and you don't build up a crazy object hierarchy, it's not too hard to get your page out with about 10 ms of overhead beyond data fetching. For very simple webservices (fetch data, possibly from multiple sources, and do a little formatting for the consumer), I was able to get the overhead down to 2ms. You can certainly do better with other languages, but you can usually get better throughput improvement by working on getting data quickly. Btw, all the frameworks are terrible; many of them add 100 ms to the page just for the privilege of loading the includes; PHP is a framework for web programming thank you very much.
d) cleanup; you don't have to worry about it. If you don't do anything weird (c extensions, with non-preferred malloc), at the end of the request, everything is thrown away.
That said, there are plenty of things PHP isn't good at: I wouldn't run a long running process in PHP; and multithreaded PHP sounds like a bad idea.
Yes, PHP is its own web framework, but it's not a very good one. And, as you say, implementing a better one on top of it adds a lot of overhead due to the execution model. With other languages, where you don't throw everything away at the end of the request, you are free to implement a good web framework without suffering additional overhead.
I'm running multithreaded background processes in PHP pretty successfully. I did not see (and still do not see) a reason why I should have chosen another "proper" language for it.
P.S. The vast majority of arguments against PHP here leads me to conclusion that most of (not all) debaters don't understand how PHP works well enough.
> a) try your new code by saving in your editor, and hitting reload in your web browser.
The only language for which that works is client-side Javascript. For PHP, you forgot the part in which you have to install and run a web server, then point your browser to localhost. Plus, to get anything useful done, you'll also need some sort of database to go along with it. I remember the first time I did that and it was pretty intimidating.
> c) it's not super high overhead at runtime ... it's not too hard to get your page out with about 10 ms of overhead beyond data fetching
To me much more interesting is the total time it took for the client to receive the response, possibly when multiple concurrent requests are happening. The comparison here should be versus a static page served by Nginx of course.
The best throughput possible that I got for an otherwise complex business logic happened on the JVM. I basically rewrote a web-service built on top of Django/Python, with a redesign with emphasis towards in-memory caching, parallelism and async I/O and the result was a server that was able to process more than 10,000 requests per second with an average of about 5ms per request (actually in production the instances ended up processing about 2000 reqs/sec of real traffic, since c1.medium EC2 instances don't have enough CPU power).
Of course, people that just need to slap something together, don't need this level of throughput. If a request takes 400ms for a dumb web form on a low traffic website, that's of no consequence to most people. The problem happens of course when such a piece of software evolves to something much bigger, like Wordpress. I'm always amazed at the gimmicks that people do just to keep their Wordpress powered blog alive.
Has anyone tried to copy these good parts of PHP, except with a not shitty language?
Is there anything about PHP, the language, that lends itself to this style of development, or could you get these same benefits with, say, Ruby or Python?
There are a lot of people making money writing systems in php because it's the right tool for some jobs. There are companies making money using system that are written in php, because it works. These people don't have as much to say as those working in other languages that may feel threatened or uncomfortable when something they think is bad, seems to used with success.
Not everyone that needs/wants to make a website wants to start a software development company. Often then just want to accomplish a few small things. To match up the rest of your examples:
He goes through why Facebook uses PHP and decided to build upon it to create Hack. I highly recommend watching the whole thing, but the main three things he points to are:
1. Frictionless programmer workflow with a short feedback cycle
2. All PHP requests start out with the same consistent state by default
1. I have never seen a framework that didn't go to great lengths to update to new changes quickly in development
2. Which means you lose resources between requests, unless you stuff them into the interpreter/httpd itself. And anyway this is only a problem for PHP, where by default everything runs in the top-level namespace, versus in separate functions.
3. That's a funny way to spin "no concurrency", but you can get that in any language by just not deploying it threaded.
On the Atari ST you didn't even have to hit reload, the webpage reloaded automatically as soon as you saved your file. No need to alt-tab even, just keep webpage open next to your editor. Now get off my lawn.
(I work at Facebook but not on Hack.) There's a lot to be said about backward compatibility, and much of Hack's virtuosity stems in its smooth interoperability with PHP - many millions of lines of it. There's nothing like working on such a large codebase to convince one how difficult disruption of any kind is.
The language definition and semantic checker are difficult, but are stereotypically tasks that cannot be distributed to many engineers; instead, a few senior engineers took that task to the benefit of all others.
That Zuck wrote the first version of thefacebook.com in PHP, the language he was the most productive at?
That the initial team didn't rewrite Facebook in Python/Perl/Ruby/Haskell during the fast growth phase? If you have ever experienced the growth phase, you understand how ludicrous the idea of rewrite would be. I've personally experienced and heard only horror stories about rewrites. We underestimate how much hidden wisdom a production code base has and that the messiness is often there for good reasons.
I once worked for a social network that was extremely popular in a particular geographic niche. It was even started before Facebook.
And it was also written in PHP. And we rewrote in Ruby (not rails, at the time it was not mature in the ways we needed it to be). We made a lot of great technical achievements in doing this and our codebase became much much better. On a technical level I don't regret us doing that one bit. We went long on schedule and we made mistakes, but some of the best work of my career went into that and I'm immensely proud of it.
But then in a matter of months of Facebook going to being open to people who weren't at a university or big organization (remember that?) our users, who got bored with our site not changing while we rebuilt all the tech, completely abandoned us and we went from profitable (having never taken any outside investment) to dead in another couple of years.
This isn't to say that this wouldn't have happened to Facebook had they done something like this, but it is always a huge risk, even if you do everything right.
Using PHP to begin with was a bad technical decision. Failing to establish a reasonable migration strategy was a bad business decision likely rooted in bad engineering management that fell out of starting with bad technical decisions.
It's much harder to hire people that can pull you out of a mess like PHP when, at the same time, you have to hire people that can keep writing PHP for you.
> That the initial team didn't rewrite Facebook in Python/Perl/Ruby/Haskell during the fast growth phase?
That would have been a good time to bring on new engineering blood as part of scaling out, which would have provided opportunities to enact mitigation and transition strategies. Imagine if the massive amount of talent currently devoted to HHVM had been devoted to Facebook's actual business?
There are migration strategies other than "rewrite everything immediately", and in fact, I'd bet that's exactly what HHVM is. It's just a shame they waited so long that the most cost-effective migration strategy was to tackle an enormously difficult computer science and engineering problem that the world's biggest software companies already invest hundreds of millions of dollars on and provide to the world largely for free.
> If you have ever experienced the growth phase, you understand how ludicrous the idea of rewrite would be.
Yes, and I've also (repeatedly) been the team brought in to rewrite the mess of a code base that was about to torpedo the growth phase.
There's not much correlation between funding, initial success, and engineering talent. Which is why you so often wind up with a mess that has to be cleaned up once you can hire people who know what the hell they're doing, instead of the ones you happened to be stuck with because you didn't know how to grow an engineering team.
> We underestimate how much hidden wisdom a production code base has and that the messiness is often there for good reasons.
Messiness is never there for good reasons other than that replacing it is more expensive than not touching it. You don't strive for hidden wisdom and inescapable messiness -- that's just what you get when you let engineering slip up.
Which language would have been a good technical decision in 2002-2003? It needs to be fast enough in terms of iteration. It needs to not require more resources than PHP. It must be easy to onboard people who don't know it onto. It needs to be easy to operate, and not be costly to deploy on the tens, hundreds, and then thousands of servers necessary. (Spending time learning a new technology that others think is cool or which seems cool, while trying to also build a product, has sunk more than a few startups...)
What was bad about the decision to keep the reasonably well-performing and reasonably suited-to-purpose PHP code for front-end code, and peel off suitable tasks into services like the feed, typeahead, messages, and so forth into languages like C++, Java, and so forth. What was bad with the decision to let hundreds of software engineers continue to build the PHP code-base while a much smaller group of people work on projects to improve the efficiency of both the execution environment but also the tooling and developer efficiency on that code-base? Their contribution there is multiplied out by the improved efficiency of those hundreds of developers and the code they wrote.
Seemingly Facebook survived its growth phase fairly well and didn't need a tiger team from outside to handle it - and without the reliability problems others who chose to use languages other than PHP for their front-end and decided to rewrite their much smaller surface area in other languages.
As much as people may dislike PHP (and I'm one of them), it was definitely "good enough". Many languages may not have been, even if they are nicer languages in some objective way.
(Disclaimer: I actually have to write code in Facebook's PHP code-base every once in a while. But most of the code I work with is Python, Java, or C++.)
> Which language would have been a good technical decision in 2002-2003? It must be easy to onboard people who don't know it onto. It needs to be easy to operate, and not be costly to deploy on the tens, hundreds, and then thousands of servers necessary.
You mean, like the JVM? 2003 wasn't the pliocene epoch, we had a working JVM. If you remember back to the last bubble, in the 90s, we were shipping "easy to operate, not costly to deploy" software on Java post-1998. Java 1.4 was released in 2002, and Java 1.5 -- what most people would say is modern Java -- was only 2004.
Scala 2.0 was released in 2006, Clojure in 2007 -- that's 8 and 7 years ago, respectively.
You really don't think there were alternatives during that long period?
> What was bad about the decision to keep the reasonably well-performing and reasonably suited-to-purpose PHP code for front-end code, and peel off suitable tasks into services like the feed, typeahead, messages, and so forth into languages like C++, Java, and so forth.
In 2004? Nothing. In 2005-2008? Things should have been reassessed, especially before building out a millstone of an engineering team around PHP. Instead, Facebook doubled-down on an actively bad language with HPHP, and the results were hilarious:
HPHPc required a very different push process,
requiring a bigger than 1 GB binary to be compiled
and distributed to many machines in short order.
So then in 2010, Facebook decides to embark on HHVM, and now four years later, we can run one of the most correctness-hostile programming languages around, quickly, with optional static typing.
That's a span of 6 years, and at the end of it, Facebook has functionality they could have gotten for free in 2003. On top of that, the intervening years allowed the PHP mess to become only more entrenched -- who on earth do you think the engineers are that accept a job writing PHP, for Facebook or otherwise?
If I had to hazard a guess, I'd guess that HHVM exists because of a large amount of political inertia in the organization that has everything to lose by PHP being eliminated entirely, and the lack of a strong hand by upper management.
I'd guess that lack of a strong hand by upper management came in no small part from hiring straight-out-of-college graduate Adam D'Angelo -- who had literally zero experience -- to serve as CTO from 2006-2008.
By the time FriendFeed was acquired and Bret Taylor along with it (2009), my guess would be entrenched interests made for a very difficult position for anyone wanting to change the ship's course.
"the intervening years allowed the PHP mess to become only more entrenched"
They specifically choose to do it and not move to an other language. One of the reasons was: PHP programmers are cheap and plenty full and can do quick iterations.
Sounds a bit like you are complaining about other peoples choices, it really is their choice. :-)
I'm not saying it is possible to move to an other language. Just look at Paypal they moved their customer-facing code from Java to node.js and got a very large productivity increase: http://www.youtube.com/watch?v=V5yk5SZxWX4
Obviously the reason Paypal choose node.js are similar to why Facebook choose PHP: quick iterations, means more iterations, which means more experimentation and better results.
> Using PHP to begin with was a bad technical decision. Failing to establish a reasonable migration strategy was a bad business decision likely rooted in bad engineering management that fell out of starting with bad technical decisions.
Facebook is a ten year old company with a market capitalization of $170 billion, so "bad" is probably not the most accurate word to describe their technical and business decisions.
> Facebook is a ten year old company with a market capitalization of $170 billion, so "bad" is probably not the most accurate word to describe their technical and business decisions.
How does that follow, exactly? They haven't failed, so any inefficient or sub-optimal decisions were the correct decisions?
A rewrite isn't such a ludicrous idea. Reddit is a prime example of a rewrite from Lisp to Python. I would say that's even a somewhat difficult rewrite.
The Reddit front-end is a relatively simple application. It has a page with a list of stories. It has a page with one story and a bunch of comments. It has a page to add comments. It has an endpoint to vote up or down stories, and to vote up or down comments. And maybe a few other less-interesting things, like admin interfaces.
Facebook is surprisingly easy to underestimate even as someone using it a fair bit.
Just try find every single interface in the front-end. For your own profile. Feed is front-and-center, as is timeline. Then look at events, groups, &c.. Look at messaging.
Then look at the interfaces for managing your privacy and permissions. The apps that you're using and information about when last they requested information. Security like login approvals. Then think about the flows involved in reporting content as abusive or inappropriate. For verifying your identity if you've forgotten your password. For adding more security if you log in from a new computer or from a new location.
Then look at pages, including insights and scheduled posts and so forth. Then look at advertising - boosting individual posts, creating campaigns, &c.. Then look at the interfaces for developers. For translators. Interest lists.
The backends for most of this are in C++ and Java. There's a large amount of data processing happening to track hidden things like spam and scam prevention. But the front-end surface area is quite clearly an order of magnitude or maybe two larger than Reddit (at least where it was when this happened).
Okay, Reddit is a good example of a successful rewrite during a growth phase. I had forgotten that. As I've personal knowledge of several unsuccessful rewrites, it would be great to hear more anecdotes about successful rewrites in growth phase and try to analyse what made them succeed.
Rewrite is more likely to succeed if you are not in a high growth phase, but even then it's risky.
Yes and the OP is criticizing the decisions being made now as "yak shaving." Could it perhaps be that the "yak shavers" made a conscious and well-reasoned decision to go in the direction of "extend PHP" vs "throw it all out and rewrite everything in language-of-the-month?"
I'm sure it was conscious and well-reasoned, but the OP's point still stands.
Trying to build a reasonable forward-looking high-performance managed language runtime platform is hard.
Trying to build it atop of PHP is harder.
The only way I could see this as being a smart long-term strategy is if the eventual goal is to isolate and retire PHP projects entirely (and PHP usage itself) over time.
However, even then, with PHP gone, and Hack no longer necessary, you're still stuck maintaining your own incompatible VM / runtime. Is Facebook signing up to reproduce the CLR? Or do they have long term plans to somehow bridge the gap between HHVM and more established VMs/runtimes, where they can better share the maintenance load with the wider industry.
Maintaining your own VM is no big deal. Compared to Facebook, HHVM is a tiny codebase. A team made of relatively small number of people (high quality, but low quantity) can and do maintain VMs like HotSpot and V8. LuaJIT is maintained by a single person.
As someone who has worked on a VM, I couldn't disagree more. It can take years to hash out things as simple as ideally performing primitives for a target architecture, and then things change.
Add to that the complexity of optimizing compilers, specification of byte code formats and a consistent virtual machine memory model that can be relied upon across architectures, and the art and science of highly concurrent garbage collectors, and your "no big deal" is a load of hogwash.
Hotspot alone is nearly 20 years of big deal.
> A team made of relatively small number of people (high quality, but low quantity) can and do maintain VMs ...
The number of people doesn't matter in this equation; your small team of (expensive, rare, high quality people) can't build a world-class VM in a day. Or a month. Or a year. Maybe in 5 or 10 years, just ask Microsoft.
> ... like HotSpot and V8. LuaJIT is maintained by a single person.
LuaJIT's said "single person" has been working on it for what, 10 years? It's an extremely impressive implementation and I don't want to bag on it, but even still, it lags in certain areas, eg, its GC implementation isn't up to par with the state of the art.
The author's skillset is extremely rare, and LuaJIT itself is an anomaly in the field. Using such a one-off example doesn't really hold water to prove that it's ideal for a company to internalize maintaining a VM for their own custom language built on top of PHP.
I am not trying to belittle efforts necessary for the state of the art VM or programming language implementation. I get paid to do these stuffs, and I am on my third VM/PL project now. It is also true these things take time and not very parallelizable, so while man-month may not be that big, you can't make it faster by throwing more people.
On the other hand, I maintain it still is no big deal compared to rewriting Facebook. I also maintain while skillset is rare, Facebook apparently had no trouble so far and will have no trouble in the future finding (I remind you, small number of) people to work on VM. I also remind you Facebook has been working on alternative PHP implementation for 6 years now, 2 years in private(2008~2010) and 4 years in public(2010~2014). It has been profitable for them for 6 years, will be profitable in the future, and profitability does not need "sharing maintenance load with the wider industry". They can maintain it fine thank you very much. Because, in the end, VM is no big deal.
If they're wasting money on bad management decisions, they're wasting stockholder money.
They're also continuing to propagate an outwardly facing engineering culture that will make it even harder to hire people to help dig them out of the PHP hole -- perpetuating this further.
Your argument is simply another take on survivor bias fallacy.
> I get paid to do these stuffs, and I am on my third VM/PL project now ... Because, in the end, VM is no big deal.
You keep saying that, and yet, there keep being so few high quality VMs.
What do you consider to be high quality VM? How many do you expect to see and how many do you find?
Adaptive JIT and generational GC would be a good baseline. Limiting myself to open source VM, I think (at least) HotSpot, Mono, V8, JavaScriptCore, PyPy, SBCL, Racket qualify. J9, CLR, Chakra, Allegro CL also qualify, but not open source. SpiderMonkey, LuaJIT, HHVM lack generational GC. All these projects are actively developed, and there are doubtlessly more, e.g. I am not faimilar with Smalltalk VM, some of which are commercial. Research VM like JikesRVM, Maxine qualify. I believe Bartok qualifies too.
I am not sure what you are arguing for. If you are arguing for Quercus route(PHP-on-JVM), I think it's unclear Quercus route is better than HHVM route. If you are arguing for not running existing PHP codebase, I think you are being unrealistic.
It's not survivorship bias. Facebook is an existence proof that there is no "PHP hole" that they are in, that it's largely a myth propagated by programming language nerds who have never tried scaling a site in PHP. When was the last time you heard about a site closing up because of PHP-induced technical debt? You don't. People re-write sites because of poor architecture, not because of poor programming languages, and PHP (in general) does not prevent you from building a site with good architecture, both from a software structural standpoint and an operational standpoint.
PHP's APIs are ugly. It's language semantics are a bit hairy until you get the hang of it. But there are parts of PHP that are extremely elegant and easy to reason about. It's OOP support provides all that you need to produce re-usable and easily understood code.
Facebook's work on PHP has focused on largely two dimensions: reducing CPU cycles and increasing static/runtime type checking. The former is something that only really matters at massive scale: PHP is generally fast enough since most of the time PHP processes are I/O bound reading from a database or memcached. It's only for sites like Facebook where if you squeeze out an additional 10% TPS from your boxes that you will start seeing large absolute cost reduction that this level of optimization starts to make sense. On the type checking side, this is something you might start to want in any dynamically typed language when you have millions of lines of code and want to ensure basic guarantees that it will run, and is something that you'd probably see Facebook doing if they were a Ruby or Python shop anyway. It has nothing to do with PHP but with the classic dynamic vs static typing tradeoff.
Should you be writing your chat server in PHP? No. But 90% of the code you write for a large website is HTTP response code rendering HTML or JSON. PHP excels at this and you can pretty much hire any developer off the street to start cranking out code if you give them a solid foundation to build on.
Facebook has already proven that they are able to make improvements that have substantially helped them to the point where this team is likely paying for itself many times over. It doesn't need to be perfect - it needs to offer return on investment, and it has.
It's possible that they could eventually get a total rewrite to give a better return, but frankly I don't think you have any idea of the enormity of trying to convert a multi-million line production platform from one language to another.
In any case, one does not preclude the other. Arguably, many of the changes they have made, such as gradual typing, and their ability to now slowly introduce other changes without breaking their existing codebase, means they have a platform for slowly firming up their codebase and migrate it towards a position where a full rewrite (should they decide to do one in the future) could be made substantially less painful.
My example, "OCaml", is not a "language of the month." Its roots are >30 years old and wasn't developed by someone in their basement over the weekend. As stated, Facebook even uses OCaml, among other languages, for good reason.
You try hiring and/or retraining enough engineers to be able to make a switch to OCaml, and see how much it'll cost you.
I detest PHP, but I've still more than once made the choice to do apps in PHP motivated by developer availability alone.
It's not a great language, but with some discipline it is also not nearly as awful as some people like to think, and you can make up for a lot of awful with the difference in ability to hire experienced engineers who know PHP vs. many of the less common languages.
I don't like how your comment implies that "throw it all out and rewrite everything in language-of-the-month" is the only option. You could also "throw it all out and rewrite everything in better-AND-mature-language" or even "throw the worst parts out and rewrite incrementally in better-AND-mature-language".
I just want to clarify something: I am not calling anyone stupid. Generally the engineers at Facebook are smart, talented, and good at getting things done.
I think PHP has some goodness that makes it the poor's man java.
Neither Ruby or Python provide that for instance.
Though Ruby is in my opinion well designed,and with duck typing,you might not need all that Java like OOP, I dont know,I feel like having interfaces makes me understand code faster and better,
Want to understand what an object does? just read the interface,no magics,no bullshit,it's self documenting and that's important,that's why you can have these huge codebases like Zend,Symfony or Doctrine and still understand how complex elements work together. And even figure out how to use them without a doc,just like Java.
I feel I cant go to Django's source code or Rails source code and understand everything easily just by reading function signatures.
And the write/refresh cycle makes iteration pretty fast during development. But yeah,PHP basic syntax sucks for sure.
Facebook is doing an admirable job replacing rotting pieces of their infrastructure with more robust replacements.
HHVM replaced the execution environment for their code with a more robust code generation/ runtime system.
Hack allows them to bootstrap their code base into higher degrees of reliability without a mass rewrite.
Also Brian O'Sullivan is one of the best people on the face of the earth to be trying to find practical ways on integrating PL research into practical engineering.
Like coral reefs! And hipsters are built from the dead bodies of vintage things (that you probably never heard of because they're pretty underwater/ground/whatever).
Ok, I cede the point. PHP has the retro chic and Hack's type system is fixed-gear.
> Only way around the type system is void, explicit casts and I guess unions.
And typedefs. Given that there's no parametric types, you run into void* quite frequently as well, so saying "only" inaccurately minimizes the scale of how much C code isn't strictly type safe.
Right, void* is used quite a lot for "generic" data structures. But I'm not sure that's what he meant. The reason I said "only" was that from my own experience, most C data structures are tailored for a specific use and so I don't see void* too much in this context.
And may I ask why you say "typedef" is unsafe? It is merely a type alias, like e.g. Haskell's and ML's "type", or isn't it?
You don't need to add a disclaimer about your workplace. It doesn't contribute at all to your question.
I do not think C is a "crumbling foundation". I would not suggest that C be used for large scale engineering efforts, but it's a relatively well-defined language with semantics dictated by a formal standard. A lot of research has gone into C compilers, which are state-of-the-art.
With that said, one of the biggest complaints I hear about C++ is that it still has the legacy of C embedded in it.
Something I didn't really learn in school, that I only picked up much later in my career, is that when applicable, applications should be built on top of mathematical models. A very contrived example would be.. would you build an ad rotator by keeping counts of all banners served and picking the next banner based on those counts and your weighting rules, or would you build the rotator on a foundation of statistics and probability with a little extra logic for handling caps and edge cases?
> would you build an ad rotator by keeping counts of all banners served and picking the next banner based on those counts and your weighting rules, or would you build the rotator on a foundation of statistics and probability with a little extra logic for handling caps and edge cases
Actually, building on a foundation of solid statistics and probability will often result in an algorithm that essentially counts views and applies weighting rules.
As another example, Pagerank has a nice theoretical basis, but power iteration reduces to perhaps 5-10 lines of C.
This seems an unequivocal improvement for Facebook, since they're unwilling to move away from PHP. The better question is why would anyone else choose to build their company on this?
As far as I can tell they are using local inference, which is basically just unification. The set of possible types seems pretty narrow[1] as well so I don't see much room to go wrong. You're right that there is a lot of mathematical theory about type systems and that inference can easily go wrong (be undecidable) but that is mostly for type systems that try to do inference for higher rank polymorphism and other things, which it doesn't seem like Hack is. Also I guess the language is supposed to be a superset of "valid" PHP, although I don't know whether this is true without modifying the PHP program much.
Local inference isn't just unification. In particular, most local inference algorithms are designed to work with subtyping, which doesn't work in ML-like type systems.
I assume that "local inference" means, in practice, "no let-polymorphism". That's what causes most of the headaches with extensions to Hindley-Milner (including the undecidability of subtyping).
I have seen many python programmers that are PHP haters, are you one of them? just curious, no offence.
I like both Python and PHP, but use PHP for commercial applications.
>> I am baffled as to why you'd build your castle atop a crumbling foundation
I think you're using a wrong metaphor. Facebook foundations can't be crumbling just because they're made in PHP. There wouldn't be Facebook as we know it today otherwise. You might say they used a "low quality" material to build them. I see Hack more as a better material, that can also bind with the previous one and make it stronger.
I think the short story is that the engineers at Facebook feel pretty productive with their stack today. Making a new language that basically fixes up PHP is ideal for them because it gives them good confidence they can get the benefits of static analysis without sacrificing much productivity. That level of productivity + the sheer numbers they have make Hack a more attractive option.
If I recall correctly PHP is banned at Amazon, even for internal tools, mainly for security reasons. The team that gives security clearance won't even take a cursory look at the service if it depends on PHP in anyway.
Twitter could have been written in any language and worked, but given their old architecture decisions, they'd been just as likely to mess it up in every language.
They blamed RoR a lot because they needed an explanation for their problems, but "many to many" messaging even at their scale is a "solved" problem and has been for decades and is fairly easy to scale.
(Think of Twitter as a bunch of mailing lists and mailboxes; you scale it by decomposing it: map accounts to virtual "buckets" that becomes the domain part, and map tweets to messages; break apart large follower lists into smaller ones and introduce a forwarding reflector; break apart large following lists by splitting "mailboxes" and doing zipper merges on reading it; add a caching layer -- this is not rocket science, and you could do it properly in any language)
Note: I think RoR was a horrible choice for them, though I love Ruby, but I also don't for a second believe RoR was their real problem.
Edit: Your overall point stands, though. Especially given that Facebook is a far more complicated application.
Because your perfect OCaml, ML, Haskell, and any other fancy, magical, fabulous, eternal, fantastical, simplest, elegant ... languages are all atop crumbling foundations implemented in ugly, stupid, out-dated, evil, chaffy ... C, C++, and assembly running on inefficient, silly and fragile digital logical CPUs.
And when Facebook uses this stupid technique to build the world's largest social network for more than one billion users, those elegant and perfect solutions are serving ... how many?
"And in the end, it's still PHP, which is duly disliked."
You might dislike it, but that doesn't mean it's disliked. PHP has a giant base of programmers, scales, is easy to learn, is extremely versatile and powerful, and as you point out, the code was already in PHP. Only an idiot would rewrite a giant working codebase simply to have it in a language that's "difficult to hire and train" in. Or any language for that matter. Perhaps they could have pulled a netscape but instead decided to serve billions.
Hi Bryan, I know most people know you from your prolific work on many great Haskell libraries ( Criterion, Attoparsec, Aeson, ...). Did Haskell have any role in the development of Hack? Looking at the code base it seems like the type system is primarily written in ML, what made the team decide to use OCaml over Haskell?
As you note, the team developed the typechecker in OCaml, as that's what the founding engineers were familiar with. Many of ML's cousin languages happen to be well suited to this kind of work.
Say I'm starting out with an entirely new project and want to leave the legacy of dynamic typing behind. Is there a flag available to enforce the use of type signatures, causing Hack to throw a compilation error when they're omitted?
Do you imagine a future where Hack will merge back into PHP (like PHP 7), in the same style that Beryl & Compiz then rejoined? Or does the team intend for the two to always be adjacent-yet-separate?
HHVM developer & PHP runtime developer here. I've got hands in both runtimes and all even I can say is: Maybe.
I think the most likely outcome is that PHP will adopt some of HHVM's additional features, but remain a separate project. IMO that's a great outcome, since we'll both likely drive the other to be better.
My impression is that Facebook mostly write their stand-alone services in Java or C++, and are using PHP only where they're "stuck" with it due to a large existing code base.
Do you think Hack is a good language to start a new project in, compared to non-PHP languages? Are you using Hack for things besides the main web page?
Yeah, I think Hack is a good language to start a new project in. For as much flak as PHP gets, there are actually a lot of good things about the language. The fast development cycle -- edit php script, refresh -- is something amazing that you don't get in a lot of statically typed languages, which usually have a compilation step. The crazy dynamic things you can do also occasionally have their place, though it's certainly easy to shoot yourself in the foot.
On the other hand, a lot of the time you want the safety that strong static typing can give you. Even just the null propagation checking can immediately find tons and tons of silly little bugs without even running the code, and ensure that the code stays consistent as a "mini unit test" if you will.
Hack hits the sweet spot of both. Wiring the Hack typechecker into vim was really revolutionary for me -- having both the immediate feedback of the type system for all the silly bugs that I was writing, along with the fast reload/test cycle from PHP, is great.
Er, `paste serve --reload` restarts small-to-medium Python projects faster than I can alt-tab, which is actually faster than my static blog engine can regenerate itself too.
Is there a statically typed variant of Python that would work with existing web servers, etc..? I am aware that there's Cython and I know that py3k technically permits type annotations (which Jetbrain's Python IDE uses quite effectively), but that isn't true static type checking in the same way as Hack does this.
Statically-typed Python would ruin a great number of Python libraries you'd probably want to be using. It'd be a very different language.
Once you have type annotations, it wouldn't be too much of a stretch to enforce them statically with a separate tool. You could even go as far as rejecting first-party code if you can't statically determine every single value's type. Pylint's underlying astroid library has a bunch of inference tools you could perhaps build on top of.
Such a tool would be no less difficult to build than hack itself, though. Hack's "gradual typing" solves the problem of re-using existing code that you've mentioned.
FB already had a PyLint-like tool earlier that could do some static analysis, namely pfff (also open source and written in OCaml), but it did not provide a full-on static type system like Hack. (Background: I used pfff when I was in bootcamp at FB itself. This was however prior to hack, I worked solely on C++, Java, and a bit of Python at FB after bootcamp).
I am sure if FB started off with Python, a similar solution could have been found, but if you're looking for a tool that exists _right_ now, Hack is actually quite decent.
Creating a static type system, implement local type inference, as well as working out "gradual typing" and associated problems (all while being able to do type-checking at speeds developer _expect_) is not a trivial problem.
The announcement post says the actual type-checking happens in a persistent server that watches for filesystem changes. That sounds pretty close to continuously running a linter. It also says "without breaking things", so I'm a little fuzzy on whether badly-typed code will actually execute or not.
For that matter, can you call a typed function from an untyped one? Or, worse, a typed method? If the typing is purely static, there's no way to know the method you're calling is actually typed, so there's no actual guarantee that it receives the types it's declared unless your entire program is typed. It doesn't seem like a very strong guarantee if both the caller and the callee have to opt into the typing.
If you're looking for a tool that exists right now, you either have an existing codebase and can't port it to Hack if it's not already PHP anyway (for the same reason Facebook couldn't port away from PHP), or you're starting from nothing and could just use a statically-typed language in the first place.
I don't know if I'd even be excited about the prospect of optional static typing in Python. (It hasn't gotten me interested in Dart, for example.) I'd kinda rather see the effort poured into something that could do static duck-typed analysis/inference, e.g. balk if I pass an argument that could be a non-string into a function that tries to call `.startswith` on it. (Ah, but maybe it could theoretically be a string or None, and I only know it isn't None for reasons the type system can't see, and now I hate the type system.)
I didn't say it was a trivial problem. I just don't feel excited by the solution.
The default mode in Hack is partial: in partial mode, the code itself must be typed and must past the typechecker, but it can call untyped code (that's in a separate compilation unit).
Another mode (you specify the mode per file/compilation unit) is "strict". In strict mode, you can not code any un-typed code (note, the standard library is typed with hack).
(There is a bit more nuance here, but you can read that in the documentation.)
So the idea is to eventually migrate most of the code to strict, but code that relies on legacy can remain partial and you can write new code without waiting for a re-write to finish.
"Shapes" are also a neat feature specifically for parts where static typing can be frustrating for dealing with HTTP requests specifically: http://docs.hhvm.com/manual/en/hack.shapes.php
(FWIW I don't see myself using Hack, but I'm not a web developer. I'd say the ML family languages are my favourite, but for what I do day-to-day it's not really an option.)
Do you know if other languages have static null checking i.e. your null annotation/propagation (apart from Haskell's Maybe union type etc)?
I'm intrigued, because it's such a good idea (especially when null's originator claimed it was a "billion dollar mistake"), though Java doesn't have it. I'm wondering if there's some subtle problem with it...?
Also, how do you make your vim typechecker fast enough? Usually, even syntax colouring is local to ensure adequate performance - and a typechecker with inference/propagation would be very non-local.
I'm stoked. And disappointed the "elvis" operator didn't make it in.
I hope some day Java breaks backwards compatibility and eliminates null entirely. Then again, that's already happened with the proliferation of other JVM languages. But that doesn't me at my day job, where we have a large legacy code base... which would need to be ported to a backwards-incompatible version of non-null Java anyway. Hm.
Well, with Java 8 I can at least start to grow null-safe code within our codebase.
>The fast development cycle -- edit php script, refresh -- is something amazing that you don't get in a lot of statically typed languages
You seriously think that? That's how we do haskell web development. Both yesod and snap do this out of the box. That's how every java developer I know works.
I have a little bit by way of Haskell chops, and I'll venture that the performance of the Hack typechecker is a very big deal, and it is in a different breed than the turnaround time you get from snap or yesod (or Java).
Edit: I misread what you wrote. I thought you were saying there was something fundamentally different about the type checker, but more that reload experience is different then what you get with Yesod.
"Type checking is faster than other statically typed languages" is quite different from "other statically typed languages don't offer this workflow", which is what was claimed.
How do you do it in java? Compile (takes time) -> hotswap (takes time), or can't hotswap since changes to signature, will need to restart server (takes lot of time).
Yeah, that's the one we're using. Still takes a couple of seconds, though. And as I said, big changes can't be reloaded, so the whole server will have to be restarted.
100% of our web front end developers use Hack now. This has been an organic process of growth over the past year, by which I mean our engineers are using it because they like it and see value in it, not because there's someone standing over them with a big stick :-)
The biggest learning step for our engineering teams was to treat type errors from Hack as actual logic errors. We have a collection of "linters" that provide advice on code style and other nice-to-have factors. Some people (quite reasonably) initially thought of Hack errors as lint-like stuff that it was safe to ignore, when in fact they indicate real logical inconsistencies in code.
> Some people (quite reasonably) initially thought of Hack errors as lint-like stuff that it was safe to ignore, when in fact they indicate real logical inconsistencies in code.
Interesting, thanks!
One followup: what has it been like from an ops perspective? Similar to PHP, or is there a better frame of reference?
Sure. I guess I'm wondering how it compares to running plain old Apache/PHP in a production environment. Or, is it more like a Django/Rails stack? Does it use the same memory footprint as PHP, etc?
Facebook hasn't used plain old Apache/PHP in production for several years (HipHop for PHP was announced in Feb 2010), so it is hard to compare.
HHVM is its own web server (although it supports FastCGI for easier use with existing infrastructure like nginx), and that's how we use it in production. It's hard to compare memory usage, except at scale, where it benefits from not having a whole bunch of interpreters (in different processes) running at the same time and some other benefits by using more appropriate types to store values through type inference.
On the scale of a single request (doing stuff from the command line), most benchmarks I've seen are about half the memory footprint. Obviously that's going to vary program to program ($data = file_get_contents("some2gbfile.txt"); is going to take 2GB, regardless), but for "normal usage" 1/2 looks fairly common.
On a webserver, that goes further since HHVM is single-process/multi-thread, whereas PHP (in its typical setup) is multi-process/single-thread. This cuts HHVM's memory overhead much further.
Yes, PHP can run multi-threaded, but it still has a number of instability issues in that mode.
Yes and no.
Yes, because TypeScript is bringing a type-system to a dynamically typed language and so did Hack.
No: because Hack is bringing some additional language features affecting the runtime. Modest changes for now, but we intend to carry on in that direction.
The key difference is affecting the runtime. All those typescript features can be compiled down to regular JS and typescript-generated javascript can be run in a browser without extending the JS engine.
I don't know what you mean by that... stuff like classes were already in the harmony proposal phase before Typescript implemented them. Stuff like type hints won't be in JS ever.
At least to date, Microsoft stresses the "TypeScript is just JavaScript." New language features are added to ES6 and then wrapped with types in TypeScript.
A question about the type inference mechanism: in PHP, while it is possible to define object interfaces which classes may be defined against, by the dynamic nature of the language, functions don't necessarily need that interface specification to accept an object conforming to it, explicitely or implicitely. OCaml provides something "similar" with its object system, but much more powerful with static inference of an object (super)type from its usage. will Hack be able to infer an object interface as well from its usage in a function?
There seem to be 2 questions here:
1) Is Hack type inference total? Answer no: you must annotate parameters and return types.
It would be pretty much impossible to implement total type-inference without loosing separate compilation in Hack.
PHP projects are not organized around a module system, which means that you have "spaghetti" dependencies, and even better, cyclic dependencies all over the place.
So trying to implement total type-inference would be a bad idea, you would not be able to separate the code in independent entities and the checker would not scale.
2) Does Hack support structural sub-typing? Answer: No, but not for obvious reasons.
Fun fact, the first version of the type-checker was implementing structural sub-typing. And it was not scaling, for subtle reasons.
Hack allows covariant return types, so if we implemented structural sub-typing we would have to "dive" into the return types of each method to see if they are compatible. But in turn, these objects could have covariant return types etc ... The process of checking that was too inefficient. Caching is a bad idea (or at least a non trivial idea to implement), because of constraints and type-variables.
Since disallowing covariant return types was not an option (it was crucial to make a lot of code work), we had to kill structural sub-typing.
I hope this answers your question. As a big OCaml fan myself, I like the features you just mentioned (Well, Hack is written in OCaml), but they really didn't seem to be a good fit due to the nature of the language and the kind of checking speed we were shooting for.
Thanks! I was mostly thinking about point 2, and I understand your motivations in going in a different direction after trying it. Very good and enlightening answer!
We don't do any type inference across function boundaries, so we largely dodge the issue that I think you are getting at. (Please elaborate if I misunderstand!) We rely on interface and class definitions in order to know what methods are available, and even though the runtime resolves everything at runtime so you can call any method that happens to exist at that time ("duck typing"), we enforce statically that methods do exist where we can. So for example, the following code will work at runtime since the method `g` does exist, our static typechecker will reject it since it does not exist on type `I`.
<?hh // strict
interface I { public function f(): void; }
class C implements I {
public function f(): void {}
public function g(): void {}
}
function f(I $i): void {
$i->g();
}
f(new C());
Thank you! You brought an interesting point with the example you gave, it is one of the things I was thinking about. Thank you for your work, and for making it available!
Pretty star-studded cast you have on the core team, there.
What's your motivation, aside from modernizing Facebook's code base? What language niche does Hack serve which is not served by other languages? Why Facebook at all (aside from the rarity of finding a company to pay you to write a compiler :))?
The sweet spot that Hack hits is that it combines gradual typing (an idea that hasn't yet seen much real-world adoption) with an incredibly fast typechecker.
This lets you choose the pace and extent to which you want to adopt the safety of static typing, while preserving your dynamically typed code -- and without sacrificing the rapid turnaround of PHP.
How does your approach compare to Typed Racket and Typed Clojure? Could they conceivably achieve the same performance or is there a fundamental difference?
Does Typed Racket have DrRacket support for instant feedback on errors? (It didn't 4+ years ago when I used Racket.)
Typed Racket also includes more types, like ": Integer [more precisely: Negative-Fixnum]" (from docs), as well as polymorphic data structures and higher-order functions. I don't know if this is a "fundamental difference" but it might mean Facebook's type checker can optimize in certain ways that Racket's cannot.
Yes, Typed Racket does have that -- DrRacket continuously expands in the background, so Typed Racket gets it for free by integrating into the macro system.
The Typed Racket type checker is much slower than Hack, though.
Hack does seem to have polymorphic data structures and higher-order functions.
If you look at the original C syntax for function types, it is also a bit weird: part of the type (the return type) is placed before, and part (the parameters) after.
That syntax in hack seems more "functional" to me. That said, I must admit that I kind of like the C syntax, because of its oddity.
I too programmed for a long time in AS3/Haxe, and while I prefer variable:type as I think it's more readable, most static languages (java, C#) do it the other way.
Thank you for your work on this and for open sourcing it. It looks like it can make large code bases in PHP a great deal more manageable.
Do you expect Hack to be stable without large breaking changes going forward?
The documentation doesn't say much about the scheduling for asynchronous tasks. Can async functions be used to batch requests to e.g. caches and databases? Can Awaitable be used to interface with code that expects chainable Future or Promise-style interfaces?
Can HHVM/Hack use standard PHP extensions, or how much work is it to port an extension?
Is there a Hack plugin for IntelliJ?
Does the Hack project have a mailing list or forum?
Our internal plumbing is different, so the extension APIs aren't the same, thus PHP extensions can't normally be used with HHVM. Paul has a source-compatability transformer which works for simple extensions (no fancy stuff allowed), but we recommend anyone with an extension consider porting the code.
If your extension is one of those "This was written in PHP but we wanted it to be faster so we turned it into C" types, you should consider going back to PHP for it. In practice, we find that HHVM can JIT PHP code into running about as fast as (sometimes faster than) C/C++ extension code.
If you do need to dip into C/C++ (because you're calling an external library), the API we've designed is a LOT easier to work with than PHP's. And I say that with the authority of having written THE book on writing PHP extensions: http://amazon.com/dp/067232704X .
With Async functions I'm not sure if the documentation is lacking or I'm just slightly thick, I believe the later.
(I'm not a php developer so excuse my ignorance)
It seems you only have 1 method 'await', to check if the async function has completed its job (it self, if I'm understanding is a blocking operation in which ever thread/worker its called).
So is there a way to run something more similar to
Because as I see it now, what ever I run asynchronously I.E.:
gen_foo(get_user_data); //async
$x = do_something_for_a_while(); //do something while gen foo processes
do_new_thing($x, await gen_foo()); //do something with rendezvous result
I'm forced to time out my async calls so that they'll rendezvous in the same place won't I?
Again if I'm completely off the mark please let me know.
async/await allows cooperative multitasking on a single thread: stay tuned for more...
The way it works is quite different from your example:
async function gen_foo(): Awaitable<string> {
echo "until we get to an await, eager execution\n";
// ...
await gen_bar();
// this code is only executed after await
return 'result';
}
$x = gen_foo(); // x is a handle, suspended at the point of its first 'await'
await $x; // ... and now it's resumed
The benefit comes from being able to batch these async functions together:
list($x, $y, $z) = await genva(
gen_foo(),
gen_bar(),
gen_baz();
); // genva creates a wait handle for awaiting its args
// ... and when we get here $x, $y, and $z are all assigned
As the author of pretty much THE book on writing PHP Extensions ( http://amazon.com/dp/067232704X ), I can say with a fair bit of authority that writing HHVM extensions is SO SO SO much easier.
If you use partial mode, which doesn't enforce that every function is fully typed, then it's trivial to break the type system by just using an untyped function. In order to ease conversion, we just assume the programmer knows what they are doing with untyped functions and let anything pass.
If all of your code is in strict mode, then we believe the type system to be sound. We haven't done any formal proof of this of course, and there have been plenty of bugs in the past. But that's the goal.
Some types are enforced at runtime -- just like PHP5 enforces class type annotations on parameters. However, at least for now, the runtime doesn't do a lot of the clever things with the type system that the static typechecker does. It's much more conservative with using the type information, and doesn't do things like check generics at all. (At least right now! We probably will change this in the future.) This means that we can play a little more fast and loose with the type system in the static parts; we want it to be sound, but it doesn't have to be right now; it's not going to cause a JIT crash or anything.
What was the rationale for being inconsistent in the type notation? Why not be consistent like most languages?
For example, in Java you'd write: int add(int x, int y) { ... }
And in Go you'd write: func add(x int, y int) int { ... }
But it looks like in Hack you'd write: function add(int x, int y): int { ... }
Both Java and Go are consistent, either the type comes before or after the name, no matter which context you're dealing with, be it return type or argument type. Hack seems to mix the two styles, which seems like it would make code harder to read as things get more complex, like when functions accept other functions as arguments.
Am I reading the documentation right? And, if so, can you shed any light on the decision making process that went into this decision?
You are reading the documentation right. I wasn't here when the syntax was codified, but my understanding is the following. For parameter types our hands were tied, since PHP actually already has this syntax for object types (unless we wanted to needlessly break compatibility here). We just extended it with primitive types, generics, etc. For return types, we wanted to preserve the greppability of "function add" in large codebases, both for the actual "grep" tool itself as well as for any other tools like ctags that look for strings like this.
> If all of your code is in strict mode, then we believe the type system to be sound. We haven't done any formal proof of this of course, and there have been plenty of bugs in the past. But that's the goal.
When I look at this from the docs, it seems unsound:
"Hack treats traits as a stand-alone entity during the type checking process. In other words, it ensures type consistency within the trait (i.e., as a black box, so to speak), but does not "copy and paste" the code into all of the classes that use the trait and check for type consistency there. The reason this is done comes down to performance."
Is there something I'm missing that does make this sound?
I'm not sure why you think this is unsound. We check traits in isolation -- we ensure that methods you call are either defined in the trait or declared abstract (and so must be defined in the including class). We also added "trait requirements" to the language, so you can say "the including class must implement this interface" ("require implements IFoo") and we'll know that in the type system too. This means that we can ensure traits are sound even in isolation, and that including classes are sound when they include the traits.
Feel free to play around with the type system in the interactive code editor on http://hacklang.org/.
From playing with the editor, it looks like it is sound.
What I thought the quoted passage was saying was that consistency between a class and a trait used by the class was not checked. That's clearly not the case, though. I don't understand what the performance point is, though, since I don't think the copy-and-paste style would produce different answers.
Finally, the online editor doesn't seem to honor // strict -- it doesn't produce an error when some methods aren't annotated.
Not Bryan, but I am an engineer who works on Hack :)
We don't (statically) protect against type errors when you cross from untyped into typed code. If you call an untyped function, we assume the programmer knows what they are doing -- just like PHP does. You might get a runtime exception if you are calling a method on null, for example.
This is actually a pretty important part of the conversion process. You don't have to convert all your code all at once. Typed code can be verified statically; untyped code is assumed to work just as before.
Parameter and return type annotations are enforced at runtime by HHVM (just like you can add a class type annotations to a parameter in PHP). This will protect against some inconsistencies -- again, just as if you weren't using Hack.
At runtime, we check and enforce parameter types at function entry and return types at function exit. The extra type information can also enable the JIT to emit more efficient code in some cases, and work is ongoing to make that even more efficient. But having these types is independent of being in Hack -- you can have fully untyped Hack code, though I wouldn't advise it, as you are leaving one of the most powerful features of the language on the table.
What do you do with higher-order functions (which PHP has I believe)?
If I write a typed map : (a -> b) -> [a] -> [b]
and then it's called with an untyped function as input, but it returns a something that's not a b. Do you place some kind of barrier around it so that that's stopped immediately?
Does this presentation match the current version of Hack fairly well? It has a nice explanation of the type system that is more concise than the one on the web site.
The idea of bolting a static checker onto a dynamic language and using an unresolved top type to make it work is cool. The presentation refers to your system as SoA gradual typing. Are there any papers or presentations which explain that approach and how it works in more detail? Particularly in how it might differ from gradual typing?
I'm a little confused as to the need for Hack unless you have a code base in PHP and need to ship tighter code (which is a problem Facebook has and my team probably has as well).
Adding lambdas makes PHP more Ruby-like and generics and type checking are straight out of Java. I'm still unconvinced in how this makes programming websites more efficient or bug-free than existing languages. Can you please elaborate on that?
If you're looking for a statically-typed version of Python/Ruby, I'd say nimrod gets fairly close. It's also fantastically fast and compiles to portable ANSI C.
Nimrod is not a statically typed version of Python in the sense Hack is a statically typed version of PHP. Not at all. Hack is gradually typed, Nimrod is not. Big difference.
I thought it was clear I wasn't trying to draw an analogy between PHP/Hack and Python/Nimrod, but rather pointing out that Nimrod is a very nice language for someone who enjoys Python but wants static typing.
Have you guys looked at the new Truffle/Graal back end in Java 8? It has seen some impressive numbers in other languages, did you explore this as a possibility?
That's coming in PHP 5.7 anyhow. It's been implemented and works, but Nikita Popov hasn't had the time to change the function definitions of the standard library to work with it, hence PHP 5.6 won't have it, sadly.
Wow, I haven't kept up with things, but gosh that sounds pretty great. Type checking + named params is a fairly difficult combination of great language features to find in the same language.
Static types are more than syntactic sugar, they allow you to make additional guarantees about the behavior of the code, like having unit tests without actually writing them.
I program in php every day and miss static typing (as opt-in mechanism). Sadly i'm on windows and use oracle, so hhvm is not applicable.
We are just beginning to use the static type information that Hack provides to yield performance optimizations, and that's definitely on the cards as something we'd like to push further.
So argue. Much armchair quarterback. Very flame. Wow.
It's entirely appropriate and reasonable to say "I don't understand why Facebook would do this; please explain it to me." But it reflects a supreme lack of humility to say "I don't understand why Facebook would do this; they must be idiots, and when they, armed with inside information I don't even know I don't have, come to defend themselves, I will angrily try to convince them of the error of their ways."
If you think what Facebook has done here is stupid, why is that something to get upset over? In fact, why is it even something worth arguing about? You're not going to convince anyone who matters, and anyway you have nothing to gain by doing so: Instead, just short Facebook stock and be smug in the knowledge that, when Facebook announces next year that they're abandoning PHP in favor of a ménage à trois of OCaml, F#, and Clojure, you'll be able to say "I told you so".
Facebook is always pushing PHP to new places. Would it be too broad to say PHP is the worst thing that ever happened to Facebook's engineering but Facebook's engineering is the best thing that ever happened to PHP?
As much as I'm not a PHP fan these days, you are completely correct. In 2003, you had CFM (which MySpace actually used at first), Java (and probably some overly complex framework like Jakarta Struts to go with it), CGI, or the new ASP.net. For a college student to do a quick HTML doc with some code interspersed here and there, PHP sounds like the ONLY viable option that didn't have you spending money on licensing, or setting up some huge infrastructure.
I used PHP/MySQL too as it was just dead simple for most things.
Any. Really, any. Facebook is throwing a lot of engineering effort into fixing things that now matter to them, but didn't when they started (and were small).
They could have picked Perl CGI's and it'd still have happened (perhaps Perl 6 would be a success these days).
I don't think PHP has played any role in bringing FB to the size it is today (programming language does not translate to number of users signing up). So yes, any mainstream language available in 2003 would do it.
Perl was a possibility, but CGI is/was a pain to use. To get started you had to understand things like headers. You frequently had scripts refusing to run for whitespace issues relating to CRLF vs LF. You'd need to come up with your own templating language... It was just much more difficult.
Also Perl libraries typically require superuser rights to install. That's a big deal if you're a poor student using shared hosting on your fun hacking projects.
Python really wasn't an option. mod_python by default required you to restart Apache whenever a python file changed. It was designed to be configured by sysadmins on dedicated servers... if you wanted something you could hack together an put on a shared host, it just wasn't an option.
Also the hot Python framework in 2003 was Zope. No one uses Zope anymore. For good reason.
Ruby was much smaller, and this was pre-rails so it didn't have any significant web framework. I'm not even sure if there was a production ready way of hosting ruby scripts.
I believe Python might have worked. It's grown well as a web development language since then by itself. It may have forced several rewrites before WSGI came about. It is a relatively easy language to learn, to hire for, and has few massive warts. (Ruby is roughly in this category too, in my opinion.)
I believe Perl probably would not have worked. It hasn't grown much as a web development language since then (although it has some great web frameworks). The language itself has languished in both development and community growth, and it fairly esoteric and certainly harder to learn than Python and PHP, and would have been harder to hire for. It seems unlikely that pre-2010 Facebook to meaningfully have changed the trajectory of the language (ie, Perl 6 being a success).
I don't think Java, C++ (or C), Cold Fusion, ASP, Haskell, Ocaml, and many other languages would have worked, for a bunch of reasons. Harder to hire for, harder to iterate with, more costly in terms of infrastructure, harder to teach/learn, and so forth.
PHP isn't a totally insignificant part of Facebook being as successful as it has been. A good-enough language choice was necessary, at least, and PHP it seems was a good-enough choice.
The point you "Anything but PHP" guys seem to conveniently ignore (or perhaps, not comprehend), is that at Facebook scale, you are going to have issues with many of these things regardless of what language you chose. They have specifically made this point many times. You and others point to some hand-wavy magical language which somehow doesn't have any issues when serving hundreds of millions of users. That doesn't exist.
Do you think perl would have been much better? Either way, the point that he was making was that there were pretty slim pickings back then. Certainly the dominant languages/frameworks of WD these days (python, ruby, node) didn't exist. Or by another token, the only dominant one that was also strong back then is PHP, the one they chose.
Python was over ten years old in 2003. In Python web development there had already been CGIs, mod_python and two major releases of Zope, with hope of Zope 3 on the horizon.
I'm aware that they existed, but clearly they were very immature as compared to how they are now. PHP on the other hand was in it's heyday, and (clearly) fulfilled all of the needs of Zuckerberg and the other developers. A suggestion at the time that they use a language like ruby or python would probably never have been made, and for good reason - and it almost certainly wouldn't have been adopted. It's just not realistic to suggest that early FB had this vast array of choices for which language to use.
I got my first job in 2003 and it involved PHP development. I remember well the state of affairs then. Yes, PHP was the most pragmatic option for getting things out the door in those days. Beginners getting started today don't know how lucky they are in terms of tools, languages, libraries, communities and documentation available.
Engineering decisions made in 2003 need not dictate the future of the company more than a decade later. Keep the systems that are written in PHP around for their nature life. Write new systems in whatever is more sensible at the time. Lazy-load your way out of shitsville.
Of course this is vastly complicated if they been neglecting modularity and SOA for the past decade and most of facebook is still a monolith.....
Pretty much only the front-end code is written in PHP (or Hack, I guess).
Services (things like the feed, the typeahead, graph search, messages, &c.) are written in C++ and other languages - ie, the things it is sensible to write them in...
Do you know if Facebook is still trying to operate out of a single mercurial repository? Or is that only for the front-end code with services having their own repositories?
About the 'new places' - I wonder why they're first abandoning PHP the project and now also PHP the language.
I don't see HHVM that much as a 'different runtime' like Rubinius/JRuby/Topaz or Jython/Unladen Swallow/Pypy - it's more a fork in my book, and Hack adds a lot to that by also greatly changing the language.
This is not a criticism or even positive/negative - this is my point of view.
However you see it, the HHVM team sees it as a runtime. They make mention of this even in this post. They want to fully support PHP. They aren't abandoning anything. You don't try to build on top of what you've abandoned.
My favourite comments are the ones where some random geeks tell the guys who have built a multi-billion dollar business with hundreds of millions of daily active users how they're doing it wrong.
Congrats to Facebook on taking PHP forward. It powers a vast amount of the web and it's great to see that it's getting some engineering love!
>> where some random geeks tell the guys who have built a multi-billion dollar business with hundreds of millions of daily active users how they're doing it wrong.
Remind me the words from Steve Jobs:
"By the way, what have you done that’s so great? Do you create anything, or just criticize others work and belittle their motivations?"
As a frequent user of R, this is one of my biggest pet peeves. There are a practically infinite number of unique, interesting, pronounceable, google-able names for a new programming language, yet we continue to get stuck with this shit.
I'm not sure what to think of the language (other than the fact that it looks very similar to Typescript); but, as for your comment, suffix the word lang and you're golden : golang, dartlang, hacklang.
Looks like Facebook might have inadvertently turned PHP trendy again.
Expect "Why I migrated from Go to Hack" articles soon enough.
Either way, the name is very fitting. I have no use for this, but good for Facebook that they've managed to (at least to some extent) evade some of the many PHP pitfalls.
I started web programming using PHP back in 2000, but quickly moved to Python a couple of years later and never looked back. That is, until I joined FB this year as a data engineer. Programming in PHP with Hack/XHP is awesome.
Historically XHP was developed before Hack. It's available as an add-on for PHP but never became a formal part of the PHP language. When Hack was being designed it needed to work well with XHP (as XHP is used a fair amount with Facebook's code base), so over time it made more sense to start thinking of XHP as being one the language features that Hack offers that is not available in stock PHP.
Is Facebook's code so monolithic that they can't deploy new, decoupled services written in new languages? Twitter did this with Ruby, Java, and Scala. Didn't Facebook create Thrift RFC for exactly this purpose??
Facebook already has several decoupled services written in other languages. The PHP codebase is mostly the front-end code which, beyond some database work, dispatches heavy lifting to back-end services written in C++, Java, and other languages.
As you say, Thrift connects this all. You can look into our C++ core library, Folly, and other C++, C, Python, and Java applications and libraries like Rocksdb, Presto, watchman, Buck, flint, scribe, and so forth on github.
I haven't looked this over too much but I'm curious as to why they did this:
<?hh
class MyClass {
const int MyConst = 0;
private string $x = '';
public function increment(int $x): int {
$y = $x + 1;
return $y;
}
}
instead of this:
<?hh
class MyClass {
const MyConst: int = 0;
private $x: string = '';
public function increment(int $x): int {
$y = $x + 1;
return $y;
}
}
The first seems inconsistent to me. Especially coming from AS3/Haxe where the function return value is indicated in the same manner.
For property and constant declarations, we chose to be consistent with the parameter typehint syntax (int $x) instead of the function return type annotation (float).
A nice effect of this decision is that this allows us to change:
class C {
private int $x = 20;
public function __construct(int $x,) {
$this->x = $x;
}
}
into the shorter and cleaner:
class C {
public function __construct(
private int $x = 20,
) {}
}
Well, it's been poo-pooed for obvious reasons (like Javascript) but those are the languages that are getting all the love in the end, so maybe I should stop programming in a language that I find beautiful (Ruby) and go with the flow.
On Linux, we use the inotify subsystem to be informed every time you save a source file or switch branches. The hh_server process can then update its data structures immediately, without being explicitly asked.
Because this is tremendously important to the usability of the language.
One of the reasons that PHP has been such a success is that you can simply save a source file and reload a web page to see what's going on.
The server-based typechecker that runs instantaneously is what makes it possible for Hack to replicate this experience: you save a file and reload a web page, but you have the safety net of a typechecker that told you about your type errors as soon as you saved the file.
I can't overstate how important this instant feedback is: it's really the thing that distinguishes Hack from working in a more traditional compile-based static language.
Was it debated whether this should be a tooling feature - thinking of an IDE plugin that parses and type-checks in the background rather than a dedicated service to watch text files on disk
Are there any plans for a Hack -> PHP transpiler? Or is that impossible? It would be nice to develop on Hack, transpile back to vanilla PHP where we still have to use the PHP.net runtime (shared hosts, Google app engine, Engine Yard, etc)
Hack is more than just a type system. At the type level, it would be trivial to erase the types and run the code on vanilla PHP. Dealing with everything else is much harder (xhp, containers, lambda, etc.), at which point it just makes more sense to use HHVM.
As said on reddit, that is really a poor choice of name. Good luck looking something related to this language on google putting "hack" in the searchbox...
It is even worse than "Go"
I don't understand why there is so much fuss about PHP in 2014. Hasn't most serious web development moved beyond scripting languages? Java and C# have been mature languages for statically-typed web development for years and are not difficult to achieve competence in. And for more fluid yet terse server-side languages you could always go with Ruby, or Smalltalk, or Racket, or F#, and their associated frameworks. Defending PHP in 2014 is almost as surreal as defending Classic ASP in 2014.
> Thus, Hack was born. We believe that it offers the best of both dynamically typed and statically typed languages, and that it will be valuable to projects of all sizes.
In which way does it offer the benefits of dynamic typing? The entire point seems to be to abandon dynamic typing, which is fine, but not what that sentence says.
I'm guessing, for example, you can't really do meta-programming with Hack in the way you can with dynamic languages, is that correct?
It depends on what you mean by "with Hack". You cannot do this in Hack code -- we disallow the dynamic features of the language you need to do this; they are impossible to statically typecheck and verify. However, Hack code iteroperates seamlessly with PHP code, so there's nothing stopping you from having some PHP files in your codebase that do anything that standard PHP can do.
So.. my question remains unanswered... if you cannot do meta-programming in hack, how can you say it "offers the best of both dynamically typed and statically typed languages"?
It's more accurate to say that the point is to let you use dynamic typing where that makes sense, and static typing in the many cases where it helps. Hence "best of both".
The benefits of dynamic typing come because you are not forced to convert all your code to the statically typed version. You can still keep some parts synamically typed if you want
I wish these extensions came by way of the PHP core rather than a language that is superimposed on PHP. However, this is awesome.
Is this a layer that is superimposed on PHP that falls back to the default interpreter for unimplemented features or is this a fresh implementation. I suppose my question is, how reliable is this. Are the core PHP bugs going to manifest here? If a bug gets fixed on core PHP, will hack be lagging behind?
Hack runs on the HHVM engine. HHVM is fairly stable and is used by Facebook to serve over a billion web requests per today. In the past year or so developers outside of Facebook have started using HHVM to run other PHP codebases, and at present 20 of the top PHP frameworks are able to run correctly on HHVM (hhvm.com/frameworks), with 9 of the framework's test suites fully passing.
If you encounter behavioral differences or bugs when running HHVM, you can report it at our github site (hhvm.com/repo) and we will help get it resolved.
> I wish these extensions came by way of the PHP core
Well where is PHP core then? to my knowledge while PHP 5 was a huge step forward, PHP core did little to fix PHP stupid design flaws,for the sake of backward compatibility.
We are in a new era, libraries are more and more decoupled,so they are easy to write or rewrite,backward compatibily shouldnt be an issue when libraries are well versioned and tested across different versions.
And Wordpress folks dont care about PHP.next.Why bother with them?
>I wish these extensions came by way of the PHP core rather than a language that is superimposed on PHP
Someone has to do it, and the PHP developers have proven time and again that they aren't going to. At some point the only options are to leave PHP altogether, or make your own PHP that sucks less.
Odd, just two days ago I was remembering a college class where we had to create a new programming language, and I named mine "Hack". The cover of the paper was a copy of K&R only "C" was replaced by "Hack" and of course I was the author.
I'm really curious to know if any new project will start using this language. I really wouldn't understand why they would.
I think that would be an argument to explain why some consider this whole project a loss of time in the long run ( although probably something really great from a purely CS pov)
Edit : i just forgot that i still haven't found any server side language i'm satisfied with. I'm still waiting for a type-annotated variant of python to catch on.... So maybe hack would be a good choice after all. It's such a pity they started this work based on php rather than any other cleaner language.
It was very important to us to have a gradual and smooth upgrade path, such that you can start using Hack by just tweaking the "<?php" header at the top of a source file.
Obviously switching to a syntactically incompatible programming language would be at odds with that goal.
Haxe is much more structured in how it's organized. It requires an explicit class for code, it has very few methods or variables in the global namespace, it has block-level vs. function-level scoping, etc. Those are pretty typical language features for languages designed for large code bases, but they're probably unfamiliar territory for some php developers.
It's cool to see the same idea implemented in Hack. I don't know if anyone at FB has even heard of Obvious, but it's cool that they had the same idea in their language.
This is great and exactly what I need in my PHP. Glad to see types making sort of a come back (see also: TypeScript) as I think they're, in one way or another, necessity in a large scale applications.
One note though - the success of this depends on the success of HHVM. Hopefully FB guys understand that and will push even more to make HHVM the best platform to have for running PHP on.
everyone has it's own <opinion> over the fact that facebook had to use another language rather than php. On the other side there is the fact that facebook managed to get one of the most visited website on top of php, period. This is a <fact> and we would never know what would be facebook if it was built on top of another language. Probably they had to shut down early just because they were not able to hire enough developers to keep pace with new stuff and scale.
At the same time they haven't suffered the same amount of issues twitter had to deal with (remember the landing whale?) with much less users and traffic just because they were using RoR and that lead them to move to something completely different with scala and a different architecture. But still, facebook has about 4x times users and much more data to deal with (more text, images, video and so on).
Nevertheless, let's remember that web applications are not a one-shop monolithic anymore.
Firstly, Awesome!
Need to traverse more in it - definitely.
But as the documentation says primarily "HHVM can run both your PHP and Hack code, either separately or when both are part of one project." -- does that means we can imagine a framework or something like that - where parts of it is in Hacklang and parts are in old PHP?
Yes, we are still very active users of MySQL. Most of the primary portions of facebook.com are still served from a backend MySQL system (with lots of caching and many other services involved). Some data is not stored in MySQL, but in other systems such as HBase (messages being a big one).
It looks promising, I highly dislike writing PHP and this seems like it might ease the pain, but they could've gotten rid of the damn dollar sign in front of variables, how ugly is this? "return ($y) ==> { return $y + 1; }"
Edit: My previous comment was referring to the second occurrence of "return" in your example (after the "==>" arrow). Just noticed I mistakenly dropped the first occurrence of "return"; the first occurrence of "return" in your example is needed.
I realize it must be shocking to see functional programming language geeks solving real-world problems in a pragmatic way. :-)
Seriously, though, they wanted to move a large PHP codebase over to a better language, at the scale of a multi-billion-dollar tech company, in a reasonable amount of time. They accomplished this. What's the criticism here?
I have wondered why FB didn't use a proper language with proper typing to begin with. I mean, I "understand" logistically: they already had a giant codebase in PHP, migrating a codebase is expensive, and it's difficult to hire and train 1000s of hackers in e.g., OCaml. (They do have some OCaml people, but they are outliers. OCaml was my favorite thing to write there, though it didn't afford some of the same niceties and interactivity as the PHP code they had, only because the support was down by several orders of magnitude.)
But at the same time, layering FP with a home rolled static type checking server (??) is bug prone and is certainly yak shaving (which they have time and money to do). Now they've written (1) a compiler to C++, (2) a compiler to VM byte code, (3) a corresponding runtime for each, (4) extensions to PHP, (5) a type checker, and (6) an inference engine. That's a lot of stuff. And in the end, it's still PHP, which is duly disliked. (Though Facebookers don't seem to care. The prevalent attitude toward it is that "PHP, as it's coded here, is mostly like C++, and that's OK.")
Writing correct type checkers and inference engines is kind of difficult. They seemed to take the approach of just building onto it incrementally until it just seems to work. That approach led to many bugs in many cases that just simply aren't thought of when one is trying to build inference engines by hand, as opposed according to theory. Type checking and inference is an area ripe with theory and attached formal, mathematical semantics. Standard ML's standard is perhaps the most infamous; it's a collection of mathematical statements about the language. That way, the compiler is now almost an engine to prove your code is correct. I don't see how the same guarantee can be made with something that is just cobbled together.