Hack: a new programming language for HHVM

reikonomusha · on March 20, 2014

I am baffled as to why you'd build your castle atop a crumbling foundation.

I have wondered why FB didn't use a proper language with proper typing to begin with. I mean, I "understand" logistically: they already had a giant codebase in PHP, migrating a codebase is expensive, and it's difficult to hire and train 1000s of hackers in e.g., OCaml. (They do have some OCaml people, but they are outliers. OCaml was my favorite thing to write there, though it didn't afford some of the same niceties and interactivity as the PHP code they had, only because the support was down by several orders of magnitude.)

But at the same time, layering FP with a home rolled static type checking server (??) is bug prone and is certainly yak shaving (which they have time and money to do). Now they've written (1) a compiler to C++, (2) a compiler to VM byte code, (3) a corresponding runtime for each, (4) extensions to PHP, (5) a type checker, and (6) an inference engine. That's a lot of stuff. And in the end, it's still PHP, which is duly disliked. (Though Facebookers don't seem to care. The prevalent attitude toward it is that "PHP, as it's coded here, is mostly like C++, and that's OK.")

Writing correct type checkers and inference engines is kind of difficult. They seemed to take the approach of just building onto it incrementally until it just seems to work. That approach led to many bugs in many cases that just simply aren't thought of when one is trying to build inference engines by hand, as opposed according to theory. Type checking and inference is an area ripe with theory and attached formal, mathematical semantics. Standard ML's standard is perhaps the most infamous; it's a collection of mathematical statements about the language. That way, the compiler is now almost an engine to prove your code is correct. I don't see how the same guarantee can be made with something that is just cobbled together.

lbrandy · on March 20, 2014

> I am baffled as to why you'd build your castle atop a crumbling foundation.

Because perfect is not the enemy of the good? Because "build atop a crumbling foundation" has demonstrated time and again to be, by far, the most successful way to accomplish anything in computing? Unless you have some example of perfect, now dominant, technologies that have been created ex nihilo that I'm missing? I mean we (facebook) are still using PHP and MySql, improving both. And when we need to break things out, we head into C++, the queen mother of castles on broken foundations.

> But at the same time, layering FP with a home rolled static type checking server (??) is bug prone and is certainly yak shaving (which they have time and money to do). Now they've written (1) a compiler to C++, (2) a compiler to VM byte code, (3) a corresponding runtime for each, (4) extensions to PHP, (5) a type checker, and (6) an inference engine. That's a lot of stuff.

All languages, runtimes, and standard libraries (and databases, and source control, and on and on) are "broken" at sufficient scale. You're going to be spending time rebuilding things other people take for granted no matter who you are and what language and technology you are working in. The underlying assumption that a "proper language" gives you these things for free is completely false.

> Writing correct type checkers and inference engines is kind of difficult.

Just so we are clear, you ask why Facebook didn't rewrite 10,000 human-years of code into a mythical unnamed "proper" language, but you consider writing a type checker to be "difficult". I think you might have vastly inaccurate pictures of what is and isn't "difficult".

> That way, the compiler is now almost an engine to prove your code is correct. I don't see how the same guarantee can be made with something that is just cobbled together.

Computing history is littered with dead projects from people who believed that anything less than perfect is unworkable or non-valuable.

Silhouette · on March 20, 2014

Because "build atop a crumbling foundation" has demonstrated time and again to be, by far, the most successful way to accomplish anything in computing?

I can't imagine where that sort of conclusion comes from. Building on a crumbling foundation seems to be just about the most proven, reliable way to ensure your software project won't survive more than a short time without needing serious effort just to maintain it and/or a big rewrite.

Much of the world still runs on C, a language that was created long before many of us here were born. Other languages have risen, peaked, and fallen into relative obscurity since then, yet C endures, because for all its faults, it is simple and predictable, the epitome of a sound foundation.

Large amounts of COBOL is still driving back office systems in large organisations. The cost of hiring people with the skills to maintain them is probably horrific, but those systems are still there, doing their jobs decades later.

You can still run applications from nearly 20 years ago on Windows today, in no small part because of Microsoft's persistent focus on compatibility and keeping the basics reliable over that time. Similar stories can be told in *nix world.

What major accomplishments in computing that have been built atop crumbling foundations can claim anything even close to the scale of success of these examples? You surely can't be talking about the move fast and break stuff philosophy that seems to drive everything at trendy software shops like Facebook and Google, or the kind of MVP/lean start-up hype we hear about ad nauseam on HN and the lasts-just-long-enough-to-exit web apps that result.

lbrandy · on March 20, 2014

PHP is a "crumbling foundation" in exactly the same way DOS, Windows, COBOL, and C and C++ were, are, "crumbling foundations". I don't see how you are disagreeing with me. You are proving the point.

You aren't actually going to claim that Windows was well-conceived and theoretically well-founded, are you? Windows has been a never ending refinement on exceptionally shaky ground.

Silhouette · on March 20, 2014

It seems like you're now arguing that PHP isn't a crumbling foundation -- a point that reasonable people could debate -- rather than that building atop crumbling foundations is by far the most successful ways to accomplish anything in computing, which was the claim I challenged.

Incidentally, my point about languages like C or operating systems like Windows was not that they are theoretically wonderful under the hood, merely that they provided a reliable foundation. C has been standardised for a long time and is widely portable. Code that was designed for early incarnations of Windows will often run with little modification even on today's systems because the essential underlying models and APIs have been diligently preserved over the years even as many other changes were going on around them.

nostrademons · on March 20, 2014

I'm curious just what definition you're using for "crumbling foundation". Is it that old software doesn't break on it, in which case PHP/HHVM/Hack isn't a crumbling foundation either - Facebook is built on it, and Facebook is clearly still running.

Or is it that maintenance is difficult and programmers will run into all sorts of ugly corner cases and features that are just grafted onto each other? Because those apply to C and C++ and Win32 and Google and basically every other large software system as well. That's the point the grandparent is making - if you look at pretty much any successful, evolving software system under the hood, you'll see a byzantine mess of complexity, and it's a wonder that it ever works.

Silhouette · on March 20, 2014

I didn't have some some specific, technical definition in mind, but if I were to try and pin it down, perhaps in software terms it would be something like "a dependency that is unreliable in the long term".

Clearly this isn't an absolute scale. As our industry evolves and we develop more reliable ways to achieve our goals, something that we regarded as being a relatively stable foundation in the past may no longer be regarded as such in the future when our standards have risen. Moreover, what constitutes "long term" might vary wildly among different projects.

I suppose my basic objection to the original claim (that building atop crumbling foundations is by far the most successful way to accomplish anything in computing) is that significant achievements in computing tend not to happen overnight but rather to develop over time, and the more stable your foundations, the better chance your project has of developing far enough to achieve significant things.

nostrademons · on March 20, 2014

If you define "crumbling foundations" as "a dependency that is unreliable in the long term", then your conclusion is circular. Of course no long-term products will be built on crumbling foundations, because if a technology stack has long-term successes then, by your definition, it is not a crumbling foundation.

I think the other posters are using a definition of "crumbling foundation" as "one which most engineers hate, which slows them down through excess complexity". And by that definition, almost all successful projects are built on crumbling foundations, because the fact that the project was successful leads you to add features to it, and adapt it in ways that the original architecture didn't anticipate. This process only ends when the software becomes so complex that all further attempts at modification fail, at which point everybody hates the codebase and it is, by pretty much any definition, a "crumbling foundation".

dasil003 · on March 20, 2014

I think it's funny that this whole thread is based on the initial haphazard word choice of reikonomusha yet he is not even a participant in the debate.

Silhouette · on March 21, 2014

If you define "crumbling foundations" as "a dependency that is unreliable in the long term", then your conclusion is circular.

Not at all. I'm arguing that in computing, worthwhile results often take time to achieve, and therefore that foundations that are likely to be around for longer will improve the chances of achieving such results. Alternatively, from the opposite point of view, the odds of achieving something worthwhile go down significantly if you have only a short time time to achieve it, as inevitably you will if you are building on foundations that aren't themselves going to be around for long (whatever we choose to call them).

jaequery · on March 21, 2014

hm interestingly, PHP still powers about 70% of the web! the web should have certainly crumbled by now ...

yawboakye · on March 21, 2014

I'd disagree if you say PHP powers about 70% of the web. Web servers (and all other kinds of servers) are not written in PHP neither are protocols that drive the web. Operating systems are not written in it either. These (and many more) are what power the web, not web applications. Your definition of what "powers the web" is remarkablly wrong.

tesseractive · on March 21, 2014

Facebook doesn't use web servers or operating systems written in PHP either, right? So if the language that web applications are written in isn't very important to the overall dependability of web applications, then it stands to reason that Facebook's choice of PHP for their web applications shouldn't be a problem either, right?

What point are you trying to make?

yawboakye · on March 21, 2014

> What point are you trying to make?

I had this question in mind while reading your reply. What point are you trying to make with your comment? For my part I wanted to point out that it's unacceptable to say PHP powers 70% of the web. (Check my previous comment's parent.) It has nothing to do with Facebook. And I didn't allude to your inference that choice of language doesn't affect the overall dependability of web applications.

coldtea · on March 22, 2014

>I didn't have some some specific, technical definition in mind, but if I were to try and pin it down, perhaps in software terms it would be something like "a dependency that is unreliable in the long term"

In this case you are mostly hand-waving, and what you say amounts to "I don't like X language".

vidarh · on March 21, 2014

> It seems like you're now arguing that PHP isn't a crumbling foundation

I think you need to go back and read the message you replied to. He is not arguing that at all. He's pointing to a long range of other things that he is asserting are also crumbling foundations.

discreteevent · on March 20, 2014

"Windows has been a never ending refinement on exceptionally shaky ground." No. Windows NT was a complete rewrite and it was " well-conceived and theoretically well- founded"

DeGuerre · on March 21, 2014

What you say is technically correct, which is the best kind of correct. However, fewer than 1% of Windows programmers ever talk to NTOSKRNL, and probably an order of magnitude fewer do it regularly.

Most of the time, you're talking to Win32/64 or high-level services based on DCOM or .NET, where the "well-conceived" and "theoretically well-founded" stuff doesn't turn up. You can go your whole career without knowing that there's a well-designed kernel under all that cruft.

I'd guess that less than half of Windows developers could say what the object manager does.

AaronFriel · on March 21, 2014

For more than half of Windows developers, if you told them there was an object manager, they would ask how you can turn off that service to improve performance.

frozenport · on March 21, 2014

This isn't true. Every Linux users encounters problems with drivers. By integrating them into the kernel we have an eco system where drivers are out of reach of many users and OEMs. Consider the difficultly in making a desktop scanner for linux. Additionally kernel changes can break a driver with little recourse on the OEMs side, you must simply bend to Linus's will. At the end of the day we suffer from Linux's driver architecture.

dingdingdang · on March 21, 2014

So true. Windows' services and drivers system may not be beautiful but it full on works most of the time and it allows even non pro Windows users to change basic stuff with far greater ease than Linux which is a real shame given Linux' potential for transparent usability (a potential it has had since the late 1990s but I have personally given up waiting for it to come true)

dscrd · on March 20, 2014

I'm appalled and little bit insulted that you group PHP in the same group with COBOL, C and C++ in terms of their foundation. COBOL and C were designed by some of the greatest pioneers of the field, and indeed PHP is built on C.

jroesch · on March 20, 2014

Just because they are remembered as being first doesn't mean they were well designed, or actually pioneers. At the same time C was being written others were working on Lisp, and ML. It has taken nearly half a century for some of their innovations to be recognized as good ideas and taken up by main stream languages, while C was a small improvement upon existing language design.

pjmlp · on March 21, 2014

C designers choose deliberately to ignore the other system level programming languages at the time, which already offered more memory safe constructs, in their quest for a portable macro assembler.

PHP could have been easily done in any language with native compiler toolchains.

adamlett · on March 21, 2014

Why would you feel insulted? Did you personally create any of those languages? In fact, did you personally create any language? If not, maybe you should't be quite so dismissive of the accomplishment it was to create PHP. Sure, it's not the best language out there, and valid criticisms can be levelled at it, but as they say: It's better to have tried and failed...

yawboakye · on March 21, 2014

What you just said is plain stupid. So our language designers can be critical of other languages? Bullshit! Since programmers can vote with their feet and gravitate towards better languages (is it miraculous that almost all programmers have a distaste for PHP?) we should give reasons why we use C and not PHP. If they were equally crappy what would have been the cause of choice of one over the other? I can call PHP the worse language ever, and I don't need to have created a language already.

coldtea · on March 22, 2014

>So [only] language designers can be critical of other languages?

Yes. Or rather, other people can be critical too, but language designer' opinions have far more validity.

In other words, anybody can say whatever uninformed BS he wants (it's a free country). But that's no replacement for being an expert in what you're discussing.

>Since programmers can vote with their feet and gravitate towards better languages (is it miraculous that almost all programmers have a distaste for PHP?)

You'd be surprised. PHP is one of the most popular languages and one of the most used languages, so far from "all programmers have a distaste for PHP". So if we were to use that "voting" argument alone (which I find wrong), PHP should be considered a very good language. Not the intention you had, I guess.

While PHP has it warts, it's mainly the less pragmatic and more fad prone programmers that have issues with PHP, those who look for silver bullets and like to feel superior by choice of programming language, editor and the like.

As for programmers "voting with their feet", well, they don't do quite a good job at it. The best languages (like LISP, Smalltalk, OcamL, Haskell, to name but a few etc) are seldom the most popular too.

yawboakye · on March 24, 2014

> but language designers' opinions have far more validity

Here's the case we're talking about the work of a language designer. By the way, not all people capable of creating a programming language have created one yet. Marc-André Cournoyer, who wrote the book that Jeremy Ashkenas learnt language design from to write CoffeeScript, has not created a language himself.

tesseractive · on March 21, 2014

Given the widespread use of PHP for web programming, it seems obvious that in the domain of web programming, developers have voted with their feet for PHP.

yawboakye · on March 21, 2014

Energetic amateurs. They've not reached maturity so that their votes should be counted.

tesseractive · on March 21, 2014

Given that these supposed "energetic amateurs" have created an enormous percentage of the code that web applications across the internet run on, their votes automatically count. If PHP developers had never produced anything of value, that would be a different discussion.

dscrd · on March 22, 2014

I also get a bit (not terribly, just a bit) insulted if somebody says "Beethoven sucks", because the person who said it is technically the same species as I am.

pekk · on March 20, 2014

In the sense that any software written in C is built on C.

dscrd · on March 21, 2014

True, but a lot of PHP's ... design also reflects C, only that they warped it.

wvenable · on March 21, 2014

A few of the complaints about PHP (the inconsistent function library, for example) exist because PHP was originally a very thin scripting language layer for using existing C libraries.

coldtea · on March 22, 2014

C was not build by the "greatest pioneers". Just succesful pioneers.

There were far more evolved and elegant languages at the time C was created (and during the time it took for C to rise, even more were made). LISP for one, but also languages with the same performance characteristics and systems programming capabilities as C.

C definitely wasn't seen as a "great language design" -- just a very useful and pragmatic one (e.g see also the classic "Worse is better" essay).

FooBarWidget · on March 20, 2014

C, simple and predictable? You've got to be kidding me: http://lwn.net/Articles/586838/

Ygg2 · on March 21, 2014

Compared to C++

Compared to C++ a language made of explosions and guttural sounds is an elegant way of communication.

tomphoolery · on March 20, 2014

It can definitely be simple, but if you're writing C you're compiling directly on hardware. If you change the hardware, your program will not run. That is literally the definition of unreliable.

solipsism · on March 21, 2014

It's not, actually. The Mars Exploration Rover won't work underwater. That doesn't mean it's unreliable.

When your compiled C program sometimes runs on x86 and sometimes doesn't, that is the definition of reliable.

serge2k · on March 21, 2014

That's a stupid definition for unreliable.

randomdrake · on March 21, 2014

Reliability can be measured in successes and failures. Failing reliably is not necessarily a bad thing. Some things you build may or may not work on other hardware. That's unreliable. You assert, with confidence, that C will fail. That's reliable. You can rely on it failing.

comex · on March 20, 2014

> Much of the world still runs on C

But these days, most interesting applications run on C++, which started from the arguably crumbling foundation (from C++'s point of view, not per se) of C, and grew organically over several major revisions into something hideously complex.

This trait of C++ is not a good thing, but the amount of successful software written in it seems to prove that it's not fatal either.

> You can still run applications from nearly 20 years ago on Windows today, in no small part because of Microsoft's persistent focus on compatibility and keeping the basics reliable over that time.

Layers of hacks on hacks to keep old software running correctly is exactly "building on a crumbling foundation", and probably the reason Microsoft is trying to get rid of Win32, having severely limited its availability on ARM in favor of the WinRT APIs. But I'd say the venerable success of Win32 demonstrates that the crumbling foundation works.

bad_user · on March 21, 2014

> This trait of C++ is not a good thing, but the amount of successful software written in it seems to prove that it's not fatal either.

That depends. It could also be argued that our current tools are limiting what we can accomplish and at some point we'll need a revolution, otherwise we'll hit a ceiling, just like when concrete replaced the need to carve and place rocks on top of each other.

For example living organisms are way more adaptable, more self-healing than anything we've ever built. Our own body should be a text-book example of massive parallelism involving trillions of independent agents that cooperate with each other. In particular, the process of wound healing that happens when you cut yourself is quite fascinating: https://en.wikipedia.org/wiki/Wound_healing

If we wanted to simulate the human body, or at least the human brain, since that's the most interesting part, somehow I'm not seeing C++ in that picture.

pjmlp · on March 21, 2014

C++ belongs to my list of languages that I enjoy using, sadly it was build in quicksand foundations due to C compatibility as a way to make it mainstream.

yawboakye · on March 21, 2014

Add JavaScript. Super crappy but its ubiquity for web scripting is really saving it some real bashing. When we finally have options, we'd relish in our freedom and say what it was like to work with badly written programming languages.

munificent · on March 20, 2014

> What major accomplishments in computing that have been built atop crumbling foundations can claim anything even close to the scale of success of these examples?

Wikipedia and Facebook?

huherto · on March 21, 2014

A fair question is if they successful thanks to PHP or inspite of PHP.

lxm · on March 20, 2014

Yahoo!, too, is a heavy PHP user.

jmgtan · on March 21, 2014

They're slowly moving to node and java for quite some time. I interviewed there a few months ago and they drilled me on java stuff.

copergi · on March 20, 2014

Neither of those would even remotely qualify is accomplishments in computing. But they also are terrible examples. Facebook isn't using PHP anymore, that's what this discussion is all about. Wikimedia is terrible, and wikipedia is almost exclusively static content being served by squid.

kamaal · on March 21, 2014

What are we discussing? Are we discussing about the tool or the things made from tools?

Its like trolling a raw cast iron hammer used to build a furniture factory and then saying the whole factory is entirely useless just because some invented a stainless steel hammer.

copergi · on March 21, 2014

We're discussing both. The original question posed was something like "what amazing stuff has been built on crumbling foundations", and the response was "wikipedia". I don't necessarily agree with the sentiment of the question, but wikipedia is not a good example of amazing technology.

lectrick · on March 20, 2014

Given a programming task, it will be written faster, easier, and in a more maintainable and less bug-prone fashion if it is not PHP.

The opposite opinion is basically indefensible. Sure, you can still dig a trench with a spoon (and if you have enough money to wield a bunch of workers with a spoon), even if a shovel would do a better job.

Let's begin.

1) PHP autocraptastically converts strings that look like numbers, into numbers, resulting in all sorts of weirdness like this: https://eval.in/111886

2) PHP 5.4's OWN TEST SUITE has 91 failures and only 70% coverage. There is NOTHING more "WTF" than that! Why even bother having a test suite?? http://gcov.php.net/viewer.php?version=PHP_5_4

3) Why the fuck are all of these different things equal, and how does this NOT result in problems? http://i.imgur.com/pyDTn2i.png

4) String increment is dumb to begin with, but why does it not even match the behavior of string decrement? https://eval.in/60631

5) Why the hell can you jump back into a try block from a catch block? Recipe for disaster: http://phpmanualmasterpieces.tumblr.com/post/33091353115/the...

6) PHP comparison operators. I'm sorry, but this level of complexity might make you feel smart once you master all its idiotsyncrasies [sic], but it's actually dumb: http://stackoverflow.com/questions/15813490/php-type-jugglin...

That's a small fraction of not-thought-out PHP language features that result in REAL bugs and security holes. Which consume large swathes of programmer time. Which, apparently, Facebook can afford to swallow.

I'm sorry, but your position, as valiant as you are defending it, is literally indefensible. And I don't give a fuck how big Facebook is, they would STILL be better-served by switching SOME of their code to a different language. ANY modern programming language wouldn't suffer from this imbecilic, immature language design.

orblivion · on March 20, 2014

You're making a mistake. The question is not whether to start a company with PHP vs language X. The company is long started. The question is not whether or not to poof into existence a port from all of FB to language X. That's not possible. The question is, given that PHP is the current language, with all its faults, will it it cost more (including all definitions of cost) to make the switch? How long will it take? Does it get the job done? How bad is the damage?

The question more pertinent to your argument is, did they make a mistake years ago choosing PHP? That's when the could have conceivably gone with language X.

BTW the types of stuff you're listing are documented here. So thorough it's amusing to read: http://me.veekun.com/blog/2012/04/09/php-a-fractal-of-bad-de...

bad_user · on March 21, 2014

> The question is not whether or not to poof into existence a port from all of FB to language X. That's not possible.

And yet Twitter has slowly migrated on the backend from Ruby to Scala/Java. They still use Ruby for the frontend, though it's not clear to what extent, since they've also migrated to a single-page, fat client design on the desktop. And at the very least, their choice of using Ruby when prototyping, was at least sane.

I understand that large codebases can't be migrated easily. But you can migrate individual components when needed, if you have a modular, service-oriented design.

Also - building new functionality in PHP is indefensible, unless their code-base is one big monolithic hairball, which I doubt it is.

lectrick · on March 21, 2014

> The question is, given that PHP is the current language, with all its faults, will it it cost more (including all definitions of cost) to make the switch? How long will it take? Does it get the job done? How bad is the damage?

An excellent point. Which in fact is an argument in favor of modularizing your code as much as possible, as early as possible. There are tools now like Apache Thrift which make this easier: http://thrift.apache.org/

eevee · on March 21, 2014

Why thank you :)

juchem · on March 20, 2014

It's interesting how most people seem to attribute the quality and longevity of a software to the language it is written in or the frameworks it uses rather than to the amount of thought that was put into its design. Sure, the former is important, but largely overrated.

ghotli · on March 20, 2014

I can't fathom why you're under the impression that some code hasn't been switched to another language. Furthermore, your vitriol seems quite effective at undermining your thesis.

lectrick · on March 21, 2014

Fair enough. And to be honest, there was a single night not too many years ago where I wrote up an auction site that had most of the functionality of eBay, in a single night, in PHP (someone else had already done the frontend work, fortunately- I just built the backend).

cybernytrix · on March 21, 2014

I think most of you guys are missing a crucial thing here -- how decisions get made. Facebook had two options: 1.) Rewrite all the PHP code in Java/Ruby/C++/whatever 2.) Write HHVM/Hack/etc to transparently convert the existing PHP code.

Option #2 ended up getting a lot of publicity for the engineers. If they chose option #1, not only is it a lot of hard boring and possibly error prone work, there is no chance for any of the engineers to get the type of publicity they are currently getting. All in all, engineers tend to work on whatever will make them get noticed, not necessarily the better technical choice nor what is in the best interest of the company... this is especially true in companies like Facebook and Google where there are a lot of very smart engineers doing relatively mundane work. So there you have it Hack/HHVM are all just publicity stunts, the more you feed it, the happier their PR gets.

derefr · on March 20, 2014

> Just so we are clear, you ask why Facebook didn't rewrite 10,000 human-years of code into a mythical unnamed "proper" ...

You don't need to rewrite anything (at least, not all at once.) Personally, I'd have expected you to make something akin to CoffeeScript or ClojureScript that targets PHP, and can "link with" your existing PHP modules (or rather, with their HHVM bytecode representations.) Then treat the PHP code as a constantly-dwindling Big Ball of Mud (http://laputan.org/mud/).

elgenie · on March 20, 2014

> Personally, I'd have expected you to make something akin to CoffeeScript or ClojureScript that targets PHP, and can "link with" your existing PHP modules.

We took a similar approach to what you described. Then we called it Hack.

This is what we mean by seamless inter-operation: the HHVM runtime understands both syntaxes and runs both <?php and <?hh code in the same process. Whether Hack integrates into the runtime at the parsing (current) or bytecode (future?) layer is an implementation detail.

derefr · on March 21, 2014

But the syntax you've got now is effectively a superset of PHP, and comes with all the problems of PHP. You've effectively wrapped your Big Ball of Mud... in slightly different-colored mud. The whole point of a clean-break-targeting-interoperability like this is that you can stop using mud at all, and it'll still work with what you've got now. In fact, what other reason would you have?

If you mean that future versions of Hack will evolve to have a different syntax, while still targeting the runtime... then you'll still have code around from the intermediate era, and you'll have to interoperate with that too, won't you? You'll have PHP, PHP-looking-Hack, and actually-nice-to-code-in-Hack.

DeGuerre · on March 21, 2014

PHP's syntax is not what's preventing you from writing large maintainable systems in it. Many of the more successful languages throughout history became successful BECAUSE they used a syntax that's superficially similar to something familiar.

That's not to say that syntax doesn't matter, but semantics and pragmatics tends to matter more, despite Wadler's Law of Language Design stating that most people don't understand this.

ufo · on March 21, 2014

I think you are putting way too much emphasis on syntax here. The important contribution of Hack is the type system and this is something that a syntactic-sugar translator like Coffeescript can't hope to achieve.

I also wouldn't count Clojurescript here. Its a whole different language that just happensto compile down to JS.

vhata · on March 20, 2014

That is essentially what Hack is.

GhostHardware · on March 20, 2014

Very suitable name for a language then.

copergi · on March 20, 2014

That's exactly what they did, that's what they are announcing here.

bad_user · on March 21, 2014

> Because "build atop a crumbling foundation" has demonstrated time and again to be, by far, the most successful way to accomplish anything in computing?

That's only because the currency for building things on top of crumbing foundations has been sweat and man-power. We aren't that far off from the Egyptians that were using hundreds of thousands of slaves per pyramid. It's a good thing that we've transcended the necessity for hundreds of thousands of slaves when raising buildings, don't you think?

And yet, here you are, claiming that building stuff with broken tools is the most successful way to accomplish anything in computing. Actually I view it as nothing short of a miracle, showing human determination in action ;-)

> All languages, runtimes, and standard libraries (and databases, and source control, and on and on) are "broken" at sufficient scale.

That's a fallacy. Just because both X and Y are broken, that doesn't mean they are equal, as some things are more broken than others and PHP is more broken than anything else mainstream (C++ at least has reasons). Also, I don't see how "at scale" changes things in PHP's favor, I really don't.

If you're trying to argue that "at scale" the level of brokenness converges to the same levels, then that's a stupid thing to say. After all, Twitter didn't had to build its own JVM and the stuff they run on top of the JVM is probably more power efficient than you'll ever be with HHVM. Probably saner too.

gruntmaster9000 · on March 21, 2014

> We aren't that far off from the Egyptians that were using hundreds of thousands of slaves per pyramid. It's a good thing that we've transcended the necessity for hundreds of thousands of slaves when raising buildings, don't you think?

False, they were skilled, paid laborers: http://en.wikipedia.org/wiki/Egyptian_pyramid_construction_t...

Apologies for being irrelevant to the main point, but this is a tired myth.

bad_user · on March 21, 2014

Interesting. Thanks for the link. Is Wikipedia great or what? :-)

goalieca · on March 20, 2014

Facebook is a relatively mature company now that is in the business of making money. In fact, I would argue that [framework of the week] is more buggy and broken because it less well understood.

eloisant · on March 21, 2014

Actually Twitter has been pretty successful in migrating a good chunk of their codebase from Ruby to Scala.

Facebook can keep PHP for the thin web layer that renders the page, but they could have migrate the meat of their code to a safer and more robust language.

toast0 · on March 20, 2014

The HN crowd seems to dislike (or despise perhaps?) PHP, but it's really not that bad. Yes it has a lot of warts, but it has a lot of things that make it nice for web development.

a) try your new code by saving in your editor, and hitting reload in your web browser.

b) it's very approachable. People who only know HTML and CSS can be expected to do a little bit of PHP work to integrate their changes. If you setup the right network mounts, they just need to edit files and reload (see a)

c) it's not super high overhead at runtime. If you're not using a framework, and you don't build up a crazy object hierarchy, it's not too hard to get your page out with about 10 ms of overhead beyond data fetching. For very simple webservices (fetch data, possibly from multiple sources, and do a little formatting for the consumer), I was able to get the overhead down to 2ms. You can certainly do better with other languages, but you can usually get better throughput improvement by working on getting data quickly. Btw, all the frameworks are terrible; many of them add 100 ms to the page just for the privilege of loading the includes; PHP is a framework for web programming thank you very much.

d) cleanup; you don't have to worry about it. If you don't do anything weird (c extensions, with non-preferred malloc), at the end of the request, everything is thrown away.

That said, there are plenty of things PHP isn't good at: I wouldn't run a long running process in PHP; and multithreaded PHP sounds like a bad idea.

ptx · on March 20, 2014

Yes, PHP is its own web framework, but it's not a very good one. And, as you say, implementing a better one on top of it adds a lot of overhead due to the execution model. With other languages, where you don't throw everything away at the end of the request, you are free to implement a good web framework without suffering additional overhead.

As for a), many (most?) frameworks are able to monitor the source files and reload the application when they change, in order to enable that workflow. For example: http://cherrypy.readthedocs.org/en/latest/refman/process/plu...

ochekurishvili · on March 21, 2014

I'm running multithreaded background processes in PHP pretty successfully. I did not see (and still do not see) a reason why I should have chosen another "proper" language for it.

P.S. The vast majority of arguments against PHP here leads me to conclusion that most of (not all) debaters don't understand how PHP works well enough.

toast0 · on March 21, 2014

Mulithreaded using the pthreads api or something else? Any chance there's code I could take a look at (I'm curious!)

ochekurishvili · on March 21, 2014

I use curl_multi_* functions.

toast0 · on March 21, 2014

Oh... I don't count curl_multi as multithreaded (because it isn't really threads), but it's very useful, and I definitely use it.

Lennie · on March 21, 2014

If you want to know: just checked the one project where I did a long running PHP-process.

It is a job scheduler which takes jobs from a Redis queue and executes it.

It does fork a new process for 1 type of task (image resizing in this case) to make sure it doesn't leak memory.

It also reuses these forked processes so it doesn't need to fork for every task.

It was last started on Jan. 2013 (because of maintenance) and still running fine.

The reason ? I didn't want to introduce an extra language for project for which all the server code was already written in PHP.

So if it fits the task, you can do it.

toast0 · on March 21, 2014

Duly noted, will not be so negative on long running processes. :)

bad_user · on March 21, 2014

> a) try your new code by saving in your editor, and hitting reload in your web browser.

The only language for which that works is client-side Javascript. For PHP, you forgot the part in which you have to install and run a web server, then point your browser to localhost. Plus, to get anything useful done, you'll also need some sort of database to go along with it. I remember the first time I did that and it was pretty intimidating.

> c) it's not super high overhead at runtime ... it's not too hard to get your page out with about 10 ms of overhead beyond data fetching

To me much more interesting is the total time it took for the client to receive the response, possibly when multiple concurrent requests are happening. The comparison here should be versus a static page served by Nginx of course.

The best throughput possible that I got for an otherwise complex business logic happened on the JVM. I basically rewrote a web-service built on top of Django/Python, with a redesign with emphasis towards in-memory caching, parallelism and async I/O and the result was a server that was able to process more than 10,000 requests per second with an average of about 5ms per request (actually in production the instances ended up processing about 2000 reqs/sec of real traffic, since c1.medium EC2 instances don't have enough CPU power).

Of course, people that just need to slap something together, don't need this level of throughput. If a request takes 400ms for a dumb web form on a low traffic website, that's of no consequence to most people. The problem happens of course when such a piece of software evolves to something much bigger, like Wordpress. I'm always amazed at the gimmicks that people do just to keep their Wordpress powered blog alive.

jimbokun · on March 21, 2014

Has anyone tried to copy these good parts of PHP, except with a not shitty language?

Is there anything about PHP, the language, that lends itself to this style of development, or could you get these same benefits with, say, Ruby or Python?

DeGuerre · on March 21, 2014

Yes, someone has indeed tried this. It's called "Hack".

justinhj · on March 20, 2014

There are a lot of people making money writing systems in php because it's the right tool for some jobs. There are companies making money using system that are written in php, because it works. These people don't have as much to say as those working in other languages that may feel threatened or uncomfortable when something they think is bad, seems to used with success.

1ris · on March 20, 2014

a) WTH? I hope you do not test your code in production. If you don't that feature (a ide that constantly compiles your code) is a common feature.

b) Theorem: Leting your web designers program is a way worse than letting your programmers do web design.

c) Every simple project can become complex at some point. Facebook certainly is.

sanswork · on March 20, 2014

a) You don't have a development server/VM?

b) Not everyone has programmers. That was his point

1ris · on March 20, 2014

Oh, for b) i know a bonmot. Sorry for the image. http://fettemama.org:6502/ed2499a7aa73df6dcefe95fbb649ece3

sanswork · on March 20, 2014

Not everyone that needs/wants to make a website wants to start a software development company. Often then just want to accomplish a few small things. To match up the rest of your examples:

I was to defend myself in small claims court

I want to learn some first aid

I want to build a shed behind my house

itafroma · on March 20, 2014

> I am baffled as to why you'd build your castle atop a crumbling foundation.

> I have wondered why FB didn't use a proper language with proper typing to begin with.

I would suggest watching Keith Adams's talk, "Taking PHP Seriously": http://www.infoq.com/presentations/php-history

He goes through why Facebook uses PHP and decided to build upon it to create Hack. I highly recommend watching the whole thing, but the main three things he points to are:

1. Frictionless programmer workflow with a short feedback cycle

2. All PHP requests start out with the same consistent state by default

3. Rigid style of concurrency

eevee · on March 20, 2014

Which is nonsense, because:

1. I have never seen a framework that didn't go to great lengths to update to new changes quickly in development

2. Which means you lose resources between requests, unless you stuff them into the interpreter/httpd itself. And anyway this is only a problem for PHP, where by default everything runs in the top-level namespace, versus in separate functions.

3. That's a funny way to spin "no concurrency", but you can get that in any language by just not deploying it threaded.

m3mnoch · on March 20, 2014

blink blink

um. have you used java?

Terr_ · on March 20, 2014

What, you mean with "hot deployment" of your changes?

eloisant · on March 21, 2014

Well, even in Play Framework 1.0 in 2009 there was hot reloading. Edit a file, refresh your browser => it's there.

Moru · on March 21, 2014

On the Atari ST you didn't even have to hit reload, the webpage reloaded automatically as soon as you saved your file. No need to alt-tab even, just keep webpage open next to your editor. Now get off my lawn.

eevee · on March 20, 2014

andralex · on March 20, 2014

(I work at Facebook but not on Hack.) There's a lot to be said about backward compatibility, and much of Hack's virtuosity stems in its smooth interoperability with PHP - many millions of lines of it. There's nothing like working on such a large codebase to convince one how difficult disruption of any kind is.

The language definition and semantic checker are difficult, but are stereotypically tasks that cannot be distributed to many engineers; instead, a few senior engineers took that task to the benefit of all others.

dcc1 · on March 21, 2014

What editor do facebook recommends for hack? Pho storm doesn't support it yet unfortunately

gfodor · on March 20, 2014

"I am baffled as to why you'd build your castle atop a crumbling foundation."

Congratulations for admitting your ignorance, and lending open ears to experts as to why they made certain engineering decisions.

Oh wait, you weren't doing that, you were just warming up to go on a diatribe about how stupid Facebook engineers must be.

teacup50 · on March 20, 2014

> Congratulations for admitting your ignorance, and lending open ears to experts as to why they made certain engineering decisions.

Yes, they made certain engineering decisions now because the decisions they made back then were stupid, and they have to dig themselves out.

dirtyaura · on March 20, 2014

What was the stupid decision "made back then"?

That Zuck wrote the first version of thefacebook.com in PHP, the language he was the most productive at?

That the initial team didn't rewrite Facebook in Python/Perl/Ruby/Haskell during the fast growth phase? If you have ever experienced the growth phase, you understand how ludicrous the idea of rewrite would be. I've personally experienced and heard only horror stories about rewrites. We underestimate how much hidden wisdom a production code base has and that the messiness is often there for good reasons.

stormbrew · on March 21, 2014

I once worked for a social network that was extremely popular in a particular geographic niche. It was even started before Facebook.

And it was also written in PHP. And we rewrote in Ruby (not rails, at the time it was not mature in the ways we needed it to be). We made a lot of great technical achievements in doing this and our codebase became much much better. On a technical level I don't regret us doing that one bit. We went long on schedule and we made mistakes, but some of the best work of my career went into that and I'm immensely proud of it.

But then in a matter of months of Facebook going to being open to people who weren't at a university or big organization (remember that?) our users, who got bored with our site not changing while we rebuilt all the tech, completely abandoned us and we went from profitable (having never taken any outside investment) to dead in another couple of years.

This isn't to say that this wouldn't have happened to Facebook had they done something like this, but it is always a huge risk, even if you do everything right.

teacup50 · on March 20, 2014

> What was the stupid decision "made back then"?

Using PHP to begin with was a bad technical decision. Failing to establish a reasonable migration strategy was a bad business decision likely rooted in bad engineering management that fell out of starting with bad technical decisions.

It's much harder to hire people that can pull you out of a mess like PHP when, at the same time, you have to hire people that can keep writing PHP for you.

> That the initial team didn't rewrite Facebook in Python/Perl/Ruby/Haskell during the fast growth phase?

That would have been a good time to bring on new engineering blood as part of scaling out, which would have provided opportunities to enact mitigation and transition strategies. Imagine if the massive amount of talent currently devoted to HHVM had been devoted to Facebook's actual business?

There are migration strategies other than "rewrite everything immediately", and in fact, I'd bet that's exactly what HHVM is. It's just a shame they waited so long that the most cost-effective migration strategy was to tackle an enormously difficult computer science and engineering problem that the world's biggest software companies already invest hundreds of millions of dollars on and provide to the world largely for free.

> If you have ever experienced the growth phase, you understand how ludicrous the idea of rewrite would be.

Yes, and I've also (repeatedly) been the team brought in to rewrite the mess of a code base that was about to torpedo the growth phase.

There's not much correlation between funding, initial success, and engineering talent. Which is why you so often wind up with a mess that has to be cleaned up once you can hire people who know what the hell they're doing, instead of the ones you happened to be stuck with because you didn't know how to grow an engineering team.

> We underestimate how much hidden wisdom a production code base has and that the messiness is often there for good reasons.

Messiness is never there for good reasons other than that replacing it is more expensive than not touching it. You don't strive for hidden wisdom and inescapable messiness -- that's just what you get when you let engineering slip up.

nbm · on March 20, 2014

As much as I prefer not to engage trolls...

Which language would have been a good technical decision in 2002-2003? It needs to be fast enough in terms of iteration. It needs to not require more resources than PHP. It must be easy to onboard people who don't know it onto. It needs to be easy to operate, and not be costly to deploy on the tens, hundreds, and then thousands of servers necessary. (Spending time learning a new technology that others think is cool or which seems cool, while trying to also build a product, has sunk more than a few startups...)

What was bad about the decision to keep the reasonably well-performing and reasonably suited-to-purpose PHP code for front-end code, and peel off suitable tasks into services like the feed, typeahead, messages, and so forth into languages like C++, Java, and so forth. What was bad with the decision to let hundreds of software engineers continue to build the PHP code-base while a much smaller group of people work on projects to improve the efficiency of both the execution environment but also the tooling and developer efficiency on that code-base? Their contribution there is multiplied out by the improved efficiency of those hundreds of developers and the code they wrote.

Seemingly Facebook survived its growth phase fairly well and didn't need a tiger team from outside to handle it - and without the reliability problems others who chose to use languages other than PHP for their front-end and decided to rewrite their much smaller surface area in other languages.

As much as people may dislike PHP (and I'm one of them), it was definitely "good enough". Many languages may not have been, even if they are nicer languages in some objective way.

(Disclaimer: I actually have to write code in Facebook's PHP code-base every once in a while. But most of the code I work with is Python, Java, or C++.)

teacup50 · on March 21, 2014

> Which language would have been a good technical decision in 2002-2003? It must be easy to onboard people who don't know it onto. It needs to be easy to operate, and not be costly to deploy on the tens, hundreds, and then thousands of servers necessary.

You mean, like the JVM? 2003 wasn't the pliocene epoch, we had a working JVM. If you remember back to the last bubble, in the 90s, we were shipping "easy to operate, not costly to deploy" software on Java post-1998. Java 1.4 was released in 2002, and Java 1.5 -- what most people would say is modern Java -- was only 2004.

Scala 2.0 was released in 2006, Clojure in 2007 -- that's 8 and 7 years ago, respectively.

You really don't think there were alternatives during that long period?

> What was bad about the decision to keep the reasonably well-performing and reasonably suited-to-purpose PHP code for front-end code, and peel off suitable tasks into services like the feed, typeahead, messages, and so forth into languages like C++, Java, and so forth.

In 2004? Nothing. In 2005-2008? Things should have been reassessed, especially before building out a millstone of an engineering team around PHP. Instead, Facebook doubled-down on an actively bad language with HPHP, and the results were hilarious:

   HPHPc required a very different push process,
   requiring a bigger than 1 GB binary to be compiled
   and distributed to many machines in short order.

So then in 2010, Facebook decides to embark on HHVM, and now four years later, we can run one of the most correctness-hostile programming languages around, quickly, with optional static typing.

That's a span of 6 years, and at the end of it, Facebook has functionality they could have gotten for free in 2003. On top of that, the intervening years allowed the PHP mess to become only more entrenched -- who on earth do you think the engineers are that accept a job writing PHP, for Facebook or otherwise?

If I had to hazard a guess, I'd guess that HHVM exists because of a large amount of political inertia in the organization that has everything to lose by PHP being eliminated entirely, and the lack of a strong hand by upper management.

I'd guess that lack of a strong hand by upper management came in no small part from hiring straight-out-of-college graduate Adam D'Angelo -- who had literally zero experience -- to serve as CTO from 2006-2008.

By the time FriendFeed was acquired and Bret Taylor along with it (2009), my guess would be entrenched interests made for a very difficult position for anyone wanting to change the ship's course.

Lennie · on March 21, 2014

"the intervening years allowed the PHP mess to become only more entrenched"

They specifically choose to do it and not move to an other language. One of the reasons was: PHP programmers are cheap and plenty full and can do quick iterations.

Sounds a bit like you are complaining about other peoples choices, it really is their choice. :-)

I'm not saying it is possible to move to an other language. Just look at Paypal they moved their customer-facing code from Java to node.js and got a very large productivity increase: http://www.youtube.com/watch?v=V5yk5SZxWX4

Obviously the reason Paypal choose node.js are similar to why Facebook choose PHP: quick iterations, means more iterations, which means more experimentation and better results.

vidarh · on March 21, 2014

> You mean, like the JVM?

In 2002-2003 I'd have quit in disgust if anyone tried to introduce the JVM anywhere I worked.... I still probably would, frankly.

bluesnowmonkey · on March 21, 2014

> Using PHP to begin with was a bad technical decision. Failing to establish a reasonable migration strategy was a bad business decision likely rooted in bad engineering management that fell out of starting with bad technical decisions.

Facebook is a ten year old company with a market capitalization of $170 billion, so "bad" is probably not the most accurate word to describe their technical and business decisions.

teacup50 · on March 21, 2014

> Facebook is a ten year old company with a market capitalization of $170 billion, so "bad" is probably not the most accurate word to describe their technical and business decisions.

How does that follow, exactly? They haven't failed, so any inefficient or sub-optimal decisions were the correct decisions?

curiousquestion · on March 27, 2014

I guess we should consider your decisions as the "optimal and correct decisions"? Are you a billionaire too?

reikonomusha · on March 20, 2014

A rewrite isn't such a ludicrous idea. Reddit is a prime example of a rewrite from Lisp to Python. I would say that's even a somewhat difficult rewrite.

nbm · on March 20, 2014

The Reddit front-end is a relatively simple application. It has a page with a list of stories. It has a page with one story and a bunch of comments. It has a page to add comments. It has an endpoint to vote up or down stories, and to vote up or down comments. And maybe a few other less-interesting things, like admin interfaces.

Facebook is surprisingly easy to underestimate even as someone using it a fair bit.

Just try find every single interface in the front-end. For your own profile. Feed is front-and-center, as is timeline. Then look at events, groups, &c.. Look at messaging.

Then look at the interfaces for managing your privacy and permissions. The apps that you're using and information about when last they requested information. Security like login approvals. Then think about the flows involved in reporting content as abusive or inappropriate. For verifying your identity if you've forgotten your password. For adding more security if you log in from a new computer or from a new location.

Then look at pages, including insights and scheduled posts and so forth. Then look at advertising - boosting individual posts, creating campaigns, &c.. Then look at the interfaces for developers. For translators. Interest lists.

The backends for most of this are in C++ and Java. There's a large amount of data processing happening to track hidden things like spam and scam prevention. But the front-end surface area is quite clearly an order of magnitude or maybe two larger than Reddit (at least where it was when this happened).

dirtyaura · on March 21, 2014

Okay, Reddit is a good example of a successful rewrite during a growth phase. I had forgotten that. As I've personal knowledge of several unsuccessful rewrites, it would be great to hear more anecdotes about successful rewrites in growth phase and try to analyse what made them succeed.

Rewrite is more likely to succeed if you are not in a high growth phase, but even then it's risky.

gfodor · on March 20, 2014

Yes and the OP is criticizing the decisions being made now as "yak shaving." Could it perhaps be that the "yak shavers" made a conscious and well-reasoned decision to go in the direction of "extend PHP" vs "throw it all out and rewrite everything in language-of-the-month?"

teacup50 · on March 20, 2014

I'm sure it was conscious and well-reasoned, but the OP's point still stands.

Trying to build a reasonable forward-looking high-performance managed language runtime platform is hard.

Trying to build it atop of PHP is harder.

The only way I could see this as being a smart long-term strategy is if the eventual goal is to isolate and retire PHP projects entirely (and PHP usage itself) over time.

However, even then, with PHP gone, and Hack no longer necessary, you're still stuck maintaining your own incompatible VM / runtime. Is Facebook signing up to reproduce the CLR? Or do they have long term plans to somehow bridge the gap between HHVM and more established VMs/runtimes, where they can better share the maintenance load with the wider industry.

sanxiyn · on March 21, 2014

Maintaining your own VM is no big deal. Compared to Facebook, HHVM is a tiny codebase. A team made of relatively small number of people (high quality, but low quantity) can and do maintain VMs like HotSpot and V8. LuaJIT is maintained by a single person.

teacup50 · on March 21, 2014

> Maintaining your own VM is no big deal.

As someone who has worked on a VM, I couldn't disagree more. It can take years to hash out things as simple as ideally performing primitives for a target architecture, and then things change.

Add to that the complexity of optimizing compilers, specification of byte code formats and a consistent virtual machine memory model that can be relied upon across architectures, and the art and science of highly concurrent garbage collectors, and your "no big deal" is a load of hogwash.

Hotspot alone is nearly 20 years of big deal.

> A team made of relatively small number of people (high quality, but low quantity) can and do maintain VMs ...

The number of people doesn't matter in this equation; your small team of (expensive, rare, high quality people) can't build a world-class VM in a day. Or a month. Or a year. Maybe in 5 or 10 years, just ask Microsoft.

> ... like HotSpot and V8. LuaJIT is maintained by a single person.

LuaJIT's said "single person" has been working on it for what, 10 years? It's an extremely impressive implementation and I don't want to bag on it, but even still, it lags in certain areas, eg, its GC implementation isn't up to par with the state of the art.

The author's skillset is extremely rare, and LuaJIT itself is an anomaly in the field. Using such a one-off example doesn't really hold water to prove that it's ideal for a company to internalize maintaining a VM for their own custom language built on top of PHP.

sanxiyn · on March 21, 2014

I am not trying to belittle efforts necessary for the state of the art VM or programming language implementation. I get paid to do these stuffs, and I am on my third VM/PL project now. It is also true these things take time and not very parallelizable, so while man-month may not be that big, you can't make it faster by throwing more people.

On the other hand, I maintain it still is no big deal compared to rewriting Facebook. I also maintain while skillset is rare, Facebook apparently had no trouble so far and will have no trouble in the future finding (I remind you, small number of) people to work on VM. I also remind you Facebook has been working on alternative PHP implementation for 6 years now, 2 years in private(2008~2010) and 4 years in public(2010~2014). It has been profitable for them for 6 years, will be profitable in the future, and profitability does not need "sharing maintenance load with the wider industry". They can maintain it fine thank you very much. Because, in the end, VM is no big deal.

teacup50 · on March 21, 2014

If they're wasting money on bad management decisions, they're wasting stockholder money.

They're also continuing to propagate an outwardly facing engineering culture that will make it even harder to hire people to help dig them out of the PHP hole -- perpetuating this further.

Your argument is simply another take on survivor bias fallacy.

> I get paid to do these stuffs, and I am on my third VM/PL project now ... Because, in the end, VM is no big deal.

You keep saying that, and yet, there keep being so few high quality VMs.

sanxiyn · on March 21, 2014

What do you consider to be high quality VM? How many do you expect to see and how many do you find?

Adaptive JIT and generational GC would be a good baseline. Limiting myself to open source VM, I think (at least) HotSpot, Mono, V8, JavaScriptCore, PyPy, SBCL, Racket qualify. J9, CLR, Chakra, Allegro CL also qualify, but not open source. SpiderMonkey, LuaJIT, HHVM lack generational GC. All these projects are actively developed, and there are doubtlessly more, e.g. I am not faimilar with Smalltalk VM, some of which are commercial. Research VM like JikesRVM, Maxine qualify. I believe Bartok qualifies too.

I am not sure what you are arguing for. If you are arguing for Quercus route(PHP-on-JVM), I think it's unclear Quercus route is better than HHVM route. If you are arguing for not running existing PHP codebase, I think you are being unrealistic.

gfodor · on March 21, 2014

It's not survivorship bias. Facebook is an existence proof that there is no "PHP hole" that they are in, that it's largely a myth propagated by programming language nerds who have never tried scaling a site in PHP. When was the last time you heard about a site closing up because of PHP-induced technical debt? You don't. People re-write sites because of poor architecture, not because of poor programming languages, and PHP (in general) does not prevent you from building a site with good architecture, both from a software structural standpoint and an operational standpoint.

PHP's APIs are ugly. It's language semantics are a bit hairy until you get the hang of it. But there are parts of PHP that are extremely elegant and easy to reason about. It's OOP support provides all that you need to produce re-usable and easily understood code.

Facebook's work on PHP has focused on largely two dimensions: reducing CPU cycles and increasing static/runtime type checking. The former is something that only really matters at massive scale: PHP is generally fast enough since most of the time PHP processes are I/O bound reading from a database or memcached. It's only for sites like Facebook where if you squeeze out an additional 10% TPS from your boxes that you will start seeing large absolute cost reduction that this level of optimization starts to make sense. On the type checking side, this is something you might start to want in any dynamically typed language when you have millions of lines of code and want to ensure basic guarantees that it will run, and is something that you'd probably see Facebook doing if they were a Ruby or Python shop anyway. It has nothing to do with PHP but with the classic dynamic vs static typing tradeoff.

Should you be writing your chat server in PHP? No. But 90% of the code you write for a large website is HTTP response code rendering HTML or JSON. PHP excels at this and you can pretty much hire any developer off the street to start cranking out code if you give them a solid foundation to build on.

vidarh · on March 21, 2014

Facebook has already proven that they are able to make improvements that have substantially helped them to the point where this team is likely paying for itself many times over. It doesn't need to be perfect - it needs to offer return on investment, and it has.

It's possible that they could eventually get a total rewrite to give a better return, but frankly I don't think you have any idea of the enormity of trying to convert a multi-million line production platform from one language to another.

In any case, one does not preclude the other. Arguably, many of the changes they have made, such as gradual typing, and their ability to now slowly introduce other changes without breaking their existing codebase, means they have a platform for slowly firming up their codebase and migrate it towards a position where a full rewrite (should they decide to do one in the future) could be made substantially less painful.

reikonomusha · on March 20, 2014

My example, "OCaml", is not a "language of the month." Its roots are >30 years old and wasn't developed by someone in their basement over the weekend. As stated, Facebook even uses OCaml, among other languages, for good reason.

vidarh · on March 21, 2014

You try hiring and/or retraining enough engineers to be able to make a switch to OCaml, and see how much it'll cost you.

I detest PHP, but I've still more than once made the choice to do apps in PHP motivated by developer availability alone.

It's not a great language, but with some discipline it is also not nearly as awful as some people like to think, and you can make up for a lot of awful with the difference in ability to hire experienced engineers who know PHP vs. many of the less common languages.

codygman · on March 22, 2014

I don't like how your comment implies that "throw it all out and rewrite everything in language-of-the-month" is the only option. You could also "throw it all out and rewrite everything in better-AND-mature-language" or even "throw the worst parts out and rewrite incrementally in better-AND-mature-language".

reikonomusha · on March 20, 2014

I just want to clarify something: I am not calling anyone stupid. Generally the engineers at Facebook are smart, talented, and good at getting things done.

camus2 · on March 20, 2014

I think PHP has some goodness that makes it the poor's man java.

Neither Ruby or Python provide that for instance.

Though Ruby is in my opinion well designed,and with duck typing,you might not need all that Java like OOP, I dont know,I feel like having interfaces makes me understand code faster and better,

Want to understand what an object does? just read the interface,no magics,no bullshit,it's self documenting and that's important,that's why you can have these huge codebases like Zend,Symfony or Doctrine and still understand how complex elements work together. And even figure out how to use them without a doc,just like Java.

I feel I cant go to Django's source code or Rails source code and understand everything easily just by reading function signatures.

And the write/refresh cycle makes iteration pretty fast during development. But yeah,PHP basic syntax sucks for sure.

zmanian · on March 20, 2014

Facebook is doing an admirable job replacing rotting pieces of their infrastructure with more robust replacements.

HHVM replaced the execution environment for their code with a more robust code generation/ runtime system.

Hack allows them to bootstrap their code base into higher degrees of reliability without a mass rewrite.

Also Brian O'Sullivan is one of the best people on the face of the earth to be trying to find practical ways on integrating PL research into practical engineering.

ChuckMcM · on March 20, 2014

Are not coral reefs both beautiful and built upon the dead bodies of those who came before?

gtirloni · on March 20, 2014

People also cut themselves easily with them.

tripzilch · on March 20, 2014

Yes, but unlike Facebook, they are largely submerged in seawater.

chenelson · on March 21, 2014

Facebook is literally 60% water, whence they came. Pretty wild when you think about it.

solipsism · on March 21, 2014

That 60% figure pertains to average humans. Those skinny hipsters are mostly calcium :)

tripzilch · on March 21, 2014

Like coral reefs! And hipsters are built from the dead bodies of vintage things (that you probably never heard of because they're pretty underwater/ground/whatever).

Ok, I cede the point. PHP has the retro chic and Hack's type system is fixed-gear.

coldtea · on March 20, 2014

>I have wondered why FB didn't use a proper language with proper typing to begin with.

Because there are factors like an existing codebase that works that trump BS idealism.

United857 · on March 20, 2014

(Disclaimer: FB employee.)

Do you also argue that C++ is built on top of a crumbling foundation, because it's based on C, a ancient language with almost no type-safety?

bluetech · on March 20, 2014

Why do you say C has no type safety? Only way around the type system is void, explicit casts and I guess unions.

It admittedly doesn't have a very advanced type system, that's true.

munificent · on March 20, 2014

> Only way around the type system is void, explicit casts and I guess unions.

And typedefs. Given that there's no parametric types, you run into void* quite frequently as well, so saying "only" inaccurately minimizes the scale of how much C code isn't strictly type safe.

bluetech · on March 20, 2014

Right, void* is used quite a lot for "generic" data structures. But I'm not sure that's what he meant. The reason I said "only" was that from my own experience, most C data structures are tailored for a specific use and so I don't see void* too much in this context.

And may I ask why you say "typedef" is unsafe? It is merely a type alias, like e.g. Haskell's and ML's "type", or isn't it?

munificent · on March 20, 2014

> It is merely a type alias, like e.g. Haskell's and ML's "type", or isn't it?

It is, but it freely allows conversion between the aliased types.

    typedef int Feet;
    typedef int Meters;

    int main() {
      Feet height = 6;
      Meters inEngland = height; // <-- OK. :(
    }

anaphor · on March 20, 2014

I'm pretty sure you and the parent were thinking of Haskell's `newtype` not `type`

λ> :{

Prelude| type Feet = Int

Prelude| type Metres = Int

Prelude| :}

λ> let a = 5 :: Feet

λ> let b = 4 :: Metres

λ> :t a

a :: Feet

λ> :t b

b :: Metres

λ> a + b

9

λ> let f = id :: Feet -> Feet

λ> f b

4

azth · on March 20, 2014

C++ attempted to fix type safety issues in C.

reikonomusha · on March 20, 2014

You don't need to add a disclaimer about your workplace. It doesn't contribute at all to your question.

I do not think C is a "crumbling foundation". I would not suggest that C be used for large scale engineering efforts, but it's a relatively well-defined language with semantics dictated by a formal standard. A lot of research has gone into C compilers, which are state-of-the-art.

With that said, one of the biggest complaints I hear about C++ is that it still has the legacy of C embedded in it.

encoderer · on March 20, 2014

Something I didn't really learn in school, that I only picked up much later in my career, is that when applicable, applications should be built on top of mathematical models. A very contrived example would be.. would you build an ad rotator by keeping counts of all banners served and picking the next banner based on those counts and your weighting rules, or would you build the rotator on a foundation of statistics and probability with a little extra logic for handling caps and edge cases?

yid · on March 20, 2014

> would you build an ad rotator by keeping counts of all banners served and picking the next banner based on those counts and your weighting rules, or would you build the rotator on a foundation of statistics and probability with a little extra logic for handling caps and edge cases

Actually, building on a foundation of solid statistics and probability will often result in an algorithm that essentially counts views and applies weighting rules.

As another example, Pagerank has a nice theoretical basis, but power iteration reduces to perhaps 5-10 lines of C.

linc01n · on March 20, 2014

I built an ad engine base on probability. But I am not sure which way is better.

wes-exp · on March 20, 2014

Because http://en.wikipedia.org/wiki/Worse_is_better

jimbrusstar · on March 20, 2014

This seems an unequivocal improvement for Facebook, since they're unwilling to move away from PHP. The better question is why would anyone else choose to build their company on this?

hub_ · on March 20, 2014

For the same reason there are companies writing software for Windows. Or companies putting Windows on all their workstations.

anaphor · on March 20, 2014

As far as I can tell they are using local inference, which is basically just unification. The set of possible types seems pretty narrow[1] as well so I don't see much room to go wrong. You're right that there is a lot of mathematical theory about type systems and that inference can easily go wrong (be undecidable) but that is mostly for type systems that try to do inference for higher rank polymorphism and other things, which it doesn't seem like Hack is. Also I guess the language is supposed to be a superset of "valid" PHP, although I don't know whether this is true without modifying the PHP program much.

http://hacklang.org/manual/en/hack.annotations.types.php

samth · on March 20, 2014

Local inference isn't just unification. In particular, most local inference algorithms are designed to work with subtyping, which doesn't work in ML-like type systems.

DeGuerre · on March 21, 2014

I assume that "local inference" means, in practice, "no let-polymorphism". That's what causes most of the headaches with extensions to Hindley-Milner (including the undecidability of subtyping).

anaphor · on March 20, 2014

True, I didn't see any mention of subtyping on there but since it's PHP I guess they have to deal with it somehow.

ausjke · on March 20, 2014

I have seen many python programmers that are PHP haters, are you one of them? just curious, no offence. I like both Python and PHP, but use PHP for commercial applications.

reikonomusha · on March 20, 2014

I very occasionally write Python code if my job calls for it, but I am not a fan.

reikonomusha · on March 20, 2014

Spelling correction: I said "layering FP" but I meant "layering PHP". The edit window has since come and gone.

cyberneticcook · on March 21, 2014

>> I am baffled as to why you'd build your castle atop a crumbling foundation

I think you're using a wrong metaphor. Facebook foundations can't be crumbling just because they're made in PHP. There wouldn't be Facebook as we know it today otherwise. You might say they used a "low quality" material to build them. I see Hack more as a better material, that can also bind with the previous one and make it stronger.

rafekett · on March 21, 2014

I think the short story is that the engineers at Facebook feel pretty productive with their stack today. Making a new language that basically fixes up PHP is ideal for them because it gives them good confidence they can get the benefits of static analysis without sacrificing much productivity. That level of productivity + the sheer numbers they have make Hack a more attractive option.

ahomescu1 · on March 20, 2014

> (3) a corresponding runtime for each

That's not really true, HPHP and HHVM share the runtime (mostly).

vishnugupta · on March 21, 2014

If I recall correctly PHP is banned at Amazon, even for internal tools, mainly for security reasons. The team that gives security clearance won't even take a cursory look at the service if it depends on PHP in anyway.

Disclaimer: I have no zero experience with PHP.

zaidf · on March 20, 2014

But at the same time, layering FP with a home rolled static type checking server (??) is bug prone

Clearly the home rolled stuff is working out for facebook. Twitter was/is on RoR and continues(!!) to have major downtime issues.

vidarh · on March 21, 2014

Twitter could have been written in any language and worked, but given their old architecture decisions, they'd been just as likely to mess it up in every language.

They blamed RoR a lot because they needed an explanation for their problems, but "many to many" messaging even at their scale is a "solved" problem and has been for decades and is fairly easy to scale.

(Think of Twitter as a bunch of mailing lists and mailboxes; you scale it by decomposing it: map accounts to virtual "buckets" that becomes the domain part, and map tweets to messages; break apart large follower lists into smaller ones and introduce a forwarding reflector; break apart large following lists by splitting "mailboxes" and doing zipper merges on reading it; add a caching layer -- this is not rocket science, and you could do it properly in any language)

Note: I think RoR was a horrible choice for them, though I love Ruby, but I also don't for a second believe RoR was their real problem.

Edit: Your overall point stands, though. Especially given that Facebook is a far more complicated application.

Xorlev · on March 20, 2014

Twitter might have some legacy Rails hanging around, but it's a beastly Scala system these days.

wbsun · on March 21, 2014

Because your perfect OCaml, ML, Haskell, and any other fancy, magical, fabulous, eternal, fantastical, simplest, elegant ... languages are all atop crumbling foundations implemented in ugly, stupid, out-dated, evil, chaffy ... C, C++, and assembly running on inefficient, silly and fragile digital logical CPUs.

And when Facebook uses this stupid technique to build the world's largest social network for more than one billion users, those elegant and perfect solutions are serving ... how many?

joesmo · on March 21, 2014

"And in the end, it's still PHP, which is duly disliked."

You might dislike it, but that doesn't mean it's disliked. PHP has a giant base of programmers, scales, is easy to learn, is extremely versatile and powerful, and as you point out, the code was already in PHP. Only an idiot would rewrite a giant working codebase simply to have it in a language that's "difficult to hire and train" in. Or any language for that matter. Perhaps they could have pulled a netscape but instead decided to serve billions.

Fasebook · on March 21, 2014

What if you could build a castle on top of a highway?

bos · on March 20, 2014

I'm the manager of the team that developed Hack, and I'm sitting here with some of the language designers. Happy to answer your questions.

freyrs3 · on March 20, 2014

Hi Bryan, I know most people know you from your prolific work on many great Haskell libraries ( Criterion, Attoparsec, Aeson, ...). Did Haskell have any role in the development of Hack? Looking at the code base it seems like the type system is primarily written in ML, what made the team decide to use OCaml over Haskell?

bos · on March 20, 2014

As you note, the team developed the typechecker in OCaml, as that's what the founding engineers were familiar with. Many of ML's cousin languages happen to be well suited to this kind of work.

more_original · on March 20, 2014

Have you considered using some standard library replacement like Core?

mintplant · on March 20, 2014

Say I'm starting out with an entirely new project and want to leave the legacy of dynamic typing behind. Is there a flag available to enforce the use of type signatures, causing Hack to throw a compilation error when they're omitted?

bos · on March 20, 2014

Yes, "<?hh // strict" at the top of a file will do the trick.

mintplant · on March 20, 2014

Great! Thanks.

ecaron · on March 20, 2014

Do you imagine a future where Hack will merge back into PHP (like PHP 7), in the same style that Beryl & Compiz then rejoined? Or does the team intend for the two to always be adjacent-yet-separate?

SaraMG · on March 21, 2014

HHVM developer & PHP runtime developer here. I've got hands in both runtimes and all even I can say is: Maybe.

I think the most likely outcome is that PHP will adopt some of HHVM's additional features, but remain a separate project. IMO that's a great outcome, since we'll both likely drive the other to be better.

Joeri · on March 20, 2014

Hhvm is not a fork of php. To my knowledge it's a radically different codebase that reimplements php's api.

tveita · on March 20, 2014

My impression is that Facebook mostly write their stand-alone services in Java or C++, and are using PHP only where they're "stuck" with it due to a large existing code base.

Do you think Hack is a good language to start a new project in, compared to non-PHP languages? Are you using Hack for things besides the main web page?

jwatzman · on March 20, 2014

Engineer working on Hack here.

Yeah, I think Hack is a good language to start a new project in. For as much flak as PHP gets, there are actually a lot of good things about the language. The fast development cycle -- edit php script, refresh -- is something amazing that you don't get in a lot of statically typed languages, which usually have a compilation step. The crazy dynamic things you can do also occasionally have their place, though it's certainly easy to shoot yourself in the foot.

On the other hand, a lot of the time you want the safety that strong static typing can give you. Even just the null propagation checking can immediately find tons and tons of silly little bugs without even running the code, and ensure that the code stays consistent as a "mini unit test" if you will.

Hack hits the sweet spot of both. Wiring the Hack typechecker into vim was really revolutionary for me -- having both the immediate feedback of the type system for all the silly bugs that I was writing, along with the fast reload/test cycle from PHP, is great.

eevee · on March 20, 2014

Er, `paste serve --reload` restarts small-to-medium Python projects faster than I can alt-tab, which is actually faster than my static blog engine can regenerate itself too.

eugenez · on March 20, 2014

The Facebook codebase is not exactly a small-to-medium project, so there is a fair bit of value in edit-and-reload.

eevee · on March 20, 2014

The parent comment was about Hack's being good for a new project.

strlen · on March 20, 2014

Is there a statically typed variant of Python that would work with existing web servers, etc..? I am aware that there's Cython and I know that py3k technically permits type annotations (which Jetbrain's Python IDE uses quite effectively), but that isn't true static type checking in the same way as Hack does this.

eevee · on March 20, 2014

Statically-typed Python would ruin a great number of Python libraries you'd probably want to be using. It'd be a very different language.

Once you have type annotations, it wouldn't be too much of a stretch to enforce them statically with a separate tool. You could even go as far as rejecting first-party code if you can't statically determine every single value's type. Pylint's underlying astroid library has a bunch of inference tools you could perhaps build on top of.

strlen · on March 21, 2014

Such a tool would be no less difficult to build than hack itself, though. Hack's "gradual typing" solves the problem of re-using existing code that you've mentioned.

FB already had a PyLint-like tool earlier that could do some static analysis, namely pfff (also open source and written in OCaml), but it did not provide a full-on static type system like Hack. (Background: I used pfff when I was in bootcamp at FB itself. This was however prior to hack, I worked solely on C++, Java, and a bit of Python at FB after bootcamp).

I am sure if FB started off with Python, a similar solution could have been found, but if you're looking for a tool that exists _right_ now, Hack is actually quite decent.

Creating a static type system, implement local type inference, as well as working out "gradual typing" and associated problems (all while being able to do type-checking at speeds developer _expect_) is not a trivial problem.

eevee · on March 21, 2014

The announcement post says the actual type-checking happens in a persistent server that watches for filesystem changes. That sounds pretty close to continuously running a linter. It also says "without breaking things", so I'm a little fuzzy on whether badly-typed code will actually execute or not.

For that matter, can you call a typed function from an untyped one? Or, worse, a typed method? If the typing is purely static, there's no way to know the method you're calling is actually typed, so there's no actual guarantee that it receives the types it's declared unless your entire program is typed. It doesn't seem like a very strong guarantee if both the caller and the callee have to opt into the typing.

If you're looking for a tool that exists right now, you either have an existing codebase and can't port it to Hack if it's not already PHP anyway (for the same reason Facebook couldn't port away from PHP), or you're starting from nothing and could just use a statically-typed language in the first place.

I don't know if I'd even be excited about the prospect of optional static typing in Python. (It hasn't gotten me interested in Dart, for example.) I'd kinda rather see the effort poured into something that could do static duck-typed analysis/inference, e.g. balk if I pass an argument that could be a non-string into a function that tries to call `.startswith` on it. (Ah, but maybe it could theoretically be a string or None, and I only know it isn't None for reasons the type system can't see, and now I hate the type system.)

I didn't say it was a trivial problem. I just don't feel excited by the solution.

strlen · on March 21, 2014

The default mode in Hack is partial: in partial mode, the code itself must be typed and must past the typechecker, but it can call untyped code (that's in a separate compilation unit).

Another mode (you specify the mode per file/compilation unit) is "strict". In strict mode, you can not code any un-typed code (note, the standard library is typed with hack).

(There is a bit more nuance here, but you can read that in the documentation.)

So the idea is to eventually migrate most of the code to strict, but code that relies on legacy can remain partial and you can write new code without waiting for a re-write to finish.

See http://docs.hhvm.com/manual/en/hack.modes.strict.php http://docs.hhvm.com/manual/en/hack.modes.partial.php

"Shapes" are also a neat feature specifically for parts where static typing can be frustrating for dealing with HTTP requests specifically: http://docs.hhvm.com/manual/en/hack.shapes.php

(FWIW I don't see myself using Hack, but I'm not a web developer. I'd say the ML family languages are my favourite, but for what I do day-to-day it's not really an option.)

hyp0 · on March 21, 2014

Do you know if other languages have static null checking i.e. your null annotation/propagation (apart from Haskell's Maybe union type etc)?

I'm intrigued, because it's such a good idea (especially when null's originator claimed it was a "billion dollar mistake"), though Java doesn't have it. I'm wondering if there's some subtle problem with it...?

Also, how do you make your vim typechecker fast enough? Usually, even syntax colouring is local to ensure adequate performance - and a typechecker with inference/propagation would be very non-local.

wffurr · on March 21, 2014

Java 8 adds this exact feature through extending annotation capability and adding hooks for pluggable type checkers, including a null propagation checker: http://docs.oracle.com/javase/tutorial/java/annotations/type...

and an Optional<T> class (references to which can still be null for maximum hilarity): http://download.java.net/jdk8/docs/api/java/util/Optional.ht...

I'm stoked. And disappointed the "elvis" operator didn't make it in.

I hope some day Java breaks backwards compatibility and eliminates null entirely. Then again, that's already happened with the proliferation of other JVM languages. But that doesn't me at my day job, where we have a large legacy code base... which would need to be ported to a backwards-incompatible version of non-null Java anyway. Hm.

Well, with Java 8 I can at least start to grow null-safe code within our codebase.

copergi · on March 20, 2014

>The fast development cycle -- edit php script, refresh -- is something amazing that you don't get in a lot of statically typed languages

You seriously think that? That's how we do haskell web development. Both yesod and snap do this out of the box. That's how every java developer I know works.

bos · on March 20, 2014

I have a little bit by way of Haskell chops, and I'll venture that the performance of the Hack typechecker is a very big deal, and it is in a different breed than the turnaround time you get from snap or yesod (or Java).

jfischoff · on March 20, 2014

what makes it a different breed?

Edit: I misread what you wrote. I thought you were saying there was something fundamentally different about the type checker, but more that reload experience is different then what you get with Yesod.

copergi · on March 20, 2014

"Type checking is faster than other statically typed languages" is quite different from "other statically typed languages don't offer this workflow", which is what was claimed.

matsemann · on March 20, 2014

How do you do it in java? Compile (takes time) -> hotswap (takes time), or can't hotswap since changes to signature, will need to restart server (takes lot of time).

copergi · on March 21, 2014

http://docs.codehaus.org/display/JETTY/Maven+Jetty+Plugin

matsemann · on March 21, 2014

Yeah, that's the one we're using. Still takes a couple of seconds, though. And as I said, big changes can't be reloaded, so the whole server will have to be restarted.

copergi · on March 21, 2014

>Still takes a couple of seconds, though

How big is your code base, and is it all in one huge file or something? Our stuff is reloaded and ready before I've alt+tabbed back and hit refresh.

wiremine · on March 20, 2014

Thanks for taking the questions!

How extensively is Facebook using Hack at this point? Is it in production?

What has been the biggest learning/unlearning you've needed to do going from PHP to Hack?

bos · on March 20, 2014

100% of our web front end developers use Hack now. This has been an organic process of growth over the past year, by which I mean our engineers are using it because they like it and see value in it, not because there's someone standing over them with a big stick :-)

The biggest learning step for our engineering teams was to treat type errors from Hack as actual logic errors. We have a collection of "linters" that provide advice on code style and other nice-to-have factors. Some people (quite reasonably) initially thought of Hack errors as lint-like stuff that it was safe to ignore, when in fact they indicate real logical inconsistencies in code.