Hacker News new | past | comments | ask | show | jobs | submit login
How PHP Opcache Works (npopov.com)
151 points by orangepanda on Oct 18, 2021 | hide | past | favorite | 82 comments



Could CPython take a page out of this playbook and become faster?

Whatever PHP does (I am not a language expert), they are doing it right. Whenever I write projects in PHP and think "If this approach is too slow, I will optimize it later", I get surprised like "WOW... it takes just a few milliseconds!".

Whenever I write projects in Python and think "This should not result in a noticeable workload", I get surprised like "Uhmm.. this takes seconds?".

Python the language is so much better than PHP. But the low performance of CPython is hard to swallow. In the context of answering web requests, there is no room for execution times of more than a few milliseconds. So as things are, Python development is way more costly because you constantly have to optimize performance bottlenecks.


The part that is relevant for CPython is implemented by it since at least 2.0. The rest is horrible hack that involves abusing shared memory to share what effctively are parts of GC'd heap between unrelated interpreter instances. Usefulness of that hack is completely predicated on PHP's execution model where there is no persistent application instance and everything gets torn down after each request.


I actually love the approach to tear down everything after each request.

It makes reasoning about the system much easier.


I first came across the term 'shared nothing' in... 2000 or 2001? (John Lim from adodb project, but I'm sure he didn't coin it). There is a difference in thinking required for shared nothing, but as you say, it does typically make reasoning easier - fewer moving parts to potentially complicate your processing.


What I love about this approach is that if I fuck up somewhere specific and a user triggers that part then the whole app doesn't go down.


This actually reminds me a bit of CGI ( https://en.wikipedia.org/wiki/Common_Gateway_Interface ) or its later variants.

Personally, i think that PHP's request-response workflow is infinitely easier to deal with than that of servlets in Java or whatever abstractions other technologies out there have, even if only when scaling isn't a concern (which is perfectly passable for some systems out there).

My ideal language for getting things out the door would be something a bit like PHP, but with the standard library of .NET/Java (LINQ or Streams in particular), the wider ecosystem of Python, simplicity of Go and a decent type system on the top of it all, that'd process requests much like PHP does (or at least would only make you think it does, though abstractions that are leaky are arguably worse than none at all).


PHP's programming model definitely should remind you of CGI, because it was a CGI module: https://en.wikipedia.org/wiki/PHP#Early_history


Until you need to talk to db, where you may want to have a persistent connection, so hacks required in php-fpm to support it.


PHP with Java std lib is called JSP, with .NET it's called ASP; the request model is similar and the packaging can be quite minimal, even if the good practice favor complexity.

Also I don't think that you can have a simple language like Go and a decent type system like Typescript, to me those properties are in opposition.


> would be something a bit like PHP, but with the standard library of .NET

Try the PeachPie compiler then ;)


Indeed. But the difference between modern Perl/Ruby/Python/Java/Whathewer frameworks that use some kind of application container and PHP is that in the first case you know what is shared and work in the world with shared things, while for PHP there is shared nothing except for implementation defined shared random things that you have no idea that might be shared state.


To do this in Python, couldn't you just fork the Python process after initializing the web framework etc. and handle the request in the forked process? Optionally keep a pool of pre-forked worker processes so that the request doesn't have to wait for the forking. Sort of like Android's zygote process.


In principle, yes. But forking an application server gets tricky -- some resources (like database connections) aren't fork-safe, and need to be recreated after forking.


Fork is kind of expensive in most OS, even it's relative cheap compared to start a new process from scratch, but still much slower than few function calls.


Well, yes, but the point was to get a process per request just like PHP, so you'll have to start those processes one way or another.

Anyway, it appears uwsgi can be configured to do this, using fork-server[1] and maybe max-requests[2], although it only works with Perl at the moment.

[1] https://uwsgi-docs.readthedocs.io/en/latest/ForkServer.html

[2] https://uwsgi-docs.readthedocs.io/en/latest/Options.html#max...


The whole PHP opcache hack works reliably only on platforms that have fork(2).


Me too, it's amazingly simple. It's a radically different language but if you really like that, you should check out Elixir/Phoenix.


Seconds? What libraries are you using?

I develop professionally in PHP but use Python for most of my personal projects, and for scripting text files. Sometimes I'll feel a few hundred miliseconds' delay and be surprised, but unless we're talking about a loop that probably should not loop seconds is a long time.

If you could share more of your workflow I'll be happy to advise.


Seconds are extreme examples of course, where heavy data processing is taking place.

My guess is that CPython is something like 4x slower for typical code and even slower if most of the code is control structures like branching, loops, function calls etc.


https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

There's a small set of benchmarks between php8 and python 3.9. Your 'guess' isn't too far off.


I have tried to move some stuff from PHP to Python (using Flask for example - I don't want to imagine the performance of Django) and the difference in performance is pathetic. Especially given the reputation of both languages. Python is still easier so if you can throw money at AWS it is a better option.


Were you maybe using Python on AWS Lambdas? That would be relatively slow, since it takes away the advantage Python/Flask would have vs PHP...that it's already warm.


In PHP 7.4 PHP can be configured to precache (typically framework) files. Thus with mod_php (as opposed to FPM), Apache and the Zend engine are already warm.


Can you be more specific and list out why 'Python the language is so much better than PHP'?



This article is horribly outdated, and generally very ignorant. This site was the response to that blog, https://phpbestpractices.org/


It's not saying anything about why python is 'good' or 'easy' or whatever, which was the original claim.

Personally, I find very little attractive or easy or 'good' about Python. I've used it on a few occasions, and can get by, but pretty much nothing that people find attractive about it resonates with me. Couple that with answers to "why do you like python?" answered with "php sucks!" or "ruby is slow" or similar, and nothing much makes me want to dip my toes in that particular pool.


Note the year on that post.


I think that's part of the plan to accelerate Python that the Microsoft team has.


I haven't followed this. What plans does Microsoft have? All I know is that they pulled out of PHP recently after supporting its development for quite a few years.



Thanks! (And oh, /. always has this historic vibe ...)


They didn't pull out, they used to provide a php windows build which is now handled by the community instead on azure resources donated by Microsoft


Sure they pulled out. They had different developers over time on their payroll and stopped that.


> the low performance of CPython is hard to swallow

African or european swallow?


The restart behavior can be a problem when there's high concurrency, long-running requests, or even worse, both.

First, the "opcache.force_restart_timeout" setting kills scripts upon reaching a timeout, independently of the well-known "max_execution_time" php.ini directive, but only from time to time. It is unpredictable and doesn't leave a comparable message in the error_log.

Also, for a long time in the PHP 7.x days there was (still is?) a race condition in the SHM-to-file-and-back-to-SHM sequence that caused weird errors in programs that were executed at an unfortunate moment during the restart. A single bit would flip in a class name, function name or filename, and the script with the bit-flipped name would be cached instead of the original. Imagine the_post() in WordPress suddenly changing to the_qost(). The entire site would grind to a halt until the corrupted entry was evicted from the cache, and in the meantime, error messages would refer to nonexistent functions in nonexistent files. Moreover, because the bit flip can happen to any name, it's hard to search for a clue -- a real head-scratcher! As far as I'm aware of, none of the related issues in the PHP bugtracker has been conclusively fixed.


It's hard to check whether old opcache bug reports have been resolved, because these issues tend to be non-reproducible. It's quite likely that they are, as opcache is tested much more extensively now than it was a few years ago. Back then, the file cache had essentially no automated tests, while nowadays we check our test suite with all four file cache combinations (written with/without SHM, read with/without SHM).

Restarts are an area that still doesn't have much in terms of automated tests, but at least it's covered by fuzz testing.


You mitigate this by taking boxes out of your response pool, deploying, then putting them back in. I generally do 50% deploys at a time.

Solves this issue elegantly.


Been doing that for a while, but it's a workaround at best insofar as the opcache bug is concerned. I wouldn't call it elegant. A thing that calls itself a cache should be able to handle evictions without throwing up.

It's also the kind of workaround that isn't applicable to the vast majority of PHP sites out there: beginners running WordPress on a single box. One day, you update a few plugins like a good security-minded person should do, and this causes a bunch of cache invalidations. Bam! The site is down and the error messages aren't helpful. It really doesn't help bolster PHP's reputation as a stable, beginner-friendly language.


Shared hosting is kinda bottom barrel these days. Even one off static sites I'd isolate into $5/month digitalocean droplets over shared hosting precisely for this reason. PHP is optimizing more for large sites that use multiple servers because frankly that use case is more important.


The opcache bug is actually more likely to hit VPS and dedicated servers, because most sites on shared hosting don't have enough concurrency to trigger the race condition. The last time I helped a customer fix it, they had a cluster of EPYC servers. Their app needed scratch space for generated code, so even offline deployments wouldn't prevent garbage from accumulating in the local opcache, eventually causing a restart.

Even up here in the clouds, there are many times more people who rely on one-click WordPress-preinstalled droplets than there are people who understand how to correctly juggle multiple droplets to deploy updates without downtime. Besides, a bug in the PHP runtime is a bug in the PHP runtime, regardless of whether the user is following today's best practices.


Yup, it's a really really bad bug that Vimeo ran into frequently in 2019-early 2020.

Migrating to K8s-based deployments was the effective workaround, but before that it was not fun to have that sort of unreliability.


Have you links to the bugs?


Serious question, more out of curiosity on the state of PHP in 2021. Because of the advent of such things as Python (which I see a lot of comments in this thread about) - is it still relevant as a tool? I use it for minor projects internally to prop up a website on a Pi here and there, for ease of use - but curious what the state of the language is.

Anyone have stats on this?


Are you asking "is PHP still relevant as a tool?". (the 'it' was slightly ambiguous to me.

PHP is absolutely relevant, I'd think, by many peoples' measurements. But... we're on a 'hacker' site and people love to dump on PHP. PHP powers a large portion of internet sites. One estimate from w3techs is ~80% of sites with a known server side language are PHP.

Yes, much is WordPress. If you estimate that half of that is Wordpress, that still would leave... ~35%-40% of sites using PHP. Maybe even that is too high. What if it's 30%? That's a large number, both in terms of percent and absolute numbers.

Yes, PHP is 'relevant'. The ecosystems (Symfony, Laravel, etc) are pragmatically iterating, evolving goods ideas from a wide spectrum of web platforms. The language itself is similarly pragmatic, adopting new features, deprecating legacy stuff, and moving forward. The language is roughly 3x faster than it was 10 years ago, while retaining a lot of backwards compatibility.

The language itself along with various frameworks offers a good combination of speed of development, flexibility and rigor. The 'shared nothing' default makes it easier to reason about for many problems. Need to write a small shell script? Easy in PHP. Need to scale up to large loads? PHP can do it. There will be scaling concerns in any large setup - a PHP platform will likely have similar problems that other language/stacks will encounter at similar load.

So again, yes, it's relevant. There will be some other response saying "no", and then ... we'd have to define what you mean by 'relevant'... :)

Popular? Growing? Evolving? Uptick in usage for many problem spaces? Easy to scale?

I'd say 'yes' to all of those.


A lot of people talk about serverless hosting like a new paradigm. But I will say the success of php was the huge amount of hosting providers where you send the files trough FTP, linked a domain and that's it, it's working.

No Linux terminal knowledge need it or worry about Nginx, SQL, or iptables configuration.

Yeah most of those cpanels hostings are bad for scale, but for a lot of new devs allowed them to deploy without having to be a sysadmin


Without those cpanel hosts I wouldn't be a web developer today. It was so easy for me to start messing around and teaching myself to code using them.


On WordPress: it's important to remember it's a starting point for most of those uses. It's what people build entire platforms on when they don't want to/can't afford to start from scratch. People sometimes dismiss the number because "it's all WordPress," but that would be like dismissing Rails or Django. WP isn't exactly like Rails/Django, but it fills the same niche where you can pretty much assemble anything you can think of from pre-made components.


Definitely. I write PHP at work, everything from 10 year plus legacy cruft to brand new projects, and it still works well.

I'm not a Python developer, but it is probably true that Python is a better designed language and has more bells and whistles as language, like decorators, operator overloading and modules. Python is also used in more domains than PHP.

But if you look past some of the weird design choices, bad reputation and the packaging as web only language, you will find a JIT-ed & typed dynamic language with good performance, FFI for easy C glue, good threading performance that is not suffering from a GIL like Python, a large & mature standard library, can be used everywhere from a small script to a large monolith, a true diamond in the rough.

Today I worked on a importing system that takes an HTTP stream to read a CSV file line by line to mangle it and rewrite it simultaneously to a FTP stream, never keeping the entire file in memory nor downloading it entirely to disk first, and by only using functions from the standard library.

I'm not saying you can't do this with Python, you probably can, or any other language, I just want to dismiss the idea that PHP is only suited for typical web tasks, PHP is a versatile tool that has every potential to expand outside of its traditional role.


Correct, even pypy provides the JIT, but it's missing in CPython. The performance is almost always the pain point of python, when compared to any web languages except ruby.


What I miss the most in PHP is support for threading. Krakjoe did wonderful, unappreciated work with pthreads (the PHP extension). Sadly, that's no longer maintained.


PHP has evolved leaps and bounds as a language, and as a community and ecosystem. It's still ubiquitous too. There seems to be an entirely theoretical delineation between "serious work" and PHP applications, but it doesn't exist in reality.

I still find it has one of the best developer experiences for web applications, out of all the tools I've had to learn through agency work.


It still has by far the lowest barrier to entry for basic deployments (pick any random cheapo shared web host; FTP your files to it; done) - that counts for a lot in the real world, and I’m sad that nobody else is even trying to compete :(


I'm surprised too. Every other language ecosystem seems to be competing for the Most Enterprise Tooling Chain award. Can you deploy it without an AWS expert? Not enterprise enough!


It seems to me that all that "serverless" stuff iscompetition


Yeah, I’d love to see some serverless php support since the language is made for that paradigm.


It is... and that's potentially part of the problem. The value proposition at the low-demand end of serverless is less clear for PHP, since it's pretty much had that value (throw files on a cheap host and forget about the vanishingly small cost) for two decades.

Still compelling for people who see spiky to high traffic and are happy to have scale reduced to an accounting/budgeting problem, of course, and I'm pretty sure Laravel has something like this.


Thats exactly what Laravel Vapor does on top of AWS Lambda: https://vapor.laravel.com/


You should have a look to Bref (https://bref.sh/)


It is supported on [1] AWS Lambda with a PHP Runtime.

[1] https://aws.amazon.com/blogs/compute/introducing-the-new-ser...


I think AWS lambda supports php


There’s Bref for the Serverless framework but it only supports AWS and they have no desire to support any other frameworks.


This may get me hung on HN for saying, but I honestly love working with PHP.


Seconded. My brother got into frontend dev (JS mainly) so sometimes I probe him for what's being used. Each time I do this there's a new thing. And the JS tooling? my fucking god, how I despise it. The opaqueness of it all, the mountains of boilerplate code and files I need to download to do things "by the book". Whenever I dip my toes there I always long for the simplicity of my backend PHP setup and almost always try to ditch what's cool and new and do things as vanilla as possible.

Or maybe I'm just getting fucking old & grumpy ;)


I've been writing personal frontend projects mostly in vanilla JS for over a year now. "Mostly vanilla JS", because I occasionally pull in a routing library (you'd probably do the same in PHP).

Modern JS (modules, web components, fetch, ...) is rather expressive and powerful, and you absolutely do not need to pull in a framework and twelve third-party hype technologies to get a small project up and running.


The language still has plenty of footguns but most serious development is done through well-organized application frameworks and CMS starters. Performance is fine and can be parallelized as well as any application server language. There is a lot of documentation and community support for good proxy and server config, and plenty of hardened images or container configs, so the days of relying on pokey Apache on shared hosting are over for anyone who cares.


It runs (most?) of the Web considering WordPress usage, but more interesting is the moderns ecosystem with excellent projects like Laravel.


Most of the web by traffic or number of websites?


Largely both I'd imagine. Facebook, Slack, and Etsy all still run PHP (or Hack which is a PHP flavor) as far as I know. I'm sure there are a ton of other large web properties doing the same for their web tier.


All of the main adult industry properties run on php and that's a lot of traffic.


Hack is not a PHP flavour. It's a separate language that's inspired by PHP, with its own separate interpreter (HHVM).

Etsy still uses PHP, as does Vimeo and Wikimedia. A few other launched-in-early-oughts web properties rely heavily on it too (e.g. Flickr).


As others have mentioned, WordPress is still huge. I believe Facebook is still mostly php, and I'm sure there are tons of sites started 2005-2010 that are still php.

PHP is easy to deploy, atleast from the start, but was difficult to use compared to python and ruby. It's deployment model also encouraged people to edit directly on the server, which lead to all kinds of problems. So it got a bad reputation.

I have heard that it has fixed many of it's inconsistencies, but many have already moved on. The ones still using it are either stuck, or happily quiet.


I assume the addition of a local testing server to bring it up to speed with other languages improved things.


Framework X Yoyo Revolt Swoole

Only to mention a few of the newest and hottest PHP tools. Then you have all the heavyweight, widely tested and used like Symfony, the Laravel ecosystem, Yii Framework, etc.

PHP has never been better and healthier.


Apcu is useful also. It's basically the same idea, but exposed to your own code. That is, an mmap() based cache that you can use to store state between requests. Somewhat like how people use redis or memcached, but typically faster since there's no networked api call.


Yes, just keep in mind that the data is wiped if you restart PHP or Nginx. Not the case with Redis.


Assuming PHP-FPM is left running, you should be able to restart Nginx without wiping the cache.


Is that the default now? I thought it used to be just in memory with an option to store to disk. But honestly I never set it up, only used it.


Genuine question, given that PHP has evolved in massive ways since 7.0 - what is the next "big thing" people wish PHP will "fix"?


I want Go/TypeScript style interfaces. There was an RFC a while back that didn’t get super far. I’m hopeful it’ll come back.


Looks like async and Coroutines are something being developed by a lot of teams. There are several PHP reactive frameworks at different levels of maturity.


Anyone using PeachPie for projects? What do they look like? Do you notice any benefits over pure PHP?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: