Hack and HHVM solves what is, IMO, the worst feature of the default PHP runtime environment[0] - and that is the superglobals.
It wasn't mentioned in the post from Slack, but default superglobals and the earlier register_globals design decisions are the worst and most impactful wart in PHP.
Because it was designed as a templating language, the default web server interface, which is CGI - will auto-expose all variables in global scope, ex.
echo $_POST['user_id']
PHP has a horrible reputation with security for this reason - we all know that somehow, somewhere, in almost every project someone is pulling in a user-controlled variable from a superglobal and they aren't escaping or checking it properly (since you can't be warned about it but the feature will work).
Worse - and i've seen this a lot, even with Laravel, CodeIgniter, Cake, Symfony/Silex etc. you end up with these well structured projects that declare request classes, methods and variables etc. etc. but then sometime down the road a developer takes a shortcut in a method and pulls in a $_GET or $_POST inside a controller (usually because they don't know how, or aren't bothered to - changing all the related classes) - running around the default exec stack.
I've seen this so often - because it's so easy to do it. The most common place is where a designer has built a frontend AJAX form. They now need to build a quick backend check, so they Google "php backend ajax username check" and they'll likely get a result like this one:
$username=$_POST['username'];
$query="SELECT * FROM username_list WHERE username='$username' ";
they copy that into a file called ajax_username_check.php and save it to the server - and they've now destroyed all that previous good work by opening up a very blatant and easy to find SQLi vulnerability. Their database will be on pastebin within a month.
You can spot this type of vulnerability from the frontend because the URLs used in the AJAX calls don't match the URL router patterns for the rest of the app (ex. GET /user/username_check_ajax.php vs /user/check_user).
In other languages you can't get that without using a standard library that will escape the values by default. Any solution you search for will always be a safe method to obtain the variable values by default.
Some good news: Hack doesn't expose superglobals in strict mode:
I'd strongly recommend that this is used in all PHP projects, since it strictly enforces variable access - even in cases where you're using a framework that is supposed to enforce it.
IMO PHP missed a big opportunity with not removing superglobals in version 7 and enforcing an explicit safe request object much like other languages do. They likely wanted to avoid it because of the cluster of register_globals and magic_quotes from earlier versions.
[0] I think it is important to distinguish PHP the language and PHP the runtime. PHP the language is now decent - having caught up with a lot of features (although I find it very verbose and harder to read) while PHP the runtime is undoubtably still a horrible runtime - hence HHVM
I'm not a huge fan of the superglobals - but the correct fix here is to use a parameterized query!
I'm not sure which languages you are used to, that will somehow magically 'escape' an input string so that it's safe to inject directly into your query in all circumstances.
I know I don't want strings from the frontend pre-quoted in any way. I want the string the way the user typed it in!
I'm sad that I had to scroll that much down to get to this response. Any other approach is still not secure, might output strange characters for specific inputs or both.
The fact that their "recipe" relies on one file being run in "non-strict" mode for it to work at all is very telling about how short-sighted this "vision" to remove superglobals is.
Even if you went the whole hog, and removed $_GET, $_POST etc. You force users to use filter_input() to get variables. (Why doesn't the Hack "recipe" do this anyway?)
Now tell me how you access a a structured POST body. e.g.
foo[bar]=baz&foo[baz]=bar
Oh. Right, you can't. Because filter_input only returns scalars.
If you remove $_GET and $_POST people will just do the equivalent in the new construct:
$query = "SELECT * FROM username_list WHERE username='" . filter_input(INPUT_POST, 'username') . "'";
$query = "SELECT * FROM username_list WHERE username='" . $myFancyPostObject->getString('username') . "'";
The PHP developers who understand why using raw untrusted input is dangerous are already using the facilities provided to make the input safe for use, in some cases built around access to $_GET and $_POST.
The PHP developers who are already using raw untrusted input in dangerous ways will simply find new dangerous ways to use the data.
> I've seen this so often - because it's so easy to do it.
That's the very point. PHP was/is _easy_ to pick up. You get something working quickly. Then you meet problems. And hopefully, you learn along the way and fix that (and see, this whole thread is from people that learned how not to use $_GET/$_POST).
This "build/make it work/fail/fix" loop was an ingredient that made PHP so popular. The "happy code path" was not a pain to setup.
While I agree that directly pulling things out of superglobals is dangerous, I disagree that it should be removed, lest you end up with a python2/python3 situation.
You can't just run around breaking BC of the language every time something is unideal.
Yes, there are a lot of ways to easily create security holes. This is what code review is for. I'm also not going to advocate abandoning C/C++ because "it's easy to create security holes" i.e. overflows.
Agreed, at Amazon you are allowed to use any programing language except for PHP for this very reason, it is the least secure. PHP is officially banned as a language.
I don't think I've used a superglobal in 10 years, because of all the reasons you list. This is why you have a senior developer on a project doing code reviews.
Hack is a technical solution that brings lots of problems of its own. By all means use it, but not because of that feature.
Re: the SO answer, is it possible to report that code for being inherently unsafe? I think SO should take responsibility and edit or at the very least flag unsafe code.
Pro tip - first thing when you start a PHP project:
- move all superglobals to your private vars
- only allow access to these vars through your special functions which REQUIRE from programmer to specify type and validation / sanitization (regex for strings, min/max for numbers,...)
Make it difficult for programmers to use unsanitized vars and you will have much more secure code.
Not sure why no frameworks (that I know of) do this, but fortunately it is easy to add this.
Python does support block scoping- so no- it doesn't work there. Also, if you need to use variables in outer scopes, it's ok to declare them there. There is literally, and I mean this, literally- no possible justification for using inner-scoped variables in an outer scope that they weren't declared in.
I don't get why people keep harping on super globals are being inherently bad. The variables are there. You can use them or ignore them. A variable definition harms you in no way other than a tiny bit of memory usage which is capped by the HTTP limit on POST and GET limits anyway. What? You think you're gonna get hacked because $_POST['ihaxyou'] is set to 'w00ts'?
No one does this anymore:
mysql_query("SELECT * FROM `table` WHERE `id`=".$_POST['ID']);
There's absolutely NOTHING wrong with having $_POST['whatever'] inside a controller as long as you're doing proper checks.
Are you expecting it to be an integer? Easy
if(!ctype_digit($_POST['ID'])) {
// throw exception here
}
Contrary to the hive mind you don't need some special encapsulation class to pull your post and get variables.
HHVM does not in any way solve this issue. You still have to write proper validation into your code or you'll get hacked. What's with people expecting frameworks to do everything for them these days?
> There's absolutely NOTHING wrong with having $_POST['whatever'] inside a controller as long as you're doing proper checks.
The last part is why this is a problem.
The truth is that programming is simply too difficult a task for human beings. Software is so complicated with so many moving parts that it is impossible for anyone to understand all the details of even the simplest piece of code. This is why we have operating systems, programming languages, and frameworks. Or more generally: this is why we have abstractions and tools: to make it possible for humans to actually write somewhat functional software.
This is also why you not only want languages, frameworks and tools that let you do the right thing; you want them to prevent you from doing the wrong thing. You want to reduce the number of things you have to think about to the absolute minimum, simply because no one is smart enough to fully understand everything that's going on. That is why you want things like strongly typed languages.
Sure, you can make something that works and is safe, if you make no mistakes. The point is that we want to take the human ability to make mistakes out of the equation as much as possible. Sometimes it is inevitable because of what a language, framework or tool is used for. In the case of PHP there are a lot of ways to shoot yourself in the foot that do not need to exist in a high-level language such as PHP, and that is the reason why PHP sucks and is a bad idea. It is way more fragile than it has a need to be.
$_POST isn't the problem. The scope of the variable isn't the problem. You've realized this, too, and that's why you're shifting the argument to one about typing instead of superglobals-are-bad (typing and scope are obviously independent features).
> You want to reduce the number of things you have to think about to the absolute minimum
The net effect of type systems seems to actually be to force you to think about a larger number of things very very carefully. The only thing that ends up being reduced is having to think about having to think about them.
No, it isn't. That's why I specifically said "The last part is the problem", referring to "as long as you're doing proper checks."
Specifically, my point is that you should not rely on people 'doing the proper checks' because people make mistakes, thus you want to reduce the amount of situations where people are given the opportunity to make those mistakes.
There are a lot of situations in PHP where you need to 'do the proper checks' for no reason other than bad language design, and that is what makes PHP a bad language.
> What's with people expecting frameworks to do everything for them these days?
I don't think that's the expectation. PHPs reputation seems to surround the fact that it tends to be (or was) the first language amateur coders dabbled with. That crowd is especially susceptible (at least before mysqli, etc) to making mistakes that amount to serious security vulnerabilities. Historically, these seemingly benign things that can only go bad if you dont know better, do go bad [1]. It seems that this lead rampant conflation of whether php is a bad language and whether php made it easier to do things in an unsafe way. Frameworks that create safer environments for devs (especially newer ones) are certainly a good thing and the good frameworks often get out of your way when you need them to.
>PHPs reputation seems to surround the fact that it tends to be (or was) the first language amateur coders dabbled with.
That's certainly part of it. JavaScript suffers the same hate today -- amateur and junior developers produce thousands of lines of crap per year, and people blame it on the language.
But PHP itself is just a mess. I'm an experience developer (about 30 years at this point), and about 12 years ago I decided to do a volunteer project for a nonprofit in PHP. Finished the project and the nonprofit used it for at least 10 years -- they may still be using it for all I know.
Never. Again. I can't stand PHP and I will avoid its language and ecosystem like the plague. Even if they've fixed some of the problems in the core language and added types, the "standard" libraries were a random pile of mismatched garbage where the mysql_xxxx functions could have parameter signatures in different orders than the pg_xxxx versions. Maybe they've fixed that as well, but they'd have to break backward compatibility in pretty awkward ways to achieve that.
I don't even remember all the other things that tortured me, but it wasn't fun.
And it's not asynchronous. There's not even a good reason to use a synchronous language for web development today. Not to mention the ease of running NodeJS code in a debugger, or running tests in a browser and debugging it there...
I'm using TypeScript and Go for all my web related code moving forward. Something better comes along, and I'll consider it. But PHP was just a nightmare. (Elm on client? Maybe, under the right circumstances?)
> I'm an experience developer (about 30 years at this point), and about 12 years ago I decided to do a volunteer project for a nonprofit in PHP
> mysql_xxxx functions
So, you used PHP 12 years ago and comment based on that.
You have a point though. I used this internet thing about 17 years ago, it was terrible. Only dial-up. SLOW! And don't talk to me about browsers. Netscape? Internet Explorer? Ugh. Forget it. I don't care what they might have changed, or replaced or completely removed, its always terrible.
The language/runtime have problems, just like every other language/runtime that exists, but complaining about a version from 12 years ago and comparing functions that aren't even part of the language anymore seems a little odd to me.
>You have a point though. I used this internet thing about 17 years ago, it was terrible. Only dial-up. SLOW! And don't talk to me about browsers. Netscape? Internet Explorer? Ugh. Forget it. I don't care what they might have changed, or replaced or completely removed, its always terrible.
We have only one Internet. But we have a lot of alternatives for Php to choose from.
If you have an Internet provider that gave you horrible service, will you give them another try even when your current provider is great in every way, just because the only good thing with your past provider was they gave you a connection in a day, instead of 3 days as with your current provider....
I'll be honest I don't really care what languages people use, not my place to convince you back. Just saying there's quite high chances PHP has changed at least a tiny bit over the past 12 years
I started using PHP when it was in version 3 and still use it occassionally. It has changed, but not considerably. The function naming is still a mess (backwards compatibility), arrays are completely inappropriate (naming them "bags" would be better), some decisions were so baffling (safe mode and magic quotes, superglobals, square brackets for arrays...) that I simply don't trust PHP to ever get better.
But there was a reason for experienced developers to use it - hosting. You could write a web app, ftp it somewhere and it would just work. There was no other platform aside from asp which would offer that. Nowadays this doesn't matter much, but 10 years ago it was great.
>And it's not asynchronous. There's not even a good reason to use a synchronous language for web development today. Not to mention the ease of running NodeJS code in a debugger, or running tests in a browser and debugging it there...
This is still true today, and this fact alone makes it not worth trying out as a server language.
Go is asynchronous. [1] It's a huge point of the language, and why it's working its way to the top of the new TechEmpower benchmarks [2].
> "callback hell"
It can be ugly to look at, but it's fast.
And I thought we weren't using references to 10 years ago to judge a language? Promises minimize callback hell with a far better (more functional) interface, and async/await (usable today with transpilation) banish it entirely.
>If using threads means the language is asynchronous to you,
Goroutines are not threads. It's named after a "coroutine," which is an async function (like a generator) where you can pause execution of a task (to wait for IO, for instance) and then resume it at a later time. Coroutines give you implicit async/await style programming, without having to use extra keywords. The language Lua supports coroutines natively, for instance, including explicit yields to use them as generators, if you need. Goroutines give you coroutines with a bonus: Actual thread hopping (and multiple channels of communication possible, so you can effectively have more than one "yield" channel).
If you have 400,000 Goroutines running on 4 CPUs, your Goroutines can task switch between those 4 CPUs using 4 OS threads (or 8, or whatever number you determine is best for your app) as their IO events queue up. It is async, and each Goroutine has around 10k of RAM overhead, not counting data you're storing yourself (it varies by architecture, but that's a reasonable estimate; I've seen between 6k and 16k on different architectures).
Go is better async than JavaScript, because a Goroutine started on one thread can get processed on another, so a few thread-hogs won't block a task. A single Goroutine could cycle between all your worker threads, in fact, though I think it tends to stick with a single worker thread ("thread affinity"). It's probably the most asynchronous you can get in a language (only the Erlang VM/Elixir is in the same class of language, as far as I know).
Do you think PHP could handle 400,000 concurrent WebSocket connections on one server? Go can. How much RAM would you need to handle 400,000 CPU threads? Maybe 64Gb instead of ~8Gb in Go? Wouldn't CPU switching alone be starving most of the threads and redlining the CPU at that point? In my experience you start seeing serious slowdown around 5,000 threads. I'd guess you'd need 50x as many servers to handle that many connections on PHP. Maybe 100x. You need to support a million users? Is it better to have 3 servers or 300?
> The things you complained about in another language were fixed/removed years ago but you say it's still just as bad.
PHP is architecturally single-thread-per-request. Full stop. Even PHP 7 and HHVM. That simply doesn't scale as well as async, as I described above. The complaints I made were why I bailed on it years ago. I'm sure PHP has gotten a lot better than it was when I used it, but given that it still misses the mark architecturally, I have zero motivation to give it a second chance.
But given that NodeJS has far surpassed it in community support (as well as by many performance measures -- HHVM is quite fast at raw processing speed now, though not as high throughput because of the lack of true async), why is PHP still relevant except to continue to maintain existing code bases? (I've worked with several of those as well -- Joomla and Drupal in particular -- and neither was something I'd consider worth using, having looked at their internals and terrible performance. Joomla was a complete disaster; Drupal only slightly better.)
>Every failure or less than great situation in your favoured languages has just been fixed in the last few years.
Your point? Even if they all were only better as of yesterday, they're still better. Or are you defending your choice of a few years ago?
And actually, Promises have been available using polyfills for years. The proposal was made at least six years ago; I can't find the exact date, but I found a reference added to the CommonJS wiki in 2010 [1]. The Q library dates back to 2010 as well. [2]
I concede that there may have been a period of time where PHP was better than JavaScript, by the criteria I'm using today, because PHP did improve a lot after I first encountered it, while JavaScript only more recently developed its compelling advantages. It looks like Node was released in 2009? In 2009 I was using OpenResty (Nginx+Lua) to do my (small amount of) server work, which can still be more performant than both Node and PHP. But NodeJS has shot past in terms of community support, which is critical, TypeScript is awesome, sharing code on client and server is awesome, isomorphic rendering is awesome, and performance is good enough (compared to Lua) that NodeJS just wins for me, big time. Except when I need the extra speed, and for that I use Go.
Go is a brand new language, relatively speaking. So of course they're still making major improvements. Performance has already surpassed HHVM, even for raw compute tasks. It's a better fundamental architecture, giving it an edge, and performance will likely improve still more, though they're already hitting diminishing returns: Some of the worst case performance they see now may get 2x-4x faster, but typical app performance is probably only 10-20% short of optimal. I mean, they're already beating C++ on the server. How much better can they go?
Use whatever language you want; if your background and/or current job is PHP, so be it. It's not as bad as it was 10 years ago, for certain. Just don't expect anyone to pick it up based on its merits. There are too many other, stronger options available today.
Sounds like you just had a very bad experience with PHP, but it could have happened with most languages, honestly. I got (only) 20-year experience, but I have come across quite bad C codebases, poor Perl applications and terrible Java code, and I have thought 'never again' more than once, too.
In fairness, PHP was in a different place 12 years ago. The language has improved in that time. There are still lots of warts but the language and ecosystem have worked together to push it forward.
I'm going to be honest, I only use it because of company lock in. I hated it for many years but some of the stuff released and some of the stuff on the way in the internals is quite exciting.
The difference between Php and Javascript is that Javascript have competent people driving it forward. Php is still developed by college students with no real world programming experience, in their spare time....
Downvotes? Don't think this is true? See the following. These are couple of most prominent people working in the language.
Not impressed by your comment at all. If you think that the age of Nikita Popov or Andrea Faulds is any indicator of their "competence" I suggest you actually take a look at their contributions.
My comment just said that they are just college students. And the links are to prove that. Those are just facts. I don't know why people are pissed.
Edit: It is not just their age. I have seen their contributions (RFC's and implementations) and had conversations with them. And my opinion is also based on that...
> Contrary to the hive mind you don't need some special encapsulation class to pull your post and get variables.
The hive mind is like that for a reason. Making it easy to do the right thing and wrong to do the wrong thing has massive effects, I'd argue the magnitude of which scale exponentially with the growth of an engineering team. Really really talented engineers make mistakes all the time. To the extent that we can systematically limit those with little to no downside we absolutely should, especially when it comes to security.
I think people are also confusing an old issue from PHP 4.x where if you had $_POST['somevar'] it would actually have an alias automatically set as $somevar in the global userspace. This was turned off by default a long time ago and is the main real security issue when it comes to super globals. $_POST and $_GET are just the normal way to access POST and GET vars. There's nothing inherently insecure about it.
Exactly, this is where the real problem was and thankfully it was fixed. An attacker could insert any variable in a script just by adding it to the URLs additionally the PHP configuration could change variable order the variables from different sources were given so a script essentially didn't know where it was getting the information from. On the other hand the super globals are just a utility making things easier for the developer, they don't directly make code insecure.
I don't think people are confusing those things at all. The comment you're replying to is quite literally saying that using $_POST['somevar'] is too easy.
> I don't think people are confusing those things at all. The comment you're replying to is quite literally saying that using $_POST['somevar'] is too easy.
Are you guys, then, saying that $_POST['id'], by itself, is less secure than a getPostVar('id') would be, by itself?
In all seriousness, shouldn't all the frameworks just have some validation built in? Being that this is such a "global" WTF problem.
I would love to be able to say ini_set('sanitize_rest', true) and deal with errors that might result from that knowing at least the strings are safe. Or have functions like sanitize_string($str) and have the documentation encourage it everywhere. I mean, aren't we all just implementing those on our own anyway?
I know that obviously there are already functions for type checks etc, but the idea is to make it even easier, more obvious, and functions directly targeting the problem, even if they are mere aliases.
When a mistake is easy, the solution should be made easier.
PHP has already tried automatic sanitization with magic_quotes_gpc, and the results were far from secure.
It is not possible to have a single sanitize() function that renders a string safe for every possible context, and to try to provide one results in nothing but complacency and false sense of security.
Escaping for SQL is different from escaping for HTML, and even if you escaped a string for both, some idiot is going to echo it inside an inline script or a CSS attribute. Sanitize for all of them, and you begin to seriously mangle those strings. Oh, and it might still be ineffective against directory traversal. I've seen plenty of PHP users who think they're clever because they wrote a function that applies every single escaping function in a row, not realizing them some of them undo one another's work. Selectively unescaping after the fact is even more fun.
The only thing I can think of that could render a string absolutely safe in every context is intval(), but then you don't have a string anymore, and I suspect that even that can be abused with unexpected zeroes and negative values.
Right. I absolutely get this argument, except, the alternative to idiots misusing the tools is for those same idiots to start with no tools! And they're going to somehow implement and upload what they're building anyway...
What does this nonsense reply even mean? Nobody is talking about one-size-fits-all. People are talking about mitigating some stupid default behavior in a language. The suggestion "You're doing it wrong" just feeds into his point (that you can even get something this straightforward wrong in the first place indicates, maybe, you should put a fence there to warn people).
Programmers have so much stockholm syndrome it's unbelievable.
> I would love to be able to say ini_set('sanitize_rest', true) and deal with errors that might result from that knowing at least the strings are safe.
How is a magic ini setting to "make
Strings safe" not a one-size-fits-all?
> People are talking about mitigating some stupid default behavior in a language.
What stupid default behaviour? Giving you data as its received and tools to validate/sanitize it as required?
> if(!ctype_digit($_POST['ID'])) { // throw exception here }
ctype_digit is broken. Try passing integer values. ctype_digit(50) === true, but ctype_digit(100) === false. And "0000001" passes as true, which most people in the majority of scenarios would prefer not to pass. I can't remember what the even-worse-bug is with ctype_digit is, but even if you cast the value to string [ex: ctype_digit((string)$var))], there is some value that still passes for true when it shouldn't - do not use ctype_digit. is_numeric() is also unusable for validation [is_numeric("123e4") === true]. is_int() is a strict type-check so can't solely be used to validate request variables which are always strings (...or arrays, more below).
The only correct ways to verify that a variable contains either a valid numeric string or integer is by comparing type, and then using a regex or a double string-then-int cast.
ex: unsigned database ids: if ((is_int($var) || is_string($var)) && preg_match('/^[1-9]\d*\z/', $var)) { // definitely an int > 0 }
ex: signed integer: if (is_int($var) || (is_string($var) && (string)(int)$var === $var)) { // valid int (including negative values) }
Frankly, developers who don't understand how request variables are handled in PHP have zero chance of properly validating input. Find any site/app written in php, even if built on any of the major frameworks. You can instantly break 30-50% of them by passing an array where a string is expected.
Find an app that takes "?query=hello+world". Instead pass in "?query[]=hello+world". Want an example? Log in to Facebook, then visit this search page[1]. Look at the query string and then what was searched for - and the contents of the search box. Bam, even Facebook gets it wrong! Same thing with Symfony's search[2]. Or Packagist (composer's package manager repository)[3]. More seriously at Yii[4], which exposes an internal error to users as they try to string-trim an array ("Error - trim() expects parameter 1 to be string, array given").
Most developers - including many seniors who have been exclusively coding in php for years - have no clue. You will either cause a 500 Internal Server Error, or your input array will result in an output string of "Array" if they typecast your array to the string they expected. Even the major frameworks, when you pull user-submitted values, simply passthrough the value submitted. Your app expects a string (or a string that contains a numeric value), and instead any user who knows the "[]" syntax can pass in an array.
Really reflect on this fact. Most applications start handling a submitted array value as if it's a string. The bugs this produces are astronomical in some cases.
If you think your framework protects you, think again. The frameworks' request objects also do not have strict type checking. The same goes for their form and model validation classes; if you're using the built-in "integer" or "numeric" validators, you're probably doing things wrong.
It's a nightmare. You could try to blame PHP, but really it's the developers - including the developers of every major well-known framework I've ever touched - that have absolutely no clue.
Related tangent: comparing password and password confirmation fields. Many developers do if ($password == $passwordConfirm) {}. In PHP 5.x, "10" == "0xA" (so type "10" in password field and "0xA" in the confirmation field, and it passes validation). This changed in PHP 7 though. There are only two correct ways to verify that two strings are exact: $password === $passwordConfirm (triple equals), or strcmp($password, $passwordConfirm) === 0.
> The only correct ways to verify that a variable contains either a valid numeric string or integer is by comparing type, and then using a regex or a double string-then-int cast.
You know there is an entire extension dedicated to validating and sanitising inputs right?
All your type checking and regexes and double cast comparisons could be replaced with:
if (($value = filter_var($value, FILTER_VALIDATE_INT)) !== false) { doStuff(); }
> You could try to blame PHP, but really it's the developers
> ctype_digit is broken. Try passing integer values.
Well, ctype_digit takes strings, not integers. So don't be surprised if you pass the wrong type to a function and it doesn't work as you expected.
Some of your criticism is valid, but you can't go around talking about how PHP isn't rigorous enough, and then complain tha some functions don't work as you'd like when you give them a wrong argument type.
Your other arguments are more about bad developers as you say it yourself, anyone who actually cares about what he does knows you have to check equality with ===, while the array argument problem is less well known, but actually almost unrelated to PHP: POST or GET is user data that can be any type and should be checked. Only the last of your examples is actually a problem to me.
The problem is that the behavior is not consistant. Php is some parts c, some parts java and some part perl. That is the problem. It takes a encyclopedic knowledge of the documentation to know what part you are dealing with. And even that might not help you sometimes, because the documentation can be plain wrong at places...
>anyone who actually cares about what he does knows you have to check equality with ===
Can you write php code to store some string to string mapping in a php array and further down, check if a particular key exist in that array?
If you know C, you can identify pretty well what are the (thin) PHP wrappers around the C routines. Java influenced the OO design, so you know where to find it, and Perl is mostly, well, PCRE. It's not consistent, but it's not _that_ hard to navigate.
Well, It is not that easy. For example, take the function strlen(). You can see that it is a wrapper for the C function.
So can you expect that it will behave like the c function, accepting strings only? No! It now accept both strings and integers. So you have part perl there.
Now take another function. ctype_digit(). I don't know where the name come from. You expect it to behave like strlen() accepting both strings and numbers. But no!
If you pass it a number, it won't even bat an eye (throw an exception or error), but it will just return gibberish...
Hope my point, that these influences are mixed together in a haphazard fashion, is a bit more clear now...
It's more about automatic type conversion than the API, though; numerics magically get converted to strings and vice-versa. This is convenient in some cases, especially for beginners who don't have to think about types, but it eventually bites you if you never realize what happens in your back.
This is not so much a PHP thing as it is a different way of thinking about things. If you know something is supposed to be an integer, you can simply force it to an integer before you do anything with it:
$id = intval(@$_POST['id']);
if (!empty($id)) { ... }
Note: 0 shouldn't be a valid value for something called 'id', since it's likely a db id; if it is, use something other than an empty() check.
That said, input is a problem. In a dynamically typed language, it's easy for beginners to expect HTTP and requests in PHP work the same way. In reality, you will be coercing from string to wherever you are working with, which could also be an array of strings, or vice versa.
Input rules would be nice. For example, we always want id to always be an unsigned integer in this context and email will always be... and so on.
Dynamic typing makes, in this case, two types look like either plain old dynamic typing or leads to believing input has a homogeneous type.
In any case, I'm going to take a mag glass to some of our code today. Thanks!
It is in fact documented; what I didn't explain is that ctype_digit treats integers < 127 as chr() equivalents. It's designed to juggle both strings and integers, which indeed works against php's usual method of type juggling. This is because ctype is a port or wrapper around the C lib which behaves as such.
Saying that superglobals are the worst thing about PHP is like saying "the worst thing about x86 Assembly is the mnemonics". It misses the point entirely. The worst thing about PHP is that it is fundamentally not well designed and therefore makes developing high-quality software much harder than it needs to be. It also makes developing extremely low-quality software easy, which could be good or bad depending on your perspective.
By all means, but having the language nudge people in the right direction makes a world of difference.
PHP, much like Javascript is terrible for new developers for this very reason.
Learning a "good" language for lack of a better term is no more difficult than learning PHP/JS and is always worth the effort, if anything learning "good" languages is usually much easier because they are usually internally consistent.
If all newbs picked up Java as language #1.. would their apps be better? Or would the really bad devs writing copy paste stack overflow code just be unable to understand it, so they would quit?
Like is it safer because it keeps out knuckle-draggers, or safer because it is actually safer? Cuz I can write some horrible Java code that will rival anything you can do in PHP
As I said, by all means you can write bad code in good languages. I'm not saying choosing a good language excludes all possible bad code, only that they provide some guidance on better practices.
So you mention Java. Java enforces OOP. Now OOP may not be the best paradigm always, however its a vast improvement on inline procedural PHP.
That isn't to say you can't write some horribly modelled Java code, but the fact that modelling tools are so explicit and forced on the user makes the user at least think about how to use them better.
Other peoples opinion may differ from mine but I maintain this is incredibly important in speeding up new programmers towards writing good code.
> by all means you can write bad code in good languages
I think the main criticism of the GP was the fact that you use the expression "good languages" without defining what makes a language "good".
> not be the best paradigm always
same as above, what makes a paradigm "best"?
> vast improvement on inline procedural PHP.
but why you assume that the majority of PHP codebases are written in an "inline procedural" style? Do you have any evidence? Regarding the "procedural" part, the only large project that is not OOP-based is Wordpress, and even there spaghetti code (which I assume is what you mean by "inline") is AFAIK frowned upon by the community.
> the fact that modelling tools are so explicit and forced on the user
You need to accept the fact that many people may not like the "opinionated" nature of some language, (in fact that inflexibility that you mentioned is something I dislike about Java); often, a language may or may not be the right tool for a specific job precisely because of those opinionated bits.
The statement I made is that more consistent and "opinionated" languages encourage better code. They don't enforce it, just encourage it.
It is my opinion that this is valuable.
I did define "good", internally consistent languages with strong guidelines for developers.
I made no statements about mature PHP codebases as they are irrelevant to my argument.
I do accept that people prefer less "opinionated" languages, I too fall into this camp, but I am no longer a new developer, as such this point is entirely irrelevant to what I was saying.
Nitpicking individual points whilst misconstruing what I said is neither useful or appreciated.
> Nitpicking individual points whilst misconstruing what I said is neither useful or appreciated.
It wasn't my intention, I'm sorry if my comment came off as nit-picky. I wasn't trying to misconstrue your comment, I genuinely did not get your argument (I think I now get it, thanks to your reply).
It rare that I see good Java code, especially that written by junior developers.
I think OO is a hard concept to get right. I know it took me years to master, and one of my epiphanies about OO design is that it's not always appropriate. Yes I can tell you the best OO approach to a problem, but I can also often tell you a better approach that isn't OO.
I often see people say that Java has pretty much been designed as (or at least evolved into) a way to let large numbers of mediocre programmers to develop acceptable-quality software.
This is the most pernicious and annoying technicality that advocates of low-quality languages invoke. PHP does not actively work against bad or just plain wrong code, and its construction actively encourages bad code. It's missing aspects that we know to be tremendously useful for writing high-quality correct code.
You can write low-quality software in e.g. rust, but you're going to work a lot harder at it. Rust (again, just as an example) also makes it easier to write high-quality software.
This is really the only metric by which you can judge the quality of a language, since in the end they're all (mostly) Turing complete.
So, I generally don't wade into this argument. I've been programming for 27 years, 12 of that professionally. In that time I've used a lot of languages for a lot of projects. Every language is capable of being used to shoot yourself in the foot TBH. The hate that PHP gets is, IMHO, mostly from the fact that it's a gateway language and as such often has a higher WTF per minute rate for the code you find than many other languages. Anyway, on to what made me post this.
> its construction actively encourages bad code
That's a statement that needs a reference to back it up.
It wasn't mentioned in the post from Slack, but default superglobals and the earlier register_globals design decisions are the worst and most impactful wart in PHP.
Because it was designed as a templating language, the default web server interface, which is CGI - will auto-expose all variables in global scope, ex.
PHP has a horrible reputation with security for this reason - we all know that somehow, somewhere, in almost every project someone is pulling in a user-controlled variable from a superglobal and they aren't escaping or checking it properly (since you can't be warned about it but the feature will work).Worse - and i've seen this a lot, even with Laravel, CodeIgniter, Cake, Symfony/Silex etc. you end up with these well structured projects that declare request classes, methods and variables etc. etc. but then sometime down the road a developer takes a shortcut in a method and pulls in a $_GET or $_POST inside a controller (usually because they don't know how, or aren't bothered to - changing all the related classes) - running around the default exec stack.
I've seen this so often - because it's so easy to do it. The most common place is where a designer has built a frontend AJAX form. They now need to build a quick backend check, so they Google "php backend ajax username check" and they'll likely get a result like this one:
http://stackoverflow.com/questions/29459183/check-username-a...
where the 4th and 5th lines to the solution are:
they copy that into a file called ajax_username_check.php and save it to the server - and they've now destroyed all that previous good work by opening up a very blatant and easy to find SQLi vulnerability. Their database will be on pastebin within a month.You can spot this type of vulnerability from the frontend because the URLs used in the AJAX calls don't match the URL router patterns for the rest of the app (ex. GET /user/username_check_ajax.php vs /user/check_user).
In other languages you can't get that without using a standard library that will escape the values by default. Any solution you search for will always be a safe method to obtain the variable values by default.
Some good news: Hack doesn't expose superglobals in strict mode:
http://cookbook.hacklang.org/recipes/get-and-post/
I'd strongly recommend that this is used in all PHP projects, since it strictly enforces variable access - even in cases where you're using a framework that is supposed to enforce it.
IMO PHP missed a big opportunity with not removing superglobals in version 7 and enforcing an explicit safe request object much like other languages do. They likely wanted to avoid it because of the cluster of register_globals and magic_quotes from earlier versions.
[0] I think it is important to distinguish PHP the language and PHP the runtime. PHP the language is now decent - having caught up with a lot of features (although I find it very verbose and harder to read) while PHP the runtime is undoubtably still a horrible runtime - hence HHVM