The mysql_real_escape_string part is what PHP did with its "magic_quotes" feature. This feature has been removed for good reasons :
- All your variables are cluttered with \' everywhere, so you have to unescape them before doing anything other than using the variable in a SQL query. And re-escape them after that. You will forget to do it.
- It doesn't protect you if you expect the variable to be numeric, and use it in a query without enclosing it in quotes:
// do not do this
mysql_query("SELECT do_not_do_this WHERE x = $var");
// $var could be `0 OR 1 = 1`, even after escaping
- Also, you never know if a variable is actually escaped or not. e.g. in the following code, is "$var" escaped ?
$var = get_data_from_db_or_other_source();
// do not do this
mysql_query("SELECT do_not_do_this WHERE x = '$var'");
- You've "protected" yourself from sql injection and html injection (very wrongly, btw). What about shell injection, css injection, javascript injection, protocol injection ? Are you going to also apply escapeshellarg() on all $_GET variables ?
---
Also, you shouldn't be using strip_tags at all:
- It doesn't strips quote characters, so you would be vulnerable to html parameter injection, if you used a variable "sanitized" like this in a html parameter.
- It alters the data is some non-reversible way. What if you genuinely want to display (not render or "execute") some html code in a blog post on your web site ?
You should be using htmlspecialchars() instead.
---
What to do:
- escape, do not "filter" or "sanitize"
- escape data just in time, not ahead of time (e.g. escape your variable just before using it in a mysql query, or just before outputing it in a html document)
- prefer prepared statements
- use the right escape function depending for the context in which the variable is used: mysql_real_escape_string for mysql query, htmlspecialchars for html, escapeshellarg for command line arguments, etc.
The strength of using a parameterised query (i.e. $dbh->prepare) is that you are letting your database's type system do the work for you. You tell it where your variable should land in your query, and before it even looks at the variable the database works out what type to expect. Either the data you give it fits into the hole you tried to put it in, or the query fails gracefully and your existing data is safe.
Without trying to be too much of an arrogant arsehole, surely they should be using a proper language. That is the real core cause of the problem here isn't it?
It's true that you can be an idiot in any language, but I have written production code in every 'high level' language of this type and only PHP seems to suffer from this problem. Take a random mail form, even has its own domain: http://jemsmailform.com/
Finding the obvious problems I leave as an exercise.
I'm trying very hard to think of a language that won't, by default, allow you to throw user-generated HTML onto an HTML page if it allows you to create an HTML page at all that incorporated user-generated (or, in fact, program- or programmer-generated) strings. Angle brackets, etc., are only situationally bad. (And let us separate language from framework here; the languages used in templating frameworks will almost always allow you to emit HTML should you choose to do so.)
It's true a turing complete language has no way of stopping you, but there are ways around this that many frameworks and languages have adopted. Taint mode, for example.
This is often said, but these problems are practically endemic in the PHP 'community'. I can't help but think that the lack of proper documentation or basic explanations is responsible.
There really is no 'community'. The official documentation is fine (the included comments actually make it pretty good).
The problem is mostly the widespread support and low barrier to entry. PHP is supported on every crappy $2/m hosting plan.
It's also the language of a lot of the net's more popular CMSs. (WordPress?).
People have a site and want to make it do stuff. The site is already either using PHP or that's all the host supports.
PHP gets most of the users who aren't trying to get into programming (and may have no interest in it whatsoever) - they just wanna get something done. Some of those people develop an interest and stick around and learn the language, but aren't really forced to learn any software development methodologies.
PHP acts as a filter, collecting all of the non-programmers and people who have stumbled into programming accidentally or unintentionally. People who write Ruby or Python are generally people who set out to be a programmer.
The language itself has its warts, but I see the above, not the warts, as the primary driver of all the crappy code out there.
The mysql_real_escape_string part is what PHP did with its "magic_quotes" feature. This feature has been removed for good reasons :
- All your variables are cluttered with \' everywhere, so you have to unescape them before doing anything other than using the variable in a SQL query. And re-escape them after that. You will forget to do it.
- It doesn't protect you if you expect the variable to be numeric, and use it in a query without enclosing it in quotes:
- Also, you never know if a variable is actually escaped or not. e.g. in the following code, is "$var" escaped ? - You've "protected" yourself from sql injection and html injection (very wrongly, btw). What about shell injection, css injection, javascript injection, protocol injection ? Are you going to also apply escapeshellarg() on all $_GET variables ?---
Also, you shouldn't be using strip_tags at all:
- It doesn't strips quote characters, so you would be vulnerable to html parameter injection, if you used a variable "sanitized" like this in a html parameter.
- It alters the data is some non-reversible way. What if you genuinely want to display (not render or "execute") some html code in a blog post on your web site ?
You should be using htmlspecialchars() instead.
---
What to do:
- escape, do not "filter" or "sanitize"
- escape data just in time, not ahead of time (e.g. escape your variable just before using it in a mysql query, or just before outputing it in a html document)
- prefer prepared statements
- use the right escape function depending for the context in which the variable is used: mysql_real_escape_string for mysql query, htmlspecialchars for html, escapeshellarg for command line arguments, etc.