I find it incredible that there are 86,453 results for this, and 58,123 results for execution of a parameter as a command!
Github should set up a little warning for people when they log in to their account if their repos match any of a set of obvious security anti-patterns, because this would be fertile ground for exploits. They could expand it later to some kind of code-review bot which does automatic code reviews looking for common antipatterns, off-by-one errors, and anything else that is easy to detect automatically. There are quite a few tools like this for analysis of desktop apps, but code for web apps are not checked as much in a systematic way.
Are there any external services which do this already for public github repos?
I think this is a great idea, and an area where someone as big as Github could make a MASSIVE difference. A CodeSmell metric! Allow contributions to add extra smells to langauges, and then it looks after itself.
A modern, open, polyglot lint would be of great benefit to the world. Unfortunately, it's something that gets way too little attention. Open-source static analysis projects always seem to flounder.
That would be great, but I would hope it could be turned off. I have plenty of code on GitHub that's clearly marked as just something I put up in a few hours for personal use only, where I don't care if there's vulnerabilities or exploits. The code was written in the true spirit of hacking, fast and dirty in order to solve an immediate need.
Sure, it could be on by default but easy to turn off if you wanted to.
Thinking about it again probably it'd be best as a per-repro flag that made a little button appear somewhere saying 'security', 'code police' or something similar, leading to more info. Obviously it'd take a bit of work and generate some false-positives, but I could see a system like this being really handy if they accepted open contributions for filters, even if it was opt-in it would probably still help a lot of people.
Re dirty hacks, I guess I wouldn't publish things like that to github, I'd keep them in a local or private repo, as they're not for sharing. If they are for sharing in spite of vulnerabilities, turning off a warning shouldn't be too much hardship for you, and it could actually help others looking at the code quickly tell it wasn't to be used in production as it would mark it as unsafe.
Lots of generic hosting providers can be used for it too, if you have a hosting account somewhere. As long as the server has git and you have SSH access, you can do this on the server side:
$ mkdir project_name.git; cd project_name.git; git init --bare
That's what I do (on Webfaction). Of course, then you don't get all the fancy issue tracking and all that from github, but it works if all you need is a few kb of storage for a remote. That also makes it easy to deploy PHP projects with a single post-receive hook, since it's all in the same box.
Your example looks much like AR and AR will sanitize the given parameter at least in this case so that's a non-issue here. Same would be true if you'd call a prepared statement in PHP and hand it a GET parameter. The problem with the given PHP samples is that they do neither: The take the unsafe user-provided parameter and use string concatenation to build a query - the textbook example of an SQL injection vulnerability.
He's citing this as a potential cross-authorization vulnerability, not SQL injection.
The form he's demonstrating is a common misstep in Rails. Instead of writing something like:
current_user.apartments.destroy(params[:id])
The programmer wrote:
Apartment.destroy(params[:id])
Meaning that the apartment lookup is not being done within the context of the current user, it's global. This means an attacker can delete other users' apartments with crafted URLs.
When I'm testing Rails applications, I always grep the source for something like:
All the code is already public on Github, so everyone can see these gaping security holes, right?
What if some honest, global actor could mass-commit a fix for all these repos in one fell swoop? For example, replace all references to:
$_GET
with:
some_safe_sanitizer($_GET)
All affected repos win a free fix, and the world becomes a better place due to having less security bugs. Essentially, what would the next abstraction layer, over all Github repository objects, look like? Ponder that.
To do it properly you'd want to use prepared statements[1] which requires a non-trivial, though not particularly complicated, syntax change. So your global actor would have to parse the PHP in a rather more intelligent way that just a string replace.
You could just use mysql_real_escape_string [2] but that's less secure than prepared statements, and may break some things (eg if the code is relying on certain things not being escaped, etc). Also that extension is deprecated in favour of prepared statements.
Also, some of the dodgy code is using $_GET vars for things other than values, like field and table names (!!!!) which would need a more significant refactoring in order to fix.
However it would be a great little problem to work on, and you could probably write a bot to fix 80% of the code pretty easily.
The other issue is how many people will understand and accept the pull requests...
The pull request could be done "stupidly", which may fix most problems (your 80% value), and for cases where it wouldn't "just work" the author could just reject the pull request but still be made aware of the problem.
I think what I'd do is write a bot that issues pull requests along with a commit message explaining what was fixed, why it needed fixing, how the automatic fix isn't perfect and that they should really consider rewriting it to use prepared statements.
Include a check to make sure multiple bugs in a single repo are handled by just one pull request too.
Quoting/escaping the table name in this case would break the code. (Backticks will work, but that's another story.) The variable has already been sanitized immediately before, so there's no injection possibility.
The case with $set_to is still pretty bad form, but it has been sanitized above by casting to an int.
Not saying this is great code, just pointing out the fact that there's no possible way to do a mass-commit fix.
Instead of policing every line of code you can also mitigate SQL injection by restricting the access of the database handle the user-facing queries are using
* All read-only queries use a read-only (SELECT only) database account. Injecting; DROP TABLES or INSERT, UPDATE, etc (DML commands) just error out.
* User accounts and logins are stored in a different database, so only the code responsible for login and registration can access those tables/database.
* All queries should use prepared statements which effectively treats all injected text as just text rather than database commands.
Oh. And it's for wordpress. Isn't that just fucking wonderful. I would guess looking at the age of the account and the complete lack of documentation that it's a personal project he never really intended to get much scrutiny. I'm sure if someone looked at my github they could find some bad code too. Not that bad though.
I second this. I manage a couple of teams and one is dealing with legacy a PHP stack where they've been refactoring things like SQL injection risks and whatnot. Having something as an automated second set of eyes would be brilliant.
Not really at the moment. I've been slowly refactoring it into a publishable chunk of code but progress is slow. I will publish it on github when it is done.
I am not sure that is enough, I mean if you look at the supposed results many of them can't actually be actacked (like the on that tests that Id is a number before it is used) but a semi stupid script wouldn't catch those.
To be clear, this list is not all exploitable. It only shows php files that use both GET variables and the mysql_query function. Not all of these variable are being fed unescaped into a query, nor are they necessarily used in a query at all.
Agreed. I find posts like these disingenuous. What do learn from this list? I think we all know that SQL injection is a serious problem and a very common mistake already. What this list seems to do is just serve as a high horse we can all get on and proceed to shame and look down our nose at people who:
- Use raw $_GET variables in their code
- Use $_GET in MySQL queries
- Use mysql_real_escape_string instead of prepared statements or otherwise properly sanitizing input
- Use PHP in general
We already know we should all be using prepared statements or combining data sanitizing methods with an ORM and to not trust raw user input and that its cool to hate on PHP. So I ask again, what are we learning here? That there's a ton of programmers who are doing sloppy incompetent work? Not news.
Furthermore, I think context is important. I'm not ashamed at all to say out loud that I've done these sorts of things knowingly and purposefully despite knowing it's not safe or correct. Why? Maybe I was creating simple examples to teach others the basics of getting user input and working with it in a database. Of course you'd warn people not to do things like that in a production environment but you show it that way anyway to get across some basic ideas without throwing too much at people. Or what if I were writing a simple series or scripts or small app that would only be used by myself or a select group of people? What if that application needed to be done in a hurry and was only for internal use and not accessible to the outside world? Lots of things can always go wrong but at a certain point you have to be able to trust someone and its not always better to be right than happy. I'm sure if anyone wants to be pedantic they can poke holes in my examples but the point is that these sorts of sins aren't always so awful depending on the context. I don't see any purpose in this other than to either look for open source applications to exploit in the wild or to just pick on a really easy target to make us all feel superior.
I totally get what you're saying, but this right here is kind of the problem:
> Maybe I was creating simple examples to teach others the basics of getting user input and working with it in a database.
I'm one of the countless people who learned PHP off of examples that do just that, for clarity, and it was years before I learned that it was not meant to be a real-world example. Sites that offer such examples (I'm looking at you, w3schools) are probably responsible for 90% of all the bad code that shows up on lists like this, because the exchange amounts to:
Student: "How do I do XYZ?"
Teacher: "It's easy! Just do 123."
Student: "Thanks!" <leaves, closes door>
Teacher (to empty room): "Haha, just kidding. You should really forget that entirely and do 456 instead."
I don't think it's valuable to learn about mysql_query("delete from `tbl` where id = {$_GET['id']}") at all in the first place, if you just have to unlearn it and start doing it the right way later. It doesn't actually simplify anything.
I don't see any purpose in this other than to either look for open source applications to exploit in the wild or to just pick on a really easy target to make us all feel superior.
You could do a similar search for common anti-patterns in almost every language you can think of, so this is not specific to PHP, just perhaps a bit more common due to the low-bar for picking up PHP.
A useful aim of this sort of post to me is to underline that people should not do this in a script exposed to the internet, and if people read comments on searches like this, they'll find out why. I don't think it's appropriate to use variables in query strings even for learners as it's teaching bad habits - I have personally seen scripts go into production based on bad examples at a former place of work (in perl, not php), allowing spammers to use a contact form. You can guarantee that for each of these common vulnerabilities there are scripts in the wild scanning websites and trying to exploit them.
A surprising number of people still do make mistakes like this, so it does more good than harm to point that out I think, and certainly doesn't make me feel superior, it makes me wonder which vulnerabilities I've missed in my own code.
We already know we should all be using prepared statements or combining data sanitizing methods with an ORM and to not trust raw user input and that its cool to hate on PHP. So I ask again, what are we learning here? That there's a ton of programmers who are doing sloppy incompetent work? Not news.
We are not the problem, but we should consider trying harder to be the solution. Maybe new PHP programmers do need to be educated, and there's a lot of outdated source material out there giving them bad advice and spreading bad practices. Maybe people putting their stuff on Github don't entirely realize how networked and public it is... and they need a polite reminder that other people might end up paying for their haste or negligence. We can do more than just point and laugh. PHP might be bad but a lot of us know it doesn't have to be as bad as most of it still is.
We actually clean the GET and POST arrays in an include when the page gets requested, if you need anything unescaped you specifically have to request it from a different array.
The mysql_real_escape_string part is what PHP did with its "magic_quotes" feature. This feature has been removed for good reasons :
- All your variables are cluttered with \' everywhere, so you have to unescape them before doing anything other than using the variable in a SQL query. And re-escape them after that. You will forget to do it.
- It doesn't protect you if you expect the variable to be numeric, and use it in a query without enclosing it in quotes:
// do not do this
mysql_query("SELECT do_not_do_this WHERE x = $var");
// $var could be `0 OR 1 = 1`, even after escaping
- Also, you never know if a variable is actually escaped or not. e.g. in the following code, is "$var" escaped ?
$var = get_data_from_db_or_other_source();
// do not do this
mysql_query("SELECT do_not_do_this WHERE x = '$var'");
- You've "protected" yourself from sql injection and html injection (very wrongly, btw). What about shell injection, css injection, javascript injection, protocol injection ? Are you going to also apply escapeshellarg() on all $_GET variables ?
---
Also, you shouldn't be using strip_tags at all:
- It doesn't strips quote characters, so you would be vulnerable to html parameter injection, if you used a variable "sanitized" like this in a html parameter.
- It alters the data is some non-reversible way. What if you genuinely want to display (not render or "execute") some html code in a blog post on your web site ?
You should be using htmlspecialchars() instead.
---
What to do:
- escape, do not "filter" or "sanitize"
- escape data just in time, not ahead of time (e.g. escape your variable just before using it in a mysql query, or just before outputing it in a html document)
- prefer prepared statements
- use the right escape function depending for the context in which the variable is used: mysql_real_escape_string for mysql query, htmlspecialchars for html, escapeshellarg for command line arguments, etc.
The strength of using a parameterised query (i.e. $dbh->prepare) is that you are letting your database's type system do the work for you. You tell it where your variable should land in your query, and before it even looks at the variable the database works out what type to expect. Either the data you give it fits into the hole you tried to put it in, or the query fails gracefully and your existing data is safe.
Without trying to be too much of an arrogant arsehole, surely they should be using a proper language. That is the real core cause of the problem here isn't it?
It's true that you can be an idiot in any language, but I have written production code in every 'high level' language of this type and only PHP seems to suffer from this problem. Take a random mail form, even has its own domain: http://jemsmailform.com/
Finding the obvious problems I leave as an exercise.
I'm trying very hard to think of a language that won't, by default, allow you to throw user-generated HTML onto an HTML page if it allows you to create an HTML page at all that incorporated user-generated (or, in fact, program- or programmer-generated) strings. Angle brackets, etc., are only situationally bad. (And let us separate language from framework here; the languages used in templating frameworks will almost always allow you to emit HTML should you choose to do so.)
It's true a turing complete language has no way of stopping you, but there are ways around this that many frameworks and languages have adopted. Taint mode, for example.
This is often said, but these problems are practically endemic in the PHP 'community'. I can't help but think that the lack of proper documentation or basic explanations is responsible.
There really is no 'community'. The official documentation is fine (the included comments actually make it pretty good).
The problem is mostly the widespread support and low barrier to entry. PHP is supported on every crappy $2/m hosting plan.
It's also the language of a lot of the net's more popular CMSs. (WordPress?).
People have a site and want to make it do stuff. The site is already either using PHP or that's all the host supports.
PHP gets most of the users who aren't trying to get into programming (and may have no interest in it whatsoever) - they just wanna get something done. Some of those people develop an interest and stick around and learn the language, but aren't really forced to learn any software development methodologies.
PHP acts as a filter, collecting all of the non-programmers and people who have stumbled into programming accidentally or unintentionally. People who write Ruby or Python are generally people who set out to be a programmer.
The language itself has its warts, but I see the above, not the warts, as the primary driver of all the crappy code out there.
Your devs will learn to trust $_GET "because it's cleaned by our include".
They'll never sanitise input themselves "because it's cleaned by our include".
Code that they write for other projects will trust $_GET "because it's cleaned by our include" - except it's not because this is a 3rd party script.
Code you import from other projects will be double-escaping.
Also mysql_real_escape_string is deprecated.
Stop treating SQL queries as strings. They're code. You wouldn't write code by concatenating strings with user input would you?
Use prepared statements.
edit: missed the part when you said that you clean $_POST as well. I was wondering "What do you do when you need to submit markup?" Now I know that it's magic. The $_REQUEST array is actually the unsanitised array, whereas $_POST is the sanitised one. Of course! Isn't it obvious!
This could be in part because the procedural interface of mysqli is deliberately designed to be vyer similar (if not identical) to the mysql one. So maybe they just continued writing their code that way but replaced mysql by mysqli because someone told them mysqli was better and supported (which it is, but probably for other reasons).
I wonder whether a similar search for the mysqli OO interface yields the same amount of unsanitised queries. (Assumption: People using the OO interface looked through the documentation more closely and maybe understand the ways in how mysqli is better than mysql. But that's just an unfounded assumption.)
Wow. This has just turned into a great example for one of my classes. I don't think there's a better way (other than hands-on examples) to show the severity and widespread of such security holes.
my 2 cents:
1) At the first place, yes, it does look like these are sureshot SQL injections.
2) However, we are looking through just a tiny window. There could be filter chains executed long before this code that would sanitize the request parameters before they are consumed anywhere else in the codebase.
It is a small window, however, wouldn't it be better to filter the values into an easily identifiable 'clean' variable? The code is still using $_GET, and while it may have been filtered above me, I have no indication of that - versus - $foo->cleaned('var') - where I can reasonably assume it is clean.
Github should set up a little warning for people when they log in to their account if their repos match any of a set of obvious security anti-patterns, because this would be fertile ground for exploits. They could expand it later to some kind of code-review bot which does automatic code reviews looking for common antipatterns, off-by-one errors, and anything else that is easy to detect automatically. There are quite a few tools like this for analysis of desktop apps, but code for web apps are not checked as much in a systematic way.
Are there any external services which do this already for public github repos?