Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A guide to preventing SQL injection (bobby-tables.com)
52 points by fogus on Sept 15, 2009 | hide | past | favorite | 28 comments


Both the guide and the strip are inaccurate. Parameterized SQL statements are also vulnerable to injection: you can't parameterize identifiers, only values. So any dynamic column references, dynamic sort orders, or dynamic limit statements remain injectable.

To see why this matters, consider every data table view ever implemented in any web app: table columns are selectable with checkboxes, you can click the columns to change sort order, and the data is paginated.

You should obviously be using parameterized queries. They are clearly more secure than concatenated SQL queries. But they aren't a magic totem against injection, and we have much better resources for developers than XKCD. Start with (yes really) Microsoft:

http://msdn.microsoft.com/en-us/library/ms161953.aspx


Do people really use variables based on raw userdata to select table/columns? Seems like an incredibly rare case, and bad design. I've never seen/used it.

Parameterized queries are a magic totem against injection, unless you mix them with concatenation of unclean userdata. Which would be silly.


Of course they do. Sort columns send "ASC" and "DESC", pagination limits and offsets are sent in offset= parameters, etc. How else do you think people do it, other than the most obvious way?

Don't make the problem even worse than it is by trying to downplay it just because you know what you're doing.


OK you got me on those 2 cases. For sort order:

$sort_order=="ASC"?"ASC":"DESC"

For offsets and limits convert/parse it to a number first and ensure it's within sane bounds.

You're right though. Parameterization isn't a magic bullet that allows you to forget about all other unclean data if you're using other unclean data in an SQL query.

Surely we all know this though? :/


My advice is, pass page= as your pagination and pass 1/0 in for ASC or DESC. If you have selectable columns, number them in the parameters, don't name them.

Point being, when you compose the query, you should never be inserting text from parameters; you should be converting parameters into known-good values.


I think it might be a bit more helpful to explain to people why or how parameterized queries are better than "doing it yourself" for this sort of thing.


I'll take a stab at it.

If you think you are good enough to write code resistant to SQL attacks, you are wrong. If you think the best coder in your company is good enough to write code resistant to SQL attacks, you are wrong. Now realize that it is neither you nor the best coder in your company that you have to worry about, but the performance of the worst guy employed by the team in Bangalore on his first day back at work after his mother's funeral.

All code by necessity has a weakest link. The best you can do is make sure it is the library you are using.

(Incidentally, this also describes manual memory allocation.)


Sigh.

You're still "doing it yourself" even if you use parameterized statements.

My take on this article has changed in the past hour; before I thought it was cute but inaccurate, but its state is transitioning to "actively evil".

The reality is that companies that have a lot of SQL but never have SQLI vulns do all of the following:

* Use an ORM like Hibernate (or AR or Django) for "front-line" database queries

* Standardize on parameterized statements for the complex stuff

* Design and implement a "house style" for query builders and "modular" SQL statements

* Factor as much as possible into stored procedures in the database

* Run databases in least-privilege mode, so that code that only needs read access to a few tables can "revoke" the unneeded privileges

* Sweep their codebases for SQL statements and audit them with a team signoff

By "do all of the following", I mean "A-L-L of the following". The ones that don't are the ones that tell us "there's no way you're going to find SQLI in your audit; you'd get fired for having concatenated SQL here", and then lose their entire database in the first week.

I don't mean to sound obnoxious. I just hate the idea of people walking around thinking their code is safe because they switched to ? instead of "'" + x + "'".


Could you elaborate on "Design and implement a "house style" for query builders and "modular" SQL statements"? How would you go about doing that, and what would it look like?


Edit: errr, nevermind. I read your posts further down.

Well to be fair, the article isn't claiming that parameterized queries will make all your DB transactions secure, just that they will prevent injection attacks. Which is true isn't it?

Unless you have dynamic code generation in your sql, parameterized queries make injection attacks impossible, don't they?


No, statictype. It isn't true.


Right. This explains why doing it yourself isn't the best strategy.

But I still feel like what is missing from this discussion is why parameterized queries are not vulnerable to injection. Are they just better implemented and better tested? Is there some technological reason why they are superior, something they are capable of doing that a concatenated string cannot?

I feel like having this discussion without telling the readers why the solution is the solution is analogous to just doing a bunch of hand-waving. "Don't do that, do this" doesn't really teach people why "this" is better.


Parameterized queries effectively pre-compile the syntax of the query, and pass the parameters out of band (you can see that on the wire in the TSQL or MySQL protocols --- the parameters are actually seperate messages).

The net effect is, there's no opportunity to mix query structure and parameter, because by the time the database looks at user input, it's already fixed the query structure in place.

You do not get the same benefit by aggressively quoting; among other things:

* Quoting can fall to character set attacks

* You still may need to handle truncation

* Different quoting regimes are required for different parameter types

It's possible to safely do concatenated sanitized SQL (ActiveRecord does it), it's just hard. It's also possible to have injectable parameterized queries, or even injectable stored procedures (for instance, if the procedure uses dynamic SQL). There's no panacea.


Part of the problem is the terrible name, "SQL Injection". I've heard it referred to by another term, something like "subverting the structure of the sql statement" (no, it's not as catchy). Even just calling it that gives a much more fundamental insight into the problem - it's not about always escaping apostrophes, it's about making sure the query parses in exactly the way you intend. Looking at the problem that way, you see that 1) there are obviously similar attacks possible against most query languages (LDAP, XPath, whatever) and 2) separating the query structure from the arguments (as tptacek describes) is key to avoiding the problem.


So why can't I use mysql_real_escape_string? Let's assume that I have to use PHP and there's no way in hell this app is ever going to need to be ported to a different db. (Or wrap it in a "quote" method.)

Edit: Instead of a downmod, how about an answer?


I think, escaping the input from the user in some way is a way which might work, but basically does not address the core issue of sql-injection attacks, pretty much like putting transparent tape over a crack in a window. It sort of works against most known cracks, but there might (and in security-terms, that means: there will) be a crack which cannot be taped well.

But, analogies are no good, let's get to the core of this.

Executing an sql-statement basically means that the server parses, optimizes and executes the statement. Whenever you type something like

  "Select name from dogs where tag=" + mysql_real_escape_string($user_tag)
you assume: the server will parse this query as a select-statement, with the field 'name' selected, whereas the table to examine is 'dogs' and there is some predicate on the tag, and some properly secured value goes into the predicate.

If an injection attack is successful, the assumption that the parsed SQL-statement and the SQL-statement in the code match fails, because a successful SQL-injection infiltrates the actual structure of an SQL-statement and modifies it pretty much arbitrarily.

Now, it is true that it might be possible to escape all known possible user inputs (which might be possible, as SQL should not be turing complete, but I am not sure about the newest standards), so the user input cannot infiltrate the structure of the SQL-Query, because in the user input, no control characters are working. But this is, as I said, mostly like saying: Well, there is a hedgedog on your chair, so use a pillow before you sit down or it might hurt both of you. It does not address the core of the problem.

The core of the problem is: you are doing two things at once. You are transmitting the actual (static) query you want to execute -- the structure of the query -- and the actual values in a single go. The real way -- parametrized queries -- first transmit the structure and after this, they transmit the value. Thus, it is guaranteed that the parsed structure in the database will be exactly the structure you specified in your code, and thus, an attacker cannot modify the structure of your query by inserting values into the structure, because at this point, it is clear that the values are values, and not actual structure code. And this is the reason, why parametrized queries are better than assuming that a certain function (or multiple functions, who knows) can handle ALL possible injection attacks any hacker (or cracker :/) genius might ever think of. Transferring the structure first and the values separately is like picking the hedgehog up, sitting down and having a happy hedgehog sitting on your lap ;)

Note that I am aware of the problem that this just guarantees that the structure of the query is transmitted properly. If the structure of your query allows arbitrary behaviour of your query via appropiate values ('if param1 == "admin" then eval param2' sort of rings), then attacks are possible again. However, these attacks are different from the sql structure injections, because these attacks do not aim to modify the structure of your query, but they rather aim to abuse the behaviour of your query. If you like to think in analogies, a structure injection would be like patching the linux kerlen somehow to get a backdoor going, while a value attack against a query with dynamic behaviour is more like replacing the 0-page with arbitray code doing nasty things.

Also note that not using parametrized queries with dynamic behaviour opens up two attack vectors: Attacking the structure of your query (bobby tables) and abusing the behaviour of your queries.

And also note, that securing a single query (or, all queries in your application) is also no substitute for actually securing your database server, because overall, your data is the goal of an attacker. The data is the money, not the application. Thus, your application is just another attack vector against your actual data base and your actual data, and your application consists of more or less attack vectors (which might be injection attacks, behaviour attacks, attacks via session hijacking, ...). Thus, you might use good queries, but if every user has weak passwords and also maximum privileges in the database, an attacker will ignore the application and probably try to attack the database directly.

Sorry for the wall of text, I just wanted to answer this and did not have that much time to make it shorter, HTH, Tetha


Also remember that parameterized queries are likely to be faster. They play better with the optimizer cache.


Hate to sound pedantic, but the Java example is a little off. java.sql.PreparedStatements have their parameters start at index 1, not 0. So the first "?" is parameter 1 and the second is parameter 2.

It's one of the few places in Java where indexing starts at 1 instead of 0.


... which probably is the reason he got it wrong. Consistency is a blessing.


This is similiar to the advice to NOT write your own encryption code: Odds are, somebody already wrote it 10 times better than you could.


Why did they not include PHP?


[deleted]


That's exactly what the page says NOT to do - attempt to escape input data. Instead, use MySQLi_STMT http://de2.php.net/manual/en/class.mysqli-stmt.php


also, if you're using PDO:

$stmt = $dbh->prepare( 'SELECT foo, bar from fooBar where foo = :fooValue' ); $stmt->bind( ':fooValue', $foo ); $stmt->execute();


If you're using PDO, it's usually going to be easier to do:

    $stmt = $dbh->prepare( 'SELECT foo, bar from fooBar where foo = ?' );
    $stmt->execute($foo);
The only case where using named parameters is really useful is when there are so many parameters you're reduced to counting question marks.


Someday every xkcd strip will have its own website.

  (that wouldn't be so bad actually)


So if you want comic-strip security, just go to this site. Perfect.


Why the downvotes? It's a superficial glance at SQL injection fixes with little explanation or education. Just like you can't put a novel in a comic-strip, you can't explain SQL injection security with a single block of code.


I was expecting some kind of PHP bullshit about replacing quotes.

The article turned out to be actually good. Cheers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: