Hacker News new | past | comments | ask | show | jobs | submit login

At some point I realized sanitizing your input for SQL is one of the dumbest ideas I've ever heard: you can trivially avoid the issue by parameterizing the variable inputs, and explicitly telling the DB that this chunk ain't SQL. Sanitizing is an absurd hack in any situation, in the fashion that parsing html with regex is; But especially with generating SQL, any notion of "sanitizing your input" should be considered a bug (perhaps a wetware one)[1]

Afaict the only reason this idea ever became widespread was because PHP's native MySQL driver lacked support for it for ages (I guess till PHP5? Been years since I looked into this)

[1] Obviously I'm not talking about validation eg validating a phone number; anytime you're trying to embed strings within a program but have to be worry that your string doesn't get read as part of the program, something has gone horribly wrong




There’s a whole class of actions and values that you can’t parameterize, like anything resembling DDL queries. I have worked on several products where these types of queries (like creating tables dynamically) have been an absolute requirement. Sanitizing input is still very much a necessity there.


I'm not sure I see anything in DDLs that denies parameterization of user input, except that support might be currently non-existent; but if the support is missing, it doesn't make sanitization a correct solution... it makes it a necessary hack to work around a lacking communication protocol

Ofc if the user is putting in full custom queries, parameterization doesn't help anything, but I'm not sure what you're even sanitizing at that point (semantics? Afaik sanitization refers to syntax cleanup; It'd be even dumber to avoid people dropping your important tables by parsing a SQL string, instead of making use of the much more reliable DB permission systems...)

It's absurd: A constant source of vulnerability in websites across the world is a syntax error.

It's the definition of accidental complexity -- A program with a valid semantic understanding of what it has is somehow totally incapable of passing on that wisdom except by code generation; and this is when it is free of the burden of any real constraints beyond correctness -- its certainly not a fast method of serializing/deserializing, it certainly doesn't minimize network-size, it certainly isn't a safe or composable API... it's just a dumb, horrible thing and we pretend its all ok because you just need to sanitize your input.

I wouldn't really care if sanitization was just considered a necessary hack, especially in certain corner-cases; but its not. It's thought of as a good thing to do. In fact, a best practice.


MySQL didn't support prepared statements until the end of 2004 with version 4.1. The LAMP craze was well underway by then. While it wouldn't be surprising to hear that it took driver developers some time to catch up to this, it's not really fair to lay the blame at their feet.


Tbh I mostly blame the users

SQL sanitization should never have been recommended to anyone else with a straight face (tears would be appropriate, for the sorry state of the world)

that it became normalized to the extent of becoming a happily provided best practice is horrific

But I doubt anyone even took notice when support properly landed, content in their ways of string-building and worrying about " replaced with \";




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: