Are we really talking about type checking or the larger circle of validation (of which type checking is just a small part)?
( Bugs found by unit tests ( ) Bugs found by input validation )
Or in other words...
String s = "lastname'; drop table user--";
...is still a perfectly acceptable string.
It seems to me that type checking is the simplest form of validation (are you an int, are you a String) and nothing more. It wont tell you if that int is positive or negative or if that string is an email.
When dealing with either static/dynamic languages I think more unit tests should be spent validating.
No, this is just common ignorance of static typing. That string is a perfectly acceptable String. But it isn't a perfectly acceptable Query, and you can't pass a String to the database, only a Query. In order to turn a String into a Query, it has to be passed to a function that escapes problem characters safely. You need to use such a function regardless of dynamic vs static typing, but static typing enforces that you always use that function, and can't forget and accidently submit an unescaped string to the database.
I don't think anyone is making that claim. You can obviously do runtime inspections of types before building the query, or rely on the "runtime inspection" of getting an exception when the unescaped String doesn't support the Query method being used.
It's entirely possible to do this without static typing. It's impossible to guarantee that all database calls use a Query instead of a String without running the code in some form.
That is why you make them separate. You only really start taking advantage of the type system after you learn to encode system invariants and rules into the type system.
He is making the point that you can create a separation between Query and string just as easily in a dynamic language; it just gets caught at runtime (preferably during testing) rather than compile time.
So when the Query is sent to the database MySQL actually receives a Query object and then parses that Query object?
...oh wait, right before it is sent to mysql it is turned back into a string again.
My point is that static typing doesn't help you do anything other than verify that the objects being passed are of a particular type. I'm not saying static typing is bad or good I'm just saying that type checking itself is NEARLY USELESS unless you include some sort of validation.
Query q = new Query("select * from users where id = (id)");
QueryParam qp = new QueryParam("(id)",25);
q.addParam(qp);
ResultSet rs = q.execute();
public class Query {
public ResultSet execute() {
for(QueryParam qp : this.getQueryParams()) {
this.getSql().replace(qp.getId(),qp.getValue());
}
super.execute(sql);
}
}
Yes, you can write a Query type that is vulnerable to SQL injection, if you want to.
But if you write a secure version, you only have to write it once. You only have to maintain it in one place. You only need to test it in one place. And if you forget to use your secure Query type, anywhere else in your code, the compiler will yell at you. It's a significant advantage.
This is easier to see in a language with a rich, flexible and expressive type system than it is in Java. The writer of the original article used Haskell for a reason.
> But if you write a secure version, you only have to write it once.
> You only have to maintain it in one place.
> You only need to test it in one place.
Again, so this cannot be done in a dynamic language? If it can be done, why bring them up?
> And if you forget to use your secure Query type, anywhere else in your code, the compiler will yell at you. It's a significant advantage.
The only thing the compiler will yell at you is if you passed a type that is not of a Query type. The compiler will not yell at you for getting the current session directly or creating your own jdbc driver for that matter.
> "The only thing the compiler will yell at you is if you passed a type that is not of a Query type. The compiler will not yell at you for getting the current session directly or creating your own jdbc driver for that matter."
In Haskell, I'd have a module, Database, that held all my db code. That module would export functions something like
(read those as "query is a function that takes a Query and returns a DBResult.")
In the rest of my program, those functions would be the only way to talk to the database. There's your guarantee.
Could I, rather than using my nice database module, instead drop into IO and write code to do something vicious? Surely. But now we've moved beyond bugs and into active malice.
> "Again, so this cannot be done in a dynamic language? If it can be done, why bring them up?"
It's harder. With duck typing, if it looks like a Query it is a Query, no? Even if it drops your table. I'm no expert on dynamic languages, and I'd believe that there are sophisticated object hierarchies that can do these things (at runtime...), but the original article is empirical evidence that real projects get this wrong.
Really, though, try a language with a modern type system and see for yourself. I know we Haskell users sound like zealots, but the difference between the Java and Haskell type systems truly is night and day.
> In the rest of my program, those functions would be the only way to talk to the database. There's your guarantee.
Honest question. Take these pseudo sql calls:
//Bad Person
username = "lastname'; drop table user--"
//Good Programmer
query = "select * from users where name like %[username]%";
input = {"username":"frank"};
result = execute(query,input);
//Bad Programmer
query = "select * from users where name like '%"+username+"%'";
result = execute(query, {});
vs
//Bad Person
String username = "lastname'; drop table user--"
//Good Programmer
Query q = new Query("select * from users where name like %[username]%");
Input input = new Input(username);
q.addInput(input);
Result r = q.execute();
//Bad Programmer
Query q = new Query("select * from users where name like '%"+username+"%'");
Result r = q.execute();
Could you solve this better using a static system? Right now I see no difference between the good and bad
> "Right now I see no difference between the good and bad"
You're building a new query string each time you create a Query object, and concatenating the string onto that. With that approach, each time you build a Query object you have a fresh opportunity to mess up. So you're right that there's no difference between your to cases.
Let's drop my off-the-cuff example and look at how a real library, postgresql-simple, handles the issue:
query conn "select x from users where name like ?" (Only username)
Do you see the difference? Instead of sticking the username into the SQL query by hand, we use a query function that takes three parameters: a database handle, a Query with a '?' character, and a thing you want to use in the query. The function takes care of properly escaping the username during interpolation. (The "Only" is just a wrapper to make sure we're handing in a datatype we can query with.)
Notice that because Query is a distinct type from String, just doing
query conn ("select x from userse where name like" ++ username)
doesn't typecheck. Bad Programmer would have a hard time screwing this up.
Query isn't a String. String interpolation[^1] would de-sugar to something like this:
query conn ("select x from users where name like " ++ username)
++ is a function that expects two Strings. The "select..." stuff isn't a String, quotation marks not withstanding. When we try to hand a Query to ++, the compiler screams bloody murder.
Longer explanation: I suspect the syntax is a bit confusing, since while I keep saying "select ..." is a Query, it looks an awful lot like a String. Here's what's going on. Haskell has a typeclass called IsString. Query is an instance of IsString, as is String.[^2]
Quoted text can represent any instance of IsString. So the compiler sees a function that expects a Query and an IsString of some sort, and through the magic of type inference, it decides that the IsString must be a Query.[^3] And when you try to use a function that concatenates Strings on that Query, it knows that Something's Not Right.
[1]: Haskell doesn't have string interpolation. But if it did, this is how it would work.
[2]: And other instances as well. postgresql-simple actually uses ByteStrings, not Strings, for performance.
[3]: I've fuzzed the evaluation order a bit, for simplicity. In practice the first error reported might be that you've passed 2 arguments to a function that expects 3.
In your static example, "Bad Programmer" would be fine, because the Query constructor does escaping. You could do this in a dynamically typed language too, but notice that you don't, you just use strings. The difference between static and dynamic is that with static typing, you can't compile your incorrect program. With dynamic typing, you find out at run time that you forgot to escape the string (turning it into a Query), when that code actually runs.
I'm admittedly ignorant of any type system newer than C++. In a modern static language, how would you design Query such that any SQL injection is caught at compile-time?
On the dynamic side, Rails (in Ruby) doesn't currently catch SQL injections, but it does catch HTML-escaping injections. It (roughly) tags all strings as tainted by default, and when you send them to the browser, it escapes them. If you want to send literal ampersands, angle brackets, etc., you have to mark them as explicitly safe. Since most of your literal HTML is generated by templates (which themselves distinguish variables from static HTML), you end up with run-time safety unless you actively try to break out of it.
If he builds the final query string before giving it to Query, his valid query parts that rely on not being escaped would also be escaped.
To make a safe query type you'd have to provide non-string primitives to build one, if I understand correctly. You can't allow just a full query string (with all of the injections already in place) to be converted to a Query type (as in his Bad Programmer example).
No, I am saying your strawman is a strawman. You were claiming static typing doesn't help since a string can contain a bad query. Now you are suggesting that you wouldn't write such code in a dynamically typed language anyways? Then why did you offer it as an example of how static typing doesn't help.
Of course you can make sure you never actually run the bad query with dynamic typing. I assumed it was obvious when talking about static typing that the difference would be compile time vs run time. With a statically typed language, when you make the error, you get told about it by the compiler. With a dynamically typed language, you find out about the error later, when that code actually runs.
> Now you are suggesting that you wouldn't write such code in a dynamically typed language anyways?
I never suggested that...?
> With a statically typed language, when you make the error, you get told about it by the compiler. With a dynamically typed language, you find out about the error later, when that code actually runs.
> static typing enforces that you always use that function, and can't forget and accidently submit an unescaped string to the database.
So you are really just saying "static typing requires you to use static typing". This has nothing to do with actually writing good code or having any sort of validation. Just that the compiler tells you that you are sending the wrong type... that's what we are arguing about?
Look my whole point is static typing by itself gives you next to nothing (See my code example below) without some form of validation beyond static typing. That obviously holds true to dynamic typing as well... I'm not even sure what we are arguing about.
> static typing by itself gives you next to nothing ... without some form of validation beyond static typing
You mean like the validation you get when you compile a program?
Yes, a statically typed program that never gets checked is strictly worse than a dynamic program, but that's the whole point of the type system: you can check it. This argument is a strawman because nearly every language with a static type system includes a validation step (maybe Dart is an up-and-coming counterexample).
Yes, you did suggest that. I am not sure how this level of cognitive dissonance is possible. What possible purpose does your example serve then if it doesn't impart any sort of meaning at all?
>I'm not even sure what we are arguing about
Clearly. Please, take the time to think through the subject and present a clear point that you will not later pretend you didn't make.
> What possible purpose does your example serve then if it doesn't impart any sort of meaning at all?
The example shows that static typing doesn't do anything more than what it says. It doesn't solve problems/fix bugs or provide some magical insight to the system as you seem to believe.
I'm genuinely curious as to your position and why you are so... clearly opinionated. I'll take the "idiot banner" for today. Please provide me with your insight as to what the fundamental argument (and why you feel so strongly about it) really is.
I think there are two points that are being mostly missed in this discussion:
A) How to define the Query type such that it would convert injection bugs to type-checking errors (No, it cannot simply be a function from a full String containing a query to a Query type, as you demonstrated).
B) Sure, you could define the same Query primitives to do the same in a dynamically typed language. The main difference is that the type-checking errors due to incorrect use of the query primitives would be caught at runtime.
As for A, you would want to define primitives that build query strings safely. That is: Query(unsafe_string_here) wouldn't work. Either because it allows too much (still can inject the original string) or disallows too much (escapes everything, makes the query invalid).
Instead, you would define "select", "update" and other querying primitives as non-string primitives you can use to build queries. You would basically mirror SQL or the query language you use into non-string primitives that allow constructing safe queries.
B) Yes, you could do this with dynamically typed languages. Right before executing your query you would need to do an isinstance() check or some other way to validate that the query was generated using the safe machinery. This means no duck typing. If you allow other, unsafe implementations of the query type here, you get the unsafety back.
Again, provide a clear point if you want me to argue against it. Don't just say "I am going to keep making weird nonsensical posts and then pretend I didn't say what I clearly said and then blame you for replying" and expect me to grace you with some magical "insight".
( Bugs found by unit tests ( ) Bugs found by input validation )
Or in other words...
String s = "lastname'; drop table user--";
...is still a perfectly acceptable string.
It seems to me that type checking is the simplest form of validation (are you an int, are you a String) and nothing more. It wont tell you if that int is positive or negative or if that string is an email.
When dealing with either static/dynamic languages I think more unit tests should be spent validating.