It's a nice Quality of Life addition, that I would like to see with C++ as well.
I wish they don't implement f-strings and s-strings, at least for now. Even if they are more ergonomic than the `format!` and `String::from`, they hide a memory allocation which is not really indicated in a language like Rust and would be really weird in a context without an allocator. The only solution to this would be `const` evaluation of this, but that would restrict their use to `const` environments, so mostly unusable.
As a person involved in this space, if there's ever an RFC for full-on f-strings there's a good chance that I'd be the one who writes it, and I can tell you for certain that I would not make `f"foo"` incur an allocation; it would return a std::fmt::Arguments rather than a String.
I mean, that ship's basically sailed. People are too lazy even to take `std::string_view` instead of `std::string&`, trying to avoid pointless allocations and clones with any API but your own is almost a fool's errand at this point.
I think the issue with string_view is less laziness and more recency: most libraries still need to support C++ versions prior to C++17. Rust had the advantage that slices were there from day one.
I don't think it's just recency either, it's just incredibly easy to shoot yourself into the foot with string_view when the language has no way of checking that the pointed to memory is actually valid. People moved to smart pointers for good reasons and string_view just undoes all of that.
While using C++, adding something like gsl to the toolbox is worth gold, and also enabling bounds checking even in release (really, most of the time it hardly matters to the application users).
I think it would be possible in C++, but it would be something that is more generic than f-strings in Python. For example, `f"Name: {name}, age: {age}"` could maybe expand to `"Name: {}, age: {}", name, age`, so you still need to pass it to some variadic function that then can decide what to do with it, instead of only being able to format strings.
In theory f-strings could act as the format_args! macro, not the format! macro. While the latter produces a String, the former produces a std::fmt::Arguments, and does not need to allocate memory (it lives in core).
I know it would need to allocate memory, but from a user perspective that allocation isn't less visible when using f strings instead of the format macro.
In your example I would return a String and generate a warning, because the f of the string isn't used.
If it was part of Rust the language it can't create Strings because String isn't part of the language, nor even a "langitem". The core Rust language has no idea there's such a thing as a String
str (and thus &str) is part of the language, it's a built-in primitive type like i64 or bool, but String is just a struct the alloc crate brings into existence and so it may not be available.
Nice, I hope it will be improved to print the variable names as in Python f-string, something like "{x=}" generating "x=<value of x>", much better than having to write "x={x}" everywhere..
Unfortunately the `dbg!` macro doesn't play nice with format strings. There is no Rust equivalent of Python's (say) `print(f"Coordinates: {x=}, {y=}, {z=}")`. In other words `dbg!` can only print `{x=}`; it can't intersperse that with other text the way `format!` can.
This should not be difficult to do yourself. That is, you should be able to provide an improved format! with this feature as say renox::format! and provide popular format-using macros like renox::println! and renox::print! too based on the existing code, simply using the same license as Rust's standard library.
Only the macro knows the name of the variable, so you can't do this inside the formatter itself.
Generally if you want that, the derived Debug impls are good enough, and then you can print the debug form with: "{x:?}" and that can be made to pretty print (newlines, etc) with "{x:#?}". That will print the entire structure of a type and its associated values (can be quite verbose, though).
The Rust's implementation apparently only looks up variables from the enclosing scope. By contrast, Python's f-strings allow arbitrary expressions, and could potentially fall for the LDAP trick, or something similar. Same with ES6's backtick-strings.
I hope Rust will keep it simple and reliable, and won't allow calling functions in format strings.
Furthermore, format strings in Rust can't just be any `String` or `&str`, they must specifically be string literals, which means they must necessarily be fully determined at compile-time and there's no chance that user input can influence the format string.
> I hope Rust will keep it simple and reliable, and won't allow calling functions in format strings.
Yes, I think people are wary of allowing full expressions in format strings as in Python. That said, I might like to see a small extension to the current rules, so that in addition to identifiers you could also access struct members. I agree that format string captures shouldn't present the chance to run arbitrary code, so I wouldn't even extend this to array indexing (which is overloadable via the Index trait).
Pythons f-strings are still static formation, you just can move party of the code into the fstring. (Ignoring eval and similar.)
The scary part about the vulnerability was that string >inputs< could dynamically delegate it's content to some magic information fetching system which by default allows accessing remote content in a way which by design can lead to remote code execution.
I have no idea who though non static format strings are a good idea.
Or an information fetching system which can trigger remote code execution.
Or that this system doesn't require stric whitelisting.
I see. Imagine that a LdapAddress class helpfully overrides __str__() in the way it happened in Log4j. Then a completely innocuous f-string that just prints the value of such an object in a log line could trigger execution of that code. No need to even have an expression that calls anything explicitly.
If I saw __str__ making network requests, I’d consider it a strong anti-pattern too. Sure, if a library was doing it then I might not find out, but that still puts the issue squarely on the library doing questionable things, rather than the entire logging framework itself.
Technically Rust calls functions on the variable being printed. Those are defined as "{}" uses the fmt::Display trait and "{:?}" uses the fmt::Debug trait.
That depends on what Self contains. Using the Cell types and some of the other inner mutating types, it is possible to change the interior state of self even behind a shared reference.
Additionally, with unsafe it’s also possible. This is in response to “the only possible” statement.
Nice post. For some reason I can never remember the syntax for formatting and I have to look it up each time. I’ll bookmark your post for the next time I need this.
It might help you to remember that the syntax is `{identifier:flags}` where identifier being left blank signifies a positional argument to the formatting macro. Although if it's the specific flags you're having trouble remembering, I can't help you as I can't remember them either except `?` (debug print) and `#?` (pretty debug print).
A little hack: if you know the C-style format string you want to use, use that. The build will fail, of course, but Rust will tell you what to use instead.
Even more: those suggestions that you see when calling cargo/rustc? They are also applicable through a tooltip with rust-analyzer. rustc has json output where the substitutions are represented in a machine-friendly format. :)
No, this is functionality of format strings (interpreted by the formatting macros like println!, format! etc), not strings in general. {} was already special in format strings, there's no backward compatibility break.
You can't really pass untrusted input as a format string because they have to be available at compile time.
So, that rules out using format strings read from configuration text files. Not a big loss, IMO (you can have a DE.rs file with format strings for German, an IT.rs one for Italian, etc. and compile those in the binary), but I think some organizations will find that inconvenient.
Also, FTA: “Remember that Rust doesn't use any localization, so these outputs will always look the same.”
So, what do rust programs do for localization, e.g. to print the thousands separators users expect? Is there a library that gives you similar string interpolation, taking locale into account?
It’s a tough call between catering for computers by ignoring locale and for humans by applying it, but I think I would have chosen for this to be locale-sensitive, with support for forcing a standard locale.
- As pointed out by GP, `format!` is a macro and only works for literal format strings available at compile time. This allows the compiler to convert the format string to code _at compile time_.
- What you are talking about is a general string template/formatting engine to execute at runtime. Such a feature can easily be provided by external crates, because it would work at runtime and not require any particular interactions with the compiler.
Using generic printf/format like function for localization is a bad idea anyway. Proper localization libraries have features like handling of plurals. Nothing prevents a localization library from creating it's own formatting function, which it would have to do it anyway since the formatting syntax for each of commonly used localization file formats (po,xlif,ts, ...) likely differs from format function in the programming language X.
> So, that rules out using format strings read from configuration text files
Rust already used curlies to indicate places where values would be inserted in format! (including print!/println!) strings, so you couldn't use them from config files with unescaped curlies already, AFAIK.
There is a few libraries that can do some work around localization and formatting, though IIRC most commonly if you need something translated, you just turn it into a format argument as well. So instead of say "hello {name}", you write "{greeting} {name}" (though this doesn't cover all languages anyway).
edit: There is also the issue that on some platforms (Linux) locale has some soundness issues (not a fault of rust) and yet on others it can be hard to use locales properly (Windows can introduce some hard locale weirdness)
That doesn’t work if you have to change word order. A format string
{a} foo bar {b}
Might have to be translated as
Baz {b} {a} quux
(Also notice that, in the first sentence, you might want to capitalize {a}. That’s complicated in itself. Localization is a rat’s nest. I don’t think you can expect any automated system to do it perfectly)
I appreciate the effort that went into the blog post, but for this forum a link to the release notes [0] and/or the updated docs [1] would have been more relevant.
How did they stabilize this so fast? I feel like we first heard about this feature only a few months ago. Meanwhile I've been waiting for `let_chains` since what feels like the mid 90s...
The link below is the pull request for the feature. The RFC dates to 2019, and the tracking issue is slightly more than two years old. I'm guessing we only started hearing about it when it became close to stabilization (around November last year?).
This feature has been in the works for quite a while. I suspect you're thinking of a different but possibly-related change, where the 2021 edition has reserved the syntax for new string literals forms (beyond the existing b and r forms). In theory those new forms could be used to permit f-strings. But the new capability mentioned in the OP is the ability to implicitly capture names in format strings, which works with the existing formatting macros (such as println).
That said, Rust isn't a company, it's a volunteer organization, and features advance at the speed of enthusiasm. If there's a feature that someone wants to see, there's no use waiting around for it, someone's got to be the one to push it forward. :)
Really amazing summary! I wasn't aware of some of these specifiers. I was alway a pretty basic fmt user due to my ignorance. I should go revisit some manual formatting code. :)
Rust has a pretty strict backwards compatibility policy. Generally speaking breaking changes are not allowed, unless they fix a specific vulnerability or other critical issue. Language-level breaking changes can be done with the Edition mechanism. Three editions have been released so far, with three years between each of them.
Additionally, new releases are frequently tested with a tool called Crater, which basically builds and runs the tests for all publicly available Rust code. This helps massively in upholding backwards compatibility guarantees and in evaluating if a compatibility break is worth it.
You might be missing the fact that this applies only to format strings, not strings in general. The interpolation into curly brackets is done by the format! (or println!) macros, not a language feature of strings in general.
`{}` has always been special in format strings, so no real change there. If you've got a string that happens to contain "Hello {x}!" and you want to print it, you must template it in, as the format string needs to be a literal:
print!("{}", hello_string)
Or as others have pointed out, you can double-up the braces:
print!("Hello {{x}}!")
Then if you want to be cute, you could do something like this:
print!("Hello {x}!", x="{x}")
Note that the named parameter syntax isn't new -- what's new is capturing named parameters from a scope that's larger than the format macro:
let x = "{x}"
print!("Hello {x}!")
And also note that it's a compile-time error to have an unused format parameter in a string, so that last code example wouldn't have compiled at all in older versions of rust.
Like many Rust features, things that were not necessary for an initial release were left to future releases. I agree that struct fields would be a natural extension, although I would want it to end strictly there.
The guidelines for Show HN [1] say: "Show HN is for something you've made that other people can play with. HN users can try it out, give you feedback, and ask questions in the thread." Blog posts are specifically off topic.
You seem to have posted this link ten hours ago and then re-posted the same link as a Show HN two hours ago - perhaps to get around the duplicate link detection. Please don't do that.
This "modern" style of string formatting might seem pretty convenient and concise, but in my opinion, it has quite a
few drawbacks.
I'd prefer something that would maybe be less concise but easier to read and maintain, using the host language instead of a mini script using in-band signaling and its weird syntax.
I'm not sure what kind of alternative you are imagining. The style's I'm aware of are
"Hello {username}!"
"Hello $username!"
"Hello " . username . "!"
"Hello " + username + "!"
"Hello " << username << "!"
Of those, I find the first and second one by far the easiest to read, and the first one is easier to extend (as rust has done, e.g. "USD{total:>6}" for a left-padded number).
These are all forms of string interpolation and concatenation, but the 'go to' way is using string formatting, e.g. `printf("Hello %s", username)` (or `sprintf` to just return the resulting string instead of print it to stdout).
It doesn't add additional syntax (and therefore complexity, compiler steps, build time, etc) to the language; the only convenience applied here is varargs for the arguments besides the first.
A lot of languages started with basic (s)printf and added string interpolation later on, but that brings its own headaches. I'm thinking of PHP, where on the one hand you can do "hello $username", but if it's a property in an object you need to add additional syntax already - "hello {$user->name}".
Rust already does that. Just that sprintf("Hello %s", username) is written format!("Hello {}", username), which made it easy to now extend it to allow format!("Hello {username}"). Being a late comer made it easy to plan ahead for the future.
But really I didn't really mention it because printf syntax seems to me like exactly the "mini script using in-band signaling and its weird syntax" that GP complained about.
And it's important to note that format (and its relatives) is a macro and not a function. That means that whatever the input syntax is for format, whether you write format!("Hello {}", username) or format!("Hello {username}") it will compile as (something like) "Hello ".to_string() + username (as an aside, I really appreciate the fact that Rust macros are syntactically distinct entities so that you can tell at a glance whether you're doing a function call or something potentially strange is happening by way of a macro).
In contrast, the old-school stdio sprintf (and relatives) will interpret the format string at run time and then read a varargs list to do the interpolation which can lead to run-time errors and buffer-overflow vulnerabilities and so forth.
Definitely concur. In a well designed language, I'd expect it to be possible to express something a bit closer to C++'s ios but without the crazy verbosity, and without switching mode. It's not that general language features like this aren't possible, it's often that they simply haven't been discovered yet. User-defined literals are a recent concept that helps eliminate some crap in a related area.
Meanwhile, can't complain all that much if zero-cost features can be added to Rust to make it easier to market it to scripting folk. I think that can only be a good thing, even if the feature design is far from ideal.
Code injection is currently the #1 language-related security vulnerability [1][2] in memory-safe languages, which is why languages should be very careful when adding string interpolation as it may well be their most security-sensitive feature: "Templated string injection attack prevention will be of primary concern. The result of template processing can to be used in sensitive applications, such as database queries. Validation of templates and expression values prior to use can prevent catastrophic outcomes." [3].
The Rust println! macros interpret the format string at compile time. You cannot do:
let f = "{:.3}";
println!(f, 3.145);
because then the macro cannot be sure what the string will be at runtime and thus cannot be expanded.
Personally I dislike the whole idea of embedding sub-languages in strings inside host languages, but this is a lost cause frankly, and if one must do it, this is a pretty good way.
The issue is that the format string does not require the variables to be sanitised. If the resulting string is then used the input to some sensitive operation, the operation no longer knows which parts of the string originated in the format string and which came from variables.
Sure, Rust does not require that your arbitrary string is actually safe HTML, a valid SQL string, XML element, DNS name and IPv4 address.
As a result, if you take arbitrary input, and then run it as an SQL query who knows what will happen, despite it being a memory safe language.
On the other hand, (safe) Rust has strong type checking, so you can make types named SafeHTML, ValidSQLString, XMLelement and so on, with the properties you desire enforced. This does not prevent the same idiots who try to make an SQL query using format!() from doing so, as they won't use ValidSQLString anyway.
If all strings fill you with such fear, probably General Purpose Programming languages aren't for you, maybe you will feel safer in WUFFS. It looks at first glance as though WUFFS has strings, but it actually doesn't, they're just a human readable label for WUFFS non-OK statuses (e.g. errors like "#bad Huffman code"). You can rest assured that your WUFFS code for processing a PDF can't have SQL injection for two reasons: 1. WUFFS doesn't have any strings to inject SQL into, and 2. WUFFS can't talk to an SQL database at all. In exchange for this safety you give up the ability to do anything outside the tiny sphere of interest of the language.
Strings should fill you with fear, because the problem is not that they're supposed to be a major cause of vulnerabilities, or that they could be a major cause of vulnerabilities, but that they are a major cause of vulnerabilities. And the danger with this kind of string interpolation is that it makes using format! easier and so more attractive without making the creation of ValidSQLString easier.
I see where you are coming from but I disagree with your conclusions.
I think the number of developers who would not have insecurely built a SQL string from user input but for the language adding format strings is approximately zero.
By analogy, to me this argument seems like saying, "If a standard library contains a StringBuilder class that minimizes unnecessary allocations when constructing a String, it encourages developers to construct SQL strings manually and therefore makes them more likely to fall victim to SQL injection attacks, therefore we must not provide StringBuilder classes."
I don't imagine there are very many developers who are properly using parameterized queries only because it is inconvenient to concatenate strings.
> I think the number of developers who would not have insecurely built a SQL string from user input but for the language adding format strings is approximately zero.
I'm not saying that format strings necessarily make injection vulnerabilities more common, but rather that, when added nowadays, they should be designed in a way that makes such vulnerabilities less common. If you're adding a feature that makes a potentially dangerous operation more convenient, you should also make it safer so that the convenience will help pull developers toward the safe option.
IMHO, your point is valid: currently, it's not possible to enforce a constraint on `format!()` arguments, AFAIK, so we cannot say that all arguments to that format string, which will be used in that API, must implement SafeHTML trait or SafeSQL trait. You should create a ticket or RFC for the problem.
Exactly like, say, string concatenation. String concatenation does not sanitize its arguments and can be misused to generate YAML, SQL, HTML, TOML in exactly the same way std::fmt can be misused. std::fmt is the wrong place for domain specific logic, as it is the sort of tool you need to write those domain specific APIs. Which you should be using instead of 'print' for your sensitive operations.
But this feature need not necessarily be in std::fmt. It could be a more general language feature, with a pluggable formatting policy, which would then help improve the security of more specific libraries. That you shouldn't use unsafe string manipulation to do sensitive operations is obvious; the problem is that people do it anyway, which is why it is one of the most dangerous and common vulnerabilities.
Isn’t that‘s just an input sanitization problem? If you chuck raw user input into some sensitive piece of code, you‘ve lost - doesn‘t matter whether its passed as a string or not. I don’t see how Rust‘s comptime string formatting is a pitfall here?
But that would be true of any string operation, right? The correct solution is not to use strings with unknown content for sensitive operations (for example by having sensitive operations take a specially blessed string type).
In my opinion that's a library problem, not a language problem. Creating code snippets at runtime by haphazardly concatenating strings is always going to be error-prone. Some SQL libraries use a builder pattern like, say:
let result = Query::select().field("id").from("sometable").exec()?;
I think that's superior to adding the concept of "sanitized" vs "unsanitized" string to the language, given that keeping track of this attribute robustly is going to be a pain IMO.
IMHO, this is the language problem. Rust can enforce correct types for all arguments to a function, except when type is erased by use of a generic container, like `String`. It's possible to enforce a `ValidHtmlLString` as argument to a function, with automatic conversion of a `String` into `ValidHtmlString` at runtime, but it doesn't protect from unsafe HTML, so `makeHeader(title: ValidHtml) -> ValidHtml` will happily accept `format!("<h1>{unsafeHtml}</h1>")` as argument.
Maybe, we should create a specialized `format!()` macro, for example: `formatValidHtml!()`, `formatSafeHtml!()`, `formatAccessibleHtml!()`, or just a `formatRestricted!(ValidHtml + SafeHtml + AccessibleHtml, "<h1 role=\"banner\">{safeTitle}</h1>");`
> Constructing SQL queries or JSON expressions with templates is convenient, but is at risk for injection attacks. Improving mechanisms for constructing composite strings without similarly improving or enabling safer mechanisms for constructing queries would surely widen the attack surface.
The fact that in many languages (and ecosystem actually, because it's not directly a language issue) building insecure queries (or HTML, or anything) is the simple way, and doing thing right requires specific thoughts from the developer[1] is what leads to so many injection attacks in the wild.
I think this is mostly a cultural thing, and having being developed recently, way after the injections attacks have become ubiquitous, the Rust ecosystem has been focusing on providing better developer experience for the safe path than the vulnerable one. Diesel and SQLx use prepared statement by default, Serde serialize JSON without exposing strings at all to the developer, HTML templating libraries have sanitization built-in, etc.
[1]: this example is also taken from your link:
String query = "SELECT * FROM Person p where p.last_name = '$name'";
ResultSet rs = connection.createStatement().executeQuery(query);
vs
PreparedStatement ps = connection.prepareStatement("SELECT \* FROM Person p where p.last_name = ?");
ps.setString(1, name);
ResultSet resultSet = preparedStatement.executeQuery()
That's a valid point, but what you're saying is that the feature is only intended to help create log messages and the like, which raises the question of why add a feature with so little utility when it could have much greater utility? There's a missed opportunity here.
But let me push on it a little more. I think that what you're seeing isn't so much a culture thing but a small ecosystem thing. Imagine that Rust takes off and in ten years there are 1M professional Rust programmers who use the language because that's the one chosen by their employer. You'll not have one JSON library (or whatever other format will be used then) but 50 and so on, and most programmers will not be experienced ones but relative novices (to programming in general). How likely would it be for them to generate JSON with format! ? So this feature provides a better user experience for the less safe path. A feature should be designed with the next 20 years in mind.
So in terms of weight pulled by a feature it is not competing with say, the add-and-assign operator +=, or even with the AddAssign operator overload trait (which is a langitem), but only with some library feature like euclidean remainder on integers, which I hope you will agree is unlikely to be more commonly used than format interpolation.
I don't know why you think that serde_json isn't good enough and so 49 other JSON libraries will spring up, but I also don't know why you're sure a programmer will decide they ought to write format!("\"{json_string}\"") but you're confident they would never write format!("\"{}\"", json_string). People determined to shoot themselves in the foot are going to do it, we provide much better, simpler, clearer ways to do what they wanted to do, but in general purposes languages it will always be possible for them to point the gun at their feet, dismiss the warning "CAUTION! Do not shoot yourself in the foot", click the safety and pull the trigger.
Finally, Rust doesn't have to design all its features with 20 years of unknowable future implausibly considered in advance, because it has Editions. If you're correct and we regret providing format!() the Rust 2040 edition needn't provide this, and old code still works.
> I don't know why you think that serde_json isn't good enough and so 49 other JSON libraries will spring up
Because I have some decades of experience.
> I also don't know why you're sure a programmer will decide they ought to write format!("\"{json_string}\"") but you're confident they would never write format!("\"{}\"", json_string).
That's not my argument. I am not saying string formatting will make injection vulnerabilities more likely, but that it's a missed opportunity to make them less likely. You add a new feature because it's more convenient and attractive, and so you expect people to use it. If you know that feature touches on what's known to be one of the most dangerous aspects of programming, you might as well make it more convenient and safer, so that you attract programmers away from the less safe options and toward the safer ones.
Is this decades of experience with Rust (from 2015) or decades of experience with JSON (from 2001) ? Or just decades of experience making implausible predictions?
Here's how you make a JSON string in serde_rust here in 2022:
let s = Value::String(myString);
Here's how you propose programmers will erroneously try to make a JSON string in 2042 abusing the format macro:
let s = format!("\"{myString}\"");
Here's how I think programmers will successfully make JSON strings in 2042 using serde_json which is obviously the right tool for the job:
Other typed languages, including those far more popular than Rust, have had libraries like serde — which are obviously the right tool for the job — for many years, and yet if 0.1% of their programmers make a mistake, that's enough to make it one of the most common and dangerous vulnerabilities. We know some programmers make that mistake because that's what security research shows, which is why the language and type system should reduce its chances. Are you saying that you're not worried about that 0.1% because you can be certain even one programmer in a thousand won't make such mistakes, or because you don't expect Rust to be popular enough for that number to matter?
So, what would you have them do? Do you even have an example of how you think general purpose formatters should be "safe" under your model of the world?
You insist it should "validate" the strings but it's a general purpose formatter, there isn't anything to validate that isn't already mandatory in the language.
Yes, if I take the SQL formatter and I use it to make email addresses that's more likely to incur dangerous vulnerabilities. This is not a defect in the SQL formatter, I am using the wrong tools.
> Are you saying that you're not worried about that 0.1% because you can be certain even one programmer in a thousand won't make such mistakes, or because you don't expect Rust to be popular enough for that number to matter?
I can't do anything about the fact that people will make grave logical errors when programming, beyond advocate for testing and code review which might catch those errors. I suspect your 0.1% figure is pulled out of your backside, but, sure, somebody will get it wrong.
General purpose languages shouldn't be riddled with foot guns, but there's a difference between a language not having foot guns and not having any guns at all out of fear that somebody might shoot themselves despite the fact the gun was locked away, the ammo was locked away, and they had taken training in "How to use guns safely" before being given the keys.
Again, if you fear strings you can use special-purpose languages which don't have any strings so that you can't possibly make this mistake. WUFFS is not a language for babies, with training wheels, it's a language for experts who know they don't need stuff like strings in the domain they're working on.
> So, what would you have them do? Do you even have an example of how you think general purpose formatters should be "safe" under your model of the world?
Yes. In my very first comment I posted a link to Java's upcoming feature, which uses the type system to ensure proper validation in a general-purpose string templating mechanism. Here it is again: https://openjdk.java.net/jeps/8273943. As I also said in my first comment, our security experts were so concerned about this problem, which was empirically found to be one of the most dangerous programming operations in recent years (famously so among those interested in security), that they wouldn't let us add it to Java without a solution to the security problem
> This is not a defect in the SQL formatter, I am using the wrong tools.
But 1. people do use the wrong tools, which is why it's such a common and dangerous vulnerability, and 2. a typed language certainly can prevent that, as in my link.
> I can't do anything about the fact that people will make grave logical errors when programming
That's a strange position from a Rust user, especially as in this case there certainly is something the language can do.
> I suspect your 0.1% figure is pulled out of your backside, but, sure, somebody will get it wrong.
Pretty much all top security vulnerabilities lists list this problem near the top, and some languages are so popular (many millions of professional devs) that even 0.1% of their users are sufficient to make this problem a common one, deserving of its spot. So I figure that 0.1% is about the right number to make this a very common problem.
> Again, if you fear strings you can use special-purpose languages which don't have any strings so that you can't possibly make this mistake.
Or use Java's upcoming string templates. Or, if you prefer less popular languages, Scala, which uses a similar technique.
But that JEP doesn't actually provide the safe general purpose formatter you're insisting Rust should have built here.
The exact same programmer who you insist will write
format!("\"{myString}\""); // in Rust
will also write:
CONCAT."\"\{myString}\""; // in Java with this JEP
The JEP argues that it'll be all OK when Steiner attacks^W^W^W so long as you only use the potentially tainted "strings" via an API which doesn't allow general string objects. But, that's also the exact situation you've dismissed as unable to prevent abuse in Rust.
Java can't stop you writing your supposedly "JSON" data made with CONCAT to a file or over a network socket, and Rust can't stop you writing some made with format! either, it's just data, who knows why you wanted to write this or that gibberish?
According to you we should expect 0.1% of the "safe" Java using these templates to be vulnerable.
The key is understanding why programmers make security mistakes. As you can imagine, this is an active area of research that, for obvious reasons, is of much interest for language and API designers, but one of the causes that is currently believed to be a significant one is that programmers reach for the wrong tool for the job -- security wise -- not because they're idiots, but because many programmers don't understand the security implications of something that they believe is completely innocent, and so they reach for the more convenient option, which might be less safe.
So while preventing someone from constructing JSON (which can be output as a string) is not always as fool-proof as preventing someone doing the same with SQL (as the driver API will simply not offer an option that takes a string) the reason such a feature is added in the first place is because it is easier, and that is what makes it more attractive. A more convenient mechanism attracts people to use it.
Some things, like SQL and JSON, are often convenient to create using templates. Using something like CONCAT."""{"x": "\{x}", "y": "\{y}"}""" is certainly no more convenient, and so no more attractive, than writing JSON."{x: \{x}, y: \{y}}", but it is more convenient in some situations than using an API that requires defining or generating a type for the object in the source language. So while safe JSON libraries in Java and Rust exist today without a built-in template mechanism, being able to offer them with that mechanism will make unsafe options less tempting by comparison.
That is why experienced security experts recommend that languages do not add a string templating mechanism that is more convenient, and so potentially more attractive, than safe options for security-sensitive uses. Their general rule is, "when possible, don't make programmers jump through more hoops to do something secure than something insecure." You're free to find this advice misguided, but I wonder if the people who added this feature to Rust consulted with security experts before adding it. I don't think I would have been aware how sensitive this feature is if it weren't for the advice of security experts.
Maybe you just aren't familiar with serde_json again:
json!({"x": x, "y": y})
... is valid and even idiomatic Rust today to express a JSON object with two entries named x and y containing whatever is in the variables x and y, or, if that doesn't make any sense (e.g. variable x is an operating system Mutex) it's a compile time type mismatch.
That's much easier than the complicated dance envision in the JEP and which you insist will be "convenient" and dissuade Java programmers from choosing the easy option, yet of course you always get the intended JSON out, even if say x is a malicious input intended to trip up naive JSON encoding.
That json! macro does the right thing. What the Java feature does is give that exact capability to any library that can benefit from templates.
The entire point of my comment was that, surprising as it might seem to some, templating is now known to be a particularly dangerous area -- quite possibly the most sensitive aspect of language and API design after buffer overflows -- and that's why templating features require a security review.
If Rust's designers' answer is that their security analysis has led them to the conclusion that the right stance against code injection is for template APIs to role their own templating macros from scratch rather than use a higher-level templating mechanism -- then they're doing what I suggested, and macros are their mechanism that corresponds to Java's pluggable templating design. If they did not consult with security experts on their string formatting feature, I suggest they do so. Perhaps all that's needed for Rust is to include components in the standard library that would help library authors write correct and secure templating macros.
Now that it has been demonstrated that your original argument has no legs to stand on you retreat to some hypothetical what-if involving popularity and two extra decades. So what if someone manages to argue against that? Will you just add 100+ million users and 50 years?
Of course culture matters. It isn’t just a side-effect of popularity. Would Scala have the same culture as Java if it became simiarily popular? I doubt it.
Rust has a culture—and it comes from a wider culture of the same ilk—where such messy shortcuts are not taken; instead “fancy language features” (much to some people’s chagrin…) like compile-time evaluation are used to make safer user interfaces for programmers. And that makes for less catastrophically buggy software.
But here both Scala and Java have opted for a similar compile-time evaluation strategy that makes use of the type system to reduce injection vulnerabilities, whereas Rust has opted for the untyped (or "stringly-typed") messy shortcut of "people shouldn't do that."
> That's a valid point, but what you're saying is that the feature is only intended to help create log messages and the like,
Yes exactly.
> which raises the question of why add a feature with so little utility
It's indeed not the most important feature ever, but after running a quick `rg "format!" | wc -l` in my Rust directories (for both work and hobby projects), it found 2797 occurrences, which isn't nothing.
> when it could have much greater utility? There's a missed opportunity here.
I bet most people in the Rust team are simply not aware of the recent Java developments on that front. Hopefully having an openjdk developer sharing insight and documentation on that topic on public forums can foster cross-pollination on that topic :).
The Rust formatting macros depend on string literals because the macro expansion necessarily happens at compile-time, which means that it's not possible to override the actual format string at runtime, nor is recursive resolution possible.
It's obviously still possible to implement the Display trait for a type in a way which makes it susceptible to code execution attacks, but that doesn't really have anything to do with string formatting.
I don’t see why this development makes this more likely to happen compared to string concatenation.
The log4j fiasco didn’t happen because of Java (wink) but because of a plain invulnerable feature. I don’t think Rust makes that more or less likely to happen.
In reality, sqlx [1], probably the most popular SQL library for Rust, has a query! format string that ensures that all parameters are properly escaped. As far as I can tell, you can't use the new format string support to create SQL queries with that macro yet, so there is no security problem. When that's fixed and query! is updated for the new format string support, I'm certain that they will escape their parameters, so there will be no security problem then either.
Because all format strings are in macro context, where the macro has full control over what to do with all substituted parameters, Rust already has sanitized string interpolation. In terms of that JEP, the macro invocation is the policy object.
How is that related? The new format strings must be static literals because they're decoded at compile time. It is impossible to use them for any kind of code injection.
It's not enough that they're static literals. The replaced variables aren't. The language should provide a way for the format string to enforce that the expressions are sanitised.
The compiler doesn’t know if the string being templated is SQL, HTML, YAML, JSON, or something else, so it doesn’t know how to sanitize the string.
IMHO generating code of any sort through string manipulation is a code smell, even if you ignore the security issues. There are better options like parameterized queries for SQL, macros for HTML, serde_json::json! for JSON, etc.
> The compiler doesn’t know if the string being templated is SQL, HTML, YAML, JSON, or something else, so it doesn’t know how to sanitize the string.
That is the problem, and why security experts wouldn't let us add string interpolation to Java until we had a way to require the format string to say how it will be used and enforce proper validation. [1]
Even if programmers should be more careful, the fact remains that this is the #1 security vulnerability caused by language features. The language should, if possible, make it easier for developers to avoid the mistake rather than make it easier for them to make it.
Looks like Java wants to skip format strings and jump straight to domain specific templating. I'm dubious doing this at the language level is a good idea. Picking a domain I'm familiar with, how do we handle this?
SQL."SELECT * FROM orders WHERE \{mcol} LIKE '%' || \{qry} || '%'"
How does the the templating engine know that 'mcol' needs to be quoted as an SQL identifier using ", and that qry either needs to be quoted as an SQL string using ' or a replacement made to use a bind variable? And since 'qry' is being used in a LIKE expression, do any % or ? characters in it need to be escaped, or should they be passed through? I guess you need to force hints in the template (force, because we are trying stop people being able to write buggy code)
SQL."SELECT * FROM orders WHERE \{mcol:sql_id} LIKE '%' || \{qry:sql_likestr} || '%'"
But none of that stops someone just doing this, which is the most common form of the bug:
"SELECT * FROM orders WHERE " + mcol + " LIKE '%" + qry + "%'"
To stop people throwing arbitrary strings at a database connection, risking SQL injection attacks and other buggy behavior, you just need to stop people throwing strings at a database connection. Instead having the driver only accept an object. Which does not rely on a templating engine at all. std::fmt does not encourage or make it easier to write code with SQL injection bugs, and if you want to stop them, the only way is to stop the driver accepting arbitrary strings and instead force developers to use an API to correctly generate their SQL. Which is exactly what most ORMs do, although they generally allow people to force arbitrary strings in for convenience or non-standard SQL stanzas.
Interesting, so to make sure I understand, if I were writing an SQL library I could require (at the type check level) that a query I was passed was constructed with a specific formatter, and that formatter would encode any substrings that I templated into it in a way appropriate for SQL?
I wonder how this would handle something like HTML injection, where the desired encoding can change within the same string depending on whether you’re in an attribute or a text block?
> Interesting, so to make sure I understand, if I were writing an SQL library I could require (at the type check level) that a query I was passed was constructed with a specific formatter, and that formatter would encode any substrings that I templated into it in a way appropriate for SQL?
Yes. That's what Java's upcoming templated strings feature does. The library provides a templating policy as the way to create the appropriate type it requires.
> I wonder how this would handle something like HTML injection, where the desired encoding can change within the same string depending on whether you’re in an attribute or a text block?
The client library decides how to interpret and format the string in its policy.
But there is nothing enforcing the use of the correct template policy. I can still use the wrong CONCAT to build a SQL query. It certainly makes sense to make string interpolation extensible but doesn't actually eliminate the potential for doing things in an insecure way.
We can't change the API of an existing library that already accepts strings for compatibility reasons (although we can make the safer API more attractive by making it more pleasant to use), but a new API can certainly enforce use of the correct policy by not accepting string parameters, only types that are constructed with the right policy, as the policy determines the type of the template and so the type system enforces the correct usage.
That is exactly my point. The way you can eliminate potential injection attacks is by not having libraries which accepts String and treat it as a sanitized input. But once you do that then there is no real reason to be wary of format strings. You can use format strings to build strings, but anything that needs a string that is sanitized against some potential injection attack only accepts a type that represents that invariant.
> But once you do that then there is no real reason to be wary of format strings.
But then format strings don't help you much as you can't use them to create the sanitised types. You can't sanitise the string after it's been constructed.
Right, but there are many, many use cases for creating string in which you don't need some specific sanitization. Most of the time when I am using a format string I am not worried about sanitization.
You can construct a SQL client which doesn’t accept normal strings in either language but how would Haskell be different otherwise? It can’t magically tell what you’re going to do with a string after formatting it.
Well, you can use strong types that separate e.g. String/SQLQuery/XMLBlob from one another, and then have very narrow conversion routines that are heavily audited to handle them safely. You can then use type directed programming to make conversions a little simpler (sometimes). This can give the appearance of things being handled "by the compiler", e.g. because typeclass resolution can be used to pick an appropriate instance. But it's not magical or anything. Maybe that's basically what you meant anyway.
Any statically typed language can do the first part, the second part is a tad less trivial (and arguably a footgun in some scenarios, but that's another discussion.)
I think the point is that they may well be used like that. Maybe not explicitly, but I bet you that any language has a user that uses string formatting to build, say, an SQL query that is then passed as a string to the SQL engine.
I'm not feeling "attacked". I'm trying to help you understand that the argument "feature X might be used in a way that could lead to a incorrect behavior" on its own isn't enough to warrant exclusion of that feature, because that same argument can be used against basically all features of a language.
There is a legitimate point that some programmers might misuse format strings and that by using named capturing in the format string, this misuse would be slightly more ergonomic. Nevertheless, legitimate uses of format strings also become more ergonomic, and legitimate uses would seem to far out-strip illegitimate uses.
Furthermore, within the Rust ecosystem, the tools to build SQL queries, or serialize/deserialize JSON generally provide interfaces that are more ergonomic than manually using format strings, so a programmer has little incentive to do so as it would constitute more work on their part.
What I particularly like about Pythons new(ish) “f strings” is that, firstly it’s opt in, you have to mark your strings with the “f” prefix. Secondly it is converted to opcodes at compile time, so f strings can’t be modified at runtime unlike standard “old fashioned” Python string interpolation.
As someone noted in another comment, this with Rust is effectively opt in with the println! and format! macros. Does it converted to machine code at compile time?
Given that the logic for handling these strings is in the println macro, does that not limit these strings to 'being displayed'?
I don't know enough about rust but I would imagine it is hard to println into a SQL statement. Hence it is not 'natural' to use this feature to build commands to be executed elsewhere.
Although I strongly disagree with pron's point, format strings aren't limited to printing them to stdout. There's format!(), for example, which returns a String for further use, and you could even write a sql_query!() macro, which creates an SQL query from a format string:
let query = sql_query!("SELECT * FROM users WHERE id = {}", user_id);
query.execute();
Of course, this isn't a valid argument against format strings, because you could just as well write a function which does the same thing without language-level format strings (although it would be a lot less flexible than format strings):
let query = sql_query("SELECT * FROM users WHERE id = @PLACEHOLDER@", user_id);
query.execute();
EDIT: Actually, it would be possible to write a Rust macro sql_query! which takes a format string and creates a sanitized SQL query from it. That's because the actual string interpolation isn't built into the language, only the format_args! helper macro [1]. This macro doesn't insert the arguments into the format string, it only associates the placeholders with the arguments and returns a struct which can be used to create the output string – or a sanitized SQL query.
> Of course, this isn't a valid argument against format strings, because you could just as well write a function which does the same thing without language-level format strings
In that case, you should ask yourself what's the point of format strings at all? The answer is that they're more attractive because using them is more convenient. You want to make the more attractive feature the safer feature to draw people away from danger, not toward it.
It seems perverse to make a ubiquitous operation (formatting a string) intentionally hard just because some people may use it for inappropriate purposes such as building SQL queries from untrusted input. Sure, we should have ergonomic easy of building prepared statements and such but that doesn't seem like something that belongs in the standard library. Or maybe it does. But either way there are many, many legitimate use cases for string formatting.
But you don't need to make it harder in order to make it safer. You just need to think more about the security concerns and come up with solutions, as others have.
> You want to make the more attractive feature the safer feature to draw people away from danger, not toward it.
There's an entire world of software which does something else than creating SQL queries or HTML pages. Rust is a general purpose language, not some niche DSL. I use format strings all the time in my Rust code, yet I've never been in a situation where those strings should or could have been sanitized.
Implementing string formatting the right way can actually solve this problem, of course. In modern JS/TS, for example, string formatting isn't just used for constructing strings, and it can limit input types as well.
This means you can construct some kind of prepared statement or similar object with a string format, and it will force the correct handling of potential injections.
Of course you can still just format a string and shove it in, but I'd argue that a good SQL library should simply not accept pure string queries and require the format that guarantees injection-safety.
I wish they don't implement f-strings and s-strings, at least for now. Even if they are more ergonomic than the `format!` and `String::from`, they hide a memory allocation which is not really indicated in a language like Rust and would be really weird in a context without an allocator. The only solution to this would be `const` evaluation of this, but that would restrict their use to `const` environments, so mostly unusable.