"You may not use data derived from outside your program to affect something else outside your program--at least, not by accident. All command line arguments, environment variables, locale information (see perllocale), results of certain system calls (readdir(), readlink(), the variable of shmread(), the messages returned by msgrcv(), the password, gcos and shell fields returned by the getpwxxx() calls), and all file input are marked as "tainted"."
Unrelated rant: Sometime recently mobile chrome omits any part of a url after a # when you copy/share the url. Grrr.
I'm not aware of a similar feature built into other languages, but most of it could be easily achieved with almost any type system.
Just have two separate types, e.g. UnsafeString and regular String, and some kind of `convert` function that takes a validation function as an argument. You'd get compile-time checking that way.
People don't tend to use such things in practice though, and you would also have to ban a portion of most language's standard libraries to enforce it in practice (because they already return regular strings for inputs).
This can be a useful pattern - though I’ve never seen it used for this usecase. A similar one is templating languages (like Jinja) where you need to wrap strings if you want to send HTML to your templates with them being escaped on render.
We use something similar where we have a BadNumber class in our code (python). Any operation with another number will also create a BadNumber. It allows us to make sure that these tainted numbers are always obvious.
You could use an import hook in python, then create a whitelist of APIs that will be mapped to UnsafeString and then will receive SafeString as an argument.
How about implementing it as an access modifier like "trusted" and enforce that only values from other trusted members can be assigned to a trusted member?
Wow that's interesting. I've never heard of this in Perl or Ruby, and I always thought of taint analysis as static rather than dynamic.
Though I don't have experience with it, maybe one reason it isn't used is because of false positives?
For efficiency reasons, Perl takes a conservative view of whether data is tainted. If an expression contains tainted data, any subexpression may be considered tainted, even if the value of the subexpression is not itself affected by the tainted data.
Has anyone used this? Is the runtime overhead always there, or only when you turn the taint mode on? It seems like it would have to occupy some extra space in the string objects all the time? (Although I guess if it's literally a single bit, it can come for "free" because of padding)
-----
FWIW here are some references on static taint analysis:
Perhaps not much on purpose, but it kicks in automatically if the Perl script is setuid. So you'll find questions about it where people are struggling with it.
Dynamic taint analysis is a really common technique in academic work, but largely has unacceptable performance costs for interesting applications. Typical costs range from 10% to 100% overhead or more. The other problem is that the entire system needs to track it. If you just own part of a system, instrumenting to add dynamic taint tracking can be really difficult.
It's used in Rails to reduce the likelihood of un-sanitized user input in SQL fragments [1]. I think it would see a lot more use if additional input sources were marked as tainted [2].
This can be done with a special type representing "trusted" data. For example, Go has template.HTML representing data that's safe to render without escaping. Everything else gets escaped.
> Security engineers in general, very much including Chrome Security Team, would like to advance the state of engineering to where memory safety issues are much more rare. Then, we could focus more attention on the application-semantic vulnerabilities. That would be a big improvement.
> Unsafe implementation languages are languages that lack memory safety, including at least C, C++, and assembly language. Memory-safe languages include Go, Rust, Python, Java, JavaScript, Kotlin, and Swift
Very nice. At the end of the process, Google might adopt Rust in Chromium. As much as I use and love Firefox, it's only realist to say that Chrome has higher chances of being around in 10 years.
I wonder why the list doesn't include their wuffs language.
I work at Google and as far as I know Rust is not really “in house” at Google, at least not any more than C#. Both languages exist in some form, Google does have some plugins for Unity and I believe Fuchsia has some Rust code. It is indeed unclear what criteria was used to select languages, though it’s really not very relevant to the primary point anyway.
(Legal line noise: my opinions are not those of my employer.)
If you check where Firefox was 10 years ago to where it's now you can see the trend. It still continues. In the last year, Firefox lost more than 10% of its market share. A component of this is probably Firefox not being able to capture growth of the entire market, but the trend also holds for the absolute number of users: 890 million YAUs in Jul 2018 vs 809 YAUs in Jul 2019. In the long term view, Firefox is dying.
Neither of those charts go back far enough because if they did you'd see this has all happened before, and even worse at one point. When IE was taking over the world FF fell to <5% of the market, yet it survived. It's not dead until its dead and with Chrome killing off ad-blockers I bet we'll see some reversal of the current trends when that ships.
By the same reasoning, Internet Explorer should have killed off Firefox ages ago. Except that never happened and instead it is IE that died. Firefox has plenty of supporters and a rich development community; it won't go away any time soon.
There are enough of them to keep Mozilla going indefinitely and they are doing some truly amazing stuff like using Rust to get massive performance boosts. Mozilla and Firefox have set the agenda technically for close to two decades. Everybody does tabs now. I remember when that was a Mozilla only thing. Extensions were a Mozilla only thing for a long time now. Even Safari has extensions now. The new focus on security and privacy started at Mozilla and is now being copied by others (Brave, Edge, Safari) while Google is moving to kill ad blockers and continues to sell users out to their advertisers.
Well it isn't meant to be an exhaustive list of languages that are memory-safe. I could complain that Ruby isn't in there, but it isn't very popular in the Google world right now. I think we all get the idea though.
> But if you transform the image into a format that doesn‘t have PNG’s complexity (in a low-privilege process, of course), the malicious nature of the PNG ‘should’ be eliminated and then safe for parsing at a higher privilege level. Even if the attacker manages to compromise the low-privilege process with a malicious PNG, the high-privilege process will only parse the compromised process' output with a simple, plausibly-safe parser.
It's interesting to get a sense of how deeply unrealistic they think it is, to write a safe parser for a typical data format in an unsafe language.
Because as of 2019 the same errors as in early 1980's keep being repeated, regardless how many tools have been developed to tame C and it's derivatives.
It is so unrealistic that Android is following up Solaris footsteps.
Google has announced that ARM memory tagging extensions will be required in future Android versions.
The PSP was hacked by pirates when a bug was discovered in its image app (I think for tiff files). It makes me wonder if security should be baked in to a data format.
Two actually seems like a lot here. Why would you angle for two and not one? It seems like the latter two (unsafe implementation language and high privilege) are both within the purview of developers. Is it just a case of resource management?
It's practical advice for Chrome developers wanting to get a patch accepted. Deciding to rewrite the high-privilege parts of Chrome in Rust (say) is too big a project to be in scope.
"unsafe implementation language" is a pretty moot point.
Looking at all the trivial exploits against web applications which are basically never written in memory-unsafe languages (Ruby, Python, PHP, ...) shows that it doesn't really matter much. While having the same implementation in a memory unsafe language would be slightly less safe, it's very unlikely that a heap corruption could be exploited remotely.
The Morris Worm happened back when security was not really a big concern.
Memory corruption is much harder (and in most cases realistically not at all) to exploit beyond a DoS, and that's what you would get with "safe" languages such as Rust or Python as well.
Heartbleed, Shellshock, Dirty COW etc. would all happen exactly the same way in different programming languages.
Yes, there is clearly a benefit in using something which makes it much harder introducing memory safety issues, but it's not nearly as big as many here on HN think.
Most recent (and the first public) Chrome ITW attack used memory corruption attack. One to exploit the renderer, another to exploit the underlying kernel to escape the sandbox.
I believe the same is true (roughly) with the Coinbase attack that went after Firefox.
In short, memory safety is not only responsible for the majority of reported vulns, but also the exploited ones, at least in the case of browsers.
> Heartbleed, Shellshock, Dirty COW etc. would all happen exactly the same way in different programming languages.
Heartbleed is impossible in a memory safe language, at the least. Same with cloudbleed for that matter.
DirtyCow and shellshock, sure.
It's a bit of a moot point though - human energy is finite, consider if we could spend energy on problems like DirtyCow and shellshock instead of memory safety issues that simply don't exist in many languages.
> Security engineers in general, very much including Chrome Security Team, would like to advance the state of engineering to where memory safety issues are much more rare. Then, we could focus more attention on the application-semantic vulnerabilities. That would be a big improvement.
Perl's "taint"[1] capability is pretty interesting in this space. Do other languages have something similar?
[1] https://perldoc.perl.org/perlsec.html#Taint-mode
"You may not use data derived from outside your program to affect something else outside your program--at least, not by accident. All command line arguments, environment variables, locale information (see perllocale), results of certain system calls (readdir(), readlink(), the variable of shmread(), the messages returned by msgrcv(), the password, gcos and shell fields returned by the getpwxxx() calls), and all file input are marked as "tainted"."
Unrelated rant: Sometime recently mobile chrome omits any part of a url after a # when you copy/share the url. Grrr.