The fundamental bug here is really slick. The static analyzer in the JIT incorrectly believes
Math.expm1(x)
can't return -0. But if x is -0, it can. That, in turn, means it believes
Object.is(Math.expm(x), -0)
must always be false. But if x is -0, it's true, not false. That, in turn, means that the JIT believes
array[Object.is(Math.expm(x), -0) * INDEX]
must be array[0], no matter what INDEX is. But if x is -0, it'll be array[INDEX]. That's a big problem, because the optimizer, guided by the static analyzer, removes the bounds check if the bounds check can't ever fail.
But it's not quite that simple, because another consequence of of the static analyzer's misperception is that
Object.is(Math.expm(x), -0) * INDEX
is the constant 0, no matter what value x takes and no matter what INDEX is, and the JIT will simply constant-fold 0 in for the whole expression, so the bounds check really won't matter.
But constant folding is an optimization, not a security boundary, and easily "defeated": replace the -0 constant in that expression with an object reference:
obj = { o: -0 }
Object.is(Math.expm(x), obj.o)
You can read that code and see that it's the constant -0, but the static analyzer needs to do escape analysis to make sure the object reference really does just evaluate to a constant, and escape analysis happens after the typing analysis, so the constant folding doesn't happen.
The end result of the bug is a sort of "black magic zero/one chimera" that, mixed into an expression, tricks the JIT into stripping bounds checks. It's an extremely cool exploit primitive. It's doubly cool because if you don't have Andrea's writeup or a really nuanced understanding of v8 internals, you could stare at that exploit for eternity and never understand why it works.
> The fundamental bug here is really slick. The static analyzer in the JIT incorrectly believes Math.expm1(x)
can't return -0.
I can't get past this part.
When a Googler "believes" Union(PlainNumber, NaN) represents the set of possible return values for a math function and commits that in the code, how is there not an automated set of tests that use one out of every other IEEE754-associated type as input to then check whether it generates output which falls outside the assumed set of types?
I mean in this case you've even got NegativeZero as its own type. What's even the point of having a type with a single value if you don't hurl it at every manually-entered optimization that it could conceivably invalidate?
I don't know, to me, this sounds like one of the more subtle examples of the kinds of mistakes that lead to security failures. Like, it might be an almost archetypical example of the "all bugs are security vulnerabilities" hypothesis. They got code execution from expm1!
But if you believe that this is an example of wanton abuse at Google, you can trade on that belief, and in a sense put your money where your mouth is, because this is a whole class of potential bugs, not just one bug; the same pattern will recur for other places where the v8 typer is wrong about the possible results of functions. Go do a sweep! If you find one, the value of the resulting bug might be pretty decent.
> it might be an almost archetypical example of the "all bugs are security vulnerabilities" hypothesis
This article will be my new go-to example when someone handwaves a bug away with a complacent “it’ll never happen” and “it’s not that big of a deal”. Yes it will, and yes it is.
certainly V8 tries to be correct and this bug will be fixed. Chromium's position is "security in depth". They know it's impossible to have zero bugs therefore the entire architure assumes there will be bugs and tries to prevent them from causing any harm. This is also why there are roughly 10x less code execution bugs in chrome vs other browsers. same number of bugs overall but most lead nowhere
I wouldn't be surprised if NegativeZero got introduced to fix some other bug along the same lines involving a different function, and when fixing that bug they neglected to notice that it applies to expm1 as well. Math.expm1 is... a bit obscure. I could certainly see myself making the same mistake.
> how is there not an automated set of tests that use one out of every other IEEE754-associated type as input to then check whether it generates output which falls outside the assumed set of types?
It's slightly worse than that. Math.expm1(-0) = -0 is explicitly called out as part of the standard[1]. You don't need to throw every IEEE754 at your functions to make sure they're all correct, but validating the explicitly defined corner cases is table stakes.
Hmmm! Wonder how hard it might be to make a build of v8 available that replaces assumes with asserts, and perhaps replaces code believed to be unreachable with assert(false)? If it's something like LLVM IR, where assume is just an intrinsic taking a Boolean expression, would seem relatively straightforwards- would probably need to run before every pass in the optimizer...but should be doable.
Here's what I don't get. Why isn't Math.expm1 just implemented in JavaScript?
Math.expm1 = (x) => Math.pow(Math.E, (x)) - 1;
With this monkey patched version, Math.expm1(-0) now returns 0. I've been writing JavaScript for a long time and I didn't even know Math.expm1 was a thing. Was it really necessary to implement this inside V8?
expm1 avoids catastrophic cancellation (due to finite precision floating point), and so is more accurate for x close to zero than naive(x) = exp(x) - 1, e.g. expm1(1e-100) != 0, but naive(1e-100) = 0.
Even good old C includes several functions that seem "redundant" at first glance [0] but are necessary for edge cases. I suppose this is similar to mechanics like fused multiply-add [1].
The reason expm1 exists is because that naive approach is numerically poor when x is small (then e^x is almost 1, and subtracting two almost equal values throws away significant digits and gives you an inaccurate result).
I don't know what the rules for JavaScript semantics are - this version will break if someone redefines `Math.pow`. Is that ok or does that break JavaScript semantics?
That's the wrong answer and your algorithm is bad. The point of expm1 is to be accurate for small arguments using the Taylor series or Chebeyshev polynomial representation.
I know what a Padé approximant is but I am not sure how it is relevant. expm1 is available as part of the math library in JavaScript, just as it is also available for C. You do not have to implement it yourself.
The exact implementation depends on your math library, but in glibc it appears to reduce the argument modulo log 2, and then uses an approximation found in part by the Remez algorithm for the exponential function on the range [0, 0.5 log2]. My assumption here is that the Remez algorithm will often be preferred over Padé, but I am not an expert on these things. The Padé approximant gives an approximation “near” a point but the Remez algorithm finds an approximation over an entire interval. In either case, the Remez algorithm is only used to construct one part of the function, and there is also some effort to propagate error from the argument reduction. The resulting algorithm claims <1 ULP error.
For matrix exponentiation you would need a library.
Interesting how at the end, after acquiring out-of-bounds write access, that it was easiest to leverage the WebAssembly infrastructure to execute code than to build a ROP chain.
Apparently WebAssembly heap memory storing generated code is not write protected atall. I guess whatever architecture they have for managing typed memory chunks doesn't make it sufficiently easy to manipulate protection bits dynamically, and the WebAssembly folks were content to leave compiled WebAssembly chunks are RWX.
It's unfortunate that this thread hasn't been upvoted. It really drives home the futility of expecting something as complex as V8 to ever be safe enough to sandbox arbitrary code. And it has little to do with the language of the implementation--the same engine re-written in Rust would have been just as susceptible to the two exploits.
I guess it's just too inconvenient to accept reality. I'm still surprised that Cloudflare and AWS have actually convinced themselves they can make such architectures secure[1]. Less shocked that everybody is else is so credulous.
Though, I am shocked that Netflix is so credulous. A recent blog post described how they use AWS Lambda functions to manage their CA private key. But, again, because of how limited KMS is I guess it's too inconvenient to not trust Lambda. KMS really should support asymmetric key operations; it's ridiculous it's still not supported. I guess AWS's KMS "cloud HSM" solution just wouldn't scale (in terms of CPU) if people could do that.
[1] Absolute security is impossible, but as complex as this exploit was it's obviously still far too trivial. It's a totally unwarranted and unreasonable expectation that bad actors are incapable of reading (if not manipulate) co-hosted Lambda or Cloudflare Worker projects. Writing a non-JIT'd engine with a simplified memory architecture is not only feasible, it would be an obvious candidate for formal verification. But the demands for performance are just too strong and so, as always, security takes a back-seat to performance and speed-to-market.
The reason that WebAssembly JIT code memory is still RMW (for now) is actually really unfortunate. As you might know, V8's JIT code memory for JS is only writable when the application is quiesced (i.e. JS is not running) and the JIT is either finishing a function or the garbage collector is moving JITted code. It's read-execute otherwise. It's never both writable and executable at the same time. We generally refer to this as WX protection (i.e. writeable/executable exclusive).
In the case of WebAssembly, it's asm.js that's the real culprit here. Internally in V8, asm.js code is translated to WebAssembly by a very quick custom-built parser that validates asm.js's type system while parsing and emits WebAssembly bytecode along the way. The WebAssembly bytecode is then fed to the WebAssembly engine for compilation.
Well...not so fast. In order to meet our performance goals for fast asm.js validation and startup, the WebAssembly engine does not do up-front compilation of Wasm code coming from asm.js. Instead, it does on-demand compilation of asm.js code, compiling the Wasm bytecode corresponding to each asm.js function upon first execution. We call this "lazy compilation".
We originally shipped lazy compilation cooperating with the WX mechanism executable for JS code. That is, upon every (lazy) compilation of asm.js code, we'd flip the permission bits on the code space in order to write in the little bit of machine code generated for each function. Problem is, that permission flip is expensive--like really expensive. So expensive that we had to unship WX protection because it made asm.js code unusably slow.
We're working on fixing this, as we are keenly aware of the risk exposure here.
Author here - thanks for this! I was wondering why Wasm was RWX.
My 2c: I think that there will be risk of code injection as long as write_protect_code_memory in Heap is writable. Changing that flag will usually mess things up and crash writing to RX memory (CodeSpaceMemoryModificationScope won't switch to RW), but a well-crafted exploit might be able to get a fresh executable MemoryChunk (which will now be RWX). It's likely complex to exploit, but the incentive is that code injection is more reliable than ROP when targeting multiple builds (unless you build the chain dynamically, which is slow and often painful).
I'm not sure how asm.js is relevant here. The issue is that function-level lazy compilation makes compilation too hot to allow for an mprotect call, no? Wouldn't a function-level lazy Web Assembly implementation have the same problem?
It seems to me that the solution is the same for both wasm and asm.js: make the unit of compilation larger than the function, so as to amortize the cost of mprotect.
Also, I should have mentioned, but concurrent compilation of Wasm (for incremental tierup) essentially puts the final nail in the WX exclusion coffin. (but we deployed that long after the first reason, lazy compilation for asm.js). The only solution in the long run afaict is out-of-process compilation, which we will explore this year.
> In order to meet our performance goals for fast asm.js validation and startup, the WebAssembly engine does not do up-front compilation of Wasm code coming from asm.js.
It sounds like the fault is how you reached your performance goals rather than asm.js itself.
It's very obvious from context the reference is to the asm.js implementation in V8, I'm not sure how you could read it otherwise ("strongest plausible interpretation" and all that).
Although it would be nice for AWS to explicitly say that there's one customer per hosted Lambda environment, and distinct from language-level or process-based sandboxing. From the article above you have to assume that because v1 of Lambda put one customer per EC2 instance as an expediency that it remains the case.
Nonetheless, regarding the Netflix case, nobody should be hosting long-term CA keys in Lambda or any co-hosted virtualized environment framework. These days few reputable companies except maybe Intel even bother arguing that secret keys are safe from side-channel attacks in such environments. The evidence is just too overwhelming, even for the self-deluding "if it's too complex for me to understand then it must be impossible" crowd.
I believe AWS does share customer information within a device, but with a lot of sandboxing below that. You can watch this talk to learn more:
https://www.youtube.com/watch?v=QdzV04T_kec
I think the threat model is basically that you'd need a KVM kernel 0-day + the ability to exploit it, so a point of privilege such as outside of the firecracker sandbox.
I would love to know how v8 is properly secured in the instance of things like Cloudflares. Im curious to know if this goes through v8 isolates, and if something like lxd containment would be a viable alternative.
But it's not quite that simple, because another consequence of of the static analyzer's misperception is that
is the constant 0, no matter what value x takes and no matter what INDEX is, and the JIT will simply constant-fold 0 in for the whole expression, so the bounds check really won't matter.But constant folding is an optimization, not a security boundary, and easily "defeated": replace the -0 constant in that expression with an object reference:
You can read that code and see that it's the constant -0, but the static analyzer needs to do escape analysis to make sure the object reference really does just evaluate to a constant, and escape analysis happens after the typing analysis, so the constant folding doesn't happen.The end result of the bug is a sort of "black magic zero/one chimera" that, mixed into an expression, tricks the JIT into stripping bounds checks. It's an extremely cool exploit primitive. It's doubly cool because if you don't have Andrea's writeup or a really nuanced understanding of v8 internals, you could stare at that exploit for eternity and never understand why it works.