WebAssembly Stack Machine

CalChris · on Nov 1, 2016

The WA stack machine was added late in the game. Before that it was an AST but they got pushback from browser backend people and pivoted to a stack machine.

https://github.com/WebAssembly/design/issues/755

AST was a good idea and parsing as validation was excellent. Having a general stack machine makes validation much more difficult.

This is a bad decision (stacks vs AST or register) made for the wrong reason (code size). That's a 1990s design assumption when memory was expensive and bandwidth was dear. Now memory is cheap; bandwidth is phenomenal; and latency is expensive.

sunfish · on Nov 1, 2016

I've implemented the stack-machine validation rules, and they're very similar to the postorder AST validation rules. In some areas, they're actually simpler, because the stack-machine rules accept roughly a superset of the AST rules, so there are fewer constraints to enforce.

The stack-machine rules were indeed added late, and it's reasonable to ask whether they might have been improved if there had been more time to iterate. It's also reasonable to ask whether a simple register-machine design with a compression layer on top would have been a better overall design.

However, code size is important for wasm. Smaller code size means less to download between a user clicking a link and viewing content. Networks have gotten faster on average, but bandwidth still matters in many contexts.

lisivka · on Nov 2, 2016

It also great for obfuscation.

PS.

I am talking about problem of trust, which this binary format creates. Here, in Linux, we are solving problem of trust using distributions, maintainers, signed packages, signed repositories, releases. It's why I will trust binary packages from my distribution but will not trust webasm binaries.

sunfish · on Nov 2, 2016

Instead of having everything go through central trusted authorities, as Linux distros do, wasm (as the Web does in general) relies on sandboxing untrusted content on the user side.

A binary encoding does not contribute significantly to obfuscation when it can be trivially undone. WebAssembly is an open standard, and browsers supporting wasm have builtin support for converting it to text and displaying it.

Compiled code can be much harder to read than human-written code, though this is mainly because of lowering and optimization, rather than the final encoding.

lisivka · on Nov 3, 2016

It's looks like you have very limited experience with Linux distros. Nobody forces Linux users to use repositories. We use repositories because we trust them much more than random binary blobs from Internet. Binary encoding contribute significantly to obfuscation, because it cannot be formatted, or refactored, or commented, or modified (e.g. to add assertion or print debugging information). Nobody programs in binary.

WebAsm creates problem. Same problem as Java, Flash, Unity, PNaCl and dozens of other platforms to execute binary blobs from untrusted sources. And the only solution is to add trust, e.g. by publishing heavy-weight libraries for review and patching by third-party maintainers, e.g. such libraries as SDL, game engines, GUI, databases, etc. Otherwise, we will have same situation as with other libraries, e.g. JQuery, when sites are using old version of common library with known security problems for ages, despite that fixed version is freely available.

munificent · on Nov 1, 2016

> That's a 1990s design assumption when memory was expensive

Memory is still expensive:

* Spreading things out in memory more causes more cache misses, which lowers performance.

* Using more memory increases page faults, which lowers performance.

* I believe using more memory drains batteries faster on mobile devices, but I'm not sure exactly why. Maybe the effort spent shuttling stuff from virtual memory into real memory on page faults?

* On embedded devices, using more memory means you need to have more RAM, which increases the per-device manufacturing cost.

* Using more memory to represent code increases memory pressure, which leads to more GC cycles.

utorrent_dev · on Nov 2, 2016

I agree. If you want things to be fast on a current Intel machine, keeping the code and data in cache means you can do ten times as much work before you have to do the next memory access.

There's also the issue of additional latency from tertiary caches, like the disk, when you are under memory pressure.

sitkack · on Nov 2, 2016

Mobile probably turns off DRAM refresh for memory not allocated. RAM at rest still takes work to refresh.

muizelaar · on Nov 1, 2016

> AST was a good idea and parsing as validation was excellent. Having a general stack machine makes validation much more difficult.

Why? This document suggests that it's not that different.

"The main new changes to verification are:

- All branches to the end of a block must have the same arity and same types, including the implicit fall-through to end.

- The true block of if-end constructs must leave the stack at the same height as when the if was entered.

- The true and false blocks of if-else-end constructs must leave the stack at the same height with the same types."

CalChris · on Nov 1, 2016

constructs must leave the stack at the same height with the same types

That's easy to 'document' and harder to do. It's what the JVM did, again, in the 90s. Now the verifier chapter in the JVM spec is now about 160 pages long.

titzer · on Nov 3, 2016

We learned our lessons from the JVM, and the verification algorithm for WASM is vastly simpler. It's even been formalized and fits in about 3/4 a page.

azakai · on Nov 2, 2016

Sadly it is more complicated - but the document in the link was written before a bunch of problems were found (with things like unreachable code, which is indeed trickier in stack machines than ASTs).

(Those problems and their solutions haven't been documented yet AFAIK.)

kuschku · on Nov 1, 2016

> Why?

Because you can also easily transform an AST back into code to reverse engineer?

Reverse engineering code for a stack machine is quite a bit more annoying.

barrkel · on Nov 1, 2016

I don't think it makes a big difference. It's not that difficult to reconstruct an AST from a stack machine that has the same stack types on every code path. It's mostly just a post-order serialization of the AST.

    push x
    push y
    add

Evaluate this symbolically and you get (add x y) naturally.

Writing a simple interpreter, I prefer register or stack machine over AST walker. It'll be faster, for one. And there's a chance of interpreting it directly with a loop and switch, without a deserialization step.

Many simple analyses of an AST have an equivalent stack machine form. If the analysis can be done as a post-order traversal (like type-checking, constant folding etc.) you're good to go.

CalChris · on Nov 1, 2016

WA doesn't get interpreted. It gets validated and passed over to the JIT (TurboFan). It comes over the wire so that validation has to be thorough (read that as expensive).

barrkel · on Nov 2, 2016

Hmm. I didn't say it was interpreted in any given browser implementation. I was judging the serialization format for what it is, vs reasonable alternatives. Analysis, transformation and interpretation are what you want out of this kind of encoding.

I also don't agree re expensive validation; I think you're wrong. Interface with APIs much more problematic than pure computation, which is not hard to validate. Interface safety is largely the same problem as with plain JS, as I see it. I don't see stack machine vs register machine making almost any difference at all.

crudbug · on Nov 1, 2016

I always liked the simplicity of Register based machines.

bcg1 · on Nov 1, 2016

If WASM runs on a stack machine, I wonder how hard it would be to map the instructions to JVM bytecodes? From the example in the document they look similar...

It would be interesting to be able to take code compiled with Emscripten for example and run it as part of JVM applications, similar to what NestedVM can do.

hackcasual · on Nov 1, 2016

The devil's in the details. Exception handling and garbage collection are different enough that a straightforward adaptation won't cut it.

bcg1 · on Nov 1, 2016

What are the details that you mean? I just skimmed the spec and I don't see anything significant about exceptions or gc... it mostly seems focused on the WASM memory layout and the instruction set. The instruction set seems at least superficially similar to the java instruction set, and operates on a "linear memory" which could easily be implemented in java as a large array of int[] for 32 bit or long[] for 64 bit (although that scheme may need to be more complicated if it is to be shared across threads). Any instruction that doesn't have a direct analog in the jvm could be implemented as a static java method. Also WASM seems to use a static set of labels as branch targets rather than reading its instructions from the program memory (is that called harvard architecture?) which seems to suggest that you could translate WASM functions into java classes/functions to be loaded into the JVM.

In any case I don't have time to implement all of that but it seems like it could be any interesting way to run non-java libraries on the JVM

tdb7893 · on Nov 1, 2016

Could you build a JVM in WASM?

striking · on Nov 1, 2016

WASM's Turing complete, so yes of course you could. (Maybe minus some features.)

The question would then be, is the mapping straightforward or you do need to reimplement everything from scratch?

tdb7893 · on Nov 1, 2016

The horrifying thing about WASM is that you just know like 5 years from now someone will be posting some link here about how they ported the JVM to WASM and are using it to run old Java applets…

amaranth · on Nov 1, 2016

There is already PyPy.js using asm.js, doppiojvm which is written in JavaScript, and some effort on getting OpenJDK and/or JamVM to work on asm.js. I don't think you'll have to wait 5 years.

tdb7893 · on Nov 1, 2016

I imagine people will use it on the desktop apps and soon everything will just be running with WASM! It sounds like it has the potential to unify web development with development on all platforms if it is efficient enough.

EDIT: I didn't think I would be excited about WebAssembly but I'm now really curious to see what happens with it.

baq · on Nov 2, 2016

the browser is a game engine with beefed up security, to quote servo. webassembly is the end game of scriptability of that engine.

what you say about app development being unified is already true: see atom, steam and all others who embed webkit or similar to create a desktop UI.

lotyrin · on Nov 1, 2016

I'd be surprised if it took that long.

amelius · on Nov 1, 2016

Yes, if there's a compiler that compiles from C/C++ to WASM, then it should be possible, I suppose.

pjmlp · on Nov 1, 2016

Or flash

bbcbasic · on Nov 1, 2016

ActiveX

amelius · on Nov 1, 2016

How about threading? And a concurrent garbage collector? Would it be possible to code these things in WASM?

aspirin · on Nov 1, 2016

It makes me sad to see the AST go. I was thinking of writing my own WASM compiler just for fun, purely as an AST transformer.

Its funny how most of the WASM tools still use s-experssions as the text format. How does that even work, now that the AST is history?

azakai · on Nov 2, 2016

Yes, the s-expression format has become awkward. There are differences between the s-expression format in the test suite, what browsers will show, and what tools support.

You can think of the old s-expression format as a language that compiles into wasm, a language that is an AST and that happens to have the same types and operations etc. as wasm.

pythonlion · on Nov 1, 2016

I just wanted to say Im really excited about this project. I hope it will come together and make software universal as the web we have today (if I understand it right). Anyways, how can I contribute?

lisivka · on Nov 2, 2016

Java Applets, Flash, PNaCl, WebAssembly. Chose your poison.

shkaboinka · on Nov 3, 2016

I believe that switching to a stack machine is short-sighted and a big mistake:

The AST format could open up new possibilities for software, some of which are observable in Lispy languages like Scheme (I won't list them here). Instead, we're looking at locking the software world back into this 1960's model for another 50 years out of a misguided concern for optimization over power.

It's like foregoing the arch because it's more work to craft, and instead coming up with a REALLY efficient way to fit square blocks together. Congratulations, we can build better pyramids, but will never grasp the concept of a cathedral.

To really grasp my point, I BEG you all to watch the following two videos in full and think hard about what Alan Kay & Douglass Crockford have to say about new ideas, building complex structures, and leaving something better for the next generation:

https://youtu.be/oKg1hTOQXoY

https://www.youtube.com/watch?v=PSGEjv3Tqo0

As Alan Kay states, what is simpler: something that's easier to process, but for which the software written on top of it is massive; or one that takes a bit more overhead, but allows for powerful new ways to model software and reduce complexity?

I believe that an AST model is a major start in inventing "the arch" that's been missing in software, and with something that will proliferate the whole web ... how short-sighted it would be to give that up in favor of "optimizing" the old thing.

Imagine if instead of JavaScript, the language of the web had been Java? Lambdas would not be mainstream; new ways of doing OOP would not be thought of; and all the amazing libraries that have been written because of the ad-hoc object modeling that JavaScript offers. Probably one of the messiest and inefficient languages ever written, yet one of the most powerful ever given. C'mon, let's do it a step more by making it binary, homoiconic, and self-modifying.

Thanks.

shkaboinka · on Nov 3, 2016

...and if you're brave enough, think about how Christopher Alexander's philosophy of "unfolding wholeness" applies so much more to an AST than to the stack-machines of the 1960's:

https://youtu.be/98LdFA-_zfA

erichocean · on Nov 2, 2016

I'd prefer it was an SSA representation, like SPIR-V in Vulkan, or LLVM IR. Missed opportunity IMO.

sofaofthedamned · on Nov 1, 2016

I'm confused about all of this - is WebAssembly related to the Google Native Client stuff they did and are now deprecating? Is there a write-up of how it works?

wingless · on Nov 1, 2016

First there was NaCl which was a subset of x86 code leveraging x86 features to sandbox the execution. This x86 subset was produced by a special toolchain and could be verified before running.

Then PNaCl came along with a platform-independent bitcode format based on LLVM IR, which was translated to host's native code in the browser.

Then WASM (also platform-independent) came along striving to be a multi-vendor solution. Unlike the other two WASM directly targets the JavaScript engine. It started out as a serialization format for JavaScript AST.

amyjess · on Nov 1, 2016

Adding to this, PNaCl relies heavily on Chrome's sandbox, so any third-party implementation of PNaCl would involve re-implementing large swaths of Chrome. It's not as portable as you'd think. WASM was designed from the ground up to not depend on any browser's implementation.

bobajeff · on Nov 1, 2016

No other than that it's meant to be used in much the same way and has some of the same developers working on it.

This might help you:

https://github.com/WebAssembly/design/blob/master/FAQ.md

sofaofthedamned · on Nov 1, 2016

Thank you, very useful.

hood_syntax · on Nov 1, 2016

http://webassembly.org/ should have the information you want.

Marat_Dukhan · on Nov 1, 2016

Google Native Client is not deprecated, it is still supported in Chrome, and in many ways exceeds the capabilities of WebAssembly.

WebAssembly is very different, and in Chrome it is implemented on top of TurboFan, V8's code generator, rather than Native Client.

sofaofthedamned · on Nov 1, 2016

Thank you. It's interesting to see the original idea of NC growing and becoming cross-browser.