The WA stack machine was added late in the game. Before that it was an AST but they got pushback from browser backend people and pivoted to a stack machine.
AST was a good idea and parsing as validation was excellent. Having a general stack machine makes validation much more difficult.
This is a bad decision (stacks vs AST or register) made for the wrong reason (code size). That's a 1990s design assumption when memory was expensive and bandwidth was dear. Now memory is cheap; bandwidth is phenomenal; and latency is expensive.
I've implemented the stack-machine validation rules, and they're very similar to the postorder AST validation rules. In some areas, they're actually simpler, because the stack-machine rules accept roughly a superset of the AST rules, so there are fewer constraints to enforce.
The stack-machine rules were indeed added late, and it's reasonable to ask whether they might have been improved if there had been more time to iterate. It's also reasonable to ask whether a simple register-machine design with a compression layer on top would have been a better overall design.
However, code size is important for wasm. Smaller code size means less to download between a user clicking a link and viewing content. Networks have gotten faster on average, but bandwidth still matters in many contexts.
I am talking about problem of trust, which this binary format creates.
Here, in Linux, we are solving problem of trust using distributions, maintainers, signed packages, signed repositories, releases. It's why I will trust binary packages from my distribution but will not trust webasm binaries.
Instead of having everything go through central trusted authorities, as Linux distros do, wasm (as the Web does in general) relies on sandboxing untrusted content on the user side.
A binary encoding does not contribute significantly to obfuscation when it can be trivially undone. WebAssembly is an open standard, and browsers supporting wasm have builtin support for converting it to text and displaying it.
Compiled code can be much harder to read than human-written code, though this is mainly because of lowering and optimization, rather than the final encoding.
It's looks like you have very limited experience with Linux distros. Nobody forces Linux users to use repositories. We use repositories because we trust them much more than random binary blobs from Internet.
Binary encoding contribute significantly to obfuscation, because it cannot be formatted, or refactored, or commented, or modified (e.g. to add assertion or print debugging information). Nobody programs in binary.
WebAsm creates problem. Same problem as Java, Flash, Unity, PNaCl and dozens of other platforms to execute binary blobs from untrusted sources. And the only solution is to add trust, e.g. by publishing heavy-weight libraries for review and patching by third-party maintainers, e.g. such libraries as SDL, game engines, GUI, databases, etc. Otherwise, we will have same situation as with other libraries, e.g. JQuery, when sites are using old version of common library with known security problems for ages, despite that fixed version is freely available.
> That's a 1990s design assumption when memory was expensive
Memory is still expensive:
* Spreading things out in memory more causes more cache misses, which lowers performance.
* Using more memory increases page faults, which lowers performance.
* I believe using more memory drains batteries faster on mobile devices, but I'm not sure exactly why. Maybe the effort spent shuttling stuff from virtual memory into real memory on page faults?
* On embedded devices, using more memory means you need to have more RAM, which increases the per-device manufacturing cost.
* Using more memory to represent code increases memory pressure, which leads to more GC cycles.
I agree. If you want things to be fast on a current Intel machine, keeping the code and data in cache means you can do ten times as much work before you have to do the next memory access.
There's also the issue of additional latency from tertiary caches, like the disk, when you are under memory pressure.
constructs must leave the stack at the same height with the same types
That's easy to 'document' and harder to do. It's what the JVM did, again, in the 90s. Now the verifier chapter in the JVM spec is now about 160 pages long.
We learned our lessons from the JVM, and the verification algorithm for WASM is vastly simpler. It's even been formalized and fits in about 3/4 a page.
Sadly it is more complicated - but the document in the link was written before a bunch of problems were found (with things like unreachable code, which is indeed trickier in stack machines than ASTs).
(Those problems and their solutions haven't been documented yet AFAIK.)
I don't think it makes a big difference. It's not that difficult to reconstruct an AST from a stack machine that has the same stack types on every code path. It's mostly just a post-order serialization of the AST.
push x
push y
add
Evaluate this symbolically and you get (add x y) naturally.
Writing a simple interpreter, I prefer register or stack machine over AST walker. It'll be faster, for one. And there's a chance of interpreting it directly with a loop and switch, without a deserialization step.
Many simple analyses of an AST have an equivalent stack machine form. If the analysis can be done as a post-order traversal (like type-checking, constant folding etc.) you're good to go.
WA doesn't get interpreted. It gets validated and passed over to the JIT (TurboFan). It comes over the wire so that validation has to be thorough (read that as expensive).
Hmm. I didn't say it was interpreted in any given browser implementation. I was judging the serialization format for what it is, vs reasonable alternatives. Analysis, transformation and interpretation are what you want out of this kind of encoding.
I also don't agree re expensive validation; I think you're wrong. Interface with APIs much more problematic than pure computation, which is not hard to validate. Interface safety is largely the same problem as with plain JS, as I see it. I don't see stack machine vs register machine making almost any difference at all.
If WASM runs on a stack machine, I wonder how hard it would be to map the instructions to JVM bytecodes? From the example in the document they look similar...
It would be interesting to be able to take code compiled with Emscripten for example and run it as part of JVM applications, similar to what NestedVM can do.
What are the details that you mean? I just skimmed the spec and I don't see anything significant about exceptions or gc... it mostly seems focused on the WASM memory layout and the instruction set. The instruction set seems at least superficially similar to the java instruction set, and operates on a "linear memory" which could easily be implemented in java as a large array of int[] for 32 bit or long[] for 64 bit (although that scheme may need to be more complicated if it is to be shared across threads). Any instruction that doesn't have a direct analog in the jvm could be implemented as a static java method. Also WASM seems to use a static set of labels as branch targets rather than reading its instructions from the program memory (is that called harvard architecture?) which seems to suggest that you could translate WASM functions into java classes/functions to be loaded into the JVM.
In any case I don't have time to implement all of that but it seems like it could be any interesting way to run non-java libraries on the JVM
The horrifying thing about WASM is that you just know like 5 years from now someone will be posting some link here about how they ported the JVM to WASM and are using it to run old Java applets…
There is already PyPy.js using asm.js, doppiojvm which is written in JavaScript, and some effort on getting OpenJDK and/or JamVM to work on asm.js. I don't think you'll have to wait 5 years.
I imagine people will use it on the desktop apps and soon everything will just be running with WASM! It sounds like it has the potential to unify web development with development on all platforms if it is efficient enough.
EDIT: I didn't think I would be excited about WebAssembly but I'm now really curious to see what happens with it.
Yes, the s-expression format has become awkward. There are differences between the s-expression format in the test suite, what browsers will show, and what tools support.
You can think of the old s-expression format as a language that compiles into wasm, a language that is an AST and that happens to have the same types and operations etc. as wasm.
I just wanted to say Im really excited about this project. I hope it will come together and make software universal as the web we have today (if I understand it right).
Anyways, how can I contribute?
I believe that switching to a stack machine is short-sighted and a big mistake:
The AST format could open up new possibilities for software, some of which are observable in Lispy languages like Scheme (I won't list them here). Instead, we're looking at locking the software world back into this 1960's model for another 50 years out of a misguided concern for optimization over power.
It's like foregoing the arch because it's more work to craft, and instead coming up with a REALLY efficient way to fit square blocks together. Congratulations, we can build better pyramids, but will never grasp the concept of a cathedral.
To really grasp my point, I BEG you all to watch the following two videos in full and think hard about what Alan Kay & Douglass Crockford have to say about new ideas, building complex structures, and leaving something better for the next generation:
As Alan Kay states, what is simpler: something that's easier to process, but for which the software written on top of it is massive; or one that takes a bit more overhead, but allows for powerful new ways to model software and reduce complexity?
I believe that an AST model is a major start in inventing "the arch" that's been missing in software, and with something that will proliferate the whole web ... how short-sighted it would be to give that up in favor of "optimizing" the old thing.
Imagine if instead of JavaScript, the language of the web had been Java? Lambdas would not be mainstream; new ways of doing OOP would not be thought of; and all the amazing libraries that have been written because of the ad-hoc object modeling that JavaScript offers. Probably one of the messiest and inefficient languages ever written, yet one of the most powerful ever given. C'mon, let's do it a step more by making it binary, homoiconic, and self-modifying.
...and if you're brave enough, think about how Christopher Alexander's philosophy of "unfolding wholeness" applies so much more to an AST than to the stack-machines of the 1960's:
I'm confused about all of this - is WebAssembly related to the Google Native Client stuff they did and are now deprecating? Is there a write-up of how it works?
First there was NaCl which was a subset of x86 code leveraging x86 features to sandbox the execution. This x86 subset was produced by a special toolchain and could be verified before running.
Then PNaCl came along with a platform-independent bitcode format based on LLVM IR, which was translated to host's native code in the browser.
Then WASM (also platform-independent) came along striving to be a multi-vendor solution. Unlike the other two WASM directly targets the JavaScript engine. It started out as a serialization format for JavaScript AST.
Adding to this, PNaCl relies heavily on Chrome's sandbox, so any third-party implementation of PNaCl would involve re-implementing large swaths of Chrome. It's not as portable as you'd think. WASM was designed from the ground up to not depend on any browser's implementation.
https://github.com/WebAssembly/design/issues/755
AST was a good idea and parsing as validation was excellent. Having a general stack machine makes validation much more difficult.
This is a bad decision (stacks vs AST or register) made for the wrong reason (code size). That's a 1990s design assumption when memory was expensive and bandwidth was dear. Now memory is cheap; bandwidth is phenomenal; and latency is expensive.