Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The WA stack machine was added late in the game. Before that it was an AST but they got pushback from browser backend people and pivoted to a stack machine.

https://github.com/WebAssembly/design/issues/755

AST was a good idea and parsing as validation was excellent. Having a general stack machine makes validation much more difficult.

This is a bad decision (stacks vs AST or register) made for the wrong reason (code size). That's a 1990s design assumption when memory was expensive and bandwidth was dear. Now memory is cheap; bandwidth is phenomenal; and latency is expensive.



I've implemented the stack-machine validation rules, and they're very similar to the postorder AST validation rules. In some areas, they're actually simpler, because the stack-machine rules accept roughly a superset of the AST rules, so there are fewer constraints to enforce.

The stack-machine rules were indeed added late, and it's reasonable to ask whether they might have been improved if there had been more time to iterate. It's also reasonable to ask whether a simple register-machine design with a compression layer on top would have been a better overall design.

However, code size is important for wasm. Smaller code size means less to download between a user clicking a link and viewing content. Networks have gotten faster on average, but bandwidth still matters in many contexts.


It also great for obfuscation.

PS.

I am talking about problem of trust, which this binary format creates. Here, in Linux, we are solving problem of trust using distributions, maintainers, signed packages, signed repositories, releases. It's why I will trust binary packages from my distribution but will not trust webasm binaries.


Instead of having everything go through central trusted authorities, as Linux distros do, wasm (as the Web does in general) relies on sandboxing untrusted content on the user side.

A binary encoding does not contribute significantly to obfuscation when it can be trivially undone. WebAssembly is an open standard, and browsers supporting wasm have builtin support for converting it to text and displaying it.

Compiled code can be much harder to read than human-written code, though this is mainly because of lowering and optimization, rather than the final encoding.


It's looks like you have very limited experience with Linux distros. Nobody forces Linux users to use repositories. We use repositories because we trust them much more than random binary blobs from Internet. Binary encoding contribute significantly to obfuscation, because it cannot be formatted, or refactored, or commented, or modified (e.g. to add assertion or print debugging information). Nobody programs in binary.

WebAsm creates problem. Same problem as Java, Flash, Unity, PNaCl and dozens of other platforms to execute binary blobs from untrusted sources. And the only solution is to add trust, e.g. by publishing heavy-weight libraries for review and patching by third-party maintainers, e.g. such libraries as SDL, game engines, GUI, databases, etc. Otherwise, we will have same situation as with other libraries, e.g. JQuery, when sites are using old version of common library with known security problems for ages, despite that fixed version is freely available.


> That's a 1990s design assumption when memory was expensive

Memory is still expensive:

* Spreading things out in memory more causes more cache misses, which lowers performance.

* Using more memory increases page faults, which lowers performance.

* I believe using more memory drains batteries faster on mobile devices, but I'm not sure exactly why. Maybe the effort spent shuttling stuff from virtual memory into real memory on page faults?

* On embedded devices, using more memory means you need to have more RAM, which increases the per-device manufacturing cost.

* Using more memory to represent code increases memory pressure, which leads to more GC cycles.


I agree. If you want things to be fast on a current Intel machine, keeping the code and data in cache means you can do ten times as much work before you have to do the next memory access.

There's also the issue of additional latency from tertiary caches, like the disk, when you are under memory pressure.


Mobile probably turns off DRAM refresh for memory not allocated. RAM at rest still takes work to refresh.


> AST was a good idea and parsing as validation was excellent. Having a general stack machine makes validation much more difficult.

Why? This document suggests that it's not that different.

"The main new changes to verification are:

- All branches to the end of a block must have the same arity and same types, including the implicit fall-through to end.

- The true block of if-end constructs must leave the stack at the same height as when the if was entered.

- The true and false blocks of if-else-end constructs must leave the stack at the same height with the same types."


constructs must leave the stack at the same height with the same types

That's easy to 'document' and harder to do. It's what the JVM did, again, in the 90s. Now the verifier chapter in the JVM spec is now about 160 pages long.


We learned our lessons from the JVM, and the verification algorithm for WASM is vastly simpler. It's even been formalized and fits in about 3/4 a page.


Sadly it is more complicated - but the document in the link was written before a bunch of problems were found (with things like unreachable code, which is indeed trickier in stack machines than ASTs).

(Those problems and their solutions haven't been documented yet AFAIK.)


> Why?

Because you can also easily transform an AST back into code to reverse engineer?

Reverse engineering code for a stack machine is quite a bit more annoying.


I don't think it makes a big difference. It's not that difficult to reconstruct an AST from a stack machine that has the same stack types on every code path. It's mostly just a post-order serialization of the AST.

    push x
    push y
    add
Evaluate this symbolically and you get (add x y) naturally.

Writing a simple interpreter, I prefer register or stack machine over AST walker. It'll be faster, for one. And there's a chance of interpreting it directly with a loop and switch, without a deserialization step.

Many simple analyses of an AST have an equivalent stack machine form. If the analysis can be done as a post-order traversal (like type-checking, constant folding etc.) you're good to go.


WA doesn't get interpreted. It gets validated and passed over to the JIT (TurboFan). It comes over the wire so that validation has to be thorough (read that as expensive).


Hmm. I didn't say it was interpreted in any given browser implementation. I was judging the serialization format for what it is, vs reasonable alternatives. Analysis, transformation and interpretation are what you want out of this kind of encoding.

I also don't agree re expensive validation; I think you're wrong. Interface with APIs much more problematic than pure computation, which is not hard to validate. Interface safety is largely the same problem as with plain JS, as I see it. I don't see stack machine vs register machine making almost any difference at all.


I always liked the simplicity of Register based machines.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: