A VM with standardized input and output is exactly what I meant. Nevertheless, the underlying implementation of said VMs can still differ but they would all implement the same bytecode standard (similarly as both HotSpot and Dalvik both implement the JVM bytecode standard).
But JavaScript VMs already have standardized input and output. It is not a bytecode, but it is standardized input with standardized output. So we already have what you want except that the input format is different than you envisioned. It seems to me that's the whole idea behind asm.js — they're defining a subset of the language that can be implemented very efficiently with precise low-level semantics so that we get most of the benefit of a bytecode without throwing out all that we have now.