How so? Interested in your elaboration. I would have surmised it'd be very smoot...

justinpombrio · on Oct 23, 2017

> free-form records, dynamically-sized arrays, no types

AFAIK, free form records and lack of types do not an easier compilation target make. It's pretty easy to compile an untyped language to a typed language.

> How so?

- There are a ton of implicit casts that are almost certainly not what you want. So, for example, if you're going to use addition in the compiled code, it's a 10-step process [1, section 12.8.3.1], and you should be careful not to trigger the 9 steps that do things other than addition.

- There is, last I heard, no way to determine the size of the stack.

- No integers.

- Say you have a demo page for some language, and someone using the page writes an infinite loop. Don't want the page to crash? Welcome to advanced compilation techniques like CPS transformations and trampolining.

Overall, JS has a ~850 page spec, and any part of the language you target for which you don't fully understand the spec is a potential bug. Instead, you want your target language to be dumb, tiny, and explicit.

[1] http://www.ecma-international.org/publications/files/ECMA-ST...

Disclaimer: I have done research on JS, but I do not study or develop compilers.

DiThi · on Oct 23, 2017

> There are a ton of implicit casts

Very easy to avoid if you're compiling a statically typed language to JS.

> No integers

Not in the JS specification (outside typed arrays), but every JS JIT works in a way you can actually declare variables as 32 bit signed integers, and made its way into the asm.js specification. They are declared like this:

    var a = value|0;

Where the |0 is a no-op so not actually done, just a type hint.

If what worries you is precision and not performance, doubles allow 53 bit integers (not counting sign bit) with full precision.

> Say you have a demo page for some language, and someone using the page writes an infinite loop. Don't want the page to crash? Welcome to advanced compilation techniques like CPS transformations and trampolining.

Or just use a web worker, which you can terminate if you haven't heard back in a while (pun intended).

> and any part of the language you target for which you don't fully understand the spec is a potential bug

It heavily depends on what type of language you do. If your compiler tracks the types and doesn't mix them, edge cases are much, much easier to avoid.

Source: I did make a compiler for my own language that targeted JS.

flavio81 · on Oct 23, 2017

>Not in the JS specification (outside typed arrays), but every JS JIT works in a way you can actually declare variables as 32 bit signed integers, and made its way into the asm.js specification. They are declared like this:

The "|0" trick you mention is not for javascript; it is for asm.js; to be able to declare such a "true integer" variable, your code would need to be in asm.js, not javascript.

Javascript has no integers, only floating point numbers. This is a very strong limitation.

DiThi · on Oct 23, 2017

What would entail to declare a "true integer"?

Correctness? The |0 after each operation makes it correct. All bitwise operations in JS operate on signed 32 bit integers.

Speed? All JS JITs have optimizations for integers, and it's the reason asm.js uses that trick, not the other way around.

Precision? If you don't use bitwise operations, you have up to 53 bits of perfect integer precision, plus bit sign. Guaranteed by the standard.

merb · on Oct 23, 2017

> Where the |0 is a no-op so not actually done, just a type hint.

if value is null / undefined / empty object / empty array, empty string... | 0 wouldn't be no-op it would actually assign 0 to a.

dbaupp · on Oct 23, 2017

When JS is a compiler target, that truck will generally only be used if the source language excludes all those edge cases, i.e. the value is guaranteed to be an integer.

michaelmior · on Oct 23, 2017

> It's pretty easy to compile an untyped language to a typed language

Did you mean the other way around? Certainly it's possible to compile an untyped language to a typed language, but it's nontrivial especially in the presence of duck typing.

comex · on Oct 23, 2017

I think they mean by using a single generic type for everything, like you'd see in an interpreter for an untyped language. It's pretty easy, but it's also very slow...

michaelmior · on Oct 23, 2017

Fair point. That approach is pretty easy :)

qualitytime · on Oct 23, 2017

"want your target language to be dumb, tiny, and explicit."

What would be your choices?

fasquoika · on Oct 23, 2017

Not the parent, but I've been reading the WebAssembly spec and that sounds like a perfect description. It's honestly probably one of the more ideal compiler targets

dualogy · on Oct 23, 2017

Any one language of your choosing that you extract a dumb-tiny-and-explicit sub-set out of to target =)

TazeTSchnitzel · on Oct 22, 2017

For one thing, JS has no unstructured control-flow. So your compilation process involves breaking down ifs, loops and so on into branches, then… trying to messily reconstruct them.

comex · on Oct 23, 2017

> For one thing, JS has no unstructured control-flow. So your compilation process involves breaking down ifs, loops and so on into branches, then… trying to messily reconstruct them.

An omission which, incidentally, is also intentionally present in WebAssembly. There are supposedly good reasons for it, but I still find it really disappointing.

see also: https://github.com/WebAssembly/design/issues/796

dualogy · on Oct 22, 2017

Well, ouch, a transpiler writer has to code up a bit of boilerplate, most of it just once early on in the project's life-time.. "too bad"! IMHO reconstructing stuff may be a somewhat tricky challenge, but there's no intrinsic need for it to be "messy" regardless of the target language? I must be missing something here still.. =)

I'm doing transpilation to Go right now, so often I think "much of this would have been much simpler to get done if I transpiled instead to an anything-goes scripting language". (Reason of course being I want to emit idiomatic human-written-like code, and working with mostly-incomplete type information coming in, still reconstruct types rather than pass-and-return-typeless-boxed-values-around messily.) A lot of this is pretty "messy" right now, but I place all of the blame for that on me (guess I prefer rodeo-ing into it rather than "sitting down and writing a formal paper on it first"), neither the target or source language.

TazeTSchnitzel · on Oct 22, 2017

> there's no intrinsic need for it to be "messy" regardless of the target language

The generated code with “re”constructed control flow is going to be a mess and you can't really help it. Worst-case it'll be a `while(1) { switch(i) {` type of thing.

DiThi · on Oct 23, 2017

Unstructured control flow is easily emulated with switch statements. In fact it's what emscripten did long before asm.js was made, IIRC.

TazeTSchnitzel · on Oct 23, 2017

Emscripten has a specially-designed algorithm called relooper for reconstructing control flow (there’s a paper on it!). Switch statements are its last resort.

sjrd · on Oct 23, 2017

Or ... you don't break down ifs, loops and so on into branches to begin with. Scala.js has an optimizing compiler that doesn't do that (and I've heard a bunch of compiler people who were really impressed at the level of optimizations we can do without breaking down control flow into branches).