Hacker News new | past | comments | ask | show | jobs | submit login

How so? Interested in your elaboration. I would have surmised it'd be very smooth and fun to target, given all the freedoms it gives you (free-form records, dynamically-sized arrays, no types). All the things that are "foot guns" when hand-writing it, should make codegen-ing it a lot easier. Because when a code-gen "shoots yourself in the foot", one can just fix the transpiler once instead of one's individual JS code-base(s) time-and-again



> free-form records, dynamically-sized arrays, no types

AFAIK, free form records and lack of types do not an easier compilation target make. It's pretty easy to compile an untyped language to a typed language.

> How so?

- There are a ton of implicit casts that are almost certainly not what you want. So, for example, if you're going to use addition in the compiled code, it's a 10-step process [1, section 12.8.3.1], and you should be careful not to trigger the 9 steps that do things other than addition.

- There is, last I heard, no way to determine the size of the stack.

- No integers.

- Say you have a demo page for some language, and someone using the page writes an infinite loop. Don't want the page to crash? Welcome to advanced compilation techniques like CPS transformations and trampolining.

Overall, JS has a ~850 page spec, and any part of the language you target for which you don't fully understand the spec is a potential bug. Instead, you want your target language to be dumb, tiny, and explicit.

[1] http://www.ecma-international.org/publications/files/ECMA-ST...

Disclaimer: I have done research on JS, but I do not study or develop compilers.


> There are a ton of implicit casts

Very easy to avoid if you're compiling a statically typed language to JS.

> No integers

Not in the JS specification (outside typed arrays), but every JS JIT works in a way you can actually declare variables as 32 bit signed integers, and made its way into the asm.js specification. They are declared like this:

    var a = value|0;
Where the |0 is a no-op so not actually done, just a type hint.

If what worries you is precision and not performance, doubles allow 53 bit integers (not counting sign bit) with full precision.

> Say you have a demo page for some language, and someone using the page writes an infinite loop. Don't want the page to crash? Welcome to advanced compilation techniques like CPS transformations and trampolining.

Or just use a web worker, which you can terminate if you haven't heard back in a while (pun intended).

> and any part of the language you target for which you don't fully understand the spec is a potential bug

It heavily depends on what type of language you do. If your compiler tracks the types and doesn't mix them, edge cases are much, much easier to avoid.

Source: I did make a compiler for my own language that targeted JS.


>Not in the JS specification (outside typed arrays), but every JS JIT works in a way you can actually declare variables as 32 bit signed integers, and made its way into the asm.js specification. They are declared like this:

The "|0" trick you mention is not for javascript; it is for asm.js; to be able to declare such a "true integer" variable, your code would need to be in asm.js, not javascript.

Javascript has no integers, only floating point numbers. This is a very strong limitation.


What would entail to declare a "true integer"?

Correctness? The |0 after each operation makes it correct. All bitwise operations in JS operate on signed 32 bit integers.

Speed? All JS JITs have optimizations for integers, and it's the reason asm.js uses that trick, not the other way around.

Precision? If you don't use bitwise operations, you have up to 53 bits of perfect integer precision, plus bit sign. Guaranteed by the standard.


> Where the |0 is a no-op so not actually done, just a type hint.

if value is null / undefined / empty object / empty array, empty string... | 0 wouldn't be no-op it would actually assign 0 to a.


When JS is a compiler target, that truck will generally only be used if the source language excludes all those edge cases, i.e. the value is guaranteed to be an integer.


> It's pretty easy to compile an untyped language to a typed language

Did you mean the other way around? Certainly it's possible to compile an untyped language to a typed language, but it's nontrivial especially in the presence of duck typing.


I think they mean by using a single generic type for everything, like you'd see in an interpreter for an untyped language. It's pretty easy, but it's also very slow...


Fair point. That approach is pretty easy :)


"want your target language to be dumb, tiny, and explicit."

What would be your choices?


Not the parent, but I've been reading the WebAssembly spec and that sounds like a perfect description. It's honestly probably one of the more ideal compiler targets


Any one language of your choosing that you extract a dumb-tiny-and-explicit sub-set out of to target =)


For one thing, JS has no unstructured control-flow. So your compilation process involves breaking down ifs, loops and so on into branches, then… trying to messily reconstruct them.


> For one thing, JS has no unstructured control-flow. So your compilation process involves breaking down ifs, loops and so on into branches, then… trying to messily reconstruct them.

An omission which, incidentally, is also intentionally present in WebAssembly. There are supposedly good reasons for it, but I still find it really disappointing.

see also: https://github.com/WebAssembly/design/issues/796


Well, ouch, a transpiler writer has to code up a bit of boilerplate, most of it just once early on in the project's life-time.. "too bad"! IMHO reconstructing stuff may be a somewhat tricky challenge, but there's no intrinsic need for it to be "messy" regardless of the target language? I must be missing something here still.. =)

I'm doing transpilation to Go right now, so often I think "much of this would have been much simpler to get done if I transpiled instead to an anything-goes scripting language". (Reason of course being I want to emit idiomatic human-written-like code, and working with mostly-incomplete type information coming in, still reconstruct types rather than pass-and-return-typeless-boxed-values-around messily.) A lot of this is pretty "messy" right now, but I place all of the blame for that on me (guess I prefer rodeo-ing into it rather than "sitting down and writing a formal paper on it first"), neither the target or source language.


> there's no intrinsic need for it to be "messy" regardless of the target language

The generated code with “re”constructed control flow is going to be a mess and you can't really help it. Worst-case it'll be a `while(1) { switch(i) {` type of thing.


Unstructured control flow is easily emulated with switch statements. In fact it's what emscripten did long before asm.js was made, IIRC.


Emscripten has a specially-designed algorithm called relooper for reconstructing control flow (there’s a paper on it!). Switch statements are its last resort.


Or ... you don't break down ifs, loops and so on into branches to begin with. Scala.js has an optimizing compiler that doesn't do that (and I've heard a bunch of compiler people who were really impressed at the level of optimizations we can do without breaking down control flow into branches).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: