There is actually a way to produce CPS without lots of administrative redexes, w...

upghost · 2024-12-26T15:57:54 1735228674

Sounds like this is for a syntactic or metaprogramming transformation over source code? I assume this implies you are either working with a homoiconic language or you have already converted the source to AST?

However it sounds like a very portable technique, would be interested to see it.

tekknolagi · 2024-12-26T15:48:29 1735228109

Please show this gist. I would enjoy linking to it

Joker_vD · 2024-12-27T00:14:12 1735258452

Okay, there it is: [0]. It mostly implements a version of CPS transform from "Compiling with Continuations, Continued" by A. Kennedy [1], figure 8. Two main differences in the output language from the one in your post is that, first, this version binds everything to temporaries (including literals and continuations), and second, it uses "let" (with multiple definitions, so it's actually "let*") as the primary name-binding feature instead of immediate application. And I use name "$ret" instead of "$call-cont".

The main point of interest is the use of CpsExprBuilder as an accumulator and explicit looping over sub-expressions instead of using meta-continuations, as is popular in most papers on CPS transformations. This is not the only possible way to implement such CpsExprBuilder: another option would be to have a mutable CPS expression with a pointer into it, and gradually grow it. But since you can't just take a pointer to a list's element in Python, you would probably have to keep a whole path (as, say, an array of indexes), and re-traverse the tree every time you append to it, like this:

    result = {
        'expr': ['let', [['#t0', 2], ['@k0', ['cont', ['#t1'], None]]], ['f', '#t0', '@k0']],
        'path': [1, 1, 1, 2],
    }

    cursor = result['expr']
    for index in result['path'][:-1]:
        cursor = cursor[index]
    cursor[result['path'][-1]] = ['g', '@halt', '#t1']

I've tried this approach, and even when it's encapsulated in a class, it's way too easy to accidentally end up with a random None somewhere deep inside your resulting expression. So instead, the expression's tree is turned inside out, so to speak. It's similar to using an explicit stack instead of using recursion.

[0] https://gist.github.com/Joker-vD/7cddcfada042ed486dd74690ae9...

[1] https://www.microsoft.com/en-us/research/wp-content/uploads/...

tekknolagi · 2024-12-27T15:31:56 1735313516

Taking a look now. A couple of questions:

* Why give literal ints their own temporaries? * Why not support lambda expressions? Is it tricky in this approach or just accidentally missing?

Joker_vD · 2024-12-28T13:08:40 1735391320

> Why give literal ints their own temporaries?

Just following the paper's convention (their justification being that now they only need to substitute variables for variables which is simpler), plus this gist is a trimmed down version of my hobby project which had full-blown algebraic data types: the integers were handled by that case as well. You absolutely can merge "int" case with the "var" case above, and use ints as themselves:

    if isinstance(expr, str) or isinstance(expr, int):
        result = CpsExprBuilder()
        return maybe_tail(result, expr)

The second argument in maybe_tail would need a better name though...

> Why not support lambda expressions?

They were missing in your write-up as well :) I just went by your text, and look at the code snippets in it. But adding them looks like this:

    if what == 'lambda':
        args, body = rest
        result = CpsExprBuilder()
        tmp_name, ret = gen_tmp(), gen_cont()
        fun = ['fun', [*args, ret], cps(body, ret)]
        result.add_let_without_body([[tmp_name, fun]])
        return maybe_tail(result, tmp_name)

Some test cases I've tried look like this then:

    ['lambda', ['x'], 'x'] =>
    let
      #t0 = fun(x, @k1):
        $ret @k1(x)
    in $ret @halt(#t0)
    
    [['lambda', ['x'], 'x'], 2] =>
    let
      #t0 = fun(x, @k1):
        $ret @k1(x)
      #t2 = 2
    in #t0(#t2, @halt)
    
    ['let', ['f', ['lambda', ['x'], 'x']], ['f', 2]] =>
    let
      @k0 = cont(f):
        let
          #t3 = 2
        in f(#t3, @halt)
      #t1 = fun(x, @k2):
        $ret @k2(x)
    in $ret @k0(#t1)

By the way, if you don't like creating contiunations just to immediately invoke them, you can adjust make_LET adjusted to flatten that pattern as well: check if the body is '$ret' with the continuation name that's in the defns list, replace body with this continuation's body (with variables substituted!) and throw its definition away.

    [['lambda', ['x'], 'x'], 2] =>
    let
      #t0 = fun(x, @k1):
        $ret @k1(x)
      #t2 = 2
    in #t0(#t2, @halt)
    
    ['let', ['f', ['lambda', ['x'], 'x']], ['f', 2]] =>
    let
      #t1 = fun(x, @k2):
        $ret @k2(x)
      #t3 = 2
    in #t1(#t3, @halt)

But this also throws the programmer-supplied names away during the substitution, sadly. Preferring to keep programmer-supplied names is possible, of course, but it requires both tracking them and making sure they don't collide with temporaries or other programmer-supplied names so... this simplification step is better performed in the optimizer after the CPS-conversion.

tekknolagi · 2024-12-31T12:34:12 1735648452

I'll get back to this in more detail soon, but I did have lambda expressions in my blog post...