Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> There is no reason for a compiler to limit itself to the system ABI for completely internal functions.

As far as I understand it, this limitation is also the only thing preventing tail-call elimination in C/C++. Tail-recursive functions are essentially the same as 'while' loops (execute some instructions then conditionally go back to the start); they just have different scope and calling conventions. In particular, 'while' loops are always inline and can't be referenced from other functions, but tail-recursive functions can. Hence compilers can apply more aggressive optimisations to 'while' loops, e.g. re-using the same stack frame and iterating with a conditional jump.

However, as you say, functions-as-subroutines don't necessarily need to coincide with functions-as-interface. Tail-recursive functions which are only used internally can also be optimised to re-use the same stack frame and iterate with a conditional jump. If such functions are also exposed in the interface, it's pretty trivial to expose a dummy entry point which invokes the optimised version (if we want to); although cross-unit tail calls wouldn't be eliminated in that case.

Of course, this is more useful than just writing 'while' loops in a different style (although I personally prefer recursion https://news.ycombinator.com/item?id=11150346 ), since tail calls don't have to be recursive: we can split our code up into modular, composable, self-contained functions without having to worry about stack usage (or code bloat caused by inlining into multiple locations).



> As far as I understand it, this limitation is also the only thing preventing tail-call elimination in C/C++.

Nothing "prevents" tail-call elimination in C. Compilers do it all the time.

> Tail-recursive functions which are only used internally can also be optimised to re-use the same stack frame and iterate with a conditional jump. If such functions are also exposed in the interface, it's pretty trivial to expose a dummy entry point which invokes the optimised version (if we want to)

Can you illustrate with a concrete example what you mean? If the function doesn't need a stack frame, there will be no stack frame either way. If the function needs a stack frame, that stack frame must be allocated in the function's start block either way, and tail recursive calls will of course not jump to that start block but to its successor. I don't see how being "exposed in the interface" (in C speak, static vs. non-static) would make a difference here.


The issue is basically that tail-recursive functions need to have this information recorded in their signature (for external interfaces), but this is not recorded in C and thus you can't provide this guarantee at the interface boundaries.


This is not concrete in any way. What guarantee do you mean? What "this information" should be "recorded in their signature"? What does this murky "information" have to do with enabling tail recursion?

Here's a tail recursive function C function that is compiled to tail recursive machine code without any problems: https://gcc.godbolt.org/z/dc886a

This function can be called from external compilation units without any problems. Whether or not it is compiled to tail recursive code is an implementation detail that the caller doesn't need to know about.


> As far as I understand it, this limitation is also the only thing preventing tail-call elimination in C/C++.

That is not the case. Guaranteed TCE requires deallocating the stack frame before jumping to the target, but that is not possible when an object is allocated in that frame and its address passed to the target function.

In C++ there is also the issue of destructors needing to run after the tail-call returns (in which case it is not really a tail-call).

C/C++ compilers can and do eliminate tail calls where possible, but there's no guarantee like you get from a compiler for a functional language.


> that is not possible when an object is allocated in that frame and its address passed to the target function

Ah of course, I hadn't thought about live aliases to stack-allocated data.

You're right about destructors; I agree that they're not really tail-calls (since we're processing results).

> C/C++ compilers can and do eliminate tail calls where possible, but there's no guarantee like you get from a compiler for a functional language.

Yes, I tend to say "tail-call optimisation" when it's opportunistic (e.g. GCC/Clang), and "tail-call elimination" when it's guaranteed (e.g. Scheme/ML).


>As far as I understand it, this limitation is also the only thing preventing tail-call elimination in C/C++.

At the risk of introducing a tangent, can someone explain how compilers detect tail recursion? This is still a complete mystery to me.


What part of the detection is giving you trouble? Detecting whether a call is recursive, or whether it is a tail call that can be optimized? The first part is relatively easy, you typically have some sort of data structure representing calls and their targets, and you know what function you're currently compiling. If the function you're compiling is the same as a call's target, you have recursion.

LLVM's code is at https://llvm.org/doxygen/TailRecursionElimination_8cpp_sourc...

The core of this detection is in findTRECandidate. It iterates backwards over a basic block, looking for call instructions whose call target is the current function F:

    CallInst *CI = nullptr;
    BasicBlock::iterator BBI(TI);
    while (true) {
      CI = dyn_cast<CallInst>(BBI);
      if (CI && CI->getCalledFunction() == &F)
        break;
  
      if (BBI == BB->begin())
        return nullptr;          // Didn't find a potential tail call.
      --BBI;
    }
The harder part is detecting whether calls are in fact tail calls that you can replace by jumps. There are all sorts of special cases, like whether alloca() might be called by the function.


Sorry, I should have been more precise: I was referring to detecting that a function is tail recursive.

(Yours is still a helpful explanation, though -- thanks!)


I can't say how LLVM does it but in a compiler I built I detected tail recursion by scanning the IR and if the value of a recursive call (ie: call to self) was immediately returned it was considered a candidate for TCE. If the value was consumed by the parent, it wasn't.


I'd recommend "Lambda, the ultimate GOTO" https://dspace.mit.edu/handle/1721.1/5753


Reading this now, cheers! Is there anything in particular I should be looking for, or will it jump out at me (bearing in mind that I've never written anything more complex than an unoptimized Lisp interpreter) ?


As I said, tail-recursion is just a particular use of tail calls. Tail calls are very simple: anything of the form 'return foo(...)'; or, in languages without 'return', a function call which is the at the end of the definition (i.e. the last step).

All we require is that the current function is calling some other function, and passing on its return value unchanged. For example, let's take a simple function:

    int foo(int x) {
      return bar(x);
    }
'return bar(x);' is a tail-call, since 'foo' isn't doing anything with the result; calling 'foo' is equivalent to calling 'bar'. In this case the compiler could actually replace every occurrence of 'foo' with 'bar' (e.g. inlining it), but we can't do that in general. However, when we call 'bar' we certainly don't need to keep any of 'foo's machinery around (stack frames, local variables, register contents, etc.); we can just start executing the code for 'bar' (i.e. a jump instruction).

More realistically, 'foo' and 'bar' might not be equivalent, e.g.

    int foo(int x) {
      int y = baz(x);
      return bar(x + 3 * y);
    }
Here we can't just replace calls to 'foo' with calls to 'bar', since they do different things. However, notice the order that things will run in: (1) 'baz(x)' will be called, (2) 'x + 3 * y' will be calculated, (3) 'bar' will be called. Even though these 'foo' and 'bar' functions are very different to begin with, once we reach step three (the call to 'bar') the remaining instructions are the same (since 'foo' is calling 'bar' and returning its result unchanged; i.e. the definition of a tail-call!). Hence we can still discard 'foo's machinery and just jump to 'bar' (e.g. we can throw away 'y'; and the compiler can ensure the result of (2) is put where 'bar' expects its argument).

If 'foo' and 'bar' are the same function then we happen to have tail recursion. For example:

    uint fac(uint n) {
      if (n) return fac(n-1);
      return 1;
    }
Here 'return fac(n-1)' is a tail-call; it also happens to be tail-recursion, since it's calling the 'fac' function from within the 'fac' function. We don't need to keep the stack frame for 'fac(n)' around, we can just overwrite it with the stack frame of 'fac(n-1)', and so on.

We can also have mutually tail-recursive functions, e.g.

    uint even(uint n) {
      if (n) odd(n-1);
      return 1;
    }

    uint odd(uint n) {
      if (n) even(n-1);
      return 0;
    }
This is harder to spot than direct tail-recursion, but it doesn't matter since tail-recursion isn't particularly special. As long as we implement tail-calls efficiently, then the benefits apply to tail-recursive functions (like 'fac') and mutually tail-recursive functions (like 'even' and 'odd'), as well as many other places which just-so-happen to use tail-calls.

Intuitively, any time we find ourselves using a "main loop", or "driver loop", etc. which just calls out to a sequence of functions, possibly passing the return values of one into arguments of the next, that's equivalent to having each of those functions make a tail-call to the next one (i.e. such loops are "trampolines").


> Tail calls are very simple: anything of the form 'return foo(...)'…

It's a bit more complicated than that since there may be destructors (in C++, or C with certain extensions) or stack-frame cleanup which occur between 'foo' returning and the 'return' statement itself. Tail call elimination is only possible when the stack frame can be freed before the final function call; if the address of any local variable escapes the current function then it might not be an option even if the source otherwise appears tail-recursive. TCE is simpler in garbage-collected languages which allocate their stack frames from the heap (and typically lack precise destructors) since these cleanup operations can be left to the GC.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: