His opposition to recursion is well founded on the "principle of least surprise" that also underlies Python. Consider, e.g.
def len1(list, acc=0):
if not list: return acc
else: return len1(list[1:],acc+1)
def len2(list, acc=0):
if not list: return acc
else: return len2(list[1:],acc+1) + 0
First one is using only tail calls, and with proper TRE will run on any sized list. The second one will blow up the stack, despite being equivalent in almost every sense, especially in those senses (of "ease of proof") touted as the main reason to use tail recursion in the first place.
And while this toy examples is easy to reason about, real world examples require more experience to understand - and Python was always intended at non CS people (and is hugely successful in that mission).
Personally, I think the whole discussion is misguided. TRE should be explicit, not some kind of implicit optimization - that is, no TRE/TCO unless you explicity use e.g. a "chain" return keyword:
def len3(list, acc=0):
if not list: return acc
else: chain len1(list[1:],acc+1)
This way, a compiler could warn you about a syntax error should you rely on TRE and it is impossible or not available; Same thing could (and should) be done in scheme/lisp etc; It's easy to shoot yourself in the foot with macros and tailcalls in scheme.
"chain" might not be the right keyword for this, but I haven't found a better one ("tail-return", "return-only", "continue-with" don't seem any better"; maybe reuse "pass"? e.g. "return pass len1(..)" or "pass return len1(...)"
Actually, if you have a compiler with both TRE and constant folding, len2 will run on any sized list, if the optimizer runs constant folding before TRE or interleaves them. Which I guess is your point, that your program's behavior becomes unreasonably dependent on implementation details.
However, I'll point out that this problem occurs in many situations already. In GC languages, a single allocation can trigger a GC that will lock up the CPU for 50-100ms; this is a particular problem with Javascript animations. In non-GC languages (like Python, Objective-C, or C++), you can get release cascades where the destruction of one object causes the reference counts on many others to go to zero, which causes the same problem. A single blocking call will make your server sit idly. Using a library function with O(N) behavior (like, say, string concatenation in many languages) in a loop will make your algorithm accidentally quadratic.
Practically speaking, the way we deal with these performance gotchas is to instrument the code heavily, profile, and measure the working system as a whole. I've got some sympathy for the "languages shouldn't have dangerous gotchas" viewpoint, but I think that most of the time it ultimately results in premature micro-optimizations.
> Which I guess is your point, that your program's behavior becomes unreasonably dependent on implementation details.
Exactly.
> Practically speaking, the way we deal with these performance gotchas is to instrument the code heavily, profile, and measure the working system as a whole.
Yes, but that's the wrong way to deal with it. Especially with today's giant memories, you might run testing on a 512GB machine with a billion element list, not realize there's no TRE and it will fail on a machine with "only" 64GB of memory that is perfectly capable of running the same computation if TRE/TCO was done (but wasn't).
I want simple-to-remember-and-use semantics, and implicitly relying on tail recursion semantics does not provide it. O() guarantees are enough (and should be properly documented) - I don't need deterministic time or space guarantees beyond order-of-magnitude precision.
If you need deterministic timing in the 100ms range, most programming languages are not for you. If you need them within 10ms, most operating systems aren't. And if you need them within 100us, most CPUs aren't.
Summing up my 23 years of Python experience, noticeable pauses due to release cascades are extremely uncommon (unlike Javascript / Java pauses) - unlike Java/Javascript, with the exception of free() and a few minor loops, everything that runs is of _your_ making, which makes it easier to reason about and control.
And back to the subject:
My "chain" (or whatever) suggestion above, which I (and others) have made numerous times is basically a static-vs-dynamic typing assertion about how a subroutine uses the call stack at that point: "need it later", "don't need it later". Even for a dynamic language like Python, it makes sense to make this static because it cannot change with argument types. The incoming call stack can either be dropped at a location, or not, independent of inputs. I am surprised it is often countered (as you did here) with "yes, but we can test it so it's not a big deal". It's not a big deal, but it is also comes at basically no cost, AND makes a requirement self-describing and easily verifiable by a static compiler.
return f(a, b, c) # tailcall on f
return f(g(b)) # tailcall on f
return f(a, b, c) + 0 # tail call on +
The last function to be called before a return is tailcall. That's it. Nothing more. It doesn't matter that the return "expression" can be more complex, only the last call is concerned. It's a simple rule.
Now, regarding your concern for the static analysis of stack allocation: This is not a feature you have for any other aspect of the language... you have no way of specifying your expected memory consumption. But you do have your experience, you know the traps, you have learned the patterns to look for. Sure, the annotation could be used by a static linter, but builtin static analysis is not a feature in Python. Why should tailcalls be a syntaxic exception?
(Note: tail recursion is a special case of tail calls! The annotation you are thinking about would be more interesting as a function annotation, as you really don't want to track each of your recursive returns to state your global expectation.)
What you are asking for is common for new language features: You want a red flag, something that highlight the semantic prominently, so that you don't forget and are forced to think about something you aren't used to. Please have a look at your python code: how many returns end with a call to a function? Are you going to decorate each of them with "tailreturn"? It's really a free optimization, your program will run faster and consume less memory once it is there.
(Is Python even specifying that its implementation should use a stack?)
In the presence of try (also with, and perhaps a few others I can't think of right now) even the examples you gave cannot be TCEd.
Personally, I consider broken a language in which enclosing a construct with "try: .... finally: pass" to switch between constant memory and endless memory requirements.
That is already true for scheme in a way. Guido's refusal to break python in this sense is wholly supported by me, even if that's not his reason to refuse.
And "return from" suggested by another poster is the perfect pythonic syntax for such a thing.
And I definitely thing it is a property of the specific call site, NOT the entire function, whether it should be checked for tailness. Decorators are the wrong solution here.
What's so invisible about your examples? I agree that an optimizing interpreter makes the situation harder to analyse, because we can't remember every simplification it handles. Without it, the tailcalls are very explicit.
Previously, you argued that Python semantics makes it very clear that all costs are visible. This seems to be against any rewrite rule, as the visible cost ("or False", "+ 0", "finally: pass") suddenly disappear. You are using your own rebuttal as an argument to make tailcalls look harder to analyse! They are not.
> And I definitely thing it is a property of the specific call site, NOT the entire function, whether it should be checked for tailness. Decorators are the wrong solution here.
As a caller of a function, the property you care about is "does it consumes a fixed amount of stack?". As the writer of this function, that's the thing that matters in the end: Sure, that means checking each tail calls, but you really want the global guarantee that you didn't forgot one. The local explicit annotation becomes redundant.
> But tailcalls already have a syntax:
>
> return f(a, b, c) # tailcall on f
However, this tail call cannot be eliminated in the context of a try: or with: scope. Ergo, it is NOT tail call syntax. Using it does not guarantee that the tail call can be eliminated:
def write_list_to_file(file, list):
"Looks weird, but it has reason to do this based on overall system architecture"
try:
item = list[0]
list[0] = None
file.write(item)
return write_list_to_file(file, list[1:])
finally:
list[0] = item
Despite being compatible with your "tailcall syntax", this is not actually a tailcall. Ergo: no tailcall syntax.
Proponents of tail calls almost always ignore a very simple practical consideration: If you rely on a tail call being eliminated and it isn't, then your program becomes incorrect, in the sense that it goes from needing constant memory to needing infinite memory.
I simply cannot fathom what argument one might have against an annotation "this tail call must be eliminated for this program to be correct" that can easily be verified by the compiler (is the compiled call site CALL instruction followed by a RET opcode? if so, it's a tail call. If not, it's not!). Not doing so means that an enclosing context can change the meaning of the (recursive equivalent to) "while 1:" to "for x in range(MEMORY): ...; raise OutOfMemoryException();" based on compiler optimization capability.
> You are using your own rebuttal as an argument to make tailcalls look harder to analyse! They are not.
No, I claim that they are extremely easy to analyze for a compiler; They are not as easy for a reader (with/try context 50 lines away converts a tail call into a non tail call. You have to both be aware of the context and of this possibility - or you will be unwittingly replacing a while loop with a limited for loop). If you claim that it is easy to analyze for a reader, then you are not considering Python's original and still significant audience: non programmers.
> As a caller of a function, the property you care about is "does it consumes a fixed amount of stack?".
Yes, you do. And as a caller of a function, you don't see or control its attributes in any way, so that isn't an argument for or against decorators.
> As the writer of this function, that's the thing that matters in the end: Sure, that means checking each tail calls, but you really want the global guarantee that you didn't forgot one. The local explicit annotation becomes redundant.
You have two wrong assumptions here:
1) That if all returns are tail calls, then constant stack use is guaranteed.
2) That I actually want every return to be a tail call
Neither of these are necessary. Therefore, while such a decorator may be useful for your style, it does not make a local explicit annotation redundant.
Consider a very sparse tree data structure with n nodes, and 0-500 children for every node, but with only log(n) of the nodes having more than one child. (It arises in practice e.g. when building a suffix array on some kinds of inputs; The Patricia tree is an optimization of this). Computing on this structure can be done as follows, assuming some special case for 2-children nodes:
def collect(node, acc):
if node.count_children() == 0:
return acc + node.value
if node.count_children() == 1:
return collect(node.get_child(0), acc + node.value) # this is key to memory use
if node.count_chilren() == 2:
return acc + PAIR_START + collect(node.get_child(0), 0) + collect(node.get_child(1), 0) + PAIR_END
acc = acc + MULTI_START
for child in node.get_children():
acc = acc + collect(child, 0)
acc = acc + MULTI_END
return acc
Now, the key collect() call determines overall memory use; if there's TCE, it will be overall O(log n). If there isn't, it will be O(n). If you add a try: finally: (with actual code, not finally:pass) it will become O(n) regardless.
The example is realistic, although somewhat contrived to be compact.
If your argument isn't "well, you don't need new syntax and help from the compiler if you just pay attention and remember not to use with/try if you ever need guaranteed tail calls", then please state it more clearly because I haven't understood it.
And if that's your argument, then I think it is technically right but practically wrong (in the same sense that saying "python is useless, you can do everything in assembly" is technically right and practically wrong)
> If your argument isn't "well, you don't need new syntax and help from the compiler if you just pay attention and remember not to use with/try if you ever need guaranteed tail calls", then please state it more clearly because I haven't understood it.
To clarify: Yes, I believe that whichever stack release semantic is chosen (tailcall optimization or the current behaviour) triggers an understanding of its limitation. Languages that support tailcalls natively yield programmers that "see" tailcalls (so they don't think about it more than necessary), and non-tailcall support yield programmers that "see" stack allocation on recursion (but they don't think about the stack more than tailcall programmers do.)
Now, we both like the "default" behaviour that we have practised the most, because we really understand its limitation. When confronted with the other semantic, we can make it work, but it's slower to trigger our reflexes, and causes frustration:
A tailcall programmer forced to use a non tail-call language will have to "compile" his tail recursion into a loop. A non-tailcall programmer forced to use a tailcall language will gain free stack, but won't have to rewrite his algorithm.
Sure, finally breaks the tailcall syntax, just like it breaks the syntax rule "the function ends on a return".
You are right that I haven't stated my motivation for arguing about tailcalls:
- I really believe that tailcalls should be supported by Python. Recursion is too useful, and it's hard enough to teach without suffering the second class behaviour that Python currently offers.
- A new syntax is harder to sell to programmers and the python team, because it says "tailcalls are not the norm". I claim that "return f()" is used way too much to justify the cost of writing explicitly "return from f()" for a free optimization.
- A beginner doesn't need to know about tailcall to benefit from it, just like we don't teach them about the stack until they hit a stackoverflow. If you have two semantics with respect to stack release, it becomes harder to teach than if you choose only one (it triggers the question "why would you ever NOT write "return from" when you can?")
You seem to disagree that tailcalls should be optimized everywhere possible, but I'm not sure if it's because you don't care about writing "return from" in general (and just want to enforce it when it's actually observable), or think that "return from" isn't always an optimization (but I'm really missing a counter-example?), or really believe that tailcalls makes the semantic of the stack harder because of things like "finally" (in which case I think we both suffer from a bias towards the stack-release semantic that we are used to.)
If Python had tailcalls by default, would we have an argument for a "return after" syntax?
I like static analysis. I just don't think tailcalls require a more explicit annotation than any other language feature: writing "return from collect()" in your example doesn't clarify your intent, it just triggers a warning when someone breaks the tailcall (but he can still break the stack complexity through other means.) You do have to state your global property somewhere: the local tailcall annotation allow you to prove it today, not guarantee its survival upon software evolution. And as a user of your library, my static analyser could benefit from a function annotation on your side, but it won't care about your usage of "return from".
> You seem to disagree that tailcalls should be optimized everywhere possible
I don't disagree, assuming there's a way to turn that off for debugging. I love it when things run faster and take less memory.
I disagree that "there's already a syntax for tail call" (that can be relied upon if Python did TCE) because in Python there isn't. Therefore, even if Python had TCE, it would be brittle (a small change 50 lines away could make it into a non-tail call) and it would be bad style to rely on it as a programming mechanism.
I've used common lisp with tailcalls. They work way better there than they would in Python, although there are still interaction problems with conditions/restarts/continuations/macros, mostly as invisible as they would be in Python. (Though, it's been something like 20 years, my memory is hazy).
My claim can be summarized as follows:
1. TCO is nice and awesome. But it is an OPTIMIZATION, and shouldn't be a central language feature you rely on, unless it is easy to guarantee it happens.
2. If you want to rely on TCE as a general programming technique, you should be able to say "and if this thing isn't a tail call, don't compile" or "and if this thing isn't a tail call, give me a warning".
3. A "return from" in Python could be an excellent way to implement (2) for Python
4. I think a special form for Scheme that says "I rely on the following tail call being eliminated" would have been useful, as it is easy to shoot yourself in the foot relying on it while e.g. using macros - where something that looks like it is a simple call expands to a non-trivial expression. Luckily, in CL (and I assume in Scheme as well), you could do that at the library level with a relatively simple code walker.
The Python Library-Level implementations that I've seen use a decorator to add the trampoline and need you to replace "return f(a,b,c)" with "raise TailCall(f, a, b, c)" (or something to that effect - you can get a much nicer syntax with some limitations). Ugly, inefficient, but available if you really need it.
'If your argument isn't "well, you don't need new syntax and help from the compiler if you just pay attention and remember not to use with/try if you ever need guaranteed tail calls", then please state it more clearly because I haven't understood it.'
I don't have much of a horse in this race...I've just been watching the debate going on in the comment threads...but I'll point out three analogous situations:
In C++, we have the keyword 'virtual' because we needed a syntactic marker to indicate whether a function may be overridden. There are two main pragmatic reasons for this: it lets the compiler keep the existing default of being able to jump directly to a known address in the common case (i.e. you don't pay for functionality you don't use), and it lets you know whenever the code that is executed may not be the code that you're staring at. If you rely on a particular method being called and it is overridden, then your program becomes incorrect, in the sense that it may do something completely arbitrary that you never expected.
In Java, we have the keyword 'new' because we needed a syntactic marker to indicate whenever heap allocation may occur. Without it, any object construction may trigger a long-running garbage collection. If you rely on object construction being a fast operation and it triggers a GC, then your program becomes incorrect, in the sense that it may not return results in the timeframe necessary. This is why Java faced significant barriers to entering the hard real-time space, and that market is still dominated by C and Ada.
In Scheme, we have the special form 'delay' to indicate that an expression should be lazily evaluated. Without it, side-effects become unpredictable, and you may also get unexpected space-leaks or execution cascades. If you rely on an operation being strict and instead it's lazy, then your program becomes incorrect, in the sense that it may execute side-effects at unknown times or have non-local execution effects.
Java decided that the 'virtual' keyword was not necessary, and omitted it in the interests of reducing language complexity. Python decided that the 'new' keyword was not necessary, and omitted it in the interests of reducing language complexity. Scheme decided that something like 'chain' is not necessary, and omitted it in the interests of reducing language complexity. Haskell decided that 'delay' is not necessary, and omitted it in the interests of reducing language complexity.
In all cases, there are situations where that omission comes back to bite users of the language. Java re-introduced @Override to indicate overridden methods. Python isn't used in performance-critical software where you want object-creation to be fast. Scheme code is often brittle, where a simple change to the method results in very different stack consumption. Haskell code often has unpredictably space leaks and execution time.
And yet all of those languages have users, some more niche than others.
My argument isn't really about what Python should or do, because realistically, Python is not getting any form of TRE, and I'm fine with that because I don't use Python in a manner or situation that would require it. But it's that "practically wrong" is a relative term, relative to who's using the language and what they're doing with it. Languages have complexity budgets as well; for all the people who don't use a feature, the existence of that feature makes the language less useful to them.
I agree that some language features have a binary nature, like static/overridable or lazy/strict, but maybe we shouldn't classify all of them like that?
The "new" keyword in Java is very much like the proposed "return from": if you know that something is an object, and that object allocation is always done on the heap, then Foo() is as clear as new Foo(). The "new" keyword is a residual from C++, where you actually had a choice. Is it possible that people don't dismiss Python for Java because of the lack of "new", but because its GC isn't as good?
If we had an hypothetical IDE that used colors to transmit static analysis to the user, then we could categorise the keywords between "this is something the IDE can infer" and "this is something the user should specify". Object allocation falls into the inferable (so rather than writing "new", the IDE would highlight the expression in red); virtual is not inferable; delay is not inferable; tail calls are inferable, and can be shown in pink or whatever.
Anyway, TCE is an optimization: we can run more programs faster when we have it. When we don't use tailrec, we still benefit from it as our program consumes less memory overall. The cost of tailcall elimination is purely cognitive, for people who learned the old C-like stack behaviour (but even C compilers handle TCE now.) If TCE/no-TCE matched the static/virtual and lazy/strict binary choice, then there would be situations were TCE has a negative impact on the code runtime.
Tcl added this relatively recently. They use the "tailcall" command, which immediately replaces the stack frame of the current function with one for the new function. If I recall correctly, it doesn't do any checks for whether or not the call is actually a tail-call, so if you do something like
return [expr [tailcall foo] + 1]
the return and expr simply never get executed, producing incorrect results (which I guess is slightly better than a potential stack overflow, since even the most basic of tests will pick up on the missing +1).
I'd use "return from" for tail recursion, to be analogous to "yield from" for iterators. It doesn't add new keywords and "return from foo(...)" seems to me to explain what it does well.
That being said, this is one of the better explanations/arguments for why iterations are more 'Pythonic' that I've heard. Thanks!