How Recursion Got into Programming (2014)

gumby · on May 3, 2020

It was hard for a moment to get my head around the idea of static activation records. That makes “procedure call” simply into a goto. I guess I tend to write assembly code this way though.

Ironic that almost 20 years later the idea of dynamic activation had been so well absorbed that it was possible to write an influential paper all about how procedure call (and lambda) could be considered a goto (via tail recursion)!

Stratoscope · on May 4, 2020

Yes, imagine that a function looks like this in memory:

  +-----------------+
  | return address  |
  +-----------------+
  | executable code |
  +-----------------+

The way you call a function is to store your desired return address in the first slot, and jump to the code in the second slot. The function jumps to the saved return address when it's done.

We did it this way for years.

Now given that mindset, how could you even contemplate recursion? Where would you put all the return addresses?

Someone · on May 4, 2020

Nitpick: you also stored the function’s arguments there (if the language supported function arguments. I think most did, but am not sure about COBOL)

On the plus side, languages at the time (FORTRAN, COBOL) also required array sizes to be known at compile time, and memory allocators didn’t exist, so programs never ran out of memory (but would abort if an input file was larger than expected, even if the machine had plenty of memory available)

Consequently, CPU support for stacks was weak, because languages had no good use for them.

In theory, optimizers could do away with (part of) the activation record in cases where a function was called from only one place, or where arguments always were the same. I don’t know whether compilers that did that existed.

mark-r · on May 4, 2020

That's the way the Control Data Cybers worked, except that it wasn't just a return address - it was a complete jump instruction. You needed to start your function with a no-op of the correct length, and it was overwritten by the caller. Then to return from your function you simply jumped back to the beginning.

zozbot234 · on May 4, 2020

Nice. Basically, the caller uses self-modifying code to set up the correct continuation ("self-modifying code" and "continuation" being somewhat closely-related concepts, after all), and then has the callee jump to it. Makes a lot of sense.

Sniffnoy · on May 4, 2020

What's a little confusing here is that they talk about "static allocation" vs "dynamic allocation"; but these days, use of the stack is referred to as static allocation! Even if it's not as fully static as the old way. The shift in terminology makes the old debate opaque for the modern reader, and without your comment (and some similar ones) I wouldn't have known what was going on.

kragen · on May 4, 2020

I haven't seen that usage anywhere.

gowld · on May 5, 2020

Sniffjoy is referring to static allocation of local variables on the stack (vs dynamically on the heap), not static allocation of the stack itself.

kragen · on May 5, 2020

Referring to stack allocation as "static allocation" is just not something I've seen before. I suspect you're simply in error; maybe you misheard someone saying the phrase "stack allocation", or you've been talking to somebody who misheard it that way. You can statically allocate a local variable in C, for example:

    static char *err = NULL;

But that specifically does not allocate it on the stack. Both heap allocation and stack allocation are dynamic, not static.

There are things that are a bit more of a gray area in C and Unix and Win32/64 — are "statically allocated" variables of a shared library really statically allocated if they aren't there when the program starts up and they may be at some unexpected address? — but this is not one of them.

gumby · on May 4, 2020

I have never seen that usage — in what community is it common? Thanks!

dreamcompiler · on May 4, 2020

Even without tail recursion, conventional procedure calls are still just gotos plus an automatically-managed stack. Which fits the author's thesis that higher-level memory models are what distinguishes higher-level programming.

The aspect of this that bothers me is that automatically-managed stacks are not the only way to call procedures. Continuations for example are another way; actors yet another. But because automatically-managed stacks are present in virtually every CPU at the ISA level, many programmers never venture outside that model and have trouble understanding that others even exist, much less what advantages they might bring.

Avi-D-coder · on May 4, 2020

If it makes you feel any better, I'd bet you all 3 UFOs still use call stacks. Even aliens are stuck on 8086 esque CPUs ;)

CyberDildonics · on May 4, 2020

And what are those advantages? Function call ABIs are a standard way to change what is executing and come back when it is done. Does that really need to change? Other software architecture can be built on top of that, you don't need to change the way you call a function with assembly to structure software differently.

kragen · on May 4, 2020

The advantages are that you can do threads, exceptions, lazy sequences, backtracking, tail-call elimination for state machines, and so on. (The disadvantages are the same.) Yes, you can emulate these on top of function-call-based ABIs, but you incur the usual emulation overhead.

gumby · on May 4, 2020

There are significantly more calling semantics beyond “push a frame and branch” as Kragen enumerates. Some, like coroutines, are visible to the high-level programmer; others, such as tail call optimization, might be invisible.

cat199 · on May 4, 2020

this sub-discussion about tail calls, etc. is in reference to:

https://dspace.mit.edu/bitstream/handle/1721.1/5753/AIM-443....

that series of papers was released adjacent to the development of the scheme language which will generally optimize out the need for a call stack and replace it with essentially goto in certain cases.

as for 'changing the way you call a function' this subtopic is primarily of importance for language/compiler implementers; while you don't "need" to change the way you call a function, often doing so can speed up the program hugely, or allow things that were not otherwise possible (e.g. infinite recursion without stack overflow)

dreamcompiler · on May 4, 2020

Tail call elimination isn't always possible, but it's the gateway drug to continuations which are always possible, and continuations are the gateway drug to actors, and once you have actors you're free to move any element of your program anywhere, break it into pieces and put them back together, go nuts, it just works--and now you're able to write code on asynchronous million-node networks without having to worry about locks, mutating global data, shared stuff, and everything else that gets in the way of using modern hardware.

tsimionescu · on May 4, 2020

If you have code that doesn't need to mutate data, it is already easy-ish to write on whatever scale you want. The problem is that there are very few algorithms and business problems that don't require shared state to work.

I'm not sure how actors or continuations could solve the problem of creating a distributed database of clients, or even of computing histograms on data in parallel, significantly more easily than more traditional programming paradigms.

toast0 · on May 4, 2020

Rather than accessing the shared data directly, you send a message to the actor that owns that piece of shared data, that message could be a request for some data, or a request to modify it in some way.

The actor that owns the data reads its mailbox of requests and processes them without worrying about locks and things, because it has exclusive access.

The actor that needs to access the data doesn't worry about locks and things because it's not directly accessing it, it's just sending a message.

The system worries about locks and things, because the message mailboxes are shared state to some extent. If you can cleanly separate ownership of different pieces of data into different actors, you get parallelism. If there's too much dependency between the pieces of data, so that separating ownership makes things more complex, maybe that becomes more visible.

tsimionescu · on May 4, 2020

That sounds nice, but it means that all reads and writes are serialized. You can't scale a system like this to, say, work for sign-up and retrieval of all Gmail users.

toast0 · on May 4, 2020

To scale up, you need to partition/shard the space.

For gmail users, you could hash on the (normalized) username, and all users with hash between A and B would go to one actor, etc. You don't need (or want) parallelism for signup on a single user id, but you could hash or otherwise divide that work to that level.

You can also do a partitioned work queue with a (smallish) worker pool per partition; and have the queue keep track of any keys with a current worker, to assign further work to that worker (to avoiding parallel work on the same key and the locking that requires)

tsimionescu · on May 4, 2020

These are elegant solutions, but they are also rather complex, and I'm sure that if we dig in deeper there are further complexities to explore.

I have no doubt that the actor model or continuations can be used to solve any problem. I do however doubt that it would turn out to be simpler than using shared state and explicit locking; or using ledger-like approaches. I also don't think it would be more complex!

I believe that problems of distributed computing are inherently hard enough that the incidental complexity from your chosen modeling approach is not likely to be a significant portion of the overall complexity.

toast0 · on May 5, 2020

The nice thing, if you do it right, is that each piece is going to be simple.

There may well be emergent complexity in how the simple parts go together, but that's going to happen anyway. Might as well make the parts simple where you can.

Incidentally, I think email is a great example of something with clear divisions of work --- an actor could easily be responsible for a single user's mailbox, and there should be no overlap in responsibility. Possibly, you could have one actor per folder, but gmail doesn't really have folders, so...

tsimionescu · on May 5, 2020

I think that's a good example of how these kinds of static allocations are tricky to get right. With your scheme, you would have hundreds of millions or even billions of actors being idle all day long, and probably a few thousand actors potentially overwhelmed with processing new email every second. You would probably have to go back and decide a new partitioning scheme to avoid this, with maybe cascading consequences in other places.

The pieces are simpler, but you might pay a greater cost in system-level design. In shared state models, you could have a more dynamic approach with a number of worker threads scaling with the number of requests (emails), with the only problem being contention when accessing the same piece of the shared storage, but at least no massive overhead from idle workers. The shared-state layer would be responsible for the locking strategy, and could even potentially optimize the required locking without necessarily changing the worker code, depending on the level of abstraction it offers workers to begin with.

toast0 · on May 5, 2020

We're getting into the weeds here, so don't feel the need to reply. I agree there's different complexities, and both ways will be complex in some places, and simple in others. I just want to address this one thing.

> you would have hundreds of millions or even billions of actors being idle all day long

> but at least no massive overhead from idle workers.

You could certainly work to activate the actor for a user only when it's needed (while the user is online, or while an email is delivered); but the level of overhead for an actor is quite low --- it would be fine to have the actor simply always running, and doing nothing when there's nothing to do. You can run a million Erlang processes on a server, and if they're mostly idle, there's no big deal; if you need to run a billion processes, you would need a thousand servers; but my wild guess is storage is going to be a bigger factor than CPU, assuming that the gmail style large quotas do get significantly used, I could easily be wrong about storage requirements though.

tsimionescu · on May 5, 2020

Agreed, we're getting in too many vague details.

I didn't know that Erlang can go up to such large numbers of processes, I knew they were very lightweight, but would have guessed maybe at most thousands to tens of thousands. Very nice that they can do that!

eru · on May 4, 2020

Well, you will only have to worry about mutable global data if that's what your solution calls for. Essential complexity vs accidental.

gumby · on May 4, 2020

Not sure what you mean by “automatically managed” since the function preamble (before user code is run) saves registers etc. well, the abi specifies if the caller or callee does it but it’s a much of a muchness.

dreamcompiler · on May 4, 2020

I mean saving the address for control to return to so that the subroutine never has to know or care; it just executes the RET instruction. And RET is just POP+JMP. This is super handy--so handy that alternative control strategies aren't explored as much as they might have been otherwise.

kragen · on May 4, 2020

I'm guessing that everyone here explaining to you how procedure calls work has, at some point, used a procedure call and return sequence that you implemented or debugged in GCC :)

gumby · on May 4, 2020

Ha ha true, though there’s always something to learn.

Even the first machine I ever programmed, the KA-10, had stack manipulation instructions though any location in memory could be a stack (which came with base and bounds for free) so threading was easy to write in assembly code.

There was no difference between a call stack and a stack data structure — we called them all “push down list”

ookdatnog · on May 4, 2020

I took a History of CS course in college, where this controversy was discussed. The way I understood it, the two camps approached the question of what ALGOL should be slightly differently.

One camp (against recursion) was trying to design a language that would be productive for programming in the (then) here and now, with the machines they had at the time; in other words, the language is molded after the machine. Since efficiently implementing recursion wasn't well understood at the time, adding it to the language was a no-go for them.

The Dijkstra side of the argument was that the language should be designed to be as elegant as possible, without much concern for the limitations of the machines of the day. The idea was that, once a broadly accepted programming language was designed, computer designs would follow to support it.

I think both are valid viewpoints (especially in context), although with perfect hindsight Dijkstra et al seem to have been the more prescient side.

dang · on May 3, 2020

Discussed at the time: https://news.ycombinator.com/item?id=8073361

also 2015: https://news.ycombinator.com/item?id=10131664

_bxg1 · on May 5, 2020

It's strange to me that recursion wouldn't just naturally fall out of any language design that has a call stack. Even if you forbid the current procedure from calling itself, it could achieve the same thing by having two functions call each other.

ken · on May 5, 2020

As the other comments here said, languages at the time did not generally have call stacks. It'd probably be more accurate to say that languages like Algol got call stacks because they were needed in order to implement recursion.

mtreis86 · on May 4, 2020

Turing's paper in 1939 describes a machine whose operation is recursive.

tuyiown · on May 4, 2020

Well, the infinite memory tape was quite a concern for actual programming

bArray · on May 4, 2020

How Recursion Got into Programming (2014)

https://vanemden.wordpress.com/2014/06/18/how-recursion-got-...

gowld · on May 5, 2020

It's more accurate to say that recursion got into recursion.

woliveirajr · on May 4, 2020

There's some discussion here: https://news.ycombinator.com/item?id=23061881

chadlavi · on May 4, 2020

it did it by getting into programming

Aperocky · on May 4, 2020

but how

taneq · on May 4, 2020

Well, it all started when they used recursion.

eru · on May 4, 2020

It's useful if your recursion does some work before recursing. (Just like your loop bodies should do something.)

smabie · on May 4, 2020

What's interesting is that church numerals don't do anything except "recurse." So calling a function that does nothing is useful after all!

eru · on May 5, 2020

Church Numerals apply the function you give them, n-times. That's hardly nothing.

bregma · on May 4, 2020

Very poorly worded title, but I guess it's more click-baity than "A History of the Personal Politics Involved with the Promotion of Alternative High-level Computer Programming Language Designs Among European and American Academics in the 1950s and 1960s". Maybe a little too dry?