I’ve taken some graduate courses in programming languages where I’ve learned abo...

yaseer · on Sept 3, 2019

It's an interesting question that poses interesting problems.

The main problem is that beta reduction is an awkward operation from the perspective of computational complexity, which is a fairly fundamental concept in algorithm analysis. There's a discussion of that here:

https://cstheory.stackexchange.com/q/41776

P.S: Purely Functional Data Structures is a great book. It's focus is more on the implementation details of algorithms in a functional language (ML), and the necessary abstractions.

It's not an analysis of algorithms within a formal model, like lambda calculus.

olooney · on Sept 3, 2019

Right. The naive way to do complexity analysis of a lambda calculus algorithm is to count β-reductions; but it's immediately clear that the cost of a β-reduction (which is basically a string substitution) is proportional to the length of the program.

Here is a simple lambda calculus program:

    ((λ x . (x x)) (λ y . y))

If we apply the β-reduction rule once, we obtain:

    ((λ y . y) (λ y . y))

However, in order to perform this operation, we have to imagine the program as a string and stepping through the entire program symbol by symbol, either copying it unchanged (if it's not "x") or replacing it by "(λ y . y)" (if it is "x".) This in an O(m) algorithm, where m is the symbol length of the program. That is how a Turing machine emulating the lambda calculus would have to do it, for example. Surely we should prefer to analyze our algorithms in terms of atomic operations that can be implemented in O(1) on a Turing machine, no?

Space complexity is also an issue; if our program expands to m symbols, surely space complexity is at least O(m) as well?

Worse yet, β-"reductions" don't necessarily make the program shorter; and it can expand to arbitrary size before it starts to get smaller. It's not usually obvious how m (length of the program) is related to n (number of elements acted on, in the usual O(n) sense.)

An "awkward operation from the perspective of computational complexity" indeed!

vilhelm_s · on Sept 3, 2019

That's not immediately clear at all!

A β-reduction is basically the same as a function call, we know how to implement those, and they take constant time. Your example could be directly interpreted as a program in e.g. Lisp or ML, and there the function call is definitely not implemented by stepping through the entire program symbol by symbol.

I think the issue in the stackoverflow post above is that the lambda calculus is nondeterministic, so if you try to define the cost as the smallest number of β-reductions to reduce a term to a value, then you're in trouble because it's in general very hard to figure out what the optimal way of reducing a program would be. But that also shows that it's a completely unrealistic model. Surely what we should do is pick an evaluation order, e.g. call-by-value, and then count the number of reductions there. This corresponds to what actually happens when you run your program on a real computer.

In fact, CBV evaluation is really well-behaved, and you can given an extremely intuitive cost semantics to such a lambda calculus: see section 7.4 in [1], and also see [2] to show that this semantics is sound in the sense that you can compile it down to an abstract machine respecting the cost. (Incidentally, this is something that Bob Harper has been talking about for a long time, see e.g. his blog post [3] for some polemics.)

If anything, I think what this argument shows is that Turing machines are bad for complexity-theoretic purposes. We do not want to "analyze our algorithms in terms of atomic operations that can be implemented in O(1) on a Turing machine", we want to count atomic operations that can be executed in O(1) time on the actual computers we run stuff on. So Turing machines are ok if you just want to distinguish polynomial from non-polynomial time, but for finer-grained distinctions you need a more accurate model, like a RAM, or... like the lambda calculus!

[1] https://www.cs.cmu.edu/~rwh/pfpl/2nded.pdf [2] http://www.cs.cmu.edu/~blelloch/papers/BG95.pdf [3] https://existentialtype.wordpress.com/2011/03/16/languages-a...

olooney · on Sept 4, 2019

Consider the following lambda calculus expression:

    ((λ x . ((x x) (x x))) (λ x . ((x x) (x x))))

Under β-reduction, this blows up: doubling with every reduction, never repeating itself, growing infinitely long. Let's name this expression "BlowsUp". Recall that the Church encoding for "false" is "(λ x . (λ y . y))". Recall that this acts like an "if/else" statement, taking two arguments and always returning the second, discarding the first. Then its clear that this expression:

    ((false BlowsUp) (λ y . y))

or more explicitly,

    (((λ x . (λ y . y)) ((λ x . ((x x) (x x))) (λ x . ((x x) (x x)))) ) (λ y . y))

should evaluate to "(λ y . y)", the identity function. However, under a call-by-value evaluation strategy, which is eager, the value BlowsUp will need to be evaluated first. Only a lazy evaluation strategy will which first the "false" expression (thereby discarding the part of the program containing BlowsUp entirely, without evaluating it) will succeed.

This is an extraordinarily important case, because recursive functions are typically written with two branches: a base case that terminates, and a recursive case that goes on for ever. For example, consider everyone's favorite recursive function, factorial:

    def factorial(n):
        if n <= 1:
            return 1
        else:
            return n * factorial(n-1)

Translated into the lambda calculus via the Y combinator, this (and pretty much every other non-trivial recursive function) will exhibit the same property of the above, and will never terminate if evaluated under a strictly eager, call-by-value strategy. Therefore, specifying the call-by-value evaluation strategy is not useful for the purposes of evaluating non-trivial algorithms, which almost always involve recursion. (In the lambda calculus even unbounded iteration must be implemented via recursion so this excludes basically all algorithms of interest to complexity theorists.) Note that languages which use call-by-value for function calls usually have a separate, built-in construct for if/else or && (short-circuiting boolean AND) which allows the programmer to decide when they want to use the "lazy" strategy; the lambda calculus unfortunately is too elegant to admit of such a pragmatic distinction. :)

There is of course a so-called "normal" evaluation strategy for lambda calculus (which is guaranteed to work) but it is basically "call-by-name" (to contrast it with the call-by-value nomenclature) and it has the problems I describe above and is hard to analyze.

https://sookocheff.com/post/fp/evaluating-lambda-expressions...

vilhelm_s · on Sept 4, 2019

It's true that some constructions that work under normal/CBN order will go into an infinite loop under CBV order, but this is easy to work around, typically you just need to add an extra lambda to prevent it from evaluating too eagerly. For example, instead of translating

    if b then e1 else e2

as

    b e1 e2

you can translate it to

    (b (λx. e1) (λx. e2)) tt

where tt is any value (it doesn't matter what).

Similarly, you can make a CBV-version of the Y combinator by eta-expanding some subexpressions to prevent it from evaluating it too eagerly [0].

The exact details of the encoding don't matter too much, because when you actually write programs you want to use some syntactic sugar like "if then else" or "let rec", so you just care that it can be translated into lambda calculus in some way which preserves the operational semantics.

[0] https://en.wikipedia.org/wiki/Fixed-point_combinator#Strict_...

whatshisface · on Sept 3, 2019

What if there is a way to write lambda calculus programs that makes beta reduction "easier" in that sense? We are used to reading things like ((λ y . y) (λ y . y)), but there may be another notation with better properties. If there was, we wouldn't even have to use it, because simply the fact that it exists would justify using beta reduction counts as a measure of complexity.

jcora · on Sept 3, 2019

But this is clearly not a property of the particular notation, but lambda-calculus itself (and the way beta-reduction is defined). If you were to make it a constant-time operation in length it literally wouldn't be lambda-calculus.

Which actually makes me curious: what sort of change did you have in mind?

whatshisface · on Sept 3, 2019

If you could quickly translate a lambda-calculus expression into a program written in some other model of computation (that was easier to simulate on a Turing machine) and back, then you could do the complete reduction in the other model, only using lambda-calculus as your input and output. To perform the entire reduction you would still have to scan the expression in O(n), but it wouldn't necessarily be O(n * number of reduction steps). There could potentially be a relationship between the number of beta reductions and the number of steps required to execute the reduction in the other model, in which case beta reduction count would be justified as a measure of time.

jdyb · on Sept 3, 2019

Perhaps the work on binary lambda calculus and Kolmogorov complexity by John Tromp would be interesting to you?

http://tromp.github.io/cl/cl.html

"Functional Bits: Lambda Calculus based Algorithmic Information Theory" http://tromp.github.io/cl/LC.pdf

bodhi · on Sept 3, 2019

It’s not exactly what you were after, but there’s a paper called “The Expressive Power of Programming Languages” that might be worth a look:

https://www.sciencedirect.com/science/article/pii/0167642391...

reuben364 · on Sept 3, 2019

I found the OPLSS lectures on parallel cost semantics using lambda calculus quite enjoyable, but I don't really know much about it.

https://www.youtube.com/watch?v=lSc0EtF6_Zk

pwm · on Sept 3, 2019

(On my phone walking so very quick and dirty post) purely functional DSs in general will cost you an extra O(log n) cause of their underlying tree representation for immutability (immutability achieved by copying paths in the tree to retain old data). Also some DSs are very difficult to express in a purely functional way.

adjkant · on Sept 3, 2019

While true on the first part, these days many cases don't have insane primary efficiency needs that would be affected by that log(n). As always though, you choose the tools based on the job at hand, not the tools. In my experience, functional philosophy often is about the gained mental and consistency advantages achieved by the features offered.

pwm · on Sept 3, 2019

Don’t get me wrong, you are preaching to the choir (I write Haskell at work), just thought it’s worth mentioning this to the OP. In practice sophisticated compilers like GHC produce insanely fast binaries and is getting better and better.

continuational · on Sept 3, 2019

Array indexing is not O(1) on any modern machine due to the cache hierachy.