Notes on Haskell: What's Wrong with the For Loop

colanderman · on March 9, 2011

That word, closure. You keep using it, but I do not think it means what you think it means.

A closure is an runtime structure used to implement static scope of locally-defined first-class functions. Closures allow locally-defined functions to "remember" variable bindings in their enclosing scope.

Beside that you can implement this feature WITHOUT closures (e.g. source rewriting), none of the examples presented actually require static scope!!

It boggles my mind that programmers can have such strong opinions on language features without actually knowing what they're called.

ScottBurson · on March 9, 2011

Yes, this has been one of my pet peeves for a while too. Everybody: a closure, as the parent says, is an implementation construct. It is not something you can find in your source code. The syntactic construct -- the thing you write in your code -- is called a lambda expression. Not a "lambda function", and not a "closure"!

Lambda expressions are to closures as `new' expressions are to instances: a lambda expression evaluates to a closure, and a `new' expression evaluates to an instance. Since any expression can be evaluated multiple times, in general the relationship is one-to-many: a lambda expression can be evaluated N times to produce N closures, just as a `new' expression can be evaluated N times to produce N instances.

(Instances and closures are closely related: an instance is a piece of state with several operations that can be invoked on it, while a closure is a piece of state with one operation that can be invoked on it.)

Confusion · on March 9, 2011

Your insistence on a specific meaning for the words ignores how words are actually used: overloaded based on context. If I call these things 'closures', you understand I'm talking about the 'lambda expressions' and not the implementation construct.

  class Foo
  end

There, a class object. Oh sorry, an object is an implementation construct; what you have there is an expression that produces an object... <- that is unhelpful semantic squabbling. It's highly inconvenient to use such indirect language. This is the view of a class object that I have most of the time and a 'lambda expression' is the view I have of a closure most of the time.

If I see the Eiffel tower, I say: look, the Eiffel tower. Not: look, an image of the Eiffel tower, with some details obscured by clouds and smoke and without considering any of the construction details and history that are an essential part of the Eiffel tower. For all practical purposes of communication, it's the Eiffel tower.

gsg · on March 9, 2011

That's not a fair example. There are languages that have lambda expressions but not the lexical scope that requires closures, whereas there are no languages that have class definition expressions but not classes.

Given there is a meaningful distinction to be made, insistence on accurate terminology becomes a lot more reasonable.

derefr · on March 9, 2011

> Instances and closures are closely related: an instance is a piece of state with several operations that can be invoked on it, while a closure is a piece of state with one operation that can be invoked on it.

I would go further, and say that they're equivalent—that "one operation" can be a dispatch function:

    def make_object
      x = 5
      lambda do |m|
        case m
        when 'increment'
          x += 1
        when 'decrement'
          x -= 1
        when 'get'
          x
        end
      end
    end

    o = make_object
    o.call('get')       # => 5
    o.call('increment') # => 6
    o.call('decrement') # => 5

dkersten · on March 9, 2011

In fact, in SICP, they build their object system in this way, if my memory serves me correctly.

khafra · on March 9, 2011

Of course. Closures are, after all, a poor man's objects[1].

[1] http://codermonk.blogspot.com/2007/01/venerable-master-qc-na...

msmith · on March 9, 2011

That's a really good explanation.

bretthoerner · on March 9, 2011

I'm not the author, and I think this was kind of a cop out, but:

http://notes-on-haskell.blogspot.com/2007/02/whats-wrong-wit...

"A few people have commented that none of the examples use closures, but in Haskell closures are used not only to represent functions but also unevaluated values (i.e., thunks). For example, in the simple expression

total = sum array

the variable array is free and refers to some value, possibly unevaluated, in the enclosing environment. Until the value of total is evaluated it exists as a thunk that closes over array (and sum, for that matter).

So the examples do indeed use closures.

Cheers! ---Tom"

colanderman · on March 9, 2011

You know, he's totally correct (Haskell being lazy and all) but I think he missed the point entirely :P

zohebv · on March 9, 2011

I know I am repeating a previous comment but the author correctly explains it here

http://notes-on-haskell.blogspot.com/2007/02/whats-wrong-wit...

In Haskell all parameters are lazy by default, which effectively turns every parameter into a closure.

for e.g., consider the code

boost x = map (x) [1..10]

total = sum (boost 2)

On invocation of the sum function, the (boost 2) is not necessarily evaluated. Instead a reference to ((x -> map (x) [1..10]) 2) is passed in to sum. So sum will now end up actually accessing x which has been bound to the value 2. When written this way, the binding of x to 2 is more explicit, but this is the way Haskell evaluates expressions and function calls by default. So (boost 2) might not actually look like a closure, but it effectively is.

Thanks to Haskell's purity, GHC can actually turn this expression into a plain for loop with no lists being constructed. At least in the context of Haskell, this makes discussions of the "runtime structure that represents a closure" meaningless as there may be no run time structures at all, not even a list, much less a closure.

Any value, which doesn't even look like a function/closure could be referencing an unevaluated function with any kind of "variable"[1] bindings.

[1] - Haskell is pure and does not have variables, only constants and functions.

jerf · on March 9, 2011

"But it does highlight the key failing of for loops: they conflate three separate kinds of operations -- filtering, reduction and transformation."

There's actually four kinds of operations: filtering, reduction, transformation, and good Lord man what the hell are you doing with the loop index? did it just go negative? It did! Why? And it still works‽, which is actually quite hard to simulate with functional programming. Fortunately.

sqrt17 · on March 9, 2011

For the WTF kind of operation, 'while' loops are much more idiomatic - generally, you'd want to use a for loop for things that are minimally recursive (i.e., where you can see, before starting the loop, that the loop body will be executed N, or len(L), or whateverNumberItIs times).

rtaycher · on March 9, 2011

I can top that mine went up to 17 and then jumped to 200 for no obvious reason(some sort of memory bug where memory in a different part of the program overlapped with the loop variable).

dkersten · on March 9, 2011

Did i still work? A bug isn't a fundamental kind of loop operation.

Stormbringer · on March 9, 2011

Colour me stupid, but in the article it gives this code as having some horrible hard to find bug:

  String s = "";
  for (int i = 0; i < args.length; i++) {  
    s += array[i];
  }

Now, as soon as I looked at that I immediately thought "well, they're not putting anything in between the params when they concatenate them", e.g. I would expect to see:

    s += array[i] + "\t";

(or a comma or newline instead of a tab)

My next thought was "what happens if there are no args?" E.g. args.length returns 0, and they try to access the first element in the list (due to heritage from c pointers we start counting at 0) which is the 'zeroth' one, which doesn't exist... it should throw an array out of bounds exception.

-----

The reason I ask, is that based on the comments, the people reading the blog think the problem is in the operator overloading. Are they expecting this iteration through the args list to sum the args, and then assign the sum to the string?

If so, then I don't see how the bug could remain undetected:

"it could take weeks to get a bug report, and weeks more to get a fix ready"

Whereas in the unlikey event that I'm right, I still don't see how it would be hard to fix?

What am I missing?

baguasquirrel · on March 9, 2011

Why are you looping over array, when you are checking over the bounds of args? Shouldn't it be s += args[i];

Normally, you loop over the thing you're checking the bounds for.

The fact that this has been sitting on HN front page all day and there's no single, satisfactory answer (I mean, is this a satisfactory answer?) is to me a Key Failure of the for loop. For loops just don't capture very much in terms of semantics.

Why are we befuddled by what this piece of code does? You can get into stupid pedantry about what a closure is (yes, looking at the top comment here), and various other details, but that misses the point of the article, which is that for-loop are practically an anti-pattern.

As a community, we've long realized that functions like fold, filter and map are good things, better than our equivalent of the for-loop, which is recursing over the data-structure. You better have a solid reason to write your own pattern match over a list. In all my years of writing functional code, I've only once had to write a function that recursed over a list, and boy was it a regrettable function.

GrooveStomp · on March 9, 2011

Well put.

I see the comments here which argue about what a closure really is (and there is a lot of conflicting info out there. Even Douglas Crockford explains it differently than the comments I've seen on this article) when the point of the article is that for loop are an anti-pattern - exactly as you say.

This realization really hit home for me when I started using Clojure and kept trying to figure out how to do a for-loop. Once you wrap your head around map, reduce and filter, all of a sudden you realize how expressive a programming language really can be.

ddfisher · on March 9, 2011

"E.g. args.length returns 0, and they try to access the first element in the list (due to heritage from c pointers we start counting at 0) which is the 'zeroth' one, which doesn't exist"

I'm not sure if you're making some subtle point here that I'm missing, but it is not the case that any of the elements of array will be accessed by that code if args.length is 0. If args.length is 0, then the exit condition (which is evaluated before entering the loop body for the first time) will be false, and the loop body will not be executed at all.

jawn · on March 9, 2011

There is no check that array[i] actually exists. If array[] has 2 items but args[] has 5, you will overflow the array.

fergal_reid · on March 9, 2011

I've been trying to figure out what he was getting at, as well. The original article, by Elliotte, seems to have updated the code snippet to:

' String s = ""; for (int i = 0; i < args.length; i++) { s += args[i]; } '

I deduce from this that the error was in the copy/paste. The line that was supposed to be changed to 's+= args[i]' was accidentally left as 's+= array[i]'

As such, its a simple failure to replace all instances of 'array' in the pasted code, rather than anything to do with 'operator overloading', or even to do with checking if 'array[i]' was out of bounds - there is simply no variable 'array' supposed to be in that context.

I wish Turoff had been less obtuse about it, especially as the original Elliotte post seems to have been silently updated.

Stormbringer · on March 9, 2011

In the interest of being fair, it is (by internet standards) quite an old blog post, which was probably a bit of a toss off effort in the first place.

If 'the bug' is just a copy/paste problem of a variable that doesn't exist in that scope, then compilation will fail at that point, so again, I don't see how that could be the one that Turoff said would survive for weeks in the wild.

kscaldef · on March 9, 2011

But, if the bug is a copy/paste of a variable that does exist in that scope, he's exactly correct that it could survive for weeks and be very difficult to actually track down (because of the sort of blindness to details that happens when you read "idioms" like this by visual pattern matching rather than actually reading the code).

Stormbringer · on March 9, 2011

D'oh!!! You're right!

I was mentally substituting args[i] for array[i]

Talk about code blindness. What a colossal blunder.

artsrc · on March 9, 2011

I up voted you showed so wonderfully that forcing an fallible human to correctly specify the same thing twice can result in errors.

But I don't think errors are the biggest issue. The biggest issue is that this kind of stuff should be getting out of our way so we can focus on other things.

Stormbringer · on March 9, 2011

"What am I missing?"

What a noob!

  for (int i = 0; i < args.length; i++) {

If args.length is zero, the body of the loop won't run, since 0 < 0 is false. Duh.

T-hawk · on March 9, 2011

Parent didn't need the downmods - it's the same poster replying to himself with "what a noob". I was about to post this same answer till I saw this at the bottom of the replies.

Stormbringer · on March 9, 2011

"Parent didn't need the downmods"

It's okay. I continue to be baffled and amazed at what gets voted up and what gets voted down, usually it balances out, so if something gets unexpectedly voted down, something else will get unexpectedly voted up.

Dn_Ab · on March 8, 2011

I feel this post is more an argument for first class functions than closures. It is true that much of the usefulness of first class functions is lost without closures, so having useful first class function almost necessitates (though does not guarantee) closures. But in none of the examples is the key attribute of closing over free variables emphasized, which is the powerful and dangerous part.

A simple example showing the usefulness of inner functions might have been more pertinent to the term closure IMO.

Stormbringer · on March 9, 2011

It is a pretty convincing argument for adding transformation (do this thing to every item in the list), and filtering (give me a sub list of everything that meets this condition)

It is a much weaker argument for adding summation, since you'd have to get into the down and dirty details of whatever operation your summation represents, e.g. is it associative? You can see how it get ugly fast in the discussion about the differences between foldl and foldr (presumably fold-left and fold-right respectively)

----

Now, Java had filtering in one of the libraries, I remember using filtering with folder operations many moons ago, so it's not like you can't roll your own. Likewise with a "do this thing to everything in the list" operation.

For loops have a use, specifically when you want to do things in a particular order. Reading lines from a file for instance you might want the lines you're putting into your data structure to be in the same order as they are in the file (as a specific counter-example, I once wrote a 'unique line' app where it didn't care about the ordering, of course if I'd been on unix there would have been some command line tool to do it for me, but it was fun and fast to write and hand tuning the grep would have been a PITA)

BUT, there are a lot of tiems I've used for loops where the sequential nature of the loop is irrelevant, and here I could see someone making an argument that the for loop is clumsy and random (from the article they object because it uses too many tokens! The horror! :D )

I could also see someone making the counter-argument that what that actually means is that the for loop is powerful (never mind that we have no objective standard for what that means. Perhaps 'powerful' in the sense of 'easy to blow your own foot off with' would fit here :D )

joe_the_user · on March 9, 2011

OK,

Let's imagine there are two kinds of language/program bugs.

1) A given expression does not mean what you think it means. 2) A given series of expression all mean what you think they mean but the combination of them isn't what you think it should be.

I would submit that bug type #2 is the main sort of bug that's going to bite you and bug type #1 is relatively benign (if occasionally infuriating). Especially, even in language with obscure syntax and hairy ambiguity, you can pepper your code with asserts and track down your problem reasonably quickly.

I have been assure that functional programming does help with bug #2. I would love some detailed blog post akin to this which says how.

This article seems to deal entirely with bug type #1. Show me how functional programming deals with bug type #2.

mapleoin · on March 8, 2011

Heh, all of these points could easily be applied to Python as well. All the concepts are there in addition to the for loop. It's nice how you can do a lot of functional programming in Python with these simple tools (map, filter and reduce (Haskell's foldl).

I wonder why there aren't more people advocating the use of these functions in Python programs instead of for loops.

xi · on March 9, 2011

I, for one, prefer fors and ifs to high order functions when reading and writing Python code. I use the latter very sparingly and only when it does not violate the overall "pseudocode" feel of the code. For that, generator expressions and functions like `any`, `all`, `sum` are of great help.

For example, I have nothing against the code like this:

    if any(foo.is_cond for foo in foos):
        do(bar)

but I cringe when I see `import functools, operator`, `reduce` or complex nested list comprehensions.

It may work better in ML-derived functional languages because function definitions there, both anonymous and named, are extremely lightweight. This part of the language is highly optimized by necessity as high order functions is the only way to express iteration and other control flow constructs. But in Python, though not particularly heavyweight, function definitions don't mix well with regular code.

Also I'm not sure if the author is completely fair with his loop example. Add a few local variables, `break` or `continue`, and the high order form may actually become less intuitive.

tel · on March 9, 2011

Guido doesn't like them at all and has been trying to get them out of Python for a long while now.

http://www.artima.com/weblogs/viewpost.jsp?thread=98196

RyanMcGreal · on March 9, 2011

I rarely use for loops in Python. Most of the time I use list comprehensions, which are much more terse and expressive than loops and, at least to me, are easier to parse than map/filter/reduce functions.

KirinDave · on March 9, 2011

Funny, that. List comprehensions are actually sytactically and semantically more difficult than folds, modulo your logic.

Honestly, I suspect the Haskell style and the curious Python cultural disdain for real lambadas have far more to do with your confort than any absolute metric of comprehension.

skm · on March 9, 2011

Anyone who has disdain for real lambadas just doesn't know what they're missing: http://www.youtube.com/watch?v=5AfTl5Vg73A

pjscott · on March 9, 2011

Technically, Python's reduce function is equivalent to a duck-typed version of Haskell's foldl1' function. It's strict, and it has no starting value argument.

(Nitpicking forever!)

rwl · on March 9, 2011

Well, as long as we're nitpicking...

Python's reduce function has an optional starting value argument: http://docs.python.org/library/functions.html#reduce

T-R · on March 9, 2011

I've gotten the impression that there are, in a sense - in python, list comprehensions are usually preferred over for-loops in places where you would use map and filter, and the most common use cases for fold/reduce are covered by functions like sum and join. List Comprehensions also tend to be preferred over higher order functions, perhaps for readability, but also, in python 3, the higher order functions return generators, which don't always play nicely with mutable data.

For-loops are still suitable for actions (which mutate state or for some other reason need to be evaluated in order) - you see this in Haskell, too, though, with Control.Monad's forM_ and the like.

athom · on March 9, 2011

After reading this, I got to wondering if I could hijack sum into joining lists. My first attempt didn't work:

  sum([ range(n,n+5) for n in range(5) ])

just gave me "TypeError: unsupported operand type(s) for +: 'int' and 'list'", which didn't make a lot of sense. Where did I pass an 'int' by itself? Well, according to the help, sum takes a second argument: a starting value, which defaults to 0. So, I wondered, what if I were to pass it an empty list...?

  sum([ range(n,n+5) for n in range(5) ], [])

came back with:

  [0, 1, 2, 3, 4, 1, 2, 3, 4, 5, 2, 3, 4, 5, 6, 3, 4, 5, 6, 7, 4, 5, 6, 7, 8]

Holy $#!7

I'd been wanting a way to join multiple lists without having to write another ugly little function! Anyone know if this particular trick is common practice?

Erwin · on March 9, 2011

I've seen it used for a long time. The only problem with it is the quadratic time complexity: it's equivalent to ((N0 + N1) + N2) + N3 ... so you copy the same elements over and over again as the sum builtin doesn't know to use anything but the + operator to build each element. (I vaguely recall some discussion about making it more efficient on sequences of sequences).

A more efficient version would be: def sumseq(l): r = []; map(r.extend,l); return r

which would also work on any type of sequence as input, even if mixed. A quick test shows the quadratic time complexity being noticeable at about 128 items, where sumseq is 10x as fast and 1751x as fast at 16384 items.

tkahn6 · on March 9, 2011

That's actually quite clever.

athom · on March 9, 2011

Thanks, but I got a little update: it won't work on strings.

The help says as much, in fact. Actually, what it says is, "Returns the sum of a sequence of numbers (NOT strings)," which I think is meant to imply it won't parse numbers out of strings, but in theory, sum should be able to concatenate strings as easily as it does lists, with the right start argument. Instead:

  >>> sum([['room315'], ['room2']], [])
  ['room315', 'room2']

but:

  >>> sum(['room315', 'room2'], '')
  TypeError: sum() can't sum strings [use ''.join(seq) instead]

Say, what???

I'm not sure, but I think sum is specifically watching for strings, and throwing a TypeError if it finds one. Otherwise, if it's simply using the + operator internally, as its list-handling behabior seems to imply, it should mash strings together just as happily! Oh, well. The join function is the right one for that job. It just seems a bit of wasted effort, to me.

Dn_Ab · on March 9, 2011

Java's younger cousin C# also has this with Select, Where and Aggregate. But sometimes the foreach and even the for loop actually makes more sense, in particular when dealing with very imperative algorithms or are not acting over a (full) sequence at all.

I'm usually only reading and not writing python but I think list comprehensions are more idiomatic. Based on some comments I've seen here I also kind of get the impression comprhensions are abused.

Fargren · on March 9, 2011

Well, if you don't want to work with the whole sequence, you can filter it. But yeah, in those cases and some others, the for loop may be more idiomatic and better performing.

Dn_Ab · on March 9, 2011

Yes I'm talking about being able to stop early for one example. Or if the thing you are dealing with is not actually a sequence for another.

darklajid · on March 9, 2011

The former can probably be solved for lots of cases by TakeWhile(), which evaluates before consuming more elements and could be used to replace "abort the loop" patterns.

The latter is a different thing..

1: http://msdn.microsoft.com/en-us/library/bb534804.aspx

Stormbringer · on March 9, 2011

"Java's younger cousin C#"

We prefer the terms:

"brother by a different mother"

or if they're being naughty:

"red headed step-child".

genera · on March 9, 2011

s/naughty/stupid

scotty79 · on March 9, 2011

You can execute loop in your head just by reading it line by line. Token by token. With other solutions you need to know other things to determine what goes inside and believe it is actually what you want it to be. Loops ar flat, explicit and versatile. I think that is the reason behind their popularity.

Symmetry · on March 9, 2011

If you're familiar with foldr or its ilk you can execute them in your head token by token too, to the same extent you can in a for loop. In the case of a for loop you have to be familiar with the syntax and semantics of "for" in your language so you know that "for(i = 0; i<n; i++)" is different from "for(i = 0; i++; i < n)" and likewise you have to be familiar with the syntax and semantics of any functional construct you use.

scotty79 · on March 9, 2011

To understand loop you just need to know in which order to execute its parts. And you see how it reduces, maps an filters your data. You don't need to keep stack in your head. To understand how functional constructs work you have to know a lot more because they are specialised and it's not that easy to track what they exactly do because of recursion. You have to know them and trust them. With loop all internal logic is linear and in plain sight.

Symmetry · on March 9, 2011

Its certainly true that to understand a fold at the abstraction level that C uses you have to know more, but that isn't the same as you having to know it in order to program effectively. The point of abstraction is that you don't need to know the fine details of whats happening under the hood. You might talk about how, in a C program you can watch what happens to your loop variable as you execute, but really "variables" are just abstractions over what the machine is really doing. If you were to look at the assembly spit out by your compiler that loop variable might be in a register or on the stack or in some combination thereof, and if your compiler does loop unrolling it might have different values at the same time even in the assembly. And once you get into the reorder buffer or a modern CPU I guarantee you that what actually happens will bear very little resemblance to what you would naively think if you just looked at the C code.

So you can execute a loop C-token by C-token and you can execute a fold Haskel-token by Haskel-token, but both are abstractions high above whats happening at the machine level and I don't see why we should prefer one over the other.

mathgladiator · on March 9, 2011

The real reason: it's much work to prove loop invariants that use structural induction.