Hacker News new | past | comments | ask | show | jobs | submit login
Functional programming with Python (pycon.org)
97 points by Swizec on Oct 27, 2012 | hide | past | favorite | 49 comments



It seems to me if you want to write code like this, you shouldn't write it in Python. Python has much simpler, built-in ways to do most of the things he's suggesting. For instance, he says "good" for this code:

    reduce(operator.add, map(len, ss))
No! The simpler (and more efficient, as it doesn't have to build a list of results!) way to do that is using the built-in sum() with a generator expression:

    sum(len(s) for s in ss)
The rest of the slides are full of similarly-complex idioms. Another one:

    def square_sum(a, b):
        return sum(map(lambda x: x**2, range(a, b+1)))
Please, no! Generator expressions are there for a reason. Much simpler and much more efficient would be:

    def square_sum(a, b):
        return sum(x**2 for x in range(a, b+1))
In fact, most places you see map() in Python can be rewritten as generator expressions.

For stuff like functools.partial, in simple cases, isn't it clearer to just write this?

    def debug(message):
        log('debug', message)
I like his use of operator.itemgetter/operator.methodcaller, though. They're useful for sort keys and stuff, and faster and often clearer than lambdas.

Also: namedtuple is great instead of "using classes as attribute containers".


> ...faster and often clearer than lambdas.

    $ python -m timeit 'import string;from operator import itemgetter;[itemgetter('slice') for x in string.letters]'
    100000 loops, best of 3: 14.2 usec per loop
    $ python -m timeit 'import string;from operator import itemgetter;[lambda x: x.slice for x in string.letters]'
    100000 loops, best of 3: 10.4 usec per loop
> Also: namedtuple is great instead of "using classes as attribute containers".

Er, OK. Just don't look at the implementation[1]. Cannot be unseen.

[1]: http://hg.python.org/cpython/file/2.7/Lib/collections.py#l23...


You're measuring the wrong thing. itemgetter() is supposed to be called just once and it's the returned function that is used in the loop/map. It should be:

    python -m timeit 'import string;from operator import itemgetter; getter=itemgetter('slice'); [getter(x) for x in string.letters]'


True. But you should also put the setup in a -s option, because you're not timing the imports. And you don't really need the loop, because timeit does a bunch of loops for you:

    >python -m timeit -s "from operator import itemgetter; elem0 = itemgetter(0)" "elem0('s')"
    10000000 loops, best of 3: 0.095 usec per loop

    >python -m timeit -s "from operator import itemgetter; elem0 = lambda x: x[0]" "elem0('s')"
    10000000 loops, best of 3: 0.138 usec per loop


My eyes... So a named tuple is just a class, nothing special about it really


I may be going out on a limb here, but I don't think he's actually suggesting using `reduce(operator.add, ...)` instead of `sum()`. He argues himself a few slides later that short functions are always better than long ones. If you look at the title of the slides, I think he's simply presenting what FP is: a chain of maps, filters and folds.

It's true that Python will use a list comprehension or a generator expression where you would normally do this, but Python 3 still includes map() and filter(). (Despite Guido not being a big fan of these functions - http://www.artima.com/weblogs/viewpost.jsp?thread=98196).

My point is simply that the presentation does a good job at exposing what FP is and how to implement it in Python. Moving from maps to comprehensions comes naturally, once you get the hang of the idea.

PS: I have the same question about functools.partial. Can somebody answer this?


This example was given to describe map/reduce pattern, not advice to use it instead of sum:

    reduce(operator.add, map(len, ss))
You know, it's hard to understand new concept. And it's twice harder to explain new concept with sophisticated code examples. So I used as simple code and situations as possible. The same for square_sum - it's easier "to see" functional pattern in map(f,l) than in (f(x) for x in l).

Regarding to debug function and partial function application, - it depends. As for me

    debug = partial(log, "debug")
is easier and meaningfully describe what's actually going on. But here we have only one function, what if you have to create 5-6 same functions? Would you prefer "def" syntax instead of partial? I have one more example of using this pattern in more sophisticated situation - code listing for dictionary binding to list of functions, "partial" looks really naturally.


i started learning functional programming with python, it falls apart at ~300 loc which is absurdly low, which makes python basically useless for production FP.

python's data structures are mutable, and mutable data structures necessarily expose a different interface than immutable data structures. using mutable datastructures in immutable style necessarily has severe performance penalties; and to do meaningful stuff you need to use a statement, and you can't put a statement in a python lambda. And Guido thinks lambda's are un-pythonic and a misfeature which he seems to regret [1].

python is a great sandbox for imperative programmers interested in FP, and these are great slides. it's awesome to play with these concepts in a familiar language - thats how i started - but its important to note that FP is just not pythonic, and thus it's probably a mistake to put functional-style python into production.

[1] http://news.ycombinator.com/item?id=3962316

if you are interested in functional python, i wrote about it a bit this year in my blog, because despite all this, python is a nice language for evangelism aimed at imperative developers.

[2] implement generators without yield http://www.dustingetz.com/2012/10/03/implement-python-genera...

[3] easy monads http://www.dustingetz.com/2012/10/07/monads-in-python-identi...

[4] hard monads http://www.dustingetz.com/2012/10/02/reader-writer-state-mon...

[5] peter norvig's Lispy refactored using monads https://github.com/dustingetz/monadic-interpreter


I agree. I've written a fair bit of python and I lean fairly far towards the FP style. However it feels like ice skating uphill compared to using a language that has FP baked in.


why doesn't your blog have an RSS feed?


Forgot one big con: Python lambda functions aren't "proper" functions (ie. arbitrary multi-line code blocks). Guido van Rossum has addressed this many times, saying (in effect) "it adds undue complexity and would be un-Pythonic", eg:

http://www.artima.com/weblogs/viewpost.jsp?thread=147358

This is one of my (very few!) gripes with FP in Python.


And you can't pass them as simple functions to do stuff via the multiprocessing module because they can't be pickled. I don't understand this infatuation with FP in Python. To me the essence of functional programming is immutable state. Python has mutable state to the core. List comprehensions are novel syntax for many FP-esque operations (so much so that map, reduce, & filter were almost removed from Python at some point) yet, to their core they're full of mutable-state disgustingness. For example:

    >>> a = range(12)
    >>> [a.pop(i) for i in range(3)]
    [0, 2, 4]
    >>> a
    [1, 3, 5, 6, 7, 8, 9, 10, 11]
As long as list methods have methods that change state of the object, even when processing said object, Python handles FP at a very superficial syntactic level.


Sorry, what's your complaint about list comprehensions? That code does exactly what I would have expected it to. The mutable-state disgustingness is only there because you're explicitly calling a function with (nonsurprising) destructive properties; you could do the same in any lisp.


You could always use tuples; they're essentially immutable lists.

  >>> a = tuple(3*x for x in range(5))
  >>> a[2]
  6
  >>> map(lambda x: str(x), a) #can be iterated
  ['0', '3', '6', '9', '12']
  >>> a[2] = 9
  TypeError: 'tuple' object does not support item assignment
  >>> a.pop()
  AttributeError: 'tuple' object has no attribute 'pop'


Yea, but still gross. map() isn't even an endomorphism.


I'm not sure what you mean. map is not supposed to be an endomorphism, rather it takes a function f and returns a list homomorphism.


I think it'd be nice if it were an endomorphism, though; Making map(f, tuple) -> tuple, map(f, list) -> list, etc. would save users from having to write tuple(map(f, tuple)) (which is something I've done a bunch).


Oh, you just meant you wanted lambda x: map(f,x) to be an endomorphism for any f (you said map itself should be an endomorphism which confused me). Well, it already does what you want: lambda x: map(f,x) takes an iterable and returns an iterable. :)


Fair enough (: I'm not the parent, by the way -- I was just taking a stab at what the parent might mean.


I don't know if this is a problem if you're going hardcore with your FP -- one line is one expression, and multiple expressions are probably going to be stateful, which should probably be factored out into something with a name.

In Haskell, the only places (that occur to me right now) where you'll have multiple lines are:

* Pattern matching function arguments

* Giving things names with `let` or `where`

* In a monadic `do` block, which is just sugar that reduces to single-expression chain of lambdas

And all of these are language design decisions that require more than just multi-statement lambdas.


...or anywhere you're using a 'let' statement. I write a lot of OCaml, and I write a lot of multi-line anonymous functions that aren't stateful. I think it's less common to do this in Haskell, but that's more because it's Haskell than because there aren't any side effects.

Random other tidbit that doesn't deserve its own comment:

In Python, instance.method is the same as doing ClassName.method(instance). This can be helpful when using map/filter, because you get to do map(ClassName.method,lst) instead of map(lambda x: x.method,lst).


Or you can do operator.methodgetter("method") and not need to know the class name.


operator.methodcaller('method') #ftfy


NOTE: Ooops, it seems the 'edit' window expired on my comment. Expanding, unfortunately, in a separate comment:

----

I don't know if this is a problem if you're going hardcore with your FP. When we're talking about "lines" in Python, one line is exactly one statement, and if you're doing FP, that statement is an expression. Multiple statements are going to be either stateful, or declaring a temporary name for an intermediate value.

I am probably not being sufficiently imaginative right now, but I can't think of any fully-fledged functional pattern which would require multiple statements in a lambda, which would not be better abstracted out and given a name, and I would genuinely like to hear some examples of cases where a multi-statement lambda make for "better" code (or to hear how my presentation of this problem might be flawed, because I haven't done any serious Python programming recently enough to be very close to the issue).

----

In Haskell, a language where the syntax has been exhaustively designed to facilitate FP, the only places where you'll see multiple lines (I'm pretty sure this is exhaustive, without using GHC extensions, eg. Arrow `proc` blocks) are:

* Pattern matching function arguments

* Giving things names with `let` or `where`

* In a monadic `do` block -- which is just sugar that reduces to single-expression chain of lambdas, and are not equivalent to multi-statement lambdas


I fail to see how this makes them non-proper functions, a better way to put it is they are a subset of possible functions.


Yes, single-expression functions are a subset of all possible functions... but that's not exactly what I was getting at. I put "proper" in quotes just to signify that they have subtle differences that, from a purely type-based abstraction, are different from named functions. For example, you can't serialize (pickle) a lambda function.


Note that `reduce` has been, if not deprecated, then at least discouraged (not that I agree with this, but reduce gets taken away in Python 3, so it's probably good form to stop using it). The author does `reduce(operators.add, lst)` a lot; you can do that using `sum(lst)` instead.


I was writing a really long response to your post, explaining how reduce/inject/fold is abstracting out an absurdly common iteration pattern, and using it can expose structural similarities between many superficially different computations, and how (unlike map and filter) it can't be expressed as a list comprehension, so removing it was super lame.

But in the process of translating a comment I wrote in another forum from Ruby to Python, it became apparent that, as long as it's pythonic to implement `__add__` for any monoidal type, the resulting polymorphic `sum`, together with list comprehensions, will cover pretty much everything[0].

So I'm actually coming around to Guido's side here.

----

[0] Assuming there's a consistent __mul__/product for alternative monoid instances, and with `all` and `any` to handle numerically-cast booleans.


If __add__ is taken to be any associative operation of a monoid, then Python's sum is merely equivalent to Haskell's mconcat, which is not nearly as general as reduce/foldl. Plus, unless you implement something akin to newtype wrappers for alternative monoid instances (hardly Pythonic), you're going to be restricted to a single foldable function for any given type.

I think you were right the first time; removing reduce from Python is very lame.


> If __add__ is taken to be any associative operation of a monoid, then Python's sum is merely equivalent to Haskell's mconcat, which is not nearly as general as reduce/foldl

I don't have the math to back this up, so please correct me if you know otherwise (and especially if you can demonstrate otherwise, though I know this being wrong is not contingent on that), but: I'm pretty sure `mconcat` with a `map` (or a list comprehension) is exactly as general as `fold`.

> Plus, unless you implement something akin to newtype wrappers for alternative monoid instances (hardly Pythonic), you're going to be restricted to a single foldable function for any given type.

Again, I know this isn't an argument and demonstrates shitty memory on my part, but a single solid example here would be key to me understanding this debate:

What is an example of a type with more than 2 monoids on it, whose fold could not be cleanly expressed with a sum/product and a map/comprehension, which is not already included in the standard library (any/all/gcd/lcm/max/min)?

[I suspect there either won't be an easy answer, or it will be so basic that I'll be brutally embarrassed... this is all kinda talking out of my ass, I have no exposure to category theory outside of reading haskell papers]


I don't know much category theory either; my argument is based solely on my knowledge of Haskell.

Maybe I'm being obtuse, but I don't quite understand what you mean when you say "mconcat with a map". But to see why mconcat isn't as general as a fold, you need only to take a look at the types:

    mconcat :: Monoid a => [a] -> a
    foldl :: (a -> b -> a) -> a -> [b] -> a
mconcat is constrained to only operating on monoidal types, whereas foldl has no such constraint. Likewise, mconcat must always yield a result of the same type as the list elements, while foldl is capable of accumulating a result of any type.

mconcat can be implemented in terms of a fold, e.g.

    mconcat = foldl mappend mempty
Such a partially applied foldl is obviously less general than an unapplied foldl, as its first 2 arguments are fixed. There is no way to implement foldl (or even foldl1) in terms of mconcat.

There are plenty of uses for folds that don't involve monoids, and therefore can't be implemented with mconcat plus a Monoid instance. As a trivial example, take

    Prelude> foldl (/) 400 [4, 4, 5]
    5.0
There is no "quotient monoid", because a monoid is defined as a type with an associative binary operation and an identity element with respect to that operation. Division has a right identity (1), but it isn't associative, so it can't be used as a monoidal operation like addition and multiplication can. But foldl still works fine in this case, as the above example shows, where mconcat would not (unless you write a Monoid instance that violates the monoid laws).

Associativity notwithstanding, it's also not adequate to say that 2 monoid instances should be enough for any type. Maybe that is true in some or even most cases, but it is certainly not true in the general case. Haskell has newtypes to mitigate this problem, essentially enabling an arbitrary number of Monoid instances to be defined per type, but Python only lets you write one __add__ and one __mul__ for a given class. On top of that, I don't believe Python even has a product function that folds __mul__ over a list the way sum folds __add__, so really the limit is 1, not 2.

Anyway it's all moot because, as someone pointed out, reduce hasn't been removed from Python, just relegated to a library. :) I think it's a fundamental enough operation that it should be built in, but obviously Guido and I have differing opinions on FP.


On the off chance you ever check this again, byorgey discusses this here:

http://byorgey.wordpress.com/2012/11/05/foldr-is-made-of-mon...



In this article[0], Guido says:

> We already have sum(); I'd happily trade reduce() for product(), so that takes care of the two most common uses.

product() never made it in as a builtin; anyone know why?

[0]: http://www.artima.com/weblogs/viewpost.jsp?thread=98196


reduce() is just moved to a library instead of being a builtin, it's not really a big deal. And of course sum() is better, addition is just the typical example for exposition (he later notes that shorter forms are always better). Whether it's good form to stop using it only depends on whether you prefer more explicit for-loops.


Yup. Just pointing out that "idiomatic Python" is to prefer sum() to reduce().


I'm not sure I agree with "don't write classes". This is Python: classes are the state container. That's the structure Python provides for grouping related pieces of information together, and that's the structure all other Python libraries and programmers expect you to use.

Now if you don't want to type the code every time, that's fine: use collections.namedtuple.


A class without methods is just syntactic sugar wrapping around a dictionary of attributes, so in that sense it could be simpler to use the dictionary directly.


The example from the slides was a __str__ method for a data structure, which makes a class the perfect fit in Python. (They did not call it __str__, of course.)


Great job (some ruby code comparisons would have been welcomed, though).

A little mistake on page 48: the Clojure code will fail. Use (map #(* % 2) '(1 2 3)) or (map (partial * 2) '(1 2 3)) instead of: (map #(%*2) '(1 2 3))


Great slides. I've been toying casually with a FP project for Python. It's an attempt to bring common FP abstractions and lazy evaluation of sequences using generators.

It's a work in progress, and I often think to myself, "Is this a good idea".

Here's the latest iteration: I'm in the middle of documenting things for an alpha release: https://github.com/ericmoritz/fp/blob/feature/documentation/...


I'm curious, on the Currying (Standard Library) slide, the author uses `attrgetter` from the `operator` module -- I've always used `getattr` to achieve the same thing. Am I missing something by not using `attrgetter`?


Short answer: not really.

Long answer:

  In [1]: from operator import attrgetter

  In [2]: class Person:
     ...:     def __init__(self, name):
     ...:         self.name = name
     ...:         

  In [3]: me = Person('walrus')

  In [4]: getattr(me, 'name')
  Out[4]: 'walrus'

  In [5]: attrgetter('name')(me)
  Out[5]: 'walrus'
This is potentially useful if you're using `map`:

  In [6]: people = [me, Person('aschwo')]

  In [7]: list(map(lambda obj: getattr(obj, 'name'), people))
  Out[7]: ['walrus', 'aschwo']

  In [8]: list(map(attrgetter('name'), people))
  Out[8]: ['walrus', 'aschwo']
Really, though, the functional programming idioms don't meld too well with Python. I think this is a lot nicer:

  In [9]: [person.name for person in people]
  Out[9]: ['walrus', 'aschwo']


On the last point, a list comprehension is just a much a functional programming idiom--Python got them from Haskell! I use both in my code, whichever is shortest.

PS: map returns a list, no need to call list() on it.


In Python 3, map returns a iterator, not a list:

  In [1]: sq = lambda x: x * x
  
  In [2]: map(sq, [1,2,3,4])
  Out[2]: <builtins.map at 0x2135050>
  
  In [3]: list(_)
  Out[3]: [1, 4, 9, 16]


attrgetter takes an attribute name and returns a function that, when passed an object, returns that attribute from that object. So it's a curried getattr. These are equivalent:

f = attrgetter("some_property")

f = lambda obj: getattr(obj, "some_property")


beautiful html slides, anyone knows what framework this is?



Wow, this presentation is very easy to step through in mobile Safari! I like the style a lot.

I wish I had the speaker to give a little more with each slide though.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: