Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Python library for functional programming (github.com/entilzha)
129 points by allenleein on Dec 14, 2017 | hide | past | favorite | 95 comments



Sorry to digress - but the comment in the first code example:

> # or if you don't like backslash continuation

got me thinking. I've always felt backslash continuation is a Python wart - but using brackets in this way only feels like a minor improvement.

It's nearly always this scenario that makes me want to break a line - i.e. splitting a long chained method call on a "."

Now - starting a line of code with a "." is not valid Python as far as I'm aware and therefore allowing it would not introduce any ambiguity or significantly complicate Python's parsing rules. "If a line begins with whitespace followed by "." then consider it as a continuation of the previous line.

Can anyone comment on whether this is a terrible idea or not?


This is a peeve of mine, and one of the reasons that I find myself preferring to write javascript these days rather than python. Never would have thought I'd say that.

I'm not up to snuff on my compiler theory, but I think at the least this would require upgrading to a two-token lookahead parser, and it might also upgrade the python grammar from context-free to context-sensitive. Accordingly, I wouldn't expect this to get implemented -- IIRC ease of parsing was/is one of the big design goals.


In python line continuations are a pre-processing step as far as I'm aware (together with inserting INDENT and DEDENT tokens). In the same vein you could just do a simple loop over the token list and remove the appropriate NEWLINE tokens before going to the actual parsing.


Whitespace-sensitive blocks aren't easily parsed either.


Line continuations could be treated by the Lexer.


You can omit the leading 0 before the dot when writing float literals so just a '.1' or any other such sub-1 float literal with no leading 0 is technically a valid but useless line of code starting with a dot (you could use it instead of pass I guess..). I can't think of a real use of a line starting with a dot off the top of my head.


Probably nothing remotely resembling this belongs in anything remotely resembling production code, but:

  In [1]: class Foo:
     ...:     def __rmul__(self, other):
     ...:         print("%s foos!" % other)
     ...:
  
  In [2]: .42 * Foo()
  0.42 foos!


That's very nice. I tried to go for such an example originally but I only tried __mul__ because I thought it's like in Lua where __mul gets used even if only the second operand has a custom one.


Identifiers can't start with a number, so this should still be unambiguous probably?


Sure, that's the only example of line starting with a dot (I think, I'm not super good at Python, just okay) so anything else that isn't a float should be fine.


Pythonistas usually recommend you not use a backslash for continuation. The approach shown on the page is the recommended way, and I find it roughly about as readable as what you suggest. Does using parentheses for continuation trouble anyone?


> Now - starting a line of code with a "." is not valid Python

Absolutely agree. I really like the multi character piping at the end of lines that would get rid of the .

R has %>%

F# uses |> and for functions >>

I would love to see R and all languages just pick up and use F# |>


I think the %x% style is baked into R to differentiate built in and custom infix operators - %>% comes from Tidyverse IIRC.


Originated in the 'magrittr' package. The early tidyverse used a different pipe before depending on magrittr's version.


I don't think so. `%in%` is built into R (or at least in the base package).


I mean, I don't think there is a way to implement |> in R


Am I the only one who thinks Python's syntax is not really suited for functional programming? Creating an ad-hoc closure that spans multiple lines, for example, seems awkward.

PS: I'm not aware of any syntax changes in this area since Python 2.7, so I could be wrong.


I’ll say it: Python sucks for Functional Programming. It’s just not the right tool for the job (or job for the tool, depending on your point of view.)

It’s a great language for most programmers to get things done in, but it belongs to the school of ‘limited abstraction is for your own good’. For example, no multi line lambda expressions. The argument given is that it would mess up whitespace syntax. Haskell handles this situation, but it does indeed introduce a bit more syntax.

If you need to FP in a general purpose scripting language, I would be looking at Elixir or Javascript right now.


True, but for better or worse python has support for spark/sklearn/pandas/numpy/plotting/deep learning/web which makes it a good jack of all trades language for data/research science. Its strength is that it can do all of those at levels varying from proficiently to best-in-class.


> Prepare for 1.0 next release

So this will soon be a stable API? Great work. The documentation is extensive.

I am personally a big fan of toolz[0], which is clojure inspired. This object method chaining Scala inspired approach is different, and interesting.

It might be fun to make sort of comparison matrix of the difference in approach between this library, toolz, and the venerable fn[1] which also has a Scala influence (and apparently already implements the '_' lambda form).

[0] http://toolz.readthedocs.io/en/latest/

[1] https://github.com/kachayev/fn.py


From https://github.com/kachayev/fn.py :

> There are theoretical and practical advantages to the functional style:

> - Formal provability

This is bollocks. To prove things about your programs, you need a formal semantics for the language you're using.


> This [Formal provability] is bollocks.

How precisely does modularity, composability and ease of debugging improve from this module? Not at all. It may have benefits, sure. But to claim these as general attributes is just sophistry in my personal opinion.


Agreed. I just felt a stronger need to comment on the zeroth bullet point in the list.


Functional style makes it easier to have a natural semantics that corresponds directly to the language proper: you can use a small-step semantics where evaluation is just reduction, and you then have a very simple and natural denotational semantics (the denotation of an expression is just the value it evaluates to) that obviously corresponds directly to your operational semantics. All that is much harder to do with programming styles that make use of pervasive, implicit state.


A semantics is a property a programming language, not the style you happen to program in.

---

@lmm

> In practice no practical language provides formal semantics for the entire language

Programming languages are mathematical objects and always have a formal semantics, regardless of whether someone has taken the trouble to write it down or not.

What I'm saying is - the formal semantics that most languages have preclude the existence of compound values. Without those, you can't define interesting functions, and thus you can't program in a functional style.


Responding to edits:

> Programming languages are mathematical objects and always have a formal semantics, regardless of whether someone has taken the trouble to write it down or not.

Not true for the usual plain-english meaning of "programming language". Plenty of things we call programming languages take a "the implementation is the spec" view, or simply can't be assigned any useful semantics in a way that's at all aligned with the defining characteristics of the language in question. Even in, say, Haskell, io explicitly lacks formal semantics.

> What I'm saying is - the formal semantics that most languages have preclude the existence of compound values. Without those, you can't define interesting functions, and thus you can't program in a functional style.

Python tuples are perfectly cromulent compound values. Python has a bunch of builtin classes for which the equality operators that are available in the language are all kind of useless, sure, but that's true in almost all languages.


> Plenty of things we call programming languages take a "the implementation is the spec" view,

And do you think the implementation isn't a mathematical object itself?

> or simply can't be assigned any useful semantics in a way that's at all aligned with the defining characteristics of the language in question.

This is a contradiction in terms. By definition a semantics is a mathematical descrpition of the meaning of a programming language, i.e., the behavior of its programs. Sometimes a semantics exposes significant differences between the mental model of programmers and what a programming language really is, but the solution is not to turn a blind eye to reality.

> Python tuples are perfectly cromulent compound values.

So long as they have object identities, they are not values. Values are merely represented in memory - they truly exist in the language's semantics.


> And do you think the implementation isn't a mathematical object itself?

Only if you're taking the Tengmark "mathematical universe" view. The implementation is not necessarily any more mathematical than, say, a table.

> This is a contradiction in terms. By definition a semantics is a mathematical descrpition of the meaning of a programming language, i.e., the behavior of its programs. Sometimes a semantics exposes significant differences between the mental model of programmers and what a programming language really is, but the solution is not to turn a blind eye to reality.

The meaning expressed in a language is some alignment of what the writer intended and what the reader understood. If different readers don't reach a shared understanding then there is no meaning. It would be desirable for our programs to always have meanings that aligned with their behaviour, but we shouldn't turn a blind eye to the fact that many programs don't.

> So long as they have object identities, they are not values. Values are merely represented in memory - they truly exist in the language's semantics.

Any practical programming language will give you some way to find the memory address of a particular value, so values that we usually consider as semantically identical are not actually indistinguishable in practice. We deal with this by only assigning semantics to a subset of the language that excludes those operations. Python without "is" may be more obviously a subset than, say, Haskell without unsafeCoerce, but both are subsets.


> It would be desirable for our programs to always have meanings that aligned with their behaviour, but we shouldn't turn a blind eye to the fact that many programs don't.

The meaning of a program is its behavior, period. If reality doesn't agree with your views, it's your fault, not reality's.

> Any practical programming language will give you some way to find the memory address of a particular value, so values that we usually consider as semantically identical are not actually indistinguishable in practice.

Standard ML doesn't give you any way to get the memory address of a particular value, so values that are semantically identical are actually indistinguishable in practice.

> We deal with this by only assigning semantics to a subset of the language that excludes those operations.

That doesn't make sense. Semantics is a property of the real programming language you are using, not the fictitious one you wish you were using.


> The meaning of a program is its behavior, period. If reality doesn't agree with your views, it's your fault, not reality's.

That's not the usual meaning of, well, meaning.

> Standard ML doesn't give you any way to get the memory address of a particular value, so values that are semantically identical are actually indistinguishable in practice.

Just tried it to be sure:

    fun address i = Unsafe.cast(ref i) : Int32.int ;;
    val a = 4 :: nil ;;
    val b = 4 :: nil ;;
    address (a) ;;
    address (b) ;;
    address (a) ;;
And yep, a and b are semantically identical values but distinguishable in practice:

    val address = fn : 'a -> Int32.int
    val a = [4] : int list
    val b = [4] : int list
    val it = ~175003712 : Int32.int
    val it = ~174998232 : Int32.int
    val it = ~175003712 : Int32.int


SML/NJ is not an implementation of Standard ML. It's known to be non-conformant in several ways: http://mlton.org/SMLNJDeviations


If SML/NJ doesn't count as Standard ML then I'd say Standard ML doesn't count as a practical programming language.


I use MLton and Poly/ML on a daily basis. No problems with them at all.


In practice no practical language provides formal semantics for the entire language, only for restricted subsets of it i.e. particular styles within the language.


Author here. 1.0 has been out for a while now, and I'm very committed to API stability. 1.1 (or 1.0.1) should be out as soon as I have some time with mostly minor fixes here and there.

I hadn't considered doing a comparison matrix, but I haven't done a '_' lambda since it seems redundent with the one in fn.


... and functools/itertools for completeness?


Shameless plug: https://github.com/sfermigier/awesome-functional-python (a list of documents and projects related to FP in Python, including, already, the OP's project).

I'm personally using Toolz so far, probably because it's the most popular of the maintained ones.

Would be happy to see both some standardisation (a PEP?) and consolidation between the various projects.


It would be great to have more native support for FP (flat map is a glaring example I run into a lot). It would help if there were 1) a less verbose anonymous closure syntax 2) support for a multi-line closure.


First off, well done.

My big issue with method chaining is exactly the one of requiring the line-continuation-escape, or enclosing the whole thing in parens. You also need to wrap the initial value, then possibly unwrap it at the end, and you're limited to the methods you define (mind you, this seems like a pretty comprehensive list).

For this reason I decided to emulate clojure's threading macros instead, as a pair of functions:

    pipe_{first,last}(
        init_val,
        some_unary_callable, # so far so familiar
        (some_n_ary_callable, *args), # puts the piped value first or last, depending on {first/last}
        ...
    )
Unfinished is the version that makes `(callable, ...args..., _, ...more_args...)` work, is that what you mean by the _ lambda operator?


Not OP, but author. Line continuation is pretty annoying. Adding custom methods wouldn't be hard (https://github.com/EntilZha/PyFunctional/issues/113), I just don't have time right now to add it.

With respect to _, this would make these two equivalent: a = lambda x: x + 1 b = _ + 1 a(1) == b(1)

Is there an expanded example I could look at for the pipe stuff, I haven't used clojure before


Are you aware of the thread_first and thread_last functions from toolz?


I wasn't thanks for the pointer!


I'm no expert in functional programming, but this feels a bit inside-out and backwards to me. It's writing f(g(h(x))) as x.h().g().f(), which doesn't read like function composition to me. If anything, the comprehensions already built into Python look more natural and more functional:

   (f(v) for v in (g(u) for u in (h(w) for w in x)))
The dots in the x.h().g().f() notation also suggest mutation (since we're calling methods on objects here) in a way that makes me uncomfortable.

But like I said, I'm no FP expert. My objections are basically aesthetic. Maybe if I used it, I'd learn to like it.


The way you read x.h().g().f() is essentially the same way you read it in F# (with |>), Haskell(with . ,although it goes the opposite direction) and R(with %>%).

It is one of the most aestetically pleasing attributes of use in the functional languages


So the thing that looks less like the function composition I know from algebra is more idiomatic in FP and more pleasing to the FP cognoscenti? I fear my familiarity with FP may be even more deficient than I thought. I could understand writing f . g . h, but what we're talking about here is written more like Unix pipelines. In which case, I could forgive overloading the bitwise or operator, which is already left-associative. You could then do x | h | g | f.

Again, this is a purely aesthetic argument -- the intent of the library seems unclear to me from its syntax. If it's beautiful and clear to you, I'm not going to object to your using it. Aesthetics aside, I don't think there's anything wrong with this library.


Here's how you could hack up a pipe operator in Python:

    class Funk:
         def __init__(self, f):
             self._f = f
         def __or__(self, other):
             return Funk(lambda *_, **__: other(self(*_, **__)))    
         def __call__(self, *args, **kwargs):
             return self._f(*args, **kwargs)
         
This lets you do fun stuff like:

    pipeline = Funk(h) | Funk(g) | Funk(f)
    pipeline(x)
    pipeline(y)
And that makes me happy that I participated in this discussion.

I'm kind of surprised nobody's done this before. Unless they have. Any pointers to more fully fleshed out implementations of this idea?


You could actually simplify even more with the trick used in a comment farther down (https://github.com/0101/pipetools). That way you would implement __or__ on for example `pipe` and not have to wrap each function.

As it turns out, implementing that different syntax in pyfunctional wouldn't be too hard or API breaking I think. Mainly it would require 1) wrapping/exporting functions to a module you could bring into scope (eg `from functional import functions as F` or `from functional.functions import *`), 2) Writing wrapper code to provide something functionally similar to `pipe`.

On libraries, I swear I saw something a while back, but my googlefu just now didn't help me find it.


I'm not into functional programming in Python, but pipetools[0] looks way more handy.

[0] https://github.com/0101/pipetools


It looks like theres not a huge amount of overlap.

Pipetools is all about implementing a "clever" syntax for data flows, whereas PyFunctional looks like it's much more about providing a library of functions for manipulating data in functional ways.

I would expect that PyFunctional could probably implement the pipe syntax if they wanted, as just a layer of syntactic sugar on top of their core, however that would break OR semantics in places so I'm not convinced that syntax is a good choice, as "clever" as it may be.


Also Coconut looks interesting: http://coconut-lang.org/


Is there a reason to use this over Coconut? http://coconut-lang.org/


Coconut is a compile-to-python(2/3) language, not just a library, so if you use it you have to be OK with using a transpiler in your project.

Here's one way you might write one of the examples in Coconut:

  ([1, 2, 3, 4]
      |> map$(-> _ * 2)
      |> filter$(-> _ > 4)
      |> reduce$((x, y) -> x + y)
  )
You may freely mix Coconut and Python in a project, even in the same file, but any Coconut code still needs to be run through the compiler.


shameless plug, if you like this you may also like my library tryme which ports the Either and Maybe monads to Python http://tryme.readthedocs.io/en/latest/


Without exhaustiveness checks for all your case analyses, what exactly is the point?


The point is that you can indicate whether an operation was successful or not by wrapping that value w/ a Success or Failure, rather returning a boolean, status code, and some subset of information about the operation.


And how is that useful in the absence of pattern matching? If you write:

    if foo.is_success():
        ...
    else: # foo.is_failure()
        ...
Then you're back to branching on booleans. And we haven't even addressed the “lack of exhaustiveness checks” objection.


These libraries in languages like python remind me of when Windows 95 was new and a bunch of Win3.1 compatible clones of the win95 task bar appeared.

People are often satisfied by the veneer of something better, because, they were told it was better but don't truly understand why. This is super common in the programming world, I'm afraid.


Agreed. While algebraic data types are obviously a good idea, the best way to use Python is to accept it for what it is: a dynamically typed object-oriented language that's not very good at manipulating user-defined values (or even defining them to begin with), but has some pretty cool tricks of its own, like runtime metaprogramming with decorators and metaclasses.


I think you place python in a much smaller box that it deserves. before it was object-oriented it was functional and procedural. For certain things I use it in a purely functional way but have never felt the need for one of these libraries to do so.


Python has always had procedures, of course, but was it ever a functional language? Let's do a bit of fact-checkcing:

(0) Functional programming is expressing computation as the evaluation of functions. A language is “functional” to the extent it allows you to do functional programming.

(1) A function is a mapping from values to values. To express computation as the evaluation of functions, you need a rich universe of values: not just primitive ones like integers and object references, but also tuples, lists, trees, etc.

(2) Values are timeless entities that are distinguished from each other structurally. For example, it makes no sense to distinguish “this list [1,2,3]” from “that list [1,2,3]”, because both have the same structure - the same parts. Thus, what Python calls “tuples” and “lists” are not really tuple values and list values.

---

@_9jgl

Let's see how equal they remain in a timeless fashion:

    xs = [1,2,3]
    ys = [1,2,3]
    zs = ys
    send_to_another_thread(xs,ys,zs)
    print(xs == ys) # who knows
    print(ys == zs) # True
    print(xs is ys) # False
    print(ys is zs) # True
Turns out, the real equality testing operator in Python is `is`, not `==`. The only values Python gives you are object references.

---

@lmm

> This is by no means the only way to write Python

This is not the problem. The problem is that Python doesn't have compound values. Values are a matter of semantics, and semantics is a matter of how the language is defined, not how you choose to program in it.


Words are defined by consensus and use. To a lot of people "functional" means having first-class functions (in particular: functions can be defined anywhere (i.e. the language has closures) and functions are first-class values (i.e. functions can be passed as function parameters and returned from functions; function values can be expressed separately from assignment (i.e. the language has lambdas))) and programming in a style that makes use of map, reduce, and filter; more generally programming with immutable values and functions that return transformations of those values (as distinct from programming with pervasive state that is mutated). This is by no means the only way to write Python, but it's practical in Python to a much greater extent than it is in, say, Java.


> (2) Values are timeless entities that are distinguished from each other structurally. For example, it makes no sense to distinguish “this list [1,2,3]” from “that list [1,2,3]”, because both have the same structure - the same parts. Thus, what Python calls “tuples” and “lists” are not really tuple values and list values.

Would you mind expanding on this? I'm not sure I really understand what you mean – if we have two lists a = [1,2,3] and b = [1,2,3], then a == b, so I don't see how Python is distinguishing them from each other.


Sum types make very little sense in dynamically typed languages even with good manipulation of user-defined values. For instance Erlang does not have "formal" sum types, it wouldn't really bring anything useful to the table, the compiler does not typecheck by default (so the sum type would just be a complication with no benefits) and Dialyzer can simulate them by using type/value unions e.g.

    {ok, integer()} | {err, string()}


While I agree that there would be no point to user-defined sum types, constructor classes would be a strict improvement over what Erlang has.


I don't use this but you could argue that it's useful to have a standard encapsulation of failure/missing that isn't null? Where some situation means that null already has a meaning, the alternative is to create a wrapper class for your situation (not general), or a general wrapper class which is basically what this is. So it's not completely useless whilst kind of crippled by no pattern matching?

Edit: although I guess that just using a tuple would be better for this maybe


> I don't use this but you could argue that it's useful to have a standard encapsulation of failure/missing that isn't null?

I don't see what encapsulation you're talking about. Maybe and Either are very much concrete, not abstract, data types.

> Where some situation means that null already has a meaning, the alternative is to create a wrapper class for your situation (not general), or a general wrapper class which is basically what this is.

Unless you want to take advantage of existing plumbing infrastructure such as a monad transformer library, custom wrapper classes for your particular situation are precisely the right tool for the job.


It's useful if it's a monad (and functor) because then you have tools to avoid this check until the end of a series of possibly failing functions. It's only really useful with these tools though.

By itself you might as well use nulls or throw errors although you could argue that it's clearer having a maybe or an either.


> It's useful if it's a monad (and functor) because then you have tools to avoid this check until the end of a series of possibly failing functions.

The monad laws already fail miserably in Haskell when you account for nontermination. Haskellers can only cope with this by eschewing partial functions as a matter of coding style. Now, eschewing partial functions, not to mention effectful procedures, is essentially impossible in a unityped language like Python. Are you sure you still want to play the monad game?

Of course, the decision to tolerate leaky abstractions is very much subjective. But constantly dealing with abstraction leaks is, in my experience, far more burdensome than not using abstractions to begin with. This is especially the case with mathematical abstractions whose meaning is given by equational / rewriting rules.


Which monad laws fail miserably in Haskell with non-termination, and for which monads?


All three monad laws and all monads. Just use `seq`.


Python has lambdas so you can still write:

    foo.fold(lambda successResult: ..., lambda failResult: ...)
Even without that, returning a "foo" that has to be explicitly unwrapped makes for a safer API than having "foo" be sometimes a valid result and sometimes silently an error instead.


> Python has lambdas so you can still write: (snippet)

Do you seriously consider this less of a hassle than branching on a frigging bit?


I think single entry/single exit is very valuable, and worth a bit of clunkiness for. And the branch version is also pretty clunky: either the long-winded "return ", abuse of "and"/"or", or the weirdly backwards expression version of "if" that Python has.


I don't disagree that branching on bits is clunky. That it's still less clunky than using Church or Scott-encodings speaks volumes of the practicality of the latter.


Also the link has a big article about why it's useful


Funny idea :)


This is quite cool, though it seems similar to RxPy (https://github.com/ReactiveX/RxPY)


Curious question: Is there also something like Mobx [1] for Python?

I'd love to have a reactive state libraries like Mobx on server-side.

Mobx is also reactive, but has a different focus: It is not so much avout the low-level event dispatching, but reacting on data changes, recalculating derived values (but only those affected, and being lazy on non-observed dependent values). And Mobx determines dependencies of calculated values automatically.

Also, Vue [2] has Mobx functionality built-in, with an even better syntax, but closely tied to the framework.

Alas, both are JavaScript and targeted at client-side / user interface. Is there something similar for Python? (See also my question on SO [3].)

[1] https://mobx.js.org/

[2] https://vuejs.org/

[3] https://stackoverflow.com/q/47808562/19163


I'm not deeply familiar with Mobx, but there's a few Redux implementations for Python. See pydux[0] and aioredux[1].

Not the same thing, of course, since Mobx is much more sophisticated than Redux, but in my team we're currently experimenting with aioredux for state management of client data, which drives the React+Redux frontend. It's working surprisingly well so far, although the biggest hurdle is team familiarity with functional aspects of Redux, and general structuring and responsibility of reducers.

The library is no longer maintained, but considering the simplicity of the implementation and of Redux itself, we felt it wasn't a show stopper since we could always take over maintenance ourselves.

I'd also be interested to know of Mobx Python implementations.

[0]: https://github.com/usrlocalben/pydux

[1]: https://github.com/ariddell/aioredux


Thanks for the pointer, although that's not what I'm looking for.

The fundamental difference between reactive state management (Mobx) and the Redux approach is that Mobx has some automatisms that are vital to what I'm looking for:

- automatic recognition of actual value dependencies (i.e. minimal update chains)

- automatic determination of the order of updates (so you don't have to think about and design your reducers along their dependencies ... because this is an annoying task which can and hence should be automated ... it is prone to inefficiencies, and sometimes errors, when done by hand.)


Very nice. I just wished Python would have a proper lambda operator. As far as i know they still have that "one-line" lambda keyword which is very restrictive for functional programming.


Definitely approve of the name EntilZha for your GitHub user name.

I was interested to see that coexisting with Pandas is supported, but the documentation seems either scant or unfinished, I can't tell which:

https://github.com/EntilZha/PyFunctional/blob/master/example...


Thanks on name!

Fair enough on docs being a bit scarce. The main intent is to being able to easily make a pandas dataframe a sequence of tuple/namedtuples, and convert a sequence of tuple/namedtuple to a pandas dataframes.


There seems to be a lot of similar functionality (although in a very different paradigm) to Pandas, I'd be very interested to see performance benchmarks between the two on some common pipelines (csv -> filter -> aggregate -> SQL)


For operations where vectorization pandas will be much faster since its based on numpy. That being said, there are things which are very awkard to do in pandas which are very easy in pyfunctional


This looks a lot like Clojure's standard library, and you could probably steal more ideas from it, e.g transducers https://clojure.org/reference/transducers


Python should somewhat provide a way of chaining these iterators. Coming from Ruby background, I really miss the ability to chain iterators.

And the backslashes shouldn't be mandatory for wrapping method chains.



Trailing backslashes are so ugly


Reminds me of dplyr. How well does this work with pandas?


Data interchange between the two is relatively seeemless. seq(df) converts a pandas dataframe into a sequence of tuples/namedtuples. Calling .to_pandas(columns=cols) will convert a sequence of tuples/namedtuples to a dataframe with cols optionally being the column names


Is it just me or this seems too much like ruby?


How does its parallelization feature work with lazy generators? Or does its seq function not return generators?


Seq doesn't return a generator. At its core there is a concept of Lineage taken from Apache Spark. Essentially, when you do something like seq(data).map(func) it builds on a list of operations to execute and holds a copy of the base data. When asked for a value only then will it compute the result.

So for parallelization operations that are embarassingly parallel (map/filter etc) are paralelized with multiprocessing. If the data is heavy and serialization is expensive it might be slow, but for operations where the bulk of the work is done in parallel it can help a lot.


Thank you to the author: another day another. Wat library to tinker with.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: