Yes, it's horrible.

coetzeesg · on Aug 29, 2017

I don't agree that it's horrible, but it really doesn't work particularly well in the context of the S4 objects in, for example Bioconductor.

minimaxir · on Aug 28, 2017

lottin · on Aug 28, 2017

Because combining an argument with a function call doesn't make sense. They have to do some voodoo under the hood to make it work and this reduces code understandability. The analogy with Unix pipes doesn't work either: one is passing an argument to a function while the other involves writing & reading a file. Finally, it's plain ugly and un-Lispy.

sin7 · on Aug 28, 2017

This particular pipe comes from F#.

Let's say I have a table called dat.

  dat %>%

    filter(col_a == 'Good') %>%  

    group_by(col_b) %>%  

    summarize(n = n(), sum_c = sum(col_c))

To do this in traditional R, I would have to:

  dat <- dat[dat$col_a == 'Good']

  dat_n <- aggregate(col_a ~ col_b, dat, length)

  dat_sum <- aggregate(col_c ~ col_b, dat, sum)

  merge(dat_n, dat_sum, by = "col_b")

I think the piped version is more readable. At least there are less variable to track.

extr · on Aug 29, 2017

That's because you're comparing to base-r. The data.table way would be:

  dat[col_a == 'Good', .(Length = .N, Sum = sum(col_c)), col_b]

jcheng · on Aug 29, 2017

That's all well and good for operations that are built into data.table; pipes can be used with anything that's a function call (like ggpage).

jhbadger · on Aug 29, 2017

The problem with data.table is that in practice your data gets converted to something else when you pass it through packages -- many functions will return a data.frame or matrix, others in the Hadleyverse will return a tibble, and so on. So you have to constantly force your data back into a data table. R has so many datatypes that basically represent a spreadsheet/database table.

nerdponx · on Aug 29, 2017

Data.table is great for SQL-style code but it imposes some annoying limitations, namely that the output is always coerced to a data.table.

mpweiher · on Aug 28, 2017

Unix pipes don't read/write files.

There also isn't any reason why other implementations of pipes have to do buffered byte read/writes, passing objects is perfectly acceptable.

The structurally distinguishing aspect of a pipe-and-filter style is that the individual processing elements don't "return" to their "caller", but rather pass their result on to the next processing element. Without involving the caller.

ekianjo · on Aug 29, 2017

> reduces code understandability.

Actually it makes R scripts much easier to grasp. It prevents the abusive usage of brackets, and it makes the data transformation flow more obvious.

lionel- · on Aug 29, 2017

Unlispy? What about threading macros? The voodoo you're referring to is simply computing on the language, which is quite lispy as well.

dandermotj · on Aug 29, 2017

Which is more legible and understandable?

    f(a, g(h(x), b))
    h(x) %>% g(b) %>% f(a, .)

"Thus, programs must be written for people to read, and only incidentally for machines to execute."

Gatsky · on Aug 28, 2017

I've never hit a bug merely because of using pipes. Is this a theoretical concern or can you give an example?

ekianjo · on Aug 29, 2017

Maybe not a direct answer to your question but using dynamic variable names is kind of tedious with dplyr. You have to go around it by using paste() statements before passing the argument to a dplyr function, so it's not always elegant either.

cwyers · on Aug 29, 2017

The most recent version of dplyr adopted quasiquotation to make that easier:

https://cran.r-project.org/web/packages/rlang/vignettes/tidy...

ekianjo · on Aug 29, 2017

Thanks! I was not aware of that!

lionel- · on Aug 29, 2017

That problem (solved in recent versions) has nothing to do with pipes though.

hobolord · on Aug 28, 2017

what is a better alternative for code understandability ?

curiousgal · on Aug 28, 2017

I like it but it makes it hard to take R as a serious programming language (thankfully it's not in standard R) because where else would you actually use a coding pattern like:

    a = "foo"
    a = "bar"

reusing variables is silly.

minimaxir · on Aug 28, 2017

The alternative is to use a different variable for each data transformation, which has costs for both system memory and code readability. And modern data analysis has a lot of transformations.