Hacker News new | past | comments | ask | show | jobs | submit login

Yes, it's horrible.



I don't agree that it's horrible, but it really doesn't work particularly well in the context of the S4 objects in, for example Bioconductor.


Why?


Because combining an argument with a function call doesn't make sense. They have to do some voodoo under the hood to make it work and this reduces code understandability. The analogy with Unix pipes doesn't work either: one is passing an argument to a function while the other involves writing & reading a file. Finally, it's plain ugly and un-Lispy.


This particular pipe comes from F#.

Let's say I have a table called dat.

  dat %>%

    filter(col_a == 'Good') %>%  

    group_by(col_b) %>%  

    summarize(n = n(), sum_c = sum(col_c))

To do this in traditional R, I would have to:

  dat <- dat[dat$col_a == 'Good']

  dat_n <- aggregate(col_a ~ col_b, dat, length)

  dat_sum <- aggregate(col_c ~ col_b, dat, sum)

  merge(dat_n, dat_sum, by = "col_b")
I think the piped version is more readable. At least there are less variable to track.


That's because you're comparing to base-r. The data.table way would be:

  dat[col_a == 'Good', .(Length = .N, Sum = sum(col_c)), col_b]


That's all well and good for operations that are built into data.table; pipes can be used with anything that's a function call (like ggpage).


The problem with data.table is that in practice your data gets converted to something else when you pass it through packages -- many functions will return a data.frame or matrix, others in the Hadleyverse will return a tibble, and so on. So you have to constantly force your data back into a data table. R has so many datatypes that basically represent a spreadsheet/database table.


Data.table is great for SQL-style code but it imposes some annoying limitations, namely that the output is always coerced to a data.table.


Unix pipes don't read/write files.

There also isn't any reason why other implementations of pipes have to do buffered byte read/writes, passing objects is perfectly acceptable.

The structurally distinguishing aspect of a pipe-and-filter style is that the individual processing elements don't "return" to their "caller", but rather pass their result on to the next processing element. Without involving the caller.


> reduces code understandability.

Actually it makes R scripts much easier to grasp. It prevents the abusive usage of brackets, and it makes the data transformation flow more obvious.


Unlispy? What about threading macros? The voodoo you're referring to is simply computing on the language, which is quite lispy as well.


Which is more legible and understandable?

    f(a, g(h(x), b))
    h(x) %>% g(b) %>% f(a, .)
"Thus, programs must be written for people to read, and only incidentally for machines to execute."


I've never hit a bug merely because of using pipes. Is this a theoretical concern or can you give an example?


Maybe not a direct answer to your question but using dynamic variable names is kind of tedious with dplyr. You have to go around it by using paste() statements before passing the argument to a dplyr function, so it's not always elegant either.


The most recent version of dplyr adopted quasiquotation to make that easier:

https://cran.r-project.org/web/packages/rlang/vignettes/tidy...


Thanks! I was not aware of that!


That problem (solved in recent versions) has nothing to do with pipes though.


what is a better alternative for code understandability ?


I like it but it makes it hard to take R as a serious programming language (thankfully it's not in standard R) because where else would you actually use a coding pattern like:

    a = "foo"
    a = "bar"
reusing variables is silly.


The alternative is to use a different variable for each data transformation, which has costs for both system memory and code readability. And modern data analysis has a lot of transformations.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: