Personally I've never liked pipes outside of scripting one-offs because I've found them difficult to read. This is because in languages like R argument order can have meaning.
Even if the piped argument goes in a certain location by default, it still creates an exception to all the other arguments' syntax, which adds another layer of complexity.
I guess at heart lisp syntax has always made the most intuitive sense to me personally. algol/c does as well, but the more you mix them the more I tend to dislike it.
I guess pipes always feel like you're "proceduralizing" functional constructs, which makes things more murky (although maybe that reflects some deep equivalence between the two?)
This is just my personal preference, and in an R context. I could be convinced differently. Part of it too is I feel like R is fragmenting by lots of defacto standards, and not in desirable ways. I increasingly find myself wanting to use other languages, even more so than in the past.
I agree. The goal of these efforts seems to be to make r less idiomatic, more “tidy”.
They do this by introducing new idioms. Granted they are idioms that if every r user would adopt 100% and all older code were rewritten 100% would in fact make the language more tidy. But that will never happen. So it ultimately is counterproductive imo. If I want to write concise shell style pipelines for data analysis I could use awk. If I wanted readability I could use python. Don’t try to cram all of those things into Posit/r.
According to the tidyverse style guide, using magrittr's pipe assignment operator %<>% (also mentioned in the blog post), should actually be avoided. I guess because it makes code reading harder.
It also makes the testing of the right hand side of the pipe rather difficult
a <- a |> mean()
Allows me to test `a |> mean()` to check it works before it gets assigned to replace a. Very useful when you have very long pipe commands. <|> Wouldn't allow this.
It still annoys me that they had to invent another pipe instead of just recreate Magrittr’s in C to gain the speed it apparently lacks. And then missing out on one of the nicer functions of magrittr…
I agree with the post that such an operator should exist., but I think it should simply be the magrittr one.
I use magrittr pipes pretty much exclusively because of the extra features relative to the native pipe, but I also respect the R devs' decision not to add those extra features to the core of the language. The magrittr pipe adds additional syntax beyond the pipe itself, namely the '.' variable to refer to the piped value (essentially it's just Perl's '$_'). It's incredibly useful, but it's not particularly intuitive the way the basic pipe is.
In Common Lisp you put the print statement inline, it will return the result of the form you pass it so it doesn't change the results. If you want to know what f is returning in the form (h (f (g x))) you'd (h (print (f (g x)))) and it will print what f returns to the repl.
Or use Stickers provided by Sly which sort of log things as they change. https://joaotavora.github.io/sly/#Stickers In that case you'd highlight the form starting at f, and turn a sticker on. Then run the code, sticker outputs into a window.
I always thought R was a vestigial artifact of the bad old days of computing that had no place in the Real World (tm), but then I had to use it for a graduate-level data analytics course for business students.
And then it all made sense. Non-coders equate R with RStudio to the point that they don't really get that R and its editor were separate concepts, and therein lies the power.
Everybody uses the same IDE and it always works. No more figuring out why the language server isn't offering completions or wondering why ESS has such awful defaults.
R (and the tight integration with RStudio) makes it very easy for non-programmer types to conduct their own sophisticated exploratory analysis and to produce good looking plots without mucking about in Excel.
I actually love that R uses '<-' for assignment. Assignment is an inherently asymmetric operation: it takes the RHS and assigns it to the LHS, and the arrow visually indicates this. It feels more correct to have a visually asymmetric operator for it.
When you do `f(x=5)`, x doesn't get set to 5 anywhere you can see (it happens inside the body of f, which you are treating as a black box). So logically, in the context of the code that's calling f, it's not an assignment operation.
Admittedly this kind of goes out the window with a lot of tidyverse non-standard eval stuff. E.g. if you do `df %>% mutate(x = 5)`, then x does get set to 5 in the data frame, so that is logically an assignment of x in a visible scope using '='. Going by my logic, functions like mutate should ideally use a different symbol for operations that alter the input like this, but we run up against syntactic limitations of the language.
I think the argument is that, to the uninitiated, x = x + 2 looks more like an invalid equality than "replace the value stored in x with its current value plus 2".
Personally, it seems to me that = for assignment is so ubiquitous that this argument has become stale.
I still use the arrow because it's the prevailing recommendation in R style guides.
I use R quite a lot for data extraction, manipulation and visualisation, it is the most used tool in my area of academia. You are correct that RStudio seems to be the only IDE but I'd say most users know that R is the language and RStudio the IDE.
That being said many decent GUIs have been developed for a more traditional analysis feel, but still use R under the hood, I like JASP the most out of the current options.
Doesn't 3..5 mean elements 3 and 4 : the 3rd and the 4th? 0-based indexing might feel less intuitive actually (element 3 is the 4th, and element 4 is the 5th). I don't know much R, I'm assuming that in the n..m notation, n is included and m is excluded (edit: also included, see replies).
I don't now about economy, but I think mathematicians use 0-indexed and 1-indexed sequences interchangeably and they usually (have to) specify which is the first term (u0 or u1).
R's indexing is not only 1-based, it's inclusive on both ends. So indexing from 3 to 5 actually gets you elements 3, 4, and 5. I've programmed a bunch with both R and 0-based languages and I greatly prefer R's way of doing indexing.
I'm pretty sure it's like this in Pascal, too. Which makes sense, since it's the only way to reasonably index 1-based arrays. For example, you'd expect the following code to set arr to the set "1, 2, 3, 4, 5", not to leave the last element uninitialized.
var
arr: array [1 .. 5] of integer;
begin
for i := 1 to 5 do
arr[i] := i
end;
Normally you would write "for i := low(arr) to high(arr)". I just wrote the numbers "1 to 5" to be explicit here.
Oh, right! I parsed the last sentence in a weird way I guess. Also thanks rcthompson for the precision about intervals being inclusive on both ends in R.
Because it has five assignment operators and a tendency to fail silently and to keep going after something errors, producing bad results for people that aren't programmers and maybe aren't good at debugging, and is only usable if you install a third party package (tidyverse).
I realise I'm in the minority, but I personally like using the right assignment operator (->) with pipes because it maintains the flow of the code when reading.
I like R a lot but I am not a fan of this sort of operator. It seems idiosyncratic and somewhat ugly also. I think code should be readable beyond all else and the examples with this operator do not look very nice.
Interesting. I am happy assigning the traditional way but don't mind the idea of an assignment pipe. Personally, I still use the magrittr pipe because I prefer the syntax (and love the keyboard shortcut) to the base pipe.
The evidence section kinda lost me (what is `wardmap@data` supposed to be? - I'm not sure it's valid R code, and can't spot it on stack overflow as the article suggests?)
Small correction: under the bit about 'A simple syntax transformation converts .. into ..' the two should be the other way around.
An assignment pipe should be a relatively easy sell since it produces more concise code, and lets the developer work left to right, top to bottom (how we all want to work), rather than having to backtrack to place `df <- ` at the start.
I'd try demo it on super common use cases (so the average R user can easily see how coding patterns are improved by it).
EDIT: came across a nice explanation of the S4 system [1] under the 'Overview' and 'Slots and accessor functions' sections. Seems it's commonly used in genetics and geospatial work.
I agree with you proposing this operator to simplify the specific (but not uncommon!) situation you describe. But i don't think, it exposes a problem to be solved but rather an antipattern to be avoided. Consider the following example in which mytransform is some function i want to apply to a part of the dataset:
This can be simplified with your proposed operator:
mydata['score'] <|> mytransform()
An upgrade in elegance, i like it. But in R, a language which is commonly used switching between scripts and the REPL, with an IDE which by default captures your workspace so all your variables etc are restored the next time you resume the session, in this environment i feel like variables should be used as constants as much as possible. Mutating a variable (as in my example) creates room for confusion: Does mydata contain the raw csv data or the transformed data? Did i already evaluate this line in the REPL or did i not? What happens if i evaluate this line twice? Many people i know tend to "jump around" in their scripts, not following the written order of operations. This creates potentially irreproducible environments.
My proposed solution is treating all variables (or at least as many as possible) like constants:
Now i always know what a variable contains because each variable contains exactly the same value, independent of when i evaluate.
But nevertheless; i kinda went on a tangent here and strictly speaking, the problem i describe only arises from careless user behavior (which is quite prevalent in statistics though). Aside from this kind of behavior, i think this operator is an elegant idea!
R has got great parts to it. It has the two best libraries for general data manipulation (dplyr and data.table), which blow equivalents in other languages away. It has the best plotting library (ggplot2). And the RMarkdown and Shiny ecosystems are incredibly powerful and useful tools.
R is used because it remains the single best language for data manipulation and reporting.
R scratches an itch no other language quite manages to reach, which is why it is used despite its peculiarities. Lots of languages exist[ed], more or less, for the same reason (php and javascript come to mind).
A visual approach with nodes and arrows gets around language syntax issues. However it comes own challenges, e.g. descending into a spaghetti mess if you aren't careful.
I used magrittr %<>% a lot back when I used R on a regular basis. Given that |> made it into the language, a pip-assignment operator doesn't seem too outlandish.
I wouldn't be opposed to the addition of this new pipe. The author has a decent argument for it being helpful in some complex special cases. But I don't know if I would want to see this used extensively. IMHO the proliferation of new notation in a quest for ever-more concise code just leads to APL, which most people don't like. Shorter isn't always more readable.
The comparison in this example is misleading:
# before
names(data)[1:2] <- paste0(names(data)[1:2], "_suffix")
# after
names(data)[1:2] <|> paste0("_suffix")
Since we're talking about pipes, the first option should use pipes:
# before
names(data)[1:2] <- names(data)[1:2] |> paste0("_suffix")
# after
names(data)[1:2] <|> paste0("_suffix")
For pipe enthusiasts this is already pretty clear. The thing on the left and right side of the assignment is the same. Compressing this saves characters and avoiding repetition of the 1:2 part is nice, but I don't know if the cost in familiarity is worth it. In any case involving a data frame, I would prefer using the .cols parameter of the rename family of verbs to either of these base R approaches.
Also, a linked post by this author [1] is materially outdated on the use of case_when and across (of course I sympathize - the tidyverse has moved very fast in the past few years). Thanks to the new backslash notation for anonymous functions and various other dplyr upgrades, it is very elegant to do assignment across multiple columns based on conditions evaluated against the entire data frame. Behold:
Over the past decade, one thing I have learned is to never, ever count Hadley Wickham and the tidyverse team out when it comes to optimizing their API. If there is a lack of expressiveness or orthogonality, they will fix it. The R community will take a while to absorb their ideas because there are so many idioms flying around (partly the fault of previous versions of the tidyverse), but the model they've landed on recently is incredible and should be a model for tabular data manipulation in any language.
The posited DRY-related benefit ultimately also comes back to conciseness, because (as the author does note) you can always avoid repeating yourself by assigning expressions to an intermediate variable instead of repeating expressions.
Not super related, but R also needs some way to handle lists in the pipe chain. Maybe something like `&>` to apply over all elements (and `1>`, `2>`,..., `N>`, for piping on individual elements), and a collector operator like `]>` (or `1,2-4,N]>` to collect subsets), so that one can do:
> data <- data |>
> split_by(field) &>
> # mutate over all
> mutate(calc=...) 1-5>
> # filter the first 5 elems
> filter(…) &]>
> # control back to parent
> collect() ]>
> # collect all elements and
> # implicit rbind them
> ...
I get it, but the syntax is surprising. This is more like += etc. so I would expect a notation like ()= (ie. apply-equal) in a way that is not related to piping I guess.
Even if the piped argument goes in a certain location by default, it still creates an exception to all the other arguments' syntax, which adds another layer of complexity.
I guess at heart lisp syntax has always made the most intuitive sense to me personally. algol/c does as well, but the more you mix them the more I tend to dislike it.
I guess pipes always feel like you're "proceduralizing" functional constructs, which makes things more murky (although maybe that reflects some deep equivalence between the two?)
This is just my personal preference, and in an R context. I could be convinced differently. Part of it too is I feel like R is fragmenting by lots of defacto standards, and not in desirable ways. I increasingly find myself wanting to use other languages, even more so than in the past.