I wouldn't be opposed to the addition of this new pipe. The author has a decent ... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

civilized on Dec 30, 2023 | parent | context | favorite | on: The case for a pipe assignment operator in R

I wouldn't be opposed to the addition of this new pipe. The author has a decent argument for it being helpful in some complex special cases. But I don't know if I would want to see this used extensively. IMHO the proliferation of new notation in a quest for ever-more concise code just leads to APL, which most people don't like. Shorter isn't always more readable.

The comparison in this example is misleading:

  # before
  names(data)[1:2] <- paste0(names(data)[1:2], "_suffix")
  # after
  names(data)[1:2] <|> paste0("_suffix")

Since we're talking about pipes, the first option should use pipes:

  # before
  names(data)[1:2] <- names(data)[1:2] |> paste0("_suffix")
  # after
  names(data)[1:2] <|> paste0("_suffix")

For pipe enthusiasts this is already pretty clear. The thing on the left and right side of the assignment is the same. Compressing this saves characters and avoiding repetition of the 1:2 part is nice, but I don't know if the cost in familiarity is worth it. In any case involving a data frame, I would prefer using the .cols parameter of the rename family of verbs to either of these base R approaches.

Also, a linked post by this author [1] is materially outdated on the use of case_when and across (of course I sympathize - the tidyverse has moved very fast in the past few years). Thanks to the new backslash notation for anonymous functions and various other dplyr upgrades, it is very elegant to do assignment across multiple columns based on conditions evaluated against the entire data frame. Behold:

  mtcars %>%
    head()

      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
  1  21       6   160   110  3.9   2.62  16.5     0     1     4     4
  2  21       6   160   110  3.9   2.88  17.0     0     1     4     4
  3  22.8     4   108    93  3.85  2.32  18.6     1     1     4     1
  4  21.4     6   258   110  3.08  3.22  19.4     1     0     3     1
  5  18.7     8   360   175  3.15  3.44  17.0     0     0     3     2
  6  18.1     6   225   105  2.76  3.46  20.2     1     0     3     1
  
  mtcars %>%
    mutate(across(mpg:hp, \(x) case_when(wt < 3 ~ x * 10, wt >= 3 ~ x))) %>%
    head()

      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
  1 210      60  1600  1100  3.9   2.62  16.5     0     1     4     4
  2 210      60  1600  1100  3.9   2.88  17.0     0     1     4     4
  3 228      40  1080   930  3.85  2.32  18.6     1     1     4     1
  4  21.4     6   258   110  3.08  3.22  19.4     1     0     3     1
  5  18.7     8   360   175  3.15  3.44  17.0     0     0     3     2
  6  18.1     6   225   105  2.76  3.46  20.2     1     0     3     1

Over the past decade, one thing I have learned is to never, ever count Hadley Wickham and the tidyverse team out when it comes to optimizing their API. If there is a lack of expressiveness or orthogonality, they will fix it. The R community will take a while to absorb their ideas because there are so many idioms flying around (partly the fault of previous versions of the tidyverse), but the model they've landed on recently is incredible and should be a model for tabular data manipulation in any language.

[1] https://davidhughjones.medium.com/ae364da6a46f

plagiarist on Dec 30, 2023 [–]

Readability wasn't the sole benefit in their mind.

civilized on Dec 30, 2023 | [–]

The posited DRY-related benefit ultimately also comes back to conciseness, because (as the author does note) you can always avoid repeating yourself by assigning expressions to an intermediate variable instead of repeating expressions.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact