Hacker News new | past | comments | ask | show | jobs | submit login

I was expecting a rant, but the OP's article is actually very thoughtful. He definitely knows what he's talking about.

The thing about R, for me and many others, is that it's very much an everyday grind language. Especially with Rstudio, its natural domain is as one of "notebook" languages like python, julia, matlab, and mathematica but with a more clear focus towards the tasks of data-analysis. I just tell the BI-tool people that R is excel on 'roids.

R frustrates me a lot, however. But I think the frustration comes out of the fact that when I am using R and get stuck, I am always in the middle of doing something that I need to get done and I don't feel like diving into a long "vignette". Moreover, the documentation is usually too terse and generalized for me to just understand it immediately. Even though I've been using R for years (albeit in fits and starts rather than continuously), there are things about it that I've just never picked up-- I just DON'T KNOW (or care) what F S3 and S4 mean. Unlike the OP, who clearly knows more R than myself, I grit my teeth when I am looking at docs and see the "..." in the arg list.

I suspect that this is part of the heritage from R's beginnings. I once tried to read John Chamber's book but found the presentation complete ass-backwards and impractical for my immediate needs. The Tidyverse has been great, it's far more consistent and ggplot is a kick-ass tool to have in your box. The drawback is that it makes Base-R seem really alien and if you want to be good at R, you have to know more than just the Tidyverse, IMHO.




The tidyverse docs are the only ones with the super frustrating ... of impenetrable gnostic "documentation" that I know of. In general the tidyverse documentation is horrible, almost as bad as typical Python docs, IMHO. Other parts of base R are wonderfully documented in my opinion.


Do you have any specific examples that illustrate the general problem? I'd love to better understand what you're looking for in docs.


Thanks for taking my aggressive comment with such spirit, it really speaks to a good community. (Sleep training an infant has me a bit frazzled)

I should have been more specific, the ... frustration for me comes up mostly in ggplot, Which usually directs you to layer(). Which gets parameter string documentation like:

* geom - The geometric object to use display the data

* stat - The statistical transformation to use on the data for this layer, as a string.

These are two hugely important parameters, with really big concepts and abstractions under them, but the documentation is of the style "foobar(): this is a function that foos the bar", documentation that restates the information in the name, but with more words, and no insight on where to go next.

So now a person is two pages deep into documentation, and it's actually circular documentation because layer() has a ... argument that gets passed back to what? The function documentation that you came from? For a newcomer it's a completely twisty series of passages, and as an experienced user who reaches for ggplot before any other tool, it's confusing.

The other function based confusion is that the list of aesthetics is not connected quite well enough to the mapping argument from aes(). What aesthetic values the function understands is probably one of the most important things about looking up the function. But reading the parameter documentation, it's not clear that there's an entire section below that describes that crucial material, far further down the page. And on a long long page it's easy to accidentally skip over that section when skimming.

(These are the sorts of frustration I have with typical Python documentation, btw, so maybe my brain is just different from typical engineers)


Ah yeah, connecting the dots in ggplot2 docs is hard. It's hard for us to document because, under the hood, the pieces quite decoupled and different pieces are responsible for different arguments. But since we last took a deep dive on the ggplot2 docs, we've gotten much better at generating docs with code, so maybe it's time to have another look. I've filed an issue (https://github.com/tidyverse/ggplot2/issues/4770) so we don't forget about it, but no guarantees about when it might get done.


Is there a tutorial someplace that explains how ggplot actually manages plotting? Or the architecture and layers between the high level code and how a plot is drawn? Meaning, I love being able to express what I want and ggplot figures out a good plot for me. But I know there are many layers that can be manipulated, but I just don’t understand the layers.

One of the best compliments I can think of is that with ggplot, easy things are easy and hard things are possible. But I haven’t been able to figure out how to fully work the system.

(Thanks for all of the work!)


That should be covered in the original A layered grammar of graphics paper: https://vita.had.co.nz/papers/layered-grammar.html

And then there is an entire ggplot2 book (there are many, but this one was written by Hadley): https://ggplot2-book.org/


That's very helpful. I think this chapter was what I was looking for:

https://ggplot2-book.org/internals.html


I think the issue with some of this documentation is that for other packages, the function documentation is largely self contained. If I look up glm() it tells me how to use glm(). However, for ggplot2 there is an assumption that you have some level of knowledge of how the pieces should be strung together. So when I know I want a boxplot, and I find geom_boxplot() documentation it wonderfully describes the options for itself, and gives examples for it's use. But sometimes it doesn't give a good idea of the context of how the other pieces might interact. It makes complete sense if you read the book and just want to refresh your memory, but if you are coming in as a new user it really can be difficult to use the documentation exclusively.


Having read an online book of some sort on ggplot2, on one of the tidyverse sites, I found the per-function documentation difficult to use and difficult to match to the concepts I had learned. This may be because I'm used to using the parameters section of a function as the primary resource for understanding the inputs. But with ggplot it's scattered in other places, and the holes are not apparent unless you know the specific terminology (not concepts) to match up.

All that said, I find the documentation to be saying a lot more than it did in the past, and it sounds like it has been continually improving.


He may be right in specific instances, but I think he's way wrong in general. Tidyverse is generally a triumph of documentation, and part of that is that it doesn't tell you too much. Lots of how-to, not too much implementation detail. It's appreciated.


After 15+ years of shipping open source stuff, I've rather concluded that any given piece of documentation is either going to be too terse or too verbose for any given user and all you can really do is mix judgement and balancing how many of each type of complaint you receive.

It's probably possible at least in theory to structure docs so you have a terse section followed by a verbose section for each thing, but I've yet to develop the discipline or the competence to pull that off remotely regularly.

Maybe in another 15 years.


> almost as bad as typical Python docs

I found numpy, scipy, pandas, and plotly docs to be quite clear and extensive. The only docs I have found to be confusing are matplotlib's and the Python standard library's. Not sure what packages you are referring to?


On the other hand, I'm curious what you've found to be lacking about the standard library documentation. I've found it to be generally very thorough, in some cases fantastic, though there's an occasional weak point.


For me, the stdlib docs isn't lacking in material, but it is hard to navigate. I think that is partly because it mixes different types of documentations together (tutorial, reference, and changelog). They should split each module's docs into two pages: a tutorial with examples of the most common use-case of the module, and a reference listing all of the classes, methods, etc. of the module.

Also, the built-in types are documented in one page, going from boolean to sequence types and even type annotations. Every Ctrl + F gives me 20 different results, which is annoying as hell.


I've used R for 19 years and do not have any other programming language ability. I am curious: What is frustrating to you about R relative to other languages?


In my experience:

- It has a bunch of different types of classes, and they all behave differently. Debugging isn't awful, but it's harder than it should be. Also, the documentation isn't clear about which classes to use.

- A lot of the workhorse functions suffer from parameter glut. Despite having different kinds of classes, almost all functions expect plain vectors. Packages like survival show how objects make it easier to read code, reuse data, and validate data. Without the base packages doing it more, everyone's chosen their own systems. The community's been gravitating to organizing "objects" as rows in tables (i.e. tidy).

- The way a function uses an argument might surprisingly change based on other arguments given (e.g., `binom.test`). And then the documentation won't have examples for the different use cases.

- Most users don't have the time or desire to become better R programmers. They have other work to do. For my own work, I write packages with custom classes, functions, and template documents. For collaboration, I keep things very plain and rarely go beyond dplyr; very often, the script goes between two steps executed in a GUI software.


I'm not OP.

To me, it's the tooling around it. Everything is done in R-studio and it's focus is to generate statistical documents.

The result is a sub optimum solution. It lacks good tooling around installing and running R programs. R programs don't import, they include. It doesn't make it more readable. R-studio is very Emacs like in the sense that it just lacks a decent editor. Due to R-studio being the default, there's not much support for other editors.


> Due to R-studio being the default, there's not much support for other editors.

But RStudio is amazing... Easily best environment I've used for any programming language (well, except Pharo).


I use Vim with the R command line wrapped around Makefiles. I don't even have RStudio installed. Works great.

I can even pop up an interactive R command prompt session and do whatever I want in it, even quick ggplot2 graphs. Help shows up just as you would expect, and plots pop up in new windows. RStudio is much less advanced than people think it is, it's really just managing R's windows for you and doing generic IDE work. R is doing all the heavy lifting.

If you're wanting a more "import" like thing you might want to look into making R packages instead of scripts. You don't have to submit them to CRAN, and you can execute them pretty simply on the R command line as well.

That being said, it's really not an OOP or software development tool. It's definitely geared toward data science, but you can automate that very well for generating graphs automatically and reporting for whatever reason needed.


Do you have notes or documentation on how to get this working without RStudio? Preferably on a Macbook?


I use Linux, but it should be pretty straightforward on Mac as well. Just type "R" in the command prompt and check the manpage for R itself ("man R").

I'd also look into the "knitr" package, which is what all of the Rmarkdown is based around. So for instance, most of my Makefiles are based around a simple command like:

    R -e "library(knitr); knit2html('index.Rmd')"
Then I just code using VIM on index.Rmd. You can probably set this up however you like with the R command line.

For interactive it literally is just typing "R" in the command prompt. For help, things like "?ggplot", "??knitr", or whatever, so you can open multiple interactive sessions like you were using IPython or something. When you print a plot, it just pops up in a new window.

You can also use "R" to just execute R raw if you are trying to do it without Rmarkdown. I just prefer the HTML output. Pretty sure all the RStudio RMarkdown stuff just calls knitr as well.

The output looks the same as anything on RPubs (and there is a way to publish to RPubs, I used to have to do that at one point), random one from the first page:

https://rpubs.com/mnguy1019/881028


Emacs has had excellent R support since much before R-studio came around. In fact to me R-studio has always felt like a stand-alone implementation of ESS (minus Emacs).


FWIW, I do prefer R to all other "notebook" languages. It has everything I need for answering questions about data, working with files, text, and visualization. Libraries that do everything I could imagine are easily available.

For me it's a pragmatic workhorse tool that I use and aside from frequently getting frustrated with the task at hand, it has never failed me in the end.

I think R is much like a handheld power tool. I have no interest in diving deep into the workings of the tool because when I need to use a drill, for instance, I just need to drill holes and anything else is an annoying distraction (I realize that sounds bad!).

I've also worked with Mathematica and JMP in the past. They're very capable, but not as good at general-purpose data-wrangling as R is today (given Rstudio, knitr, shiny, all the specialized libraries, and most especially the tidyverse).


In decades I have not encountered any development environment where the obscurity of error message presentation can touch R. If one uses other languages or build environments the error messages can frequently be used to diagnose the issue.

In R they default to just vomiting some internal exception often with no context, and I can count the times I have encountered helpful or even seemingly deliberately constructed error messaging in single-place base five. Even kernel development is in some sense better because at least there you are in context and the layers are traversable.

Seemingly one of the primary skills of an R programmer is serving as an informal database of “what the fuck does this mean?” when the issue is something trivially detectable like passing a list vs a vector.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: