Create Page Layout Visualizations in R

claytonjy · on Aug 28, 2017

This is really neat; I'm working with OCR output and hand-rolled something very similar to `ggpage_plot` about two weeks ago to recreate the original layouts with colored labels. Super helpful to get a sense of how to engineer some spatial features to feed into classification models. Having this around might have saved me some time!

gghh · on Aug 29, 2017

Interesting project, and surely a great showcase of R graphic capabilities. For somebody like me coming from python/matplotlib, which gives almost unlimited freedom in creating complex visualizations, when approaching R graphing should I focus on basic R ("graphics" package) or learn some ggplot2?

At first it looked like basic graphics is quite limited, but after I learned that I can draw rectangles and polygons I re-evaluated it a lot. OTOH I can't avoid the feeling that all ggplot2 gives me is some canned styles and a very uncomfortable syntax, like "ggplot(...) + geom_bar(...) + theme(...)" where `+` means something I can't fully comprehend (because "The Grammar of Graphics").

Please help me change my mind if I'm being ill-informed, I do want to take the most I can from R graphing. ggplot2 is hugely popular so it must be doing something right.

Tarq0n · on Aug 29, 2017

For most everyday plotting tasks freedom is overrated. What ggplot gives you is unparalleled productivity, combined with a 'grammar of graphics' API that makes it very pleasant to specify and experiment with visualizations. It's also easy to make ggplot look publication ready with little effort.

If you need something highly customized it can be quite a bit of work, but at least in my experience you almost never do.

nerdponx · on Aug 29, 2017

The + should just be taken to mean "add this layer to your plot". It's building up a graph object, whose print() method happens to also be its plot() method.

Basic graphics are actually tremendously powerful and can basically give you pixel level control. However the documentation is trash and many of the higher-level functions have hideous defaults.

Ggplot is great for fast iteration and exploratory data analysis, especially when "coloring by group" and "faceting" are involved.

jhbadger · on Aug 29, 2017

The problem in basic R graphics is that you basically spend all your time messing with margins and font sizes and the like. Not that there's no fiddling with ggplot2, but it's much better at not generating graphs with all your labels overlapping. Literally the only time I use basic graphics is for heatmaps (all the major packages still seem to use basic graphics for those for some reason). Ggplot2 may not be "intuitive", but it definitely is worth learning. And it isn't just an R thing anymore -- ggplot2-inspired graphics systems are beginning to be created for lots of other programming languages these days.

stared · on Aug 29, 2017

For practical plots ggplot2 is awesome.

For having full freedom of drawing anything, it's impossible to beat D3.js (neither graphics in R nor matplotlib in Python gives is comparable).

pgroth · on Aug 29, 2017

Is there anything like this for python and or javascript?

lottin · on Aug 28, 2017

I really wish this trend of using pseudo-pipes in R would stop.

mdlincoln · on Aug 28, 2017

I find that a fascinating reaction given how rapidly %>% have been taken up across a large segment of the R universe, to great excitement! Personally, I find it far MORE legible than endlessly-nested function calls.

It results in code that more closely resembles executed order of operations (e.g. filter -> mutate -> group -> summarize). Context is also key: it's most often used for data processing pipelines in specific analytical scripts or literate-code documents - less so used when defining generalizable/testable functions in packages (again, just a personal perspective - YMMV of course)

extr · on Aug 29, 2017

you nailed it. dplyr is better the further you are from doing heavy duty data analysis or creating production code. if you're writing some simple transforms to put data into a report, fine. someone is probably going to want to look at that at some point and it's much, much easier to understand. but for anything else i stick with data.table.

kermatt · on Aug 29, 2017

https://github.com/hrbrmstr/rstudioconf2017

jgalt212 · on Aug 29, 2017

Maybe R needs transducers.

https://clojure.org/reference/transducers

https://stackoverflow.com/questions/26317325/can-someone-exp...

kermatt · on Aug 28, 2017

"pseudo-pipes"? As in %>% ?

lottin · on Aug 28, 2017

Yes, it's horrible.

coetzeesg · on Aug 29, 2017

I don't agree that it's horrible, but it really doesn't work particularly well in the context of the S4 objects in, for example Bioconductor.

minimaxir · on Aug 28, 2017

lottin · on Aug 28, 2017

Because combining an argument with a function call doesn't make sense. They have to do some voodoo under the hood to make it work and this reduces code understandability. The analogy with Unix pipes doesn't work either: one is passing an argument to a function while the other involves writing & reading a file. Finally, it's plain ugly and un-Lispy.

sin7 · on Aug 28, 2017

This particular pipe comes from F#.

Let's say I have a table called dat.

  dat %>%

    filter(col_a == 'Good') %>%  

    group_by(col_b) %>%  

    summarize(n = n(), sum_c = sum(col_c))

To do this in traditional R, I would have to:

  dat <- dat[dat$col_a == 'Good']

  dat_n <- aggregate(col_a ~ col_b, dat, length)

  dat_sum <- aggregate(col_c ~ col_b, dat, sum)

  merge(dat_n, dat_sum, by = "col_b")

I think the piped version is more readable. At least there are less variable to track.

extr · on Aug 29, 2017

That's because you're comparing to base-r. The data.table way would be:

  dat[col_a == 'Good', .(Length = .N, Sum = sum(col_c)), col_b]

jcheng · on Aug 29, 2017

That's all well and good for operations that are built into data.table; pipes can be used with anything that's a function call (like ggpage).

jhbadger · on Aug 29, 2017

The problem with data.table is that in practice your data gets converted to something else when you pass it through packages -- many functions will return a data.frame or matrix, others in the Hadleyverse will return a tibble, and so on. So you have to constantly force your data back into a data table. R has so many datatypes that basically represent a spreadsheet/database table.

nerdponx · on Aug 29, 2017

Data.table is great for SQL-style code but it imposes some annoying limitations, namely that the output is always coerced to a data.table.

mpweiher · on Aug 28, 2017

Unix pipes don't read/write files.

There also isn't any reason why other implementations of pipes have to do buffered byte read/writes, passing objects is perfectly acceptable.

The structurally distinguishing aspect of a pipe-and-filter style is that the individual processing elements don't "return" to their "caller", but rather pass their result on to the next processing element. Without involving the caller.

ekianjo · on Aug 29, 2017

> reduces code understandability.

Actually it makes R scripts much easier to grasp. It prevents the abusive usage of brackets, and it makes the data transformation flow more obvious.

lionel- · on Aug 29, 2017

Unlispy? What about threading macros? The voodoo you're referring to is simply computing on the language, which is quite lispy as well.

dandermotj · on Aug 29, 2017

Which is more legible and understandable?

    f(a, g(h(x), b))
    h(x) %>% g(b) %>% f(a, .)

"Thus, programs must be written for people to read, and only incidentally for machines to execute."

Gatsky · on Aug 28, 2017

I've never hit a bug merely because of using pipes. Is this a theoretical concern or can you give an example?

ekianjo · on Aug 29, 2017

Maybe not a direct answer to your question but using dynamic variable names is kind of tedious with dplyr. You have to go around it by using paste() statements before passing the argument to a dplyr function, so it's not always elegant either.

cwyers · on Aug 29, 2017

The most recent version of dplyr adopted quasiquotation to make that easier:

https://cran.r-project.org/web/packages/rlang/vignettes/tidy...

ekianjo · on Aug 29, 2017

Thanks! I was not aware of that!

lionel- · on Aug 29, 2017

That problem (solved in recent versions) has nothing to do with pipes though.

hobolord · on Aug 28, 2017

what is a better alternative for code understandability ?

curiousgal · on Aug 28, 2017

I like it but it makes it hard to take R as a serious programming language (thankfully it's not in standard R) because where else would you actually use a coding pattern like:

    a = "foo"
    a = "bar"

reusing variables is silly.

minimaxir · on Aug 28, 2017

The alternative is to use a different variable for each data transformation, which has costs for both system memory and code readability. And modern data analysis has a lot of transformations.

confounded · on Aug 28, 2017

Death to the Pyramid of Doom!