Hacker News new | past | comments | ask | show | jobs | submit login
Create Page Layout Visualizations in R (github.com/emilhvitfeldt)
82 points by mpweiher on Aug 28, 2017 | hide | past | favorite | 35 comments



This is really neat; I'm working with OCR output and hand-rolled something very similar to `ggpage_plot` about two weeks ago to recreate the original layouts with colored labels. Super helpful to get a sense of how to engineer some spatial features to feed into classification models. Having this around might have saved me some time!


Interesting project, and surely a great showcase of R graphic capabilities. For somebody like me coming from python/matplotlib, which gives almost unlimited freedom in creating complex visualizations, when approaching R graphing should I focus on basic R ("graphics" package) or learn some ggplot2?

At first it looked like basic graphics is quite limited, but after I learned that I can draw rectangles and polygons I re-evaluated it a lot. OTOH I can't avoid the feeling that all ggplot2 gives me is some canned styles and a very uncomfortable syntax, like "ggplot(...) + geom_bar(...) + theme(...)" where `+` means something I can't fully comprehend (because "The Grammar of Graphics").

Please help me change my mind if I'm being ill-informed, I do want to take the most I can from R graphing. ggplot2 is hugely popular so it must be doing something right.


For most everyday plotting tasks freedom is overrated. What ggplot gives you is unparalleled productivity, combined with a 'grammar of graphics' API that makes it very pleasant to specify and experiment with visualizations. It's also easy to make ggplot look publication ready with little effort.

If you need something highly customized it can be quite a bit of work, but at least in my experience you almost never do.


The + should just be taken to mean "add this layer to your plot". It's building up a graph object, whose print() method happens to also be its plot() method.

Basic graphics are actually tremendously powerful and can basically give you pixel level control. However the documentation is trash and many of the higher-level functions have hideous defaults.

Ggplot is great for fast iteration and exploratory data analysis, especially when "coloring by group" and "faceting" are involved.


The problem in basic R graphics is that you basically spend all your time messing with margins and font sizes and the like. Not that there's no fiddling with ggplot2, but it's much better at not generating graphs with all your labels overlapping. Literally the only time I use basic graphics is for heatmaps (all the major packages still seem to use basic graphics for those for some reason). Ggplot2 may not be "intuitive", but it definitely is worth learning. And it isn't just an R thing anymore -- ggplot2-inspired graphics systems are beginning to be created for lots of other programming languages these days.


For practical plots ggplot2 is awesome.

For having full freedom of drawing anything, it's impossible to beat D3.js (neither graphics in R nor matplotlib in Python gives is comparable).


Is there anything like this for python and or javascript?


I really wish this trend of using pseudo-pipes in R would stop.


I find that a fascinating reaction given how rapidly %>% have been taken up across a large segment of the R universe, to great excitement! Personally, I find it far MORE legible than endlessly-nested function calls.

It results in code that more closely resembles executed order of operations (e.g. filter -> mutate -> group -> summarize). Context is also key: it's most often used for data processing pipelines in specific analytical scripts or literate-code documents - less so used when defining generalizable/testable functions in packages (again, just a personal perspective - YMMV of course)


you nailed it. dplyr is better the further you are from doing heavy duty data analysis or creating production code. if you're writing some simple transforms to put data into a report, fine. someone is probably going to want to look at that at some point and it's much, much easier to understand. but for anything else i stick with data.table.




"pseudo-pipes"? As in %>% ?


Yes, it's horrible.


I don't agree that it's horrible, but it really doesn't work particularly well in the context of the S4 objects in, for example Bioconductor.


Why?


Because combining an argument with a function call doesn't make sense. They have to do some voodoo under the hood to make it work and this reduces code understandability. The analogy with Unix pipes doesn't work either: one is passing an argument to a function while the other involves writing & reading a file. Finally, it's plain ugly and un-Lispy.


This particular pipe comes from F#.

Let's say I have a table called dat.

  dat %>%

    filter(col_a == 'Good') %>%  

    group_by(col_b) %>%  

    summarize(n = n(), sum_c = sum(col_c))

To do this in traditional R, I would have to:

  dat <- dat[dat$col_a == 'Good']

  dat_n <- aggregate(col_a ~ col_b, dat, length)

  dat_sum <- aggregate(col_c ~ col_b, dat, sum)

  merge(dat_n, dat_sum, by = "col_b")
I think the piped version is more readable. At least there are less variable to track.


That's because you're comparing to base-r. The data.table way would be:

  dat[col_a == 'Good', .(Length = .N, Sum = sum(col_c)), col_b]


That's all well and good for operations that are built into data.table; pipes can be used with anything that's a function call (like ggpage).


The problem with data.table is that in practice your data gets converted to something else when you pass it through packages -- many functions will return a data.frame or matrix, others in the Hadleyverse will return a tibble, and so on. So you have to constantly force your data back into a data table. R has so many datatypes that basically represent a spreadsheet/database table.


Data.table is great for SQL-style code but it imposes some annoying limitations, namely that the output is always coerced to a data.table.


Unix pipes don't read/write files.

There also isn't any reason why other implementations of pipes have to do buffered byte read/writes, passing objects is perfectly acceptable.

The structurally distinguishing aspect of a pipe-and-filter style is that the individual processing elements don't "return" to their "caller", but rather pass their result on to the next processing element. Without involving the caller.


> reduces code understandability.

Actually it makes R scripts much easier to grasp. It prevents the abusive usage of brackets, and it makes the data transformation flow more obvious.


Unlispy? What about threading macros? The voodoo you're referring to is simply computing on the language, which is quite lispy as well.


Which is more legible and understandable?

    f(a, g(h(x), b))
    h(x) %>% g(b) %>% f(a, .)
"Thus, programs must be written for people to read, and only incidentally for machines to execute."


I've never hit a bug merely because of using pipes. Is this a theoretical concern or can you give an example?


Maybe not a direct answer to your question but using dynamic variable names is kind of tedious with dplyr. You have to go around it by using paste() statements before passing the argument to a dplyr function, so it's not always elegant either.


The most recent version of dplyr adopted quasiquotation to make that easier:

https://cran.r-project.org/web/packages/rlang/vignettes/tidy...


Thanks! I was not aware of that!


That problem (solved in recent versions) has nothing to do with pipes though.


what is a better alternative for code understandability ?


I like it but it makes it hard to take R as a serious programming language (thankfully it's not in standard R) because where else would you actually use a coding pattern like:

    a = "foo"
    a = "bar"
reusing variables is silly.


The alternative is to use a different variable for each data transformation, which has costs for both system memory and code readability. And modern data analysis has a lot of transformations.


Death to the Pyramid of Doom!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: