Hacker News new | past | comments | ask | show | jobs | submit login
Renjin: a JVM-based interpreter for the R language (renjin.org)
118 points by phillc73 on Nov 3, 2017 | hide | past | favorite | 34 comments



FastR [1], an R implementation for the JVM using Truffle aims to be both fast and compatible, including C and fortran calls from R using Truffle's polyglot framework.

The thing that makes this even more awesome is inter language optimization - inline your C code into your R code at runtime, which can result in zero overhead inter language calls.

[1] https://github.com/graalvm/fastr


One of the interesting things I note from the readme, is that they have an implementation of Grid graphics. Grid is pretty awesome for lower level graphics, and is part of the reason ggplot is able to be as good as it is.

Looking forward to poking around at their implementation. If it’s a JVM friendly implementation, that could open up some nicer visualization options from other languges like clojure, scala, etc.


Indeed, they presented this at the last RIOT workshop in July [1] and it's a great way to support graphics without dealing with the grDevices package from GNU R.

[1] https://youtu.be/otXTGBTb-3w?t=985


Oh cheers! Looking forward to watching that.


It's been a while since I've looked at Renjin. Last I looked, it was a worthy replacement for tasks that you'd do in base R, but the library story was incomplete.

Nowadays, my R usage has gone full tidyverse. Anyone know if dplyr and the gang work here? I know dplyr has a lot of C++ code, and don't know how well the "Renjin-specific" Rcpp interop story is.


First thing I did was look at packages and see if tidyverse, purrr, lubridate, stringr and feather. They are all there.

http://packages.renjin.org/packages/search?q=tidyverse


Yes, but some of them aren't available yet, specifically dplyr:

http://packages.renjin.org/package/org.renjin.cran/tidyverse


There is a java/scala project providing R like statistical computation:

http://haifengl.github.io/smile/linear-algebra.html


I build and maintain the R stack in my organization. A good portion of the R core is written in Fortran for maximum performance and efficiency; the rest is in C and C++. Why would I want an interpreted, just-in-time, garbage collected version of R in a JVM when I can run at full speed on the real thing?


You certainly don't have to use it. One of the goals we had when we created Renjin was to keep it pure Java. Even the packages with native code are transpiled to pure Java bytecode so you can use an R interpreter in a JVM without interfacing to native code (e.g. using JNI) and this allows you to use the interpreter on an instance with one of the many Platform-as-a-Service providers like we do with Google App Engine.

Oh, GNU R is also garbage collected of course and compared to the JVM's garbage collector it is pretty primitive. This is an area where TIBCO have also improved their own R interpreter called TERR [1].

[1] http://www.edii.uclm.es/~useR-2013/slides/65.pdf


While it is true that S was originally a macro layer around FORTRAN code R has always been primarily written in C. Very little of R core is in FORTRAN, and essentially none of it is C++.

The language statistics show a good deal of FORTRAN (%24.5) [0], however that is largely skewed by the included LAPACK code [1], which accounts for 221,921 / 259,773 lines of FORTRAN in R.

[0]: https://github.com/wch/r-source [1]: https://github.com/wch/r-source/tree/trunk/src/modules/lapac...


> however that is largely skewed by the included LAPACK code

which is where most of your number crunching will be happening..


> interpreted, just-in-time,

For clarification, the JVM JIT-compiles the most frequently used blocks of bytecode to machine code, so the bulk of the heavy lifting is not interpreted.

The benefit of doing it this way is that compilation can be optimised based on the runtime-characteristics of the code, not compiler judgement. Cases of JVM code running faster than equivalent C/C++ code are not uncommon.

Aside from that, access to Java libraries (eg for integration) may be important to some shops.


I suppose their reasons are here: http://www.renjin.org/about.html

I've only used R for a short time, so I can't comment on it myself.


“R has been traditionally limited by the need to fit data sets into memory, and working with even modest sets of data can quickly exhaust memory due to historical limitations in GNU R interpreter’s implementation.”

I see that the propaganda machine never rests, no matter which language. Either that, or they haven’t heard of Vertica’s R, which is excellent and counters these claims.

The rest with the “fragmentation of R” is a poor marketing tack: people who run R are perfectly capable of thinking with their own brains, thank you very much; Renjin has no place patronizing. We’re not sheep.


that's a fairly toxic and silly attitude towards a free software project. Don't like it - don't use it.

I personally think it's great that there are people who are working on making R faster/better/popular, even though I may disagree with the methods they've chosen to achieve those goals.


I wish it were toxic but it isn’t - Java is still around, so software like Renjin gets written. If my attitude were toxic to Java, Renjin wouldn’t exist.


IMHO the crucial question is whether users can use libraries that load native libraries and/or make use of rcpp. The docs make me think it's not possible.


Regarding Rcpp, their package library says "A Renjin-specific version of this package is available."[1]

Looks like Renjin can also be used from GNU R as an R package[2]. I'd assume any installed R package should also work in conjunction with it.

Admittedly, I haven't tried Renjin, but only just found it and thought it was quite interesting.

[1] http://packages.renjin.org/package/org.renjin.cran/Rcpp

[2] http://docs.renjin.org/en/latest/package/index.html#using-re...


Even just using Renjin to be able to access Java code within R seems super useful.


Wasn’t that already possible via the rJava library?


True. Although Renjin seems like it provides a bit of a nicer interface in some cases. (Disclaimer: not an R developer)


From the documentation [1]:

"As a service, BeDataDriven provides a repository with all CRAN (the Comprehensive R Archive Network) and BioConductor packages at http://packages.renjin.org. The packages in this repository are built and packaged for use with Renjin. Not all packages can be built for Renjin so please consult the repository to see if your favorite package is available for Renjin."

[1] http://docs.renjin.org/en/latest/interactive/index.html#prer...


Yes. Renjin isn’t even a complete implementation of R and I don’t believe that a Java JIT will ever beat Fortran in performance. I’d sure like to see some evidence of JIT R beating the Fortran implementation. And then there is the memory inefficiency of Java. So again I ask: why?


Because there is value in having a completely jvm stack in some places. Sometimes performance takes a back seat to consistency. It may not be right, but that's the way it is in some orgs.


Hadley's book, "Advanced R" has a specific performance related chapter[1]. As well as techniques to boost performance within R, there's a section on "Alternative R implementations," which says in part, "There are some exciting new implementations of R. While they all try to stick as closely as possible to the existing language definition, they improve speed by using ideas from modern interpreter design." Renjin is listed in this section.

While I don't have direct experience of Renjin, or any other alternative R interpreters, I'm inclined to believe Hadley has and if he believes they improve speed, I'll take him at his word.

[1] http://adv-r.had.co.nz/Performance.html


When I was implementing Renjin's PRNG based on GNR R's, I read a fair amount of GNU R code. I think it goes without saying that C and Fortran perform well. However, I do believe that R's flexibility as a language comes at a performance cost. The implementation of that flexibility is not always done in an efficient manner; and as a result performance suffers. I.e. R is not particular performant in certain circumstances despite being largely implemented in C and Fortran. I only have anecdote to offer in this regard, but it is worth considering that GNU R's good performance is not a foregone conclusion.


R's wealth of libraries and syntax was always appealing to me, so maybe now I will look at how compatible and faster Renjin might be.


> Renjin allows the user to plugin best-of-class implementations of BLAS, LAPACK, and FFT,

Makes sense. Why, if you want numerics that were traditionally written in Fortran, of course you have to go to JVM. :)


How does Renjin run the BLAS and LAPACK Fortean code, and the C code in R?


We transpile all C, C++, and Fortran code directly to Java-bytecode using our open source tool called gcc-bridge (https://github.com/bedatadriven/renjin/tree/master/tools/gcc...). This is included as part of Renjin, but you can use it independently as part of your project as well.


This is nice.

Does anyone know of a similar tool for c# ?


Tried a few basic things, some of them did not work:

1. x = rnorm(1000) 2. plot(density(x)) --> not work 3. stem(x) --> not work 4. summary(x) --> works

For R data handling, I always use data.table for its efficiency and power.

1. library(data.table) 2. x = data.table(x=rnorm(1000)) --> not work > x = data.table(x=rnorm(1000)) ERROR: Exception calling Calloccolwrapper : Unimplemented GNU R API function 'DUPLICATE_ATTRIB'


Well supposedly you'd use native data.frame's if you're using a faster R implementation.

However, I would agree with you that at this point in time it's fair to assume that R is no longer a language in and of itself - it's more akin to a collection of APIs around years of optimized C++ code, somewhat separated into a few universes (bioinformatics, time series analysis with xtz, data.table aficionados and tidyverse acolytes).

And thus making a faster R alternative should focus on 100% package compatibility along with speeding up native R code. Renjin, pqR [1], et. al. all seem to be compromises that work for some workflows but not all of them.

The project that I'm excited for is fastR [2] - bringing R, C, C++ and FORTRAN code together into one Truffle/Graal environment would be absolutely huge for this ecosystem.

However, I've tested fastR on my employer's [3] workflows and it does not work yet.

[1] http://www.pqr-project.org/ [2] https://github.com/graalvm/fastr [3] http://syberia.io/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: