Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Announcing pqR: A faster version of R (radfordneal.wordpress.com)
113 points by tmoertel on June 23, 2013 | hide | past | favorite | 37 comments


For those who don't know, Radford Neal is an highly respected statistician and machine learning researcher.

His most well-known contribution is showing the community how to train neural networks in a Bayesian way.

He is at Toronto, where he received his Ph.D. under Geoff Hinton (the father of deep learning).


Also he invented tons of kickass sampling techniques and wrote what I consider to be the best review of MCMC of all time http://www.cs.toronto.edu/~radford/ftp/review.pdf


Don't understand why this wouldn't just been merged into the upcoming versions of R. Creating a new project is strange, anyone know as to why they might have done this? Sure there's probably a good reason that I'm just unaware of.


R core developers are notoriously resistant to change. A number of glaring inefficiencies have gone unfixed for years, despite people submitting patches etc.


R core is generally motivated by ensuring that R continues to work as is, not by improving performance. You can argue whether or not this is a good idea, but in the absence of a comprehensive unit test suite, it's pretty hard to improve performance without breaking behaviour.


It's difficult to see how this rationale can possibly justify ignoring a 10x speed up in vector-matrix multiplies (and similar speedups for some other matrix multiplies) that can be achieved with a modification affecting a dozen or so lines of easily-checked code.


Radford Neal: oh, let me take a break from ground-breaking stats work to double the speed of R.

I'm very thankful.


Looks great!

One thing R really needs is some sort of dead simple pass-by-reference mechanism for functions. Creating copies object copies every time you call a function on an object is a real performance killer.


If a function doesn't modify a passed data frame, R doesn't actually copy it. It's formally pass-by-value, but the implementation uses a copy-on-write approach, so no copy is made when the function only reads the parameter's values. That at least covers the common case of passing a bunch of data to a function that builds a statistical model.

Of course that doesn't help in the case where you do want the called function to modify the data, but in-place rather than by making a copy.


If you look at the "future directions" section of pqR's version of the "R Internals" manual, you'll see a brief mention of a plan to implement "call by name" parameter passing in the style of Algol 60, which should address this issue. Before that happens, however, pqR will improve the tracking of references to reduce the number of unnecessary copies made when parameters are passed by value, which may be more than you realize in past versions of R.


It was my understanding that R basically already does implement call-by-name - arguments to a function are passed by name and looking up in the calling environment until you first modify them. Is my understand incorrect, or do Algol 60's call-by-name semantics mean something different?


Sort of. That's why it shouldn't be too hard a modification to implement. For a call-by-name argument, you just have to evaluate the "promise" every time, rather than just the first time. (Assignment to a call-by-name argument will be a bit trickier, but not impossible.)


Is the pqR version of that manual available online?


Yes, it's part of the release, and is also directly linked at radfordneal.github.io/pqR


How does the "helper threads" mechanism interact with existing code explicitly using multicore operations like mclapply? Inadvertently spawning extra "helper threads" from each of the explicit processes per core would not be pretty.


At present, pqR waits for all helper threads to be idle before doing a fork in the "parallel" package, and disables use of helper threads in the child processes (and temporarily in the parent, since it will wait for the child processes before doing anything more).


Sounds great, thank you. I'm off to build this thing.


Would this work with StatET?


Are there any plans to merge that back into the main R implementation? And if not, what are the reasons for keeping a separate fork? backwards compatibility?


Off topic:

Maybe it's just me, but the log scale for the relative program times is pretty confusing. While it probably makes the improvement more obvious, it doesn't help me understand the actual magnitude of improvement visually, without having to look at the scale and figure out the actual numbers.


I think what would have worked better is normalizing the interpreted to 1.0 and then having the pqR results set against that. That'd make the graph far less noisy and much easier to interpret for making the case of pqR being faster. Right now there's much more information there than needs to be for that with every different version of both being represented.


Is this version compatible with RStudio?


At present, pqR is compatible with Rstudio only if you configure with the --disable-helper-threads option (ie, no automatic parallelization) along with the --enable-R-shlib option that's needed for Rstudio. This is a minor glitch in how pqR is linked that will be fixed soon.


Revolution Analytics[1] is also claiming a lot of speed improvements. Is there a sense yet of how pqR compares? Some of their speed up comes from linking Intel's Math Kernel Library. Does this duplicate the "helper thread" approach pqR uses or would they complement each other?

[1] I'm not familiar with them other than their website. My impression is that they are real, but the website feels just "slick" enough to make me uncertain. Are they considered reputable?


A friend of mine was testing Revolutions tools at his job recently and found a few frustration points: namely its based upon a relatively old version of R, and while they allege to support larger than ram numerics, its apparently quite thrashy.

[there some interesting subtleties to supporting larger than ram computation well enough for it to beat distributed, which i'm trying to do for my own work, so I found it quite exciting to hear that current analytical tooling vendors dont do it terribly well :) ]


As far as I know, the majority of the speed improvements come from a proprietary on-disk data store, accompanied by high-performance statistical algorithms built on top of it. You need to use their packages to take most advantage of the speed.


Phenomenal!

> Since pqR has not yet been tested on Windows and Mac systems, trying to install it on such a system is not currently recommended.

Can't wait to switch to it on os x. Installing on my server to play with it...


This is cool, but wouldn't it be a better idea to reimplement the interpreter from the ground up on a solid platform such as pypy?


Any idea why the R Core team haven't accepted his patches?


No way I use this... In french, PQ = toilet paper lol. :-/


Some English speakers claim they are uninterested in Coq for similar reasons. I guess if you want to ignore a good tool because of a bad name, go ahead. But it seems to be throwing the baby out with the bathwater.


What's in a name?

While I do not suggest ignoring tools such as pqR, coq or gimp because their childish and immature names, I do think it is bad manners to name your tools in an offensive way.

It also makes people talk about the bad name you choose for your tool instead of the properties your tool, and surely that's not what you wanted, as a creator.

If you want your tool to be considered a professional tool used by professionals, please name it like a professional would. If you want your tool to be used by children, by all means, name it like a child would.


I think it's unreasonable to expect the namers of a language to know every conceivable double entendre in every language. Coq was named by French researchers, and means "rooster", and pqR is obviously from the sequence of letters. Neither of those seems particularly immature to me, in context.


Yes, it's the sequence of letters but that's not all. From the first line in TFA, "pqR — a “pretty quick” version of R."


Ahh...that explains the seven users of Agda.


You could call it SPQR, for something like Super pretty quick R if you want to. I'd say not calling it SPQR is an opportunity missed anyway.


Good work Radford Neal!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: