Announcing pqR: A faster version of R

bravura · on June 23, 2013

For those who don't know, Radford Neal is an highly respected statistician and machine learning researcher.

His most well-known contribution is showing the community how to train neural networks in a Bayesian way.

He is at Toronto, where he received his Ph.D. under Geoff Hinton (the father of deep learning).

jtmcmc · on June 23, 2013

Also he invented tons of kickass sampling techniques and wrote what I consider to be the best review of MCMC of all time http://www.cs.toronto.edu/~radford/ftp/review.pdf

tardigrade · on June 23, 2013

Don't understand why this wouldn't just been merged into the upcoming versions of R. Creating a new project is strange, anyone know as to why they might have done this? Sure there's probably a good reason that I'm just unaware of.

jergosh · on June 23, 2013

R core developers are notoriously resistant to change. A number of glaring inefficiencies have gone unfixed for years, despite people submitting patches etc.

hadley · on June 23, 2013

R core is generally motivated by ensuring that R continues to work as is, not by improving performance. You can argue whether or not this is a good idea, but in the absence of a comprehensive unit test suite, it's pretty hard to improve performance without breaking behaviour.

radfordneal · on June 23, 2013

It's difficult to see how this rationale can possibly justify ignoring a 10x speed up in vector-matrix multiplies (and similar speedups for some other matrix multiplies) that can be achieved with a modification affecting a dozen or so lines of easily-checked code.

epistasis · on June 23, 2013

Radford Neal: oh, let me take a break from ground-breaking stats work to double the speed of R.

I'm very thankful.

scottfr · on June 23, 2013

Looks great!

One thing R really needs is some sort of dead simple pass-by-reference mechanism for functions. Creating copies object copies every time you call a function on an object is a real performance killer.

_delirium · on June 23, 2013

If a function doesn't modify a passed data frame, R doesn't actually copy it. It's formally pass-by-value, but the implementation uses a copy-on-write approach, so no copy is made when the function only reads the parameter's values. That at least covers the common case of passing a bunch of data to a function that builds a statistical model.

Of course that doesn't help in the case where you do want the called function to modify the data, but in-place rather than by making a copy.

radfordneal · on June 23, 2013

If you look at the "future directions" section of pqR's version of the "R Internals" manual, you'll see a brief mention of a plan to implement "call by name" parameter passing in the style of Algol 60, which should address this issue. Before that happens, however, pqR will improve the tracking of references to reduce the number of unnecessary copies made when parameters are passed by value, which may be more than you realize in past versions of R.

hadley · on June 23, 2013

It was my understanding that R basically already does implement call-by-name - arguments to a function are passed by name and looking up in the calling environment until you first modify them. Is my understand incorrect, or do Algol 60's call-by-name semantics mean something different?

radfordneal · on June 23, 2013

Sort of. That's why it shouldn't be too hard a modification to implement. For a call-by-name argument, you just have to evaluate the "promise" every time, rather than just the first time. (Assignment to a call-by-name argument will be a bit trickier, but not impossible.)

nkurz · on June 23, 2013

Is the pqR version of that manual available online?

radfordneal · on June 23, 2013

Yes, it's part of the release, and is also directly linked at radfordneal.github.io/pqR

makeset · on June 23, 2013

How does the "helper threads" mechanism interact with existing code explicitly using multicore operations like mclapply? Inadvertently spawning extra "helper threads" from each of the explicit processes per core would not be pretty.

radfordneal · on June 23, 2013

At present, pqR waits for all helper threads to be idle before doing a fork in the "parallel" package, and disables use of helper threads in the child processes (and temporarily in the parent, since it will wait for the child processes before doing anything more).

makeset · on June 24, 2013

Sounds great, thank you. I'm off to build this thing.

urlwolf · on June 23, 2013

Would this work with StatET?

perlgeek · on June 23, 2013

Are there any plans to merge that back into the main R implementation? And if not, what are the reasons for keeping a separate fork? backwards compatibility?

ehsanu1 · on June 23, 2013

Off topic:

Maybe it's just me, but the log scale for the relative program times is pretty confusing. While it probably makes the improvement more obvious, it doesn't help me understand the actual magnitude of improvement visually, without having to look at the scale and figure out the actual numbers.

simcop2387 · on June 23, 2013

I think what would have worked better is normalizing the interpreted to 1.0 and then having the pqR results set against that. That'd make the graph far less noisy and much easier to interpret for making the case of pqR being faster. Right now there's much more information there than needs to be for that with every different version of both being represented.

minimaxir · on June 23, 2013

Is this version compatible with RStudio?

radfordneal · on June 23, 2013

At present, pqR is compatible with Rstudio only if you configure with the --disable-helper-threads option (ie, no automatic parallelization) along with the --enable-R-shlib option that's needed for Rstudio. This is a minor glitch in how pqR is linked that will be fixed soon.

nkurz · on June 23, 2013

Revolution Analytics[1] is also claiming a lot of speed improvements. Is there a sense yet of how pqR compares? Some of their speed up comes from linking Intel's Math Kernel Library. Does this duplicate the "helper thread" approach pqR uses or would they complement each other?

[1] I'm not familiar with them other than their website. My impression is that they are real, but the website feels just "slick" enough to make me uncertain. Are they considered reputable?

carterschonwald · on June 23, 2013

A friend of mine was testing Revolutions tools at his job recently and found a few frustration points: namely its based upon a relatively old version of R, and while they allege to support larger than ram numerics, its apparently quite thrashy.

[there some interesting subtleties to supporting larger than ram computation well enough for it to beat distributed, which i'm trying to do for my own work, so I found it quite exciting to hear that current analytical tooling vendors dont do it terribly well :) ]

hadley · on June 23, 2013

As far as I know, the majority of the speed improvements come from a proprietary on-disk data store, accompanied by high-performance statistical algorithms built on top of it. You need to use their packages to take most advantage of the speed.

_anshulk · on June 23, 2013

Phenomenal!

> Since pqR has not yet been tested on Windows and Mac systems, trying to install it on such a system is not currently recommended.

Can't wait to switch to it on os x. Installing on my server to play with it...

joelthelion · on June 23, 2013

This is cool, but wouldn't it be a better idea to reimplement the interpreter from the ground up on a solid platform such as pypy?

mikevm · on June 23, 2013

Any idea why the R Core team haven't accepted his patches?

sunseb · on June 23, 2013

No way I use this... In french, PQ = toilet paper lol. :-/

pi18n · on June 23, 2013

Some English speakers claim they are uninterested in Coq for similar reasons. I guess if you want to ignore a good tool because of a bad name, go ahead. But it seems to be throwing the baby out with the bathwater.

draugadrotten · on June 23, 2013

What's in a name?

While I do not suggest ignoring tools such as pqR, coq or gimp because their childish and immature names, I do think it is bad manners to name your tools in an offensive way.

It also makes people talk about the bad name you choose for your tool instead of the properties your tool, and surely that's not what you wanted, as a creator.

If you want your tool to be considered a professional tool used by professionals, please name it like a professional would. If you want your tool to be used by children, by all means, name it like a child would.

rflrob · on June 23, 2013

I think it's unreasonable to expect the namers of a language to know every conceivable double entendre in every language. Coq was named by French researchers, and means "rooster", and pqR is obviously from the sequence of letters. Neither of those seems particularly immature to me, in context.

extra88 · on June 23, 2013

Yes, it's the sequence of letters but that's not all. From the first line in TFA, "pqR — a “pretty quick” version of R."

reeses · on June 23, 2013

Ahh...that explains the seven users of Agda.

Bootvis · on June 23, 2013

You could call it SPQR, for something like Super pretty quick R if you want to. I'd say not calling it SPQR is an opportunity missed anyway.

phalina · on June 23, 2013

Good work Radford Neal!