Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN Hackers: What's R's future?
47 points by wildanimal on Nov 11, 2010 | hide | past | favorite | 41 comments
The R community seems to be growing rapidly, with many favorable reviews of the language/system. For instance --

Forbes Magazine, Names You Need to Know in 2011: R Data Analysis Software: http://blogs.forbes.com/smcnally/2010/11/10/names-you-need-to-know-in-2011-r-data-analysis-software (and links therein)

And some of its developers are suggesting that they scrap it and start over (don't know if the whole "Core Team"'s on board tho'): http://www.stat.auckland.ac.nz/~ihaka/downloads/Compstat-2008-Slides.pdf

Are there parallels like this in the development of other languages/environments/ecosystems (e.g., Python3, Perl6 "revisions")? How do these efforts usually end up (I guess we're still waiting to see about Python3 and Perl6...) -- and how would it affect your business's decision to develop a library in this environment?




I'm a non-IT employee (but with a comp-sci background) working in the insurance sector, and I'm currently managing R adoption for a group of about 30 business analysts with minimal programming background.

Programming in the business world is screwed up beyond all imagination. The more money a given application is responsible for, the more likely it is that it's a house-of-cards (pun intended for MVS nerds). They're always mishmashes of COBOL, SAS, DFSORT, and random proprietary languages that have never been the subject of a third-party book, and were sold to a company that was sold to a company that was sold to CA Technologies back in the 1970s. Whatever these languages can't do is implemented through Escher-painting constructions of Excel references and VBA macros.

So, when people say that R has some issues, I say, "boo fucking hoo".

Most businesses suffer from an unnatural separation between IT and the business end. If business people want something programmed, they call IT. They don't learn Python and do it themselves, because Python is a "programming language". R is the first real language that business people are being encouraged to learn, because it's an "analysis environment". You have no idea how often I have to edit the word "programming" out of my presentations for this reason.

R will win in business because it's decent, and it's been around long enough to not be scary to managers. I'd be cautious about drawing comparisons to other languages that have undergone big design changes, because, as far as I can tell, the existence of a decent language in the business world is entirely without precedent.

(Edit: In case the above came off as sounding like a "non-hackers are idiots" rant, it wasn't meant as such. Many of the people that produce these hideous monstrosities of SAS and VBA code have PhDs in statistics and atmospheric sciences. You can be pretty smart without knowing how to write software well.)


It is surprising how many business people aren't afraid to build monstrous macro-driven spreadsheets, but cringe at the thought of programming.

Makes me wonder if it is a UI problem. In the macro spreadsheet, you have data with code tied to it. In the programming world, you generally have code accessing data.

The abstractions might just be wrong for business people and a "simple" change could reduce a lot of IT pain.


The reason that spreadsheets are popular is the same reason that many students trying to solve a maths problem immediately plug in the knowns, without rearranging terms first. It's fear of the abstract and comfort of the concrete.


Ironically, it's a marketing problem.


Is it really true that business people use R directly? Being in Forbes is consistent with it...

I recall that SQL was intended to be used by business people... and it probably has been, sometimes; but I don't think it happens much. The days of early adoption might have differed, through appealing to the more adventurous business people (as R might be now).

One thing I know for a fact: business people use spreadsheets. I think making something that easy to get things done in is an incredible achievement. As an example, I think PHP has approached but not attained it.


How about a spreadsheet that uses R as its scripting language? (There's one of these for Python.)



I don't know if business people do a lot of scripting (if any) in spreadsheets, because it side-steps the ease-of-use GUI of spreadsheet in favour of a programmer's UI...


look inside your average quantitative analysis or risk management group and you'll see that they definitely do.


Ah, you make really good points about non-programming people not thinking about R as a "programming language" and thus approaching it without fear.

I'm familiar with exactly the same phenomemon in the hardware verification world, where new tools and languages keep being invented just so that validation engineers can relax that they're not expected to "program" in a real language.


I predict a 10 year campaign of conquest followed by a 30 year death march. R is a complete mess that kicks ass in its niche. There are too many data types and the syntax seems kind of random, but two lines of R can get you publication quality graphics.

R is really becoming huge in academia. As far as I can tell, health sciences is the last SAS holdout. I expect it to take over business as well. Biz types will love it because it's so powerful as a scripting environment, but the programmers building and maintaining stuff with it will come to loathe it. R will become the PHP of analysis; ubiquitous but hated, and no one will have the chutzpah to fix it.

Random aside, anyone notice that the Kiwis are all over R? The original creators and the guy who wrote ggplot2 among many others.


I really don't get where this too many data types canard comes from - all you need to know is vector, matrix, array (1d, 2d, and nd homogeneous data types) and list and data frame (1d and 2d heterogeneous data types). On the other hand, the OO systems are somewhat bewildering.

I disagree that no one will have the chutzpah to fix R - I know of at least three groups including one driven by an extremely serious computer scientist, who are either working on rewrites of the internals or complete new implementations of the language. Even though R has been around longer than languages like Python and Ruby, it hasn't excited the interest of so many CS people, so it's at an early stage of it's evolution - it's only now at a point where serious alternative implementations of the core engine are starting to come out.

Personally, I've been working on making many of the core library more cleaner and more consistent. I'm completely biased, but I think if you use my packages (ggplot2 for graphics, plyr for apply functions, stringr for strings, lubridate for dates, ...) you'll have many fewer problems. And if you do find inconsistencies, I'm committed to fixing them.


Yeah, in my experience, most biostatisticians (especially those involved in public health and clinical research) are SAS folks. Some of that is inertia- a lot of these people learned SAS at the same time they were learning stats. However, I think that most of SAS's continuing prevalence is due to the fact that, for all of its (many, many, many) problems, SAS is a freakin' log chipper when it comes to statistics- it doesn't care how much data you throw at it, or what kinds of crazy and/or exotic statistics you ask it for- if you can decipher its syntax, you can get it to do it.

Even for stuff that a lot of other programs can do just fine, SAS often has an edge. For example, everybody and their brother can do a logistic regression model... but SAS can give you confidence intervals for all kinds of crazy parts of the model that SPSS won't even bother calculating and that R will only give you point estimates for.

The other great thing about SAS is that a lot of the good statistics books from the last twenty or thirty years include SAS sample code- for example, I'm currently having to do some off-the-beaten-path ANOVA stuff, and the reference I'm using (Edwards' "Analysis of Variance for the Behavioral Sciences") uses SAS as its language of choice.

That said, I personally find the SAS "language" to be alternatively bewildering and nostalgia-inducing (the "cards" command, anybody?). SAS is the only language about which I can honestly say "it makes R's syntax look clean and predictable". Also, the Windows version of SAS is an absolute abomination from a UI standpoint. And, their licensing schemes are draconian, and installing the damn thing can easily take an entire day, especially if (say, for example) the installer gets confused because you've already got a JDK installed on your computer. Not that I'm bitter, or anything...

Of course, as others have noted, in bioinformatics, R either is already the default or is almost there. I know that in my department's bioinformatics courses, they use R, Python, and Perl almost exclusively, and only break out the SAS when there's something specific they need it to do.


Not sure about the rest of the field of health sciences, but everything in genetics is written in C or in R (and usually as C libs for R). The generation above me used SAS, but they're no longer writing code.


Yeah, Ross Ihaka just received the "Lifetime Achievement in Open Source Award" at the New Zealand Open Source Awards the other day.


While there may be a new language that deals with some of the deficiencies of R at an unspecified point in the future, R is here today. It works, it works very well for the tasks it was designed for and is both well written and well supported. Get coding.


I wonder - could R in theory be rewritten as a Python library? If not, why not? Is there any special syntax of R that makes it more amenable to statistical analysis than Python? Performance concerns?

It's just a shame to see a whole language popping out of something that could just be a library.


[caveat: I am a numpy/scipy developer]

The idea of rewriting a large body of code in a different language does not make much sense.

Also, being a niche language has some nice consequences:

  * R has been there for a long time through its predecessor S
  * R is a specialized language: little chance to see it being screwed up by some library which wants to change everything, as it happens too often in python
  * Because it is a niche language, its behavior is consistent across platforms (it is just easier to do with R than with python, or other "real" languages).
Note how being a "real" language goes against those advantages. Also, most researchers are very lousy programmers. Often, their software is super smart, but the code quality is awful and write-only. A less powerful language may mitigate those issues


Would love to know why you think R isn't a "real" language.


I've called out to R libraries before using rpy/rpy2. These have made it pretty easy. Then I can work with my data in Python, but when I need to use a stat function, I can just call out to R.


There are a few issues. For me two of the most basic are 1) the lack of a ubiquitous NA across all data types and 2) lack of 754 floating point behavior. The lack of custom infix operators is also a bit painful, especially if you view matlab as a competitor. The unquoted formula type is also nice, although you could get some of that by parsing strings.


R is an implementation of S. According to http://en.wikipedia.org/wiki/S_%28programming_language%29 it has its origins in 1975.


Something like that exists for clojure. http://incanter.org/ "Incanter is a Clojure-based, R-like platform for statistical computing and graphics."


You mean like numpy/scipy/matplot lib?


Yes, on top of these


scikits.statsmodels, pandas, rpy2


It is still going strong in bioinformatics/genomics. I think it's slower and clunkier than the alternatives, but for stats and graphics it's pretty easy for scientists to learn...


It's also slowly getting a foot hold in Pharma. I think the large Pharma companies would love to get out from under the expensive SAS licenses.


Incanter is a project worth checking out if you use R I believe.


as is this article - http://lambda-the-ultimate.org/node/3726

We propose developing an R-like language on top of a Lisp-based engine for statistical computing that provides a paradigm for modern challenges and which leverages the work of a wider community. - Ross Ihaka (co-developed the R statistical programming language with Robert Gentleman) and Duncan Temple Lang (core developer of R)


R as a language could use some help. It is painfully slow, a big memory hog (since it copies large objects with abandon), and has lots of language "gotchas". We had a saying where I work: "R is really fast if you write it in C".

However, the libraries are great. Anything you'd want in statistics is already there. So I do use it all the time.

Just to say something nice, I do like data frames (a two dimensional matrix, where each column can have a different type).


I'm more interested in the education vs. income vs. prestige graph in: http://blogs.forbes.com/smcnally/2010/11/10/names-you-need-t...

Looks like there might be a ceiling for prestige, and that income is not as related to it as I would have thought. But, what are the units?


I would love to see Python become the standard language for scientific computing including statistics but right now R is popular and gets it done. Don't get me wrong, R is excellent for its purpose and I like working with it and its many specialized libraries. However, Python is fun to hack away in and is more of a multipurpose language.

R works for me right now so that's what I'll stick to.


It think it's more like "abandon Perl/php/javascript/whatever in favour of Python/Ruby/Lisp/whatever".

As long as there aren't more than 10 books written about the yet unborn data-cruncher saviour and as long as the brand new alternative isn't adopted in courses, I wouldn't bother -- unless you want to be the saviour's father (i.e. developer) of course.


I believe that over time, it will become the standard for statistical computing in most businesses, eventually displacing much of what SAS does today.

It's attractive to embed into databases like Neteeza, Teradata and other analytic databases, and vastly easier to use than SAS.

Even if it was rewritten in python, I think that would be unlikely to slow down it's adoption, which is driven by grad-students, researchers and quants who often have no real programming background (and frequently aren't interested in learning more than they need to generate figures for their publications).


Implement data frames and trellis graphics in Numpy/Scipy and I don't think I'd go back to R for much.


I would love to use R for my latest project, but as far as I know you can't create a compiled executable for use in a runtime, live web environment. Anybody know of a solution to this issue?


Rserve is an tcp/ip interface to R so you can send R code from any other language. I have used the rserve-ruby interface to some success.


I didn't see a ruby interface - where is it? Also, this doesn't deal with the issue of speed - can R scale to handle, say 100 simultaneous connections?


I do wish they'd change the name, R is such a difficult search term


using rseek.org will help for search on R-related pages.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: