Not an advantage if you ask me - exactly because data.frame is built in, people have been building their own versions (tibble, data.table) instead of improving it. That's how R ended up with 3 different structures that are similar but have inconsistent apis and behaviour.
> lots of domain-specific packages
That's true.
> more consistent interfaces for basic statistics and machine learning models
Can't disagree more - there is no one go-to library for ML in R (like sklearn in Python) and each package has it's own strange interface and implementation.
> Not an advantage if you ask me - exactly because data.frame is built in, people have been building their own versions (tibble, data.table) instead of improving it. That's how R ended up with 3 different structures that are similar but have inconsistent apis and behaviour.
I've been fortunate to only work on projects that use built-in data frames, never encountered tibble or data.table in the wild.
> there is no one go-to library for ML in R (like sklearn in Python) and each package has it's own strange interface and implementation.
I still disagree here - one example being the unified interface for generalized linear models. Also, the vast majority of classifiers (RF, SVM, etc.) have similar or identical interfaces. Also, the unified `predict` interface as well. Granted, `sklearn` does have a consistent API as well.
That said, some of this is just a personal preference for the vaguely functional interface in R. The object-orientedness in Python feels a little forced for some tasks in `sklearn`.
> because data.frame is built in, people have been building their own versions (tibble, data.table) instead of improving it.
To make what I think your point is more explicit, people build their own things in R because R must maintain compatibility with S. So by and large, changes happen in packages and not the base language. This does lead to a proliferation of solutions for the same kinds of problems.
You mean like keras? or tensorflow?
Or base random forest. You know, like the original Breiman implementation.
Python has utility. But R is far superior in its the quality of the packages, their documentation, their ability to behave predictably on a given data type.
I run a machine learning shop. Right now all of the training, application, and data management is handled via R. R is simply superior in too many ways for us to be bothered with python for the scale of work we are doing.
Since we're moving some big applications to keras/ TF we do use python and will be using more in the future. However, for almost all data management, munging, movement visualization, reporting, its an R world.
> You mean like keras? or tensorflow? Or base random forest. You know, like the original Breiman implementation.
> ...
> Since we're moving some big applications to keras/ TF we do use python and will be using more in the future.
Not sure if I misunderstood, or you're contradicting yourself there.
> R is far superior in its the quality of the packages, their documentation, their ability to behave predictably on a given data type.
I not only disagree but I think that the exact opposite is true for each one of these points. But if things are working well in our shop, I'm not going to try to convince you otherwise.
> > R is far superior in its the quality of the packages, their documentation, their ability to behave predictably on a given data type.
> I not only disagree but I think that the exact opposite is true for each one of these points. But if things are working well in our shop, I'm not going to try to convince you otherwise.
I partially agree with you here. I'm extremely careful about what non-standard packages I use in R. Code quality varies wildly outside of these, likewise for documentation. But outside of neural networks, I've never found a package in Python that I felt better about in terms of code quality or documentation than its equivalent in R.
My point behind the keras/ TF comment is that the libraries have front ends in both python and R, so its mix mox/ dealers choice on what you like to work in (since the backends of both are identical).
The primary reason to moving these to python is due to convenience/ the community. Most new work is published in python. If we find a new/ interesting model we want to implement, its probably written in python. Rather than reskin the thing in its entirety, its easier here to work in python.
A couple disclaimers: my group works primarily in geospatial data, and principally in LiDAR and multispectral imagery.
The coarse division I see between R/ Python, is that if you come from a research/ academic background (non-engineering), you probably learned to program in R. If you were an engineer, you probably learned matlab. If you are self taught/ coursera/ youtube, you probably learned in python.
R libraries are generally more geared towards academic research, and specifically, working within existing frameworks (handling geospatial data as geospatial data rather then turning them into a numpy arrays). Working in python, there is far more re-invention of the wheel, and its always a pain the ass to get things back into the structures they came in as.
Python has huge utility and is an important tool for certain work. But its really really not faster than R (it def used to be, this isnt the case any more).
R has better support for more scientific programming than python.
> My point behind the keras/ TF comment is that the libraries have front ends in both python and R, so its mix mox/ dealers choice on what you like to work in (since the backends of both are identical).
Not as a point of argument, just additional information:
R's support for keras and TF is a wrapper around the Python interface to those libraries.
Not an advantage if you ask me - exactly because data.frame is built in, people have been building their own versions (tibble, data.table) instead of improving it. That's how R ended up with 3 different structures that are similar but have inconsistent apis and behaviour.
> lots of domain-specific packages
That's true.
> more consistent interfaces for basic statistics and machine learning models
Can't disagree more - there is no one go-to library for ML in R (like sklearn in Python) and each package has it's own strange interface and implementation.