These tutorials are from 2014. While they provide a good overview of R syntax, a lot has been added to the R-verse such as dplyr, which the author primarily used for his Trump Tweets blog post yesterday.
If you are interested in learning R, you may want to read the R for Data Science book (http://r4ds.had.co.nz/) book by dplyr (and ggplot2) author Hadley Wickham.
Relatedly, I have my own (slightly more complicated) notebooks using R/dplyr/ggplot2, open-sourced on GitHub, if you want further examples of real-world analysis with publically-available data along the lines of the Trump Tweet analysis:
While the tidy-verse and data.table are definitely game changers for R, it's still worth learning the basics. Often the packages make irritating tasks easy, though they rarely touch the tasks that are easy in base R. I've seen some pretty convoluted dplyr from newcomers that could have been achieved in a single line without loading any packages.
I love R, but I have two problems with it that I would like suggestions to deal with.
1. Debugging seems way more primitive than in other languages; I get cryptic messages and really struggle to pinpoint what is happening. Debugging in (free) shiny is even harder, the page says connection closed and I have to guess what has happened.
2) Code structure. R is simply fantastic in REPL and/or RStudio mode for digging around in data, but longer programs remind me of COBOL (yes, I have programmed in COBOL) longer programs written by other people remind me of the need to drink alcohol. Creating good code with R is vastly harder than Julia, in Julia the challenge is not to create working clean code - that's natural, the challenge is to create the best code that it's ever possible to have. In R the challenge (for me) is to make it work and not make a plate of spaghetti.
Ad 1. See ?browser ?traceback ?debugger. You also have breakpoints.
Ad 2. There are reference classes which also provide limited type checking for fields. You can also encapsule your code in environments which is more R-style but doesn't work well with roxygen.
Additionally you want to write more modular code. There is lots of infrastructure around that in R, but people just don't use it often enough because a lot of them aren't programmers.
mlr provides very convenient infrastructure for building data mining pipelines where you can fuse steps with each other.
Neither is perfect, but I've found them helpful in my projects. They involve much less overhead than writing packages, especially when the modularization I'm trying to achieve is purely internal to my project and I don't intend to publish the code. At the same time, they provide much better encapsulation compared to `base::source`.
FYI, debugging in Shiny has gotten much better in 0.13.1 and later (you now get stack traces at the console, or in your log file if running on your own Shiny Server, or in your admin console if running on ShinyApps.io).
Considering how R has exploded in recent years, I'm sure a more recent article could have been found. That being said, R is amazing, easily the best language/software for any sort of data analysis. And bonus points for easy Fortran/C++ interop, as well as easy multicore/cluster computing. Oh, and a shout out to RStudio, which is also amazing.
Very good, just look at sparklyr and the likes. People seem to think there is some hang up here. R has some of the best developed packages for working out of memory.
I use R multiple hours every day, I love it, I love the ecosystem, but I can't help thinking that it's showing its age. It does a great job on static analysis and visualization of smaller (< 1 gigabyte) data sets, but is seriously challenged by anything significantly larger, and is unfit for purpose if the data is changing rapidly (eg streaming). I unfortunately am slowly coming to the conclusion that Spark and Flink style tools are where data science will be at in a few years time, and while I know you can use R as a layer on top of these, I think other aspects of R also hold you back, paradoxically, things like the excellent base and ggplot graphics, which are rightly lauded as excellent, but are very low-dimensional in a world where tensors increasingly rule. I think R will remain hugely relevant for a long time, and is, what I tell people, like Excel^2, but it's getting to point where the world is moving on and it will struggle if it's not rewritten from the ground up with a much faster, multicore, threaded, distributed implementation.
I genuinely think these type of thoughts come from having extensive experience as a programmer, that can consider building out systems that might reach the performance ceiling of R. For 99% of people multicore/distributed architecture will never even be a consideration. But I'm with you, in that having these things from a system engineers perspective would be incredible. There are other implementations of R out there (not just R GNU): Hadley Wickham's discusses them in Advanced R somewhere.
If you are interested in learning R, you may want to read the R for Data Science book (http://r4ds.had.co.nz/) book by dplyr (and ggplot2) author Hadley Wickham.
Relatedly, I have my own (slightly more complicated) notebooks using R/dplyr/ggplot2, open-sourced on GitHub, if you want further examples of real-world analysis with publically-available data along the lines of the Trump Tweet analysis:
Processing Stack Overflow Developer data: https://github.com/minimaxir/stack-overflow-survey/blob/mast...
Identifying related Reddit Subreddits: https://github.com/minimaxir/subreddit-related/blob/master/f...
Determining correlation between genders of lead actors of movies on box office revenue: https://github.com/minimaxir/movie-gender/blob/master/movie_...