Ask HN: Languages to learn in 2019 for Data Science?

altairiumblue · on Dec 25, 2018

I would recommend... statistics.

If you already know Python, learning how to do the same things in a different environment doesn't really add a skill to your resume. It probably means you can apply for more jobs/work on different projects but for each job/project you would be the same candidate as before. Almost like branching out your skill tree rather than extending it.

On the other hand, at a time when importing a library and throwing it on a dataset (without understanding what it actually does) is becoming easier, having a comprehensive understanding of probability, statistics, calculus and linear algebra would be a huge differentiator.

As far as languages go, Scala comes up a lot in data science job postings where I live, but my experience is limited only to trying out the Scala API of Spark. Maybe someone with more experience can shed some light on this.

rasmus1610 · on Dec 24, 2018

R ... The language is just a joy to use.

* Great environment with RStudio

* Awesome plotting and visualization with ggplot

* Very welcoming community with a very diverse background ranging from social science to quantitative finance

Basically for everything except deep learning R is a great alternative to python. If you rely on advanced statistics I think R is the way to go.

altairiumblue · on Dec 25, 2018

> R ... The language is just a joy to use.

My experience has been the exact opposite. R is great as an interactive statistical environment. But if you need to build data-intensive applications in it - it's still powerful but somewhat slow and also a bit of a mess: many exceptions, special cases, strange design choices and multiple ways to do the same thing. It can get the job done, but there is an unpleasant learning curve to get used to all the weirdness.

cdcrabtree · on Dec 25, 2018

I'd second the R recommendation. It's a particularly exciting time to use R, with the proliferation of package collections like tidyverse (https://www.tidyverse.org/) and the development and release of free texts like R for Data Science (https://r4ds.had.co.nz/).

usgroup · on Dec 25, 2018

Definitely R, but IMO from the start learn it with tinyverse and Rcpp. I.e use it as a pipelining and glue language but don’t write any processing code in it. R is slow, and memory hungry.

Having said that tinyverse basically calls methods written in C and has a beautiful syntax. As Rcpp let’s you write whatever needs to go quickly in c++ then easily expose it in R.

confounded · on Dec 25, 2018

It is still the best language for exploratory data analysis, fitting models intuitively, and visualizing data.

Being able to do these things thoroughly and quickly, has (in my experience) been the most important technical skill for getting data science right, regardless of the language production models end up getting deployed in.

sidcool · on Dec 25, 2018

Julia, with its 1.0 release and much needed stability, is a joy to work with with. It's blazing fast compared to Python. The learning curve may be steep though.

sgillen · on Dec 25, 2018

I don't actually recommend it but Matlab is annoying good at a lot of things you want for data science.

I'd only recommend Matlab for data science if you already know it (say you're studying engineering) and are looking to start with data science.

stefanpie · on Dec 25, 2018

I have found Matlab as a great tool to write scrips to do data exploration and manipulation; then later on I can easily reimplement the same workflows or algorithms into something like a python notebook and also have the added benefit of more flexible and open frameworks (for stuff like ML with keras and tensorflow).

sgillen · on Dec 25, 2018

Sounds like you just know Matlab better then, I think python, and especially notebooks, are also very good for data exploration and manipulation. I've recently switched over from doing everything in Matlab to everything in python. Although I do occasionally fall back to Matlab for things like 3D plotting.

Also, if you like Matlab/know it well, you might want to look into the deep learning toolbox (if you have access to it). I found I was able to do all the stuff I was doing in keras with Matlab. Haven't learned to recreate all the stuff I was doing with tensor flow there, but it might be possible.

badsavage · on Dec 25, 2018

Clojure is a data-oriented language. quick guide here: https://hackernoon.com/clojure-functional-programming-38cc6a...

Others on the internet say:

"Teaching a python Dev Clojure would be small fraction of there time.

Some of the libraries people reach for in Python for DS aren't even necessary in Clojure. Pandas is a good example of this.

Data Science is as wide a field as programming, there are lots of things that a JVM based language would be great at."

Source: https://www.reddit.com/r/Clojure/comments/7jdaac/clojure_dat...

IMO learn a little lisp ASAP

usgroup · on Dec 25, 2018

Learn make. IMO, Makefile is a very flexible way to combine the best of breed from matlab/octave, R, Python, etc in to a single pipeline. Each processing step spits out file dependencies for the next and so on.

ankyth27 · on Dec 25, 2018

JavaScript. I know this could be a little controversial but with more and more people learning JavaScript and porting of tensor flow,torch into javascript, I believe this is the language which will take data science to masses of developers.

badsavage · on Dec 25, 2018

JavaScript is cute, I use it as a puppet language

octokatt · on Dec 27, 2018

Java. Because if you’re using Spark heavily, then being able to make your own microservices allows great scaling.

NinjaX · on Dec 26, 2018

I am learning Python. So I'll go for it.