Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Languages to learn in 2019 for Data Science?
17 points by dangom on Dec 24, 2018 | hide | past | favorite | 16 comments
Besides Python, any languages that can make a difference for data science? What would you recommend and why?!



I would recommend... statistics.

If you already know Python, learning how to do the same things in a different environment doesn't really add a skill to your resume. It probably means you can apply for more jobs/work on different projects but for each job/project you would be the same candidate as before. Almost like branching out your skill tree rather than extending it.

On the other hand, at a time when importing a library and throwing it on a dataset (without understanding what it actually does) is becoming easier, having a comprehensive understanding of probability, statistics, calculus and linear algebra would be a huge differentiator.

As far as languages go, Scala comes up a lot in data science job postings where I live, but my experience is limited only to trying out the Scala API of Spark. Maybe someone with more experience can shed some light on this.


R ... The language is just a joy to use.

* Great environment with RStudio

* Awesome plotting and visualization with ggplot

* Very welcoming community with a very diverse background ranging from social science to quantitative finance

Basically for everything except deep learning R is a great alternative to python. If you rely on advanced statistics I think R is the way to go.


> R ... The language is just a joy to use.

My experience has been the exact opposite. R is great as an interactive statistical environment. But if you need to build data-intensive applications in it - it's still powerful but somewhat slow and also a bit of a mess: many exceptions, special cases, strange design choices and multiple ways to do the same thing. It can get the job done, but there is an unpleasant learning curve to get used to all the weirdness.


I'd second the R recommendation. It's a particularly exciting time to use R, with the proliferation of package collections like tidyverse (https://www.tidyverse.org/) and the development and release of free texts like R for Data Science (https://r4ds.had.co.nz/).


Definitely R, but IMO from the start learn it with tinyverse and Rcpp. I.e use it as a pipelining and glue language but don’t write any processing code in it. R is slow, and memory hungry.

Having said that tinyverse basically calls methods written in C and has a beautiful syntax. As Rcpp let’s you write whatever needs to go quickly in c++ then easily expose it in R.


It is still the best language for exploratory data analysis, fitting models intuitively, and visualizing data.

Being able to do these things thoroughly and quickly, has (in my experience) been the most important technical skill for getting data science right, regardless of the language production models end up getting deployed in.


Julia, with its 1.0 release and much needed stability, is a joy to work with with. It's blazing fast compared to Python. The learning curve may be steep though.


I don't actually recommend it but Matlab is annoying good at a lot of things you want for data science.

I'd only recommend Matlab for data science if you already know it (say you're studying engineering) and are looking to start with data science.


I have found Matlab as a great tool to write scrips to do data exploration and manipulation; then later on I can easily reimplement the same workflows or algorithms into something like a python notebook and also have the added benefit of more flexible and open frameworks (for stuff like ML with keras and tensorflow).


Sounds like you just know Matlab better then, I think python, and especially notebooks, are also very good for data exploration and manipulation. I've recently switched over from doing everything in Matlab to everything in python. Although I do occasionally fall back to Matlab for things like 3D plotting.

Also, if you like Matlab/know it well, you might want to look into the deep learning toolbox (if you have access to it). I found I was able to do all the stuff I was doing in keras with Matlab. Haven't learned to recreate all the stuff I was doing with tensor flow there, but it might be possible.


Clojure is a data-oriented language. quick guide here: https://hackernoon.com/clojure-functional-programming-38cc6a...

Others on the internet say:

"Teaching a python Dev Clojure would be small fraction of there time.

Some of the libraries people reach for in Python for DS aren't even necessary in Clojure. Pandas is a good example of this.

Data Science is as wide a field as programming, there are lots of things that a JVM based language would be great at."

Source: https://www.reddit.com/r/Clojure/comments/7jdaac/clojure_dat...

IMO learn a little lisp ASAP


Learn make. IMO, Makefile is a very flexible way to combine the best of breed from matlab/octave, R, Python, etc in to a single pipeline. Each processing step spits out file dependencies for the next and so on.


JavaScript. I know this could be a little controversial but with more and more people learning JavaScript and porting of tensor flow,torch into javascript, I believe this is the language which will take data science to masses of developers.


JavaScript is cute, I use it as a puppet language


Java. Because if you’re using Spark heavily, then being able to make your own microservices allows great scaling.


I am learning Python. So I'll go for it.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: