>Python has gotten sufficiently weapons grade that we don’t descend into R anymore.
>Hadoop is definitely happening but it’s Google’s problem because now after building our own Hadoop on iron solution, after dealing with Redshift for a while, we now just gave it all to BigQuery.
Python has gotten sufficiently weapons grade that we don’t descend into R anymore.
I've experienced this in my own work as well. The extra verbosity of Pandas data frames compared to R data frames doesn't bother me anymore. Sometimes I miss the Lispy homoiconic magic, but not enough to make me want to use R at work.
I still use it once in a while for heavily "statistical" stuff that doesn't ever need to be "productionized", but for run-of-the-mill machine learning I see no reason to use it over Pandas.
I gave plotnine a go in one of my personal Python projects (I'm a big fan of ggplot2 and tidyverse in general over pandas and seaborn) and after struggling for a while with a more complicated graph I went back to using seaborn.
Not to mention writing R-like code in Python will prevent you from being immediately understood by both R and Python developers. It's just not worth it.
I'd like to see more transparency from NYT on how they're actually collecting, retaining, and distributing user data given both their data science and privacy efforts.
Interesting how at 11:45 he skirts the whole privacy topic by just stating that linking all their data to an identified reader (the 'who' in the 'who what where' of reader behavior tracking) 'involves third party data'.
I’ve collaborated with Chris Wiggins at Columbia. He’s insanely hardworking and it’s impressive to see how he balances an academic life with the life of a working Data Scientist at the New York Times. Really inspiring guy to be around.
And it seems like an understatement, when presenting his data science credentials in an article that mentions Hadoop this much, to not mention that the dude founded the first major Hadoop company between Facebook and "retirement". The guy really was an inspiration when I worked there in the early days.
>Hadoop is definitely happening but it’s Google’s problem because now after building our own Hadoop on iron solution, after dealing with Redshift for a while, we now just gave it all to BigQuery.
A tidy simplification of the technology stack.