"R inferno" will help with the software engineering bits I suppose. R is sort of...

jointpdf · on July 24, 2020

Honest question: what skills should a data scientist possess to graduate out of “shit tier”? Should we have all of the skills of statisticians, ML engineers, data engineers, software engineers, visualization designers, and domain/communication experts? Can it not be valuable to have some but not all of the above skill sets? Does it matter that software engineers are often “shit-tier statisticians” that understand just enough ML lingo to dismiss it as marketing hype?

I’ve gone out of my way over the years to make learning data science skills as approachable as possible for uninitiated (giving trainings, providing customized learning paths based on someone’s background, offering encouragement), and yet this is almost never reciprocated by engineer types. It’s always just, “data scientists can’t write production quality code”, with no explanation of what production quality entail, or without consideration of the fact that notebook-based data science can have advantages over perfectly modularized code with a battery of tests. See the comment above: “I'm not even sure what to recommend for developing good software judgment and habits.“. It’s like a chess coach admonishing their subject to simply “think harder”. Not helpful.

When curious and open-minded data scientists and software engineers work together, it can be magic. When people snipe at others for their “shitty” skills, it creates a petty and toxic environment.

This comment comes off as a bit of an admonition, but I would greatly appreciate a list like TFA for data scientists looking to shore up their fundamental CS and software development skills.

(PS — The first book I read when teaching myself R was R Inferno, so that ain’t it.)

ims · on July 25, 2020

> See the comment above: “I'm not even sure what to recommend for developing good software judgment and habits.“. It’s like a chess coach admonishing their subject to simply “think harder”. Not helpful.

Hey, it seems like you took this as gatekeeping or something. These skills can definitely be taught or self-learned, I've done it and seen it done many times.

My point was only that I don't know resources that can act as a shortcut (my actual word above), i.e. ways to skip over the longer path of gaining experience through long engagement with the topic. So maybe more like a chess coach saying they don't know any books that let a beginner jump ahead to being a more experienced player?

There are hundreds of past threads on HN about books to level up in software, so clearly some people have thoughts about this. I just don't know what to recommend a data scientist who needs these skills immediately.

jointpdf · on July 25, 2020

What you said wasn’t egregious or anything, no worries. I’ve seen some incomprehensible code from data scientists with PhDs, stuff that has no excuse. I also know of one single resource for essential coding skills specific to data scientists either.

Sometimes a rant on a topic brews in my head for weeks or months, and I will uncork it on a random passerby that brings up the subject—which happened to be you this time.

But, I’ve had coworkers who like clockwork sneer at anything a data scientist wrote. “Why did you do it that way?”. When asked for advice on how to improve it, they huffily say nevermind. It’s ingratiating as hell.

scared2 · on July 26, 2020

R should be killed.