I don't like this curriculum very much - I think it is way too heavy on the data engineering side and way, way too little about the actual mechanics of the data science bit.
For example, the words "validation" (as in cross validation) and "overfitting" aren't mentioned anywhere on that page, and yet things like data scraping are mentioned multiple times.
With all due respect, I can find lots of people to do scraping, but it is much harder to find someone to explain a good strategy for cross validation on time series data (for example).
And yes, "data scientist" is a vague term that can mean pretty much anything.
Having said that, if you did everything on this list you'd be a pretty good data scientist.
(I run a data science team, and I'm involved in building a data science competency framework so it's something I think about a fair bit)
Good point. That said, easy enough to augment an expert data scientist with 4:1 data engineering support, whereas a data scientist working solo will spend 80% engineering data. With all the hype and inflated expectations, IMO much easier to hire aspiring data scientists than talented engineers who are satisfied with the data prep and admin aspects. MSPA programs are realistic that the bulk of their graduates will be spending much of their time as data janitors.
IMO much easier to hire aspiring data scientists than talented engineers who are satisfied with the data prep and admin aspects
As someone who hires both, I can guarantee this is incorrect. Well, maybe hiring "aspiring" data scientists is ok, but an aspirations will get me models that do exactly the wrong thing. So that isn't useful.
Yeah, I did see that, and that is a great course. I get the impression (based on the lack of a description) that they see it as equal importance to all the other many, many courses they tell you to do.
Put that first, and it would be a big improvement. Would be better if it wasn't in Octave though!
For example, the words "validation" (as in cross validation) and "overfitting" aren't mentioned anywhere on that page, and yet things like data scraping are mentioned multiple times.
With all due respect, I can find lots of people to do scraping, but it is much harder to find someone to explain a good strategy for cross validation on time series data (for example).
And yes, "data scientist" is a vague term that can mean pretty much anything.
Having said that, if you did everything on this list you'd be a pretty good data scientist.
(I run a data science team, and I'm involved in building a data science competency framework so it's something I think about a fair bit)