With 10+ years in DS, I've always felt that best DS were always basically software engineers that knew math and were more interested in prototyping cool machine learning product than maintaining production infrastructure. Unfortunately this always accounted for a small fraction of DS I interacted with.
The largest group of DS was non-ML/CS/Math PhDs who started panicking once they realized their future job prospects in academia were very slim and so they signed up for bootcamps and got jobs at places hiring DS by the hundreds. Many of the people in this latter group had no idea how to write Python outside of a notebook, generally just structured problems to fit into XGBoost, and when not doing that tried to squeeze resume-boosting-complexity into any problem the could find. They also tended to have a hilariously poor understanding of creating business value.
Nearly everyone I know in the first group has switched back to just being an engineer of some sort, typically ML or AI engineer. I suspect the small set of talented people from the second group will end up in lesser paying product analytics type roles or closer to product management roles, while the majority that don't bring much to the table other than a PhD will be slowly attritioned out of the field as companies start looking for the value different skillsets bring to the table.
>With 10+ years in DS, I've always felt that best DS were always basically software engineers that knew math and were more interested in prototyping cool machine learning product than maintaining production infrastructure. Unfortunately this always accounted for a small fraction of DS I interacted with.
I've been a DS for 10+ years, and I feel the exact opposite. The worst "Data Scientists" I've worked with are all ex Software Engineers who seem to assume that business problems are really computation problems. So they find convenient ways to ignore the human aspects (e.g. trying to figure out why the data is a mess) and gravitate to using more complex algorithms and breaking down the problem to an achievable programming pipeline that runs in production, but the results are of low value. But it looks awesome on a resume.
Are you right or am I right about SWEs turned DS? I have no idea. But one quality that IMHO is important is the interest in actually looking at data and asking questions, which is much rarer than most people realize.
> Many of the people in this latter group had no idea how to write Python outside of a notebook, generally just structured problems to fit into XGBoost, and when not doing that tried to squeeze resume-boosting-complexity into any problem the could find.
To be fair, for 90% of business problems that require ML, I’d rather take the guy who throws XGBOOST at everything instead of the one trying to be fancy with neural networks.
You get an explainable output and good results without deep subject matter expertise that the person likely won’t have. It also runs at a fraction of the cost.
With any ML/AI problem, based on long weary hours doing the grunt work, the vast majority of time spent will be getting the data into some useful format. It doesn't matter too much how fancy you can build your models if there's nothing to train it on, or worse still, unreliable or incorrect training data.
So for newly minted Math PhDs, sure go out and learn how to do some ML coding in notebooks, but if you can't get a decent dataset together to train it on it'll be all for nought. Anyone with AI/ML coding only, and no SQL, is a no hire in my book.
> Nearly everyone I know in the first group has switched back to just being an engineer of some sort, typically ML or AI engineer.
@OP - mind rerunning this analysis for "AI Engineer" titles? https://www.latent.space/p/ai-engineer anecdotally i saw 8 of these in the last Who's Hiring and wanted to tease out the emerging difference between ML and AI Engineer
> I argue from data that the Data Scientist role is poorly differentiated and I speculate that its responsibilities are being eroded by better specified roles such as ML Engineer and Data Scientist.
I'm already struggling to parse the first paragraph -- is this a typo? Or do they mean a role that is both ML Engineer and Data Scientist?
Not a typo, this is is actually an industry trend.
Companies think they need a "data scientist" but they actually want a "software engineer with enough stats/data background to implement data science algorithms in production."
The result is that a lot of "data scientist" jobs are mostly data engineering or ML engineering and that is reflected in the list of requirements. It makes finding a good job extremely difficult, and it's for the better that the "data scientist" title is being eroded, because it makes it easier to tell when a "data scientist" job is actually data science as opposed to something else.
Not that "data science" is a good name anyway, but that's another story.
Something I've run into is that a while ago, there were "Type A" ("Analysts") and "Type B" ("Builders") data scientists [1] , but most job postings now are just looking for "Type A" data scientists, and the "Type B" openings have been renamed to "ML Engineer" or "Data Engineer."
Took me for a bit of a spin because when I was first interviewing this year, I'd apply for DS roles and only get interviews that were very stats heavy with a leetcode easy, but started getting further when I basically stopped applying for DS roles and went straight for MLE roles.
There seems to be a typo there. But in general, I understood the argument as different roles that have narrower and better scopes are causing less popularity of the data scientist role. Basically, DS was/is used as an umbrella term to cover many different things, and as companies are understanding the field better, they are moving towards more specific roles they need (such as ML engineer), rather than hiring for a data scientist role.
My hot take (as a DS of 12 years): data scientist was always a over-hyped title and led many to unrealistic expectations. I think we will see a rise of data-inflected product managers (this is what I am already seeing), as IMO data scientists are most effective at scoping problems and pioneering solutions, not scaling them.
I don't think it's a hot take among us practitioners. I view in terms of "how many capabilities are needed in the radar chart?"
Systematic MLOps helped to decrease _some_ of that, but not nearly enough, and certainly not with the recent explosion of LLM-induced hype.
I view MLOps engineers and ML engineers as tasked with scaling the problems, and research scientists as the scoping and pioneering of solutions. All three fall under the larger umbrella we call "Data Science" IMHO.
I've always thought that the suffix "Scientist" on any title in a software company was likely more hype than reality. Unless the company is really doing science.
The problem as I have witnessed it during my 15+ years of experience in the field as a DE by trade leading DEs, PM, ProdMs, and DSs is that even if you have DSs, The Business does NOT want to do what is required for the actual and reliable science.
Thus, we end up with a significantly weakened analysis plan and execution. Much to the disappointment of everyone involved.
As a data scientist I can testify that it's really two jobs -- 80% or more data engineering and 20% or less analysis. When the enterprise is small it's reasonable to want people who do both. Once it's big enough, though, specialization makes more sense -- you don't need all your data engineers to know how to draw conclusions from the data.
Moreover the people analyzing it don't need to be data scientists -- they can as easily be statisticians, economists, geneticists, etc.
> you don't need all your data engineers to know how to draw conclusions from the data.
I'm not sure this is accurate. To the extent that a data project is underspecified (which, let's be honest, all projects are) then the engineers will end up making some decision somewhere that may have an impact on what's available for analysis.
If the engineers have some understanding of project motivation and hypotheses then they'll make better decisions.
I think you're both saying the same thing. Data engineers don't need to be able to also do all of the data science, but they should know enough to make good decisions about data projects.
> Moreover the people analyzing it don't need to be data scientists -- they can as easily be statisticians, economists, geneticists, etc.
While data science is a newer academic field, most of its practitioners come from statistics, economics, genetics, etc.! (I say this as a 10-year data scientist who is an economist).
Amazing article! I love the approach and description of data gathering, data prep and analysis.
I believe you should edit the text to read "Data Engineer" in the first numbered item:
"I argue from data that the Data Scientist role is poorly differentiated and I speculate that its responsibilities are being eroded by better specified roles such as ML Engineer and Data [Engineer]."
> However, in so far as HN is an avant-garde community, its adoption of tech and practices likely foreshadow adoption writ large.
I'd love to see an analysis on this thesis!
Or more generally, how various sources of new-hire job descriptions correlate with each other: HN Who's Hiring and other job-advertisement boards; LinkedIn profiles; etc.
And same thing for programming languages: appearance in job postings vs. TIOBE index vs. ...
It would be useful to give more clarity around your thoughts on the "Data Engineer" role. Is it also in decline? Is the market as a whole in a relative decline?
(I wrote a blog [1] making a call that Data Engineering was likely also to see something of a relative demand decline, or be better defined into Software Engineer and Analytics Engineer, so I am quite interested in your analysis here)
> Most businesses' data engineering needs have been solved or will shortly be solved by managed services that 10 years ago would require endless and extensive self-built ETL pipelines, databases and tools.
For the exceeding majority of businesses, this means they can and should focus on building capacity for business logic, analysis and predictions instead of data engineering.
> It would be useful to give more clarity around your thoughts on the "Data Engineer" role. Is it also in decline? Is the market as a whole in a relative decline?
My thoughts: the tech job market as a whole has been in decline, as unanimously observed. There may be some signs that the slowdown is abating though. Next 6-12 months will be key to see how DS and DE rebound (or not).
> Most businesses' data engineering needs have been solved or will shortly be solved by managed services that 10 years ago would require endless and extensive self-built ETL pipelines, databases and tools. For the exceeding majority of businesses, this means they can and should focus on building capacity for business logic, analysis and predictions instead of data engineering.
Could not disagree more with your take of "DE demand will decline due to DE needs being already solved for most businesses". Apologies, but have you ever worked as a data engineer or even close to one? Pipelines break, requirements change, businesses expand, and infrastructure needs to be managed and optimized, etc. ETL processes, in the wild, are decidedly not one-off affairs.
The evidence points to demand for data engineers declining on a relative basis to other data roles, following the data science trajectory. Agree or disagree?
Maybe the analysis as to _why_ is wrong, but that is what I'm trying to unpack.
HN Guidelines: "When disagreeing, please reply to the argument instead of calling names"
> Most businesses' data engineering needs have been solved or will shortly be solved by managed services that 10 years ago would require endless and extensive self-built ETL pipelines, databases and tools
A lot of what modern data engineering has turned into is connecting various tools and software together. So adding another managed service doesn't feel like it is going to magically solve the problem. It is going to be just one more tool that the DE's will be managing. And indeed that has been my experience. For every tool that we added to the stack, we ended up spending just as much time fighting the tool as we did maintaining the self-built solution that the tool replaced. The two main advantages of using tools over DIY solutions is that they have an opinionated way of doing things, and they usually come with extensive documentation. So on boarding a new team member is easier. But engineering hours saved is pretty much a wash compared to DIY once you hit that first edge case that the tool does not handle elegantly.
> But engineering hours saved is pretty much a wash compared to DIY once you hit that first edge case that the tool does not handle elegantly
This is probably the exact crux of the argument, and I wonder if it reduces the overall demand relative to the growth in the industry. Do you need 10 engineers (like a decade ago) when 1 (great engineer) can deal with the bulk of the work with "better tools" and then DIY on the edge cases. This has been my experience with the modern ETL tools.
I don't have a good explanation as to why the "fraction of terms" for DE is going down. But the decline seems to be coincident with the general decline in tech hiring, so my best guess is that the DE role tends to be a bit more internal facing than other software engineering positions, so it is probably an easy target for hiring freezes. Based on my personal experience, the DE team seemed to always be the last team to get approved for additional hires, as the other teams in the engineering org where more directly tied to revenue generating initiatives whereas we were often seen more as a cost center.
But I don't think this is a long term trend. The DE role was originally tied heavily to the rise of data science, but it has turned into more of an operations role since then. You can probably get away with slowing down hiring for a little while, but just like with janitorial services, if you cut back too much things are going to get messy.
Visualization will never be out of favor. Management eats up fancy charts. It's unlikely they will have dedicated roles for that. It'll just get lumped in other stuff.
While I generally agree, I think there's a point in these 2 statements that can easily misinterpreted.
>It is likely that the Data Scientist role is in a long term decline...
Also
> Data science is in decline and vaguely defined
Reading this, you can think that "Data Science" jobs are decreasing. But I don't think that's true.
Let's just say that it's 2017 and I hire a team of 3 people with the job title of Data Scientist. One ends up focusing on the data side, one on modeling+analysis, and one on building the infrastructure. In 2023, I decide to change the job titles so one of them is now a Data engineer, one is now a Data Scientist, and one is now a ML Engineer to match what is happening in the job market.
It's still 3 jobs with 3 people doing the same thing. So the number of jobs aren't decreasing, but their titles are more specific. Overall, the number of "Data Science" jobs are still doing up.
Somebody will say "But that's exactly what the author said." But I think people who are new(ish) to this field might read it as "Data Science Jobs are decreasing." So I'm making this comment.
> skills such as data mining and visualisation are also out of favour.
Honestly, I just don't believe this. It's possible that as job descriptions are filled with different buzzwords, people just leave these out. For visualization it's also possible that there is a bigger focus on keywords of an established BI tool (e.g. PowerBI) instead of ad-hoc charts in matplotlib or ggplot. But some degree of data mining and visualization is useful, even to Data Engineers.
This is some awesome analysis - great job! And I’ve seen it first-hand from my own job hunt with different success looking for data scientist vs ML/AI engineer positions.
I think it really comes down to a lot of marketing, which you touch upon a bit. AI is in a hype cycle right now and people want it on their products and in their companies, so they want people that are capable of bringing those skills to the table.
> I have worked in “big data”, “data science” or something adjacent for around 12 years, and in that time I have observed these fields (and their associated roles) change a lot. I had never thought about it much because it was never very difficult to find work, however, recent times have been a bit different because my neck of the woods
Nah, recent times have been "different" for everyone. If the author entered the job market in 2011, then this is the first economic recession they have seen. Economic downturns happen roughly every 10-12 years or so and generally cause a fair bit of turmoil.
Forest fires are a necessary part of a healthy natural wooded ecosystem. I tend to think of economic downturns the same way. Every once in a while, companies (or entire markets) have to look carefully at what really adds value to their businesses and figure out how to focus on that when cash flow dwindles and investors clam up. If a business doesn't survive a recession, then it was on shaky ground well before the economy went south.
Perhaps, I think there are a couple of things contributing to the declines represented since mid 2022. I believe them to be economic conditions. My outlook is quite the opposite as I think data roles will continue to grow in demand over time for the next 3 years.
The overall economy is not doing well right now, at least in the USA. There is significant amount of extra talent (supply) on the market due to big tech hiring freezes and layoffs, which also contributes to lower demand of new roles. There is the return to office debacles occurring all the while housing is becoming even more unaffordable due to high interest rates (compared to the recent 2 to 3% during covid) and low supply a consequence of the rates where owners aren't going to want to trade a 2 to 4% for 7+ nearly 8% right now. I don't cite rto debacles to have a debate on the specifics of if rto is good/bad, but I would speculate that it's pushing more talent onto the job market to escape working environments forcing any style (rto or forced remote) that an employee disagrees with.
So, in my mind the only thing my speculation doesn't really cover is how that would contribute to lower HN responses to which I don't have an answer, maybe it truly is shrinking. However, my gut says it's the economic factors. I think (and hope) that the shift in conditions occurs in the next 6 months and hiring ramps back up as companies recover and adapt.
This is only helpful as context, but you need to have had quite a few years' experience to remember before the era of cheap money and massive FAANG-inflated salaries. Now FAANG is getting significantly regulated and fined, and money is more expensive, things will no doubt start to cool off from a salary perspective.
I think the work that needs to be done in the field of Data Science has not changed fundamentally, and simply varies from one organization to another on the specifics. As a poster above said, a Data Scientist can easily expect to spend most of their time doing data engineering at any point in time. But while "Data Science" was a big title and commanded high salaries before, now titles involving AI or Machine Learning are getting paid more, so specialists tend to adopt them to differentiate themselves.
It the description given here (data mining and visualization), and the fact that for a long time this was (I’m pretty sure) advertised as a bootcamp-appropriate sort of role seems to indicate that make this is not a role in and of itself?
A little coding and the ability to think about data seems like a generally useful add-on skill for most roles? Maybe a we’re seeing unsatisfied need for, like, office workers with some technical proficiency?
"Data scientist" properly ought to be something like "statistician" or "predictive modeler" in most orgs.
I think the need for a broader umbrella term is still present and I think that explains the wide adoption of the word "data science" in the first place. But the current meaning has been stretched way too far.
> Maybe a we’re seeing unsatisfied need for, like, office workers with some technical proficiency?
1. Is this a true decline, or in line with general tightening of the tech economy?
2. Where are the "analysis" conventions going generally -- HN is going to be a weird subsample of the economy as a whole, given that BI, DA, BA roles still exist and overlap with DS -- on top of that, many industries still haven't adopted Research Scientists, MLE, MLOps Eng, etc. into their lexicon of roles
see the discussion section. author argues that it is a true decline because other roles (data engineer, ml engineer and data analyst) are either keeping or gaining share.
DS was always an overloaded title - speciation into various other titles is ultimately good and indicates a healthy and maturing ecosystem. You still need DS though, in the multi-armed bandit that is your organization your real DS are your explore function - they figure out what to do. The other roles are exploit - they do it.
I think this sells the position short. Data analyst explore, data scientists are to have enough skill and expertise to actually make something out of what they find. That might be an XGBoost model to deliver a monthly forecast, or it might be a setting up an automated decision process.
However where I draw the line (and where I think most data scientists should draw the line) is actually putting that stuff into production code. Maybe they're good enough to write the prototype, but you need somebody else on hand to help with test coverage, make sure it meets performance requirements, triage bug reports, etc. if you make your data scientist responsible for that, they are going to spend all of their time doing that, instead of doing the things that they are actually trained to do and that you are paying them to do. This is true even if they are a perfectly competent software developer.
> I argue from data that the Data Scientist role is poorly differentiated and I speculate that its responsibilities are being eroded by better specified roles such as ML Engineer and Data Scientist.
The problem with Data Scientists is that they have been historically overpaid while producing poor quality code. Mostly data janitors who create transformation code using open source libraries, converting from one format / database / stream to another format / database / stream. Then real software engineers look at their code and it’s janky and messed up.
I no longer hire data scientists. I hire recent CS graduates then they cut their teeth on that work because it is very easy, typically low risk, and they can learn the basics of software engineering.
That describes a past gig of mine which was fantastically successful. I was handed a messy 3000-line Jupyter notebook which contained the prototype of a great product, and given the task of productionizing it.
In fact, this is what I've done at several startups now: take prototypes and productionize them. At this point, I'm almost a specialist in transitioning from early stage to growth stage. I can do prototyping, but I recognize that productionizing is my stronger suit.
The technology for the prototype doesn't matter — Jupyter is fine. In fact, it's better if the prototype is shite because then it makes it easier to make it the "one to throw away".
I once watched a data scientist copy code from a notebook into an email and send it to us, right before going to Hawaii for two weeks so we could productize it. This was after advocating for version control for the project for months...
Yup, between analytics engineers and ML engineers, it seems data scientists are the weird middle we no longer need.
I have worked with "data scientists" that are more rebranded statisticians. Some are overpaid SASS users, but some are true statistical wizards who can also program (R/Python/SQL), and when you need them they're great. I don't need them to write production code. I'd rather call them statisticians but I'm glad to see them getting paid better!
The largest group of DS was non-ML/CS/Math PhDs who started panicking once they realized their future job prospects in academia were very slim and so they signed up for bootcamps and got jobs at places hiring DS by the hundreds. Many of the people in this latter group had no idea how to write Python outside of a notebook, generally just structured problems to fit into XGBoost, and when not doing that tried to squeeze resume-boosting-complexity into any problem the could find. They also tended to have a hilariously poor understanding of creating business value.
Nearly everyone I know in the first group has switched back to just being an engineer of some sort, typically ML or AI engineer. I suspect the small set of talented people from the second group will end up in lesser paying product analytics type roles or closer to product management roles, while the majority that don't bring much to the table other than a PhD will be slowly attritioned out of the field as companies start looking for the value different skillsets bring to the table.