(1) If you're going to hop on the 'hot' AI/ML/DS bandwagon, make sure that you actually have 'work' for the highly coveted person you are trying to hire. 'Everyone else is doing it', 'all our clients have data, so selling them data-sciency projects shoudn't be a problem', 'Hire a first DS person, to have them start a brand new DS division' are things that probably are doomed to fail.
If you weren't selling classic Data Analyses/Intelligence services before, chances are you are lacking the foundation in your sales and support staff, and maybe even your customer type, to be able to embed this into your company and into a new line of business. The risk your prize new employee will walk to greener horizons to work with peers on exiting real projects is high.
(2) Know where you are heading. If you just want to embellish your 'normal' projects with some gimmicks from the online 'Cognitive Services' api's offered by the various vendors, then yes, sending some of your regular staff on a quick 'boot-camp' style course will probably work. You don't need a 'data-scientist' for that. And if you hire a real one for that type of work he'll probably walk within 2 weeks.
>(1) If you're going to hop on the 'hot' AI/ML/DS bandwagon, make sure that you actually have 'work' for the highly coveted person you are trying to hire. 'Everyone else is doing it', 'all our clients have data, so selling them data-sciency projects shoudn't be a problem', 'Hire a first DS person, to have them start a brand new DS division' are things that probably are doomed to fail.
I'd add that if you have all these conditions and still want some hope of getting value from your Data Scientists, give your data scientist(s) a period of time (6 - 12 months) where they can discover how they can provide business value. Accept that during that period they may not provide the value you are hoping for. Support them if they recommend changes (e.g. different ways of collecting or processing data)
Lots of middle/upper managers want to hire a team of Data Scientists and want them them to deliver high business value from day one. They want a magic black box to solve their business problems. This results in exactly what you describe - constant fires, games, gimmicks (that look good on PowerPoint), and good people eventually quitting or getting fired.
On the other hand, if you give them 6-12 months to interact with different teams and identify relevant business problems, they'll find ways to provide value. Or, worst case scenario, they'll tell you (maybe by all quitting) that your problems aren't data science problems and the money you are spending on them is best invested elsewhere.
> if you give them 6-12 months to interact with different teams and identify relevant business problems...
I fully agree with your points BUT 6-12 months of self-directed, open-ended discovery just ain't gonna happen. This isn't the era of Bell-Labs, this is the era of PMP-certified project managers. In many places the "data scientist" will be dropped into a project and expected to start producing "deliverables", outcomes will be a mixed bag.
A more organic approach would be for teams to start picking up tools and skills for basic data analysis. We used to call it "stats". Doesn't have to be cutting-edge ML stuff to provide value and new opportunities for the practitioners. Of course whatever they start doing, HR and management will pump it up: mining a few hundred megs of csv files all of sudden will get called "big data".
Something similar happened ~10 years ago with the advent of "BI" (Business Intelligence). It was nothing more than folks realizing that their corporate databases have useful stuff in them and they could query them in an ad-hoc fashion _outside_ of the applications that typically use them to provide "business value".
>A more organic approach would be for teams to start picking up tools and skills for basic data analysis. We used to call it "stats". Doesn't have to be cutting-edge ML stuff to provide value and new opportunities for the practitioners. Of course whatever they start doing, HR and management will pump it up: mining a few hundred megs of csv files all of sudden will get called "big data".
Here's my one caveat to this based on personal experience. This happens to be why an entire data science team in one division of a large company quit over the course of a year.
If middle/upper management is purely focused on "show value on day one", it's very easy to get stuck in a position where basic stats and building Dashboards for simple data mining becomes the gold standard for what a Data Science team is for. Management develops an addiction to fast data mining projects that look good in PowerPoint, but don't actually deliver much business value because nobody is actually measuring business impact outside of how impressive it sounds. When the Data Science team tries to focus on projects that require more work that deliver more value, they're shot down because they don't fit into the "monthly management update" mentality.
You need a good management structure that realizes that quick data mining exercises are a stepping stone to delivering longer term value. They are not the ultimate goal.
> This isn't the era of Bell-Labs, this is the era of PMP-certified project managers. In many places the "data scientist" will be dropped into a project and expected to start producing "deliverables", outcomes will be a mixed bag.
I have been on the receiving end of that argument a lot. Given the efficacy of well-thought ML, I suspect that competition will make those companies less durable than they seem to expect.
Sage advice. There is a significant misconception on what a data scientist does and many roles don’t seem to truly bring out their potential. Most get grouped into doing everything around “data”.
My company opened up a load of AI/ML roles with absolutely no pending projects for them. CTO literally said once they are hired, they will find opportunities.
Agreed. From what I can tell the data scientists at my company pretty much are running a few SQL queries with simple stats. They also seem to have a high turnover. It looks like they got hired mainly because the company wanted "data science" without knowing what it means.
As someone who has been employed in this space recently: No there is not.
There is however a lot of middle managers who have heard the term and want to pay crud vba developer wages for people to make pretty graphs in excel.
The credential inflation is also getting ridiculous. I have an MSc and I don't see how me being able to do quantum chromodynamics at one point in my mid 20s adds value to a business ten years later.
That's what I am thinking. Any half decent software engineer could probably produce the results they need from a few SQL queries. But "data science" is hot at the moment.
Comments like this reinforce the idea that data science is too broad a term. Everything data-related is thrown into the "data science" bucket.
You describe what I'd call business intelligence or even advanced analytics. SQL jockeys (not a pejorative) really slicing into the data and presenting it in meaningful ways. There's usually exists some science behind these exercises.
Then there's machine learning, which also gets clubbed into the data science camp. Try capturing a convolutional neural network in SQL - you might go bald from pulling your hair out. It's totally not the tool for the job.
While perhaps any half-decent software engineer could maneuver his/her way around the great ML libraries out there to produce a CNN (or something of the like), I would hesitate to give them a project that involves true science (bring out your pitchforks). Experimental design, hypothesis testing, etc. While being a SQL expert could help turnaround time and enable you to test more interesting hypotheses, I wouldn't leave most DBAs to develop and carry out an experiment.
So I guess my point is "data science" is too broad and overused a term to be meaningful.
True, but many people jump on the hype wagons and assume that they need machine learning, or whatever the flavor of the day is when in fact they need something far simpler.
- some smaller -mid-sized companies call datascience are actually referring to software engineering / data engineering
- Smaller companies with datascience might be doing software engineering as well, if no data pipelines / collection datasets available (they are using the term wrong here)
- most companies calling for AI/deep learning don't actually meet the requirements for it.
- You don't need a data scientist until you have SWE's / data engineers
- Machine learning is more narrow smaller defined scoped problems / dataset, for things like text classification and analysis. Things like identifying what number is drawn out on a piece of paper using MNIST, etc.
basically, data-science could mean a lot of things and is pretty ambiguous. Its the new software-engineer vs web-developer terminology.
That really depends. I'm working sometimes on problems that physician researchers have been cranking away on for decades. We want to do better than their state of the art. That's not something you can throw a software engineer at. There is a pretty wide spread of problem difficulty out there.
> Not every labor shortage problem is solved by adjusting compensation levels upwards.
Quite literally all shortages are solved by rising price. After all, the formal definition of a shortage is "a situation where an external mechanism, such as government intervention, prevents price from rising". If the price can rise, then a shortage cannot be. Which is the case here. If you have the budget for $1m/year compensation, it's yours to offer.
Think about it this way: If there was only one person in the world capable of being a data scientist, each company in need of a data scientist would increase their offers to try and attract this person to their company. But no company in existence has an unlimited budget. Eventually all companies will be priced out of the market except for the company who ultimately employs this person. Since the other companies stop looking for data scientists beyond a certain price, the one position that remains in the market is perfectly filled with the one person available to do such work.
To put it another way: No matter how much you want one, there isn't a shortage of Ferraris because you cannot afford one. You simply aren't in the market for one in the first place. Everyone who is in the market for a Ferrari can have one. Likewise, everyone who is in the market for a data scientist can have one.
The mid-level exec with a data science poster on his wall, but a small budget for such work, is no more in the market for a data scientist than a teenager with a Ferrari poster on his wall is in the market for a Ferrari. That's just hopes and dreams. They don't count as anything more.
Try to replace ferraris with food and explain that during a famine, most people are simply not in the market for eating.
Your definition is a circular piece of logic defined to justify an ideology. The word 'shortage' has been used for a while. Creatin a definition that disqualifies its use is simply disingenuous.
Appealing to emotion doesn't work... for both of you. "Solving the problem" is not a well defined. Although it was a pleasure to read this exercise in persuasion.
I simply did not use the words "solving the problem". Also showing an example that is not carefully picked to fit within the offered definition is not appealing to emotion, it's simply valid feedback.
> Try to replace ferraris with food and explain that during a famine, most people are simply not in the market for eating.
Makes sense to me. Famine suggests that someone goes hungry, so obviously not everyone is in the market. So it is either down those who can afford food get what food is available, and thus everyone else is not in the market to buy food. Or you resort to capping the price of food and rely on something like a lottery system, or first come, first serve to determine who ultimately gets the food. Only the latter would be described as a shortage of food.
> Can we assume that starved people are not in the market for living?
I'm not sure that living has a market, but okay, sure. If living is something you can buy, it is quite possible that not everyone will have the desire and willingness to be a buyer.
> Would submarines never be short of oxygen if we raised oxygen price ?
If you are trying to use short in an economic context to stay consistent with shortage (which, admittedly, makes the phrasing awkward), and not using definitions of short that have no relation to shortage (which would make the question pretty strange in the context of this discussion), then I am not sure why you would want to sell your existing oxygen only to buy it back after the price has gone up? But I guess you can short your oxygen if you wish. Why not?
> and not using definitions of short that have no relation to shortage
"there's a shortage of oxygen" fits perfectly well. My point here is that your definition is purposefully narrowing the problem in order to force one solution.
Let's come back to shortage of food. Several ways to tackle it have been applied in the past, that didn't rely on free floating prices:
* rationing the limited supply
* increasing production
etc.
There is really no reason to stop at "these people can't afford to participate in the food market, let them starve". However, your definition of "shortage" tells us that the only course of action is precisely that.
I'll just go on with my tirade a little bit further, and tackle the "increase production" point. Some people will say that by raising the prices, rational economic agents will enter the market to produce and sell more. This is yet another hint that the definition is not there to define a situation, but to hint at a larger scheme of economic agency. This is precisely why the definition is ideological; it is not meant to define a problem, but to introduce someone's opinion on how to solve that problem.
By the way, shortage can be defined as the lack of something needed. This definition does not need further ideological body to make sense, it defines every situation we have touched here, and doesn't fragment the mean of shortage in specific situations.
> there's a shortage of oxygen" fits perfectly well.
It certainly can fit. Context is important though. If you are twenty thousand leagues under the sea and running out of oxygen, an external mechanism (being isolated from people with oxygen to sell) is preventing price from rising. There is nobody to offer a higher price to. You could have all the money in the world and it still isn't going to buy you oxygen. This is indeed a shortage, by very definition.
But this is not the same as being unable to afford oxygen. If your submarine is sitting in harbour, with oxygen vendors by your side, and you would be wise to have more oxygen before you use the sub again, but don't have the budget to buy it, then that does not mean there is a shortage of oxygen. It simply means you are not in the market to buy oxygen.
> My point here is that your definition
Let's be clear here, it is not my definition. This is the formal definition of the word.
> By the way, shortage can be defined as the lack of something needed.
This is what a shortage is, yes. Price is how we determine need. The one willing to pay the highest price is the one who is in the most need. Those without the desire or willingness to pay a given price are indicating that they do not have the need at that price point, and thus are no longer counted as part of the market. Only when everyone is prevented from offering more money to get what they need do we say it is shortage.
Everything that carries a price is lacking to some degree. That is why it has a price: To determine who gets it and who doesn't. The higher the price, the more something is lacking. A hypothetical post-scarcity world would mean that things like food would be given away for free. But that's not the world we live in. There is only so much food to go around, and price determines who gets it, and how much of it they get. Only when an external mechanism stops the price of food from rising would you call it a shortage, however.
If shortage simply meant "lacking something needed", everything that has a price would be in a shortage situation, and that would end up being quite meaningless. After all, if there was no limits to availability for a given good or service, it would be free. There is a good reason why the definition of shortage is more nuanced.
Why is that, economically speaking? If there's a labor shortage, doesn't that mean it's undervalued in the first place? Forgive me if that sounds naive, I'm no economist.
Because it makes the assumption that the system has no constraints, which is often untrue.
First of all, you have to ask yourself why there is a labor shortage. If the underlying reason is insufficient pay, then sure, increasing pay will fix it. This is often the case for fungible talent -- incentivize something enough, and people will shift resources to it. However in many situations, certain types of work are not fungible.
There are natural barriers to entry and qualification issues. Surgeons, for instance. You can incentivize and compensate all you want, but the fact is, not everyone is cut out to be a surgeon, so you have a funneling effect. It's not even a matter of pay.
Then there's desirability issues. Deep sea welding is a highly specialized (and dangerous) trade that pays handsomely, but not everyone wants to do it.
Some physically demanding jobs also have natural attrition issues in spite of compensation. When the inflow of talent is smaller than outflow over a long period of time, a shortage results.
There's also a training and timing issues. Let's say it's 2011 you want someone who can build the kind of infrastructure that powers Netflix, in 6-12 months. The talent and experience pipeline would still have been brewing at the time, so there's going to be a temporary shortage of talent and experience until folks gain experience and mature in the field. In that instance, you can still hire and grow personnel over time, but there would still be a shortage of existing talent.
Talent pipelines take time to build, are highly dependent on talent pools available and the ecosystems around it. Just throwing money at the problem doesn't always work. Compensation is just one end of it (the opportunity end); there are also significant long-term investments needed on the other end (the cradle end, which includes education, development, etc.). Texas Instruments did this -- they funded what eventually became UT Dallas... but it took many many years and the outcome was uncertain.
There's geographic issues. To use an extremely unlikely example, let's say you wanted someone at the level of Jeff Dean or Sanjay Ghemawat, but you would need them to relocate to Podunk, Iowa. Very few people at that level would want to relocate for any amount of pay, hence a shortage. I realize this is a pretty extreme example, but I worked at a company in an undesirable part of the country and it was difficult to get truly talented folks to move out there even with significant premiums on compensation. Now you can keep increasing compensation until you get someone who's willing to move, but they're usually not the kind of talent you were looking for in the first place. Also there's a break-even point at which the compensation doesn't make sense for the value the position is likely to generate, so companies will just not hire.
My point is there are all kinds of real and complex reasons why increasing compensation alone will not solve all shortage problems.
Thanks for the great rundown of some factors! I would like to take issue with just one part:
> Very few people at that level would want to relocate for any amount of pay, hence a shortage.
I'm not sure if you're using hyperbole or not. But I'm guessing that if you paid $1 million USD / year, there would be no practical shortage of data scientists willing to live in Podunk, Iowa.
There is a cost-benefit proposition to every hire.
Unfortunately, the issue there is that the majority of data scientists don't generate $1 mil of value (exceptions exist of course), and so it's difficult to justify that level of compensation to management, which means the position may never get created in the first place.
Absent an existential threat to their bottom lines, companies will just muddle on without hiring.
This may be different in high growth companies, but in most traditional companies that aren't sloshing around in VC cash, to increase headcount you need to provide a value justification vis-a-vis salary (unless it's for cost center positions, but even then..)
There's also an underlying assumption that talent flocks to the highest bidder.... but most HR folks will tell you that it's more complex than that. There are many quality of life issues that come into play, like weather, peer-group, spousal happiness, etc. Money doesn't buy everything, and humans aren't optimizers but satificers.
I'm a sample of one, but for me, I would absolutely not move to Podunk, IA for a $1mil salary. I'm happy taking a lower salary living in a city and intellectual milieu that feeds me. The kind of person who would take the $1mil may likely not be the kind of talent you want.
It does not matter how much you pay, there are only so many 'qualified' data scientists to go around. If the wages are so ridiculously high, then there will be all manner of hucksters pitching themselves as data scientists with minimal training (but good enough to pass an interview), and they will give data scientists a bad name. This will make paying high-wages a super risky move, which almost never pays off, putting downward pressure on demand and wages.
Well not according to the people who constantly talk about Supply and Demand. If there's a massive supply shortage as implied there ought to be a corresponding increase in Compensation no?
The market for serious data scientists is growing, but is not as dire as these articles make it seem.
The private sector has wrapped 'senior data analyst','BI developer','data base administrator', and 'data engineer' into 'data scientist', along with the ML/AI/stats roles you would expect.
So sure, when you put all the roles that touch data into one basket, and we're in a world that demands increasing familiarity with data systems and data analysis techniques... a gap is going to appear. Color me shocked
Typical data scientist role I see posted is looking for:
- SQL
- AWS/Azure
- Tableau/Power BI/Other BI tools
- Python/R
- Increasingly Hadoop/Spark
With Java/Scala/C#/C++ listed as 'nice to haves' along with 'machine learning' and 'big data'. These are not research roles, and while they often want a masters degree or more they really don't need them.
If you fit that bill you can probably get a role paying $95-120k just about anywhere in the US. You'll wrangle data, make pipelines, build dashboards and occasionally deploy a model or two when the planets align.
If you want to/are capable of doing more ground breaking/cutting edge work, these roles will crush your soul in 8-12 months.
>If you fit that bill you can probably get a role paying $95-120k just about anywhere in the US. You'll wrangle data, make pipelines, build dashboards and occasionally deploy a model or two when the planets align.
If you expect to pay that little to anyone who knows the above to any degree you will have a bad time.
A number of projects I have joined absolutely look like someone treated them as a way to learn the technology before moving onto better paying jobs, leaving the business with a terrible mess that's largely unfixable without a rewrite. You're better off hiring one developer that knows what they are doing at above market rates than three who don't for the same price.
Outside the bay area, 95-120 is about the going rate for someone with those skills. Add a little for the bigger cities, and maybe 10k for every 3-5 years of experience.
>If you expect to pay that little to anyone who knows the above to any degree you will have a bad time.
I believe you misunderstood his point. The $95-120K figure is for people who do not do ground breaking research. Anecdotally, I know people who get hired for data science who are doing very little data sciencey stuff (a bit of data manipulation/slicing and lots of plots - almost no actual ML). These people should not get paid more than a typical SW developer, and decent SW developers should get paid more.
First, I don't think he meant they expect you to know all of these - just a subset.
Second, most people I know who know R or Pandas or other Python numerical stuff (NumPy/SciPy) are horrible programmers. They're good at numerical work - but that doesn't translate to good coding skills. The bar is not that high to know it.
It's somewhat similar with AWS/Hadoop. Most people I know have taken learn these (online training, in person classes, etc) with very basic SW skills.
I don't doubt expert programmers exist who know these technologies - I'm just pointing out that it's not that hard to learn some of these. Plenty of people learn these without knowing much SW. To be a good SW developer takes a lot more.
>If you expect that on a 120k salary, you will be what you pay for, e.g. a code monkey living on peanuts.
You must be in the Bay Area. A 120K salary is above the median for SW developers for most of the country, and grants them a great lifestyle: Big house, two cars, vacation travel, etc. Hardly "living on peanuts".
The GP was clear he was not talking about the Bay area:
>If you fit that bill you can probably get a role paying $95-120k just about anywhere in the US.
Finally, the point is: If you studied actual Machine Learning, he is pointing out that these jobs are mostly not that, and you will just do a boring job.
>You must be in the Bay Area. A 120K salary is above the median for SW developers for most of the country, and grants them a great lifestyle: Big house, two cars, vacation travel, etc. Hardly "living on peanuts".
I'm not in the US currently. I have worked with teams in New York in finance and the wages expected were multiples of 120k. Likewise in London and Honk Kong.
Yeah, outliers. The rest of the US where costs of living are radically lower 120k is a reasonable expectation. It's actually a bit lower for subsets of those skills here in Columbus,OH. Mostly with large corps like Chase, Cardinal Health etc.
But you only have a 1400$ mortgage for a three bedroom home here so it's all relative.
When someone says "most", they mean the outliers are not included. When one says "most Americans can't point out Egypt on the map", it's rather pointless to give examples of PhD's who can.
That made me wonder, though, in this particular context, are these really outliers?
Even just counting population, the Bay Area's CSA is 8.8M, NYC is 23.9M, Boston is 8.2M and LA is 18.9M. That's already over 18% of the population, just in the major urban centers of California and the Northeast.
It's not much of a stretch for the percentage of tech jobs to be in those areas, and less of a stretch to imagine that the vast majority would be in some kind of high-cost area.
I wasn't able to come up with suitable web search terms to find any numbers, but perhaps someone else has.
Throw in Seattle and there are no tech jobs outside those areas to a first approximation.
I don't know where the people above are getting hired, but the only places I've ever seen outside those major cities are incidental programming jobs to babysit VBA code that runs some business that has nothing to do with computers.
Denver, Austin, Houston, North Carolina, Chicago, Atlanta, Salt Lake City, Boise to name a few. Granted, some of these have recently become pricey, but still nowhere near Seattle levels.
Look at all big companies (and not just the recent Internet ones of the last 20 years), and look at where they have sites. Jobs in most of these places.
> Look at all big companies (and not just the recent Internet ones of the last 20 years), and look at where they have sites. Jobs in most of these places.
But how many of those jobs are technical/programming (in those places)?
Even by population alone (Chicago, Houston, and Atlanta making up the bulk), that's under 10.7%, less than CA+NE (and only half of CA+NE+Seattle?). That persuades me that these lower-cost areas are the exception, not the rule.
Just looking at top 50, you've got a real geographic spread - about half of the people are in CA/NE, the rest are split pretty evenly between the four quadrants...
> Not sure why you'd compare Houston to all of California?
I believe everyone in this sub-thread has used "CA" to mean the SF Bay Area plus greater Los Angeles (and certainly I have). Similary, "NE" was just shorthand for the big cities therein, not the whole region.
Ultimately, population is just a very loose proxy for number of tech jobs.
I feel like I hear just as many people getting degrees in "AI" nowdays as plain CS degrees. A much higher percentage than five years ago. It feels like everyone and their grandma is minoring or majoring in AI or machine learning. And when I talk with companies about it, they're lukewarm at best.
I really wouldn't bet my personal career at this point on going into this area. It'll be filled to the brim sooner than you expect. It's a much better, and just as intellectually satisfying, bet to just go into plain traditional CS in my opinion.
(A counter argument is that I felt about the same sentiment around three years ago, but my intuitions doesn't appear to have manifested, so maybe I have poor intuitions, or I'm missing some piece of the puzzle.)
I've heard it said, "A data scientist is a better statistician than a typical programmer and a better programmer than a typical statistician."
I 'graduated' from the Data Analyst Nanodegree @ Udacity in my spare time, and have experience with both. I lack a background in math beyond university Calculus, though, so that seems the major barrier for me to get a foothold in the industry.
It depends on how companies staff their data science teams. I’m comfortable with a lead data scientist that oversees “less math” data analysts who build the models.
If you get the high level concepts of stats and linear algebra, it should be enough that someone else with a PhD can help fill the knowledge gap.
The other way to phrase that is "A data scientist is a worse programmer than a typical programmer and a worse statistician than a typical statistician." If you need a programmer, hire a programmer. If you need a statistician, hire a statistician. I wouldn't ever hire someone who took a bootcamp that is "a better plumber than a typical doctor and a better doctor than a typical plumber."
I’d have a hard time finding a job in ML for $300K but
I’d have an easy time finding a job in PHP contracting for $350K. This is despite having 10 years experience as a ML practitioner.
I quit my last job after bringing in over $100M in profit and only getting a $30K bonus.
Now it would take ~$500K and an ‘eat what you kill model’ for me to consider going back to work.
These days I make my money from licensing ML products which gives me a lot of time to study - which is what I enjoy doing.
Initially one of those elite consulting marketplace things. And then my own pool of referrals. I hate PHP so I set my rate higher on those projects and people still pay it.
Care to share a link to your licensed ML product, for those of us who're curious? Got any tips for people who are playing with the idea of doing something similar?
That's literally impossible. What level did you get hired at? I'm assuming at least 5 given your background/exp? That's more than 300k EZ unless you're talking about salary only.
Can anyone with industry hiring knowledge comment on this gap?
Maybe I’ve misunderstood but from HN / other online discussions it seems that employers in Data Science have been increasing education requirements to MSc, PhD, and so forth for many jobs that more appropriately require a strong statistical understanding combined with rigorous data processing skills and some imagination. Is this a real “skill gap” or an attempt to create artificial scarcity of hot jobs?
I think we are still in the wild west of the Data Science field, at least in the Midwestern U.S. I have seen job postings for Data scientists that are simply rebranded data analysts to postings looking for advanced math and programming ability.
I am suspicious that bootcamps and online courses will set individuals on to long term data science careers. I think there will be a large demand for individuals who understand basic data science principles such as the difference between supervised and unsupervised learning problems. I think these people will be given subscriptions to services that automate a lot of the actual data science work flow. They will use programs like Data Robot. However, I also think these individuals will be paid closer to what a data analyst is than what a software engineer gets.
I think that certain companies will always be looking for people who can actually do traditional data science work from scratch because the automated tools will not apply to their specific use cases. I think these individuals will probably generally have a master's or PhD.
I think right now the field is too new for managers to be able to tell the difference between what kind of Data Scientists they need. My experience has been a lot of companies need the former, but they think they need (or they want) the latter. I suspect in the coming decade these two roles will find different titles.
>I have seen job postings for Data scientists that are simply rebranded data analysts to postings looking for advanced math and programming ability.
Being as how "Data Science" is such a nebulous term, what is wrong with such a posting? I'd say that a data analyst with some advanced math and programming ability is a pretty rare skillset. Is it any surprise such a person is in high demand?
When looking for data science jobs it's nice to know from the prospective hire side that your skills will be used well. Coming into a job only to find out you're way overqualified can be a downer, particularly in a field where you need to keep your teeth sharp. I think the same goes for programming - imagine finding out on your first day you'll just be doing user testing and not writing any code.
IMO, as companies' data science capabilities mature, people with actual skills are needed.
5 years ago DS was hot and new. As a middle manager you would call up a consulting meat farm and ask for data science consultants: presto, you have a data science team. These people would, likely, be random BSc's with a SAS base certificate.
In 2018 data science, and data driven decision making, is still hot, but hiring managers know they need people with a formal education. So, education reqs go up and scarcity increases.
>but hiring managers know they need people with a formal education
It's the exact opposite: as better tools are developed, there is less "formal education" required. In the same way that Web Devs are in high demand, and nobody needs a Computer Science degree to do that work.
The reality is, "data driven decision making" is something businesses large and small are discovering they need. And often times that sort of analysis requires a regression, or maybe something slightly more sophisticated. What these businesses do not need is some cutting edge algorithm or complex AI pipeline.
There is a ton of gate-keeping in this space, as the the PhDs learn that domain experience and business acumen are more important than a deep knowledge of the algorithms. A significant amount of the work is cleaning and processing the data. This is the way of the world. At the end of the day, most business problems boil down to inference and prediction, and most times you don't need to be incredibly precise; the easy 80% will do.
Having looked through the contents of a couple of data science moocs they often try to give intuition without math.
When trying to put it into practice, if we go the slightest distance off piste we are lost without the math. So it's a fragile form of knowledge.
If we take someone who's already strong at math from some other training and teach them more stats/ML concepts and intuition, then great. But the value add of the mooc learning is quite low in that scenario.
The best alternatives (barring a formal course) imo, are text books that have exercises like Bayesian Data Analysis by Gelman.
There is no gap, actually. It is plain, old supply vs demand. Second and third class firms wanting the few Einsteins out there for peanuts, being flooded with CVs from kaggle noobs and mooc kids. Point is the latter are good enough for the jobs on offer but the former feel somewhat entitled to be choosy & stingy, even if they have no real business case to pursue.
I'm on a hiring committee for a job with the title of "Machine Learning Engineer". The most qualified candidates we have come through the process are physics PhDs who did a significant amount of coding during their PhD and have been able to segue into machine learning because they were interested.
Our other tier of hireable candidates have been individuals with 4+ years of industry experience, usually with CS PhDs with a machine learning specialty.
The rest of our team came from internal transfers, people who were less qualifed but proved that they would have a positive impact on the output of the team.
I'm a physicist, and got my degree in the early 90s, but maybe I can shed some light on this.
First, physics has been computation driven since the 1940s. When I was in grad school, programming was vital to my project. I wrote mountains of code, and also designed my own electronics.
Second, I noticed a weird difference between physics and engineering. Engineering students were given problems that were expected to be solved within a particular domain of engineering. A physics student might be given a problem with no idea of how to solve it, much less even how to define the problem itself clearly. That was my project. So a physics student could find themselves having to learn practically any technical skill.
Third, a matter of motivation. We knew that we would need to make ourselves employable. My project would occasionally fail in some spectacular way that would require me to learn more programming, or more electronics. Imagine that. ;-)
Fourth, possibly also as a matter of employability, math and physics people have always figured out how to worm our way into ill-defined, nascent areas of technology. These are areas that have not yet created a mainstream training pipeline, so we can plausibly claim to be as well trained as anybody. "Embedded systems" was such a field when I was finishing school.
Adding onto this, every physics PhD I've talked to has really impressed me with how much problem-solving they tackled during their stint in academia. You really said it succinctly with "a physics student could find themselves having to learn practically any technical skill".
Running and extending numerical simulations with supercomputer clusters is just something you have to learn to do in order to solve your problem in some cases, apparently.
Their ability to code is usually sub-par, but they ask the right questions about the data, which is critical in the data setting that we work in.
We don't work in advertising / marketing / business analytics, which is very easy to have an intuition about; and we don't strictly work with images, which, again, isn't terribly difficult to have an intuition about. So a strong scientific background is actually a major plus for us.
I could see that if we were doing purely deep learning image classification or advertising prediction then the physics degrees would be less useful, but thankfully we don't.
Probably the wrong fit if the industry experience isn't right, in which case they would be on a fast-track to a senior dev role in the non machine-learning part of our company.
You're right about "research", although I'd hesitate to use the word hardcore hah. We get tons of candidates with ~2 years of slightly relevant work, but rarely do we get candidates with the exactly right relevant work. The "qualified" candidates I spoke of have ~4 years of slightly relevant work, but there are only a couple of them, and they're either out of our hiring budget or currently working for our team.
In my experience, data science is turning massive amounts of potentially unstructured data into some sort of insights for business. I guess depending on the volume and structure of the data, and the level of insight required, you may need to be more or less qualified to do the job. A bachelors degree would be fine for most of the roles I’ve seen in places I’ve worked, but I’ve never worked anywhere that had google level demands.
I think getting insights out of massive amounts of unstructured data is a relatively new problem for most organisations, so that might explain the skill gap. Personally I also think the ubiquity of ORMs might also play a role, as they remove a lot of the day-to-day data handling that many engineers would otherwise have to do. But I might be completely wrong about that.
As the hiring manager for that kind of profile for more than ten years, the problem is rather simple: they want data scientist, but they don’t have good data. The position are still open but no one well-informed would apply.
Managers want to understand what’s happening in their company; they’ve been told data scientist can make that happen, or better, fix the problems automatically. They expect the solution to come without significant investment in data collection. One, even over-paid, data scientist is a far smaller budget than having your dozens of teams make an effort in fixing data collection, and setting up a reporting team. That kind of problem is easy to detect during the interview, if you’ve been there, and any data scientist has. What’s left are fresh graduates without diploma, who were taught statistics, but not how to get a project running -- so they don’t know what to say during an interview.
In more details:
Too much focus on stats. The actual skillset required should not be that demanding on statistical side: interviews generally focus on testing that exclusively. It takes the form of Whomever has any statistical background will grill on what they know best, their own MSc thesis. The thing is: no one can know the minutia of every model, certainly not a student’s interpretation of them, so those end up being a lot of “I would have to go back to the documentation.” Unless your interviewer is mature enough to either know that’s reasonable or stop doing that, it’s a lost cause. At best, you end up hiring a pile of people with the same focus, and every problem will look like a nail to a group of people carrying hammers. One alternative is to ask about something they did recently, but that’s rarely very instructive, either because the interviewer has no idea if that makes sense, or because the implementation has a lot of properties that make the whole process tedious.
The focus should be the ability to learn new techniques at speed. Because the vast majority of the time, the right model is:
a. a basic implementation of a better one, just to get things started (fix the data collection, the engineering of the input to the serving model, think about scaling the serving model, improve the actions taken from the decision, improve the decision format and the actual model objective)
b. a model that you haven’t heard about, and you need to learn about it within a calendar that is conditioned by a. not being sufficient.
Be more project-focused Description like “codes and does statistics better than either” fail to mention that the bulk of the time, a data scientist is actually more of a project manager, trying to identify blockers, get the integration of their tool prioritised, compare the impact and risk of different approaches; there’s also a big chunk (if not all of it) of being an analyst, generally to clarify what problem is being asked, or checking that the data is consistent. That’s if all is reasonably good: far too often, the time is spent trying to get management to grasp that Machine learning can only predict something, and that what to do with that, what to expose to the users, how, when, is more the remit of the product team i.e. you don’t have a real direction. Many large companies actively select against people willing to say that the kind is naked.
Some company test for “business sensitivity” but that too often means checking that aspiring data scientist can walk through a P&L. Reality is closer to: Can you listen to a project manager, frustratingly oblivious to his own contradiction and general ass-holery, rant for 45 minutes without strangling him? Because you need the support of his team to get through.
Fight for good data I have yet to see a company that had data you could use for data science on day 1. The vast majority of the time, it was inconsistent. Not a little bit, or hard to find: big, glaring, obvious contradictions like 20% of the revenue unaccounted for. If there are analysts, they generally have found a way around it, rather than address the issue. In more mature companies, there were some problem that would make basic model stumble, and more nuanced one go to very stupid conclusions. The classic case of a trip-up on good, but not well-structured: What drives retention? Completely failing to delivery your service: people will re-order within the hour, a larger ratio than any other case. Executives pay for the servers so they think that data quality if measured in peta-bytes. They get fed a continuous streamed of over-analysed data, filtered by hand from any inconsistency. They have no idea how inconsistent it really is. Any improvement is seen as a cost centre, rather than the ability for the company to be better informed.
Don’t get me started on technical debt, short-term focus, insensitivity.
What data scientists need, before they even join: clear objectives, clear breakdowns to understand what effects are correlated, clear plans to address some of the concern, pilot tests to make sure that those solutions would work, and collect some data, audited data pipeline with clear documentation to avoid misinterpretation, dedicated engineering to help with data logging, hosting computation, serving model. I am willing to bet that hardly any of those dozen of thousands of jobs looking for candidates don’t have most of that.
There is a small piece of good news: Agrawal & al. summarised their theory on how ML will make predictions cheaper in a all-access book, Prediction Machines. The predictable success of that book will help. Basically, it tells about what ML does, without ever doing any ML at all: it just looks at how it’s a lynchpin technology, like search engine, digital photo, etc.
>They expect the solution to come without significant investment in data collection. One, even over-paid, data scientist is a far smaller budget than having your dozens of teams make an effort in fixing data collection, and setting up a reporting team.
This x100. I've seen so many places where the budget for "data" is zero. This includes funds for paying for new data, budget for additional people/software to support data collection/cleaning, money to provide to existing data vendors/suppliers to provide better data, support from legal to re-negotiate contracts with existing data vendors who historically have only provided aggregated summaries, etc.
I've seen situations where a Data Scientist discovered that a data pipeline was broken reports using that data had errors for years. Rather than trying to fix the pipeline, the Data Scientist got in trouble.
I've had a few friends express interest in pivoting their domain knowledge + small programming knowledge into a Data Science job. Are there many success stories or guides out there for those who make such a transition? I went the formal route of getting the CS degree, so I never really thought about how you turn a non-CS degree into a CSy type job.
Most data scientists I know don't have CS degrees but instead a background in physics, biology, math, economics or statistics.
For a data science role focused around exploratory analysis, model building and idea generation (as opposed to a more engineering-focused data scientist) I'd say that a basic understanding of programming concepts and familiarity with the most essential tools (version control, Linux, Python/R) is probably enough to be accepted into a data science role. Data science degree programs are still quite novel so most companies don't expect to recruit people with such a degree, hence there are good opportunities for people with a STEM education I'd say.
I've actually seen many many job postings looking for people with a PhD in Data Science which is truly mindblowing because there are only 2-3 such programs in the Country and most of them just started so none of their students are even on the job market yet.
ummm you are probably wrong.
those job postings probably want people with PhDs in Computer Science/Statistics/Applied Math that did hard things with data.
I have a degree in mechanical engineering and 2 years later am a software engineer pivoting into a datascience/dataengineering role after building an nlp product for our business.
I only have my experience and don't know how repeatable it is, but I just got here by building stuff that I thought was interesting.
Right at the end of college I tried to build a connected hardware/data harvesting startup, and it failed because I was a naive idiot who didn't understand just how hard the financials of building a hardware company were (all of the money up front for manufacturing runs) and how much time I would have to spend on things that I wasn't good at or really interested in, like building out supply chain and design for manufacturing or dealing with radio emission problems.
But over that time I collected a couple of relevant awards (decently sized pitch contests, won at a mlh hackathon I did for fun etc.), built interesting prototypes of the hardware that worked and a web platform that supported it that people thought was interesting and that all worked as a hedge to show that I can at least build interesting things. So when I looked a recruiter came to me with a software engineering job at a startup that I thought was novel and interesting and I ended up taking it.
Once I was there I saw a problem with a lot of labor going into dealing with relations between a bunch of types of texts, so I built a preliminary thing to automate as much of that as possible using relatively basic clustering and putting a person in the loop to collect labels so that I can come back and try something supervised later. The CEO brought on a chief data scientist to help me fix everything that I messed up and build a more scalable deployment pipeline and architecture for datascience microservices and now I work with him.
But yeah, my experience is just that I built things and tried to make sure they worked and were sane, and then just showed people those things to convince them to let me do that for other things I wanted to do. I have a couple friends who did relatively similar things with pretty similar results.
Any time I see businesses complaining about shortage of skilled labor, I read it to mean they aren't willing to pay enough to attract the skilled labor they want.
The formula is simple. Businesses are in business to return as much money to investors as possible. IE: they're all about the money. Employees, more and more, are saying, "So am I. You want yours, so I want mine." Pay more and you'll attract more talent. Simple.
In my experience, the title 'Data Scientist' has become to mean data analyst at most companies, meaning working with Excel, Tableau, and SQL. Maybe R if you're lucky.
Companies doing ML/AI will usually have a small team of Research Scientists who mostly hold PhDs, and a team of supporting ML Engineers.
It’s even worse at established companies because data scientists still only do the data plumbing and simplistic analytics tasks, but not due to anything reasonable, like the “many hats” needs of a startup, but instead because of IT dysfunctiom and bureaucracy.
Think it depends but it does happen. We’ve been able to carve ourselves out of IT with our own infrastructure and our data scientists specifically focus on research and analysis. The only issue is when we try to get anything to prod and we hit IT.
> “The only issue is when we try to get anything to prod and we hit IT.”
Which inevitably means the one person on the data science team who is good with linux and docker suddenly becomes the IT wizard, and their time gets sucked up by having to find ways to go around utterly stupid barriers and tactics used by IT to avoid doing work to help you.
Used to be this person at a previous company. I didn't run molecular dynamics simulations on supercomputers so I could explain to the CISO why a machine learning library isn't a virus.
Would love to get your thoughts some time - I'm building a product to make life a little bit easier (3 step deploy model as API) but with a vision towards more broaded deployment usecases. Your advice will be valuable and much appreciated!
As someone who is about to be tasked with building this sort of infrastructure for a hospital system what would be your dream architecture? I hate IT hurdles and I am hoping I can avoid building them into our infrastructure.
The number one thing is to make containers a first class deployment and provisioning artifact. As long as dev teams can control their own containers, they will be able to do what they need no matter how arbitrary or assumption-breaking.
Do not ever require dev teams to go through IT to get their chosen tools deployed to the right places or with the right resources provisioned. Never.
This is the root of all evil with infra teams: if they see themselves or their mandate as being gatekeepers of provisioned resources, then dev teams have lost and you as a data scientist / ML engineer, you’ll never get your work done.
As an ML engineer, I want to define the entire runtime and development environments of any analytics artifacts or web services that I create, and to change these environments as needed, as indicated by what’s required to get the job done.
Let me define containers, hook them up in whatever CI tools are used, push and pull them from some internal container repository, and describe the configuration for the resources they need. Offer that as the contract to dev teams and then infra’s job is to maintain the underlying data center that physically supports running the containers and occasional hand holding for special exceptions, networking, secrets management, and cost tracking.
Don't know why you're getting downvoted for this - we have the same pain. We're not allowed to have a sysadmin or administer our own servers but IT is busy supporting thousands of non-computery-users. To them we're a fly in the ointment.
As someone who has been producing value in a data science/machine learning role for multiple years, it's disheartening to see comments that I may be blacklisted from positions due to "only" having a bachelor's degree.
Somewhat non-humbly, I was valedictorian at my high school, I triple-majored at a respectable Big 10 school, I actively use all 3 majors on a daily basis, in a foreign country, and sometimes in a language that is not my mother tongue (as an American).
I can't justify spending time and money on a master's degree (millennial wealth problems) where many courses would just be putting a formal, academic spin on ideas that I'm familiar with from a practical business-value-producing point of view.
Any advice on how I can effectively jump off the black-lists?
If your target is a data scientist role at Google, you're probably going to want more schooling.
But if you've been producing value in a DS/ML role for years, you have experience, which is even more rare than some of the qualifications people are listing here.
If you can say "I created an anomaly detection system using isolation forests that 5,000 clients relied on for detecting market changes", there will always be places that want your skillset: it is kind of a new field, after all.
So the credential bloat at entry-level shouldn't really be an issue for you.
Maybe a bit (a lot?) out there: why not go for a business degree instead, e.g. EMBA?
You mention “foreign country” so I’m guessing you can have access to good curriculums without paying the US premium on education (or just go for an online course).
Pros: the time and money is not “wasted” as you actually pick up new skills; the MBA card should be enough to trump any education requirement; and you become that most desirable of hybrids: the tech/data guy who can talk business (or vice versa).
Cons: significant time (and money) investment; doesn’t help you get expert DS jobs (you’d be aiming for team/program manager, consultant, etc)
My own experience: completed an EMBA in 2017. Ranked in FT’s top 10, the program cost was around 50 k€ (it’s increased a bit since) and I was able to get 25 k€ of outside funding. The program I followed lasts 2.5 years, meaning I was able to do it while keeping my job (and having a kid) without losing my sanity or my wife. Landed my dream job just before completing the curriculum for a nice 40% pay increase (not saying the EMBA alone had that effect, far from it —but it definitely helped).
Huh. This is actually a route I hadn't considered. Thanks for pointing it out! I will definitely consider it as I continue researching my next opportunities...
EMBAs are usually part-time over 18 to 24 months, MBAs are full time over 12 to 24 months.
So an MBA naturally has a bigger (as in more in-depth) curriculum than an EMBA. It is also a significantly higher investment in terms of time and opportunity cost, since you're not getting paid during the program.
EMBAs compensate with 1) more experienced participants (so in theory you don't need the introductory classes) and 2) a lot of pre- or post-readings (e.g. my corporate law module was 12 hours in the classroom, but you were expected to have read the 400-page book, and the numerous case studies).
But the bottom line is that you don't go into as much detail as you would during a full-time program. OTOH, since EMBAs are attended by "senior" employees (managers / VPs / directors / etc), and because they're part-time, what you learn is usually directly relevant and applicable in everyday work - and you usually get to work on real-life problems (yours or your teammates) during classes.
I'm not the best placed to say whether an EMBA is considered a "lesser" MBA. They don't really fill the same niche. An EMBA is a career booster if you're say a technical manager and want to move into business or senior management. An MBA is when you haven't started working (or are still junior) and are looking for a fast track to C-level, or to work in a specific area (e.g. consulting, finance, etc.). So basically MBA vs EMBA is mainly a function of your current experience level.
I had a dream of wanting to start a company one day and was interested in a more holistic understanding of the business side of things. I thought that it would add some credibility when speaking with business types but also uncover ideas to base the company on.
Personally, for me, I found it useful but not for the reasons of knowledge. The knowledge was good but the program helped to sharpen my speaking and thinking skills. It also broadened my mind to different perspectives - that the tech world that I come from is quite different to people outside of industry.
The classes and sessions have also made me think further and deeper about business, culture and management beyond the usual.
It also made me more disciplined - focus on the business not the product and technology. Used to waste countless hours building, researching with not much to show for.
It’s also given some confidence to speak to business types and connect with them at a deeper level while introducing technology to them.
You can most definitely fallback as an engineer but why are you thinking of an MBA in the first place? What’s your goal for pursuing one?
The reason I’m considering it is because I’m trying to envision myself in 10-20 years and thinking who I would be happy to be.
Right now, I don’t believe what would make me happy is to be a principal/staff engineer somewhere necessarily.
Don’t get me wrong, I love programming, but I see it as me getting paid to solve problems, and not getting paid to write good code, and I think there are other ways to solve those problems. For example, I think the biggest problems in my organization are managerial and organizational rather than technical, and I feel like the type of training that would come with an MBA can help one solve those issues, including communication, planning, product validation, people management, etc.
That said, my entire reporting chain up to and including the CEO doesn’t have an MBA, so it’s not like it’s a prerequisite.
The other path I’m considering is an MS in CS/SE because while I’ve been an engineer for a few years, my undergrad is in Mathematics, and I’m worried it’ll be a limiter later on to not have a CS degree, but also only a BS.
First off, kudos to you for looking ahead. It’s one of the things that’s really “scared” me into action. It’s funny because I went through the same thought process as you are going through now.
I was a decent enough software engineer and while I love to build things, I felt there was more.
Almost all company problems tend to be managerial and organisational rather than technical, which is a fascinating discussion in my MBA classes. You will definitely get the space and time to think about these things and your theories on how to solve them.
However, I will say that to solve those problems, you have everything you need today. Your knowledge, wisdom and experience can help guide you but fundamentally, these problems touch the aspect of humans behaviour. A great book is “How to win friends and influence people” - I’m recommending it not to influence anyone but it’s a good eye opener to human behaviour. The MBA will help with theories but it’s not fact of course.
I’ve also thought about an MS in CS for fears of being limited in the future as well but will respond back when I have more time. Or if you’d like we can have a chat over Skype or email.
Besides knowledge, I gained a lot of insight about myself, and also learned quite a bit about entrepreneurship (more in terms of state of mind than actual knowledge). And made a lot of connections and friends in a lot of different fields (e.g. a vet, an airline pilot, a board member in a very large corporation, a few startup founders, a tax attorney...).
Ironically, the EMBA also convinced me that I'm much happier in technical roles, and gave me the confidence to go for it. I'm currently the CTO of a startup and most of what I learned during the EMBA is of marginal usefulness to me, except the stuff about entrepreneurship. But I wouldn't be in my current job without the EMBA.
Agreed - I wouldn't be where I am today without it either.
I think the biggest misconception about an MBA is the actual textbook education but I've heard many CEO/CTOs make the same statement - helped them decide their career path and boost their confidence plus made a lot of friends across many different industries.
The UG online masters of computer science is a good option. Many people in this site speak highly of the curriculum[1]. Last time I checked the total for the entire curricula if you pass every course on the first try is around $7000. As a bachelor holder myself looking to break into some of these higher-salary and in-depth roles I’m certainly considering it. It’s even better if you have a company that will pay for it, and because it’s so cheap even the most meager offerings from companies will cover a good portion of it.
Yeah I'm definitely familiar with the Georgia Tech offering. But my impression is that it will be a $7000 + $(my_hourly_rate) * (hours_spent) certificate that will only get me past the employers who have a "Select your highest degree level" drop-down on their application form. Is it that much more respected than a collection of MOOCs?
As someone who's also involved in hiring, if I see no industry experience + GA tech online degree it's still on an entirely separate tier than 2 years of industry experience. But that's my bias I suppose, and part of the reason that I'm not the only one on the hiring committee.
Definitely biased, but here are some anecdotes/thoughts:
- I've found the rigor to be significantly more than most MOOCs, inline with other traditional grad courses I've taken.
- Some classes are hybrid, sharing the term with on-campus students.
- Not having finished the degree, my work as a data scientist has significantly benefited from the coursework. This is not to say that it wouldn't have benefited from other, non-GT coursework.
That's rather sad considering the degree isn't going to say Online and Georgia Tech is top 10 in the world in Computer Science and online or not that credential carries real weight for people in the know.
Well I'm not the one who does resume sorting, so usually if it makes it to my desk I know there's a good chance they're qualified. If you give me (personally) two resumes, one with a master's degree from almost anywhere, and one with two years of experience doing /exactly/ what we do, I'll choose the latter first. At this point, master's degrees don't carry the weight they used to in my mind based on (1) people I've interviewed (2) my coworkers and (3) my friends and acquiantances.
>Any advice on how I can effectively jump off the black-lists?
Find a decent hiring manager who has actually done some hands on Data Science/Analytics work and knows what skills/thinking are actually required. Lots of Data Science hiring managers have no or limited practical experience with doing actual Analytic work so they get overly focused on paper qualifications and buzzwords. This is reinforced by HR people who love buzzword bingo.
I created a vehicle plate recognition system before ML got cool, but I can't get any ML job with my less than bachelor degree (associate? dunno how to translate) here in Brazil. I think there are only 2 positions open for machine learning less than 100km from where I live.
>As someone who has been producing value in a data science/machine learning role for multiple years, it's disheartening to see comments that I may be blacklisted from positions due to "only" having a bachelor's degree.
Don't worry.
The big salaries will go to people who create value and solve problems. You can do that without a PhD. In fact, if most Data Science communities are representative, PhDs feel they're above 90% of the work required to put data to work to solve problems. You know, the ones who walk into a job and say, "Oh, I don't get to apply the latest algorithm onto a perfectly cleaned toy data set? I'm leaving!". They're going to have their lunch eaten.
I come from the background you describe, have lots of friends from the same background, and my experience has been the opposite. Most people know that data is messy business and that as data scientists we will often serve more as engineers in our day to day work.
Hah. I want to feel this is right. I like to think of my work as the "full-stack" equivalent of the "data science" career path. There's no part of the data pipeline I'm not currently doing/qualifed to do/interested in doing: acquisition, transformation, storage, exploration, analysis, machine learning, presentation & dashboarding, integration, server maintenance & operations...
The "toy examples" require only a very small subset of the skills required to extract business value from an amorphous blob of data.
I view modern AI as a combination of things that are too early and too late - data science hasn't changed much since the 1990s, but people have a misconception that there is still a lot more to do in data science because of the undeserved excitement around deep learning.
Perhaps I've misunderstood your desire to be sarcastic, but I think your timeline is off. To give two concrete examples, Geman and Geman's "Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images" paper is from 1984 and Holland's "Statistics and Causal Inference" paper is from 1986.
You don't need to have a CS degree to be in "Data Scientist". In fact, I would argue that you need a statistics degree. There is a difference between a data scientist and an engineer.
1) They often lack experience with messy (i.e. real) data
2) They sometimes lack experience with programming.
Data science, as currently practiced requires some CS, some stats (with a focus on experimental design/observational analysis) and communication skills.
These skills are rarely taught together in undergrad, but lots of PhD's acquire them by accident, hence the propensity of PhD's to end up in these roles.
I was hired as a data scientist for a large company that thought they needed data scientists. They didn't and I added no value to the company. I left after about a year. If there are many people like me then that might explain the shortage.
How do you see the chances of someone without a degree getting an entry-level, low-paid, remote job in this space? What would be things that could help out? Certifications? Nano-degrees like Udacitys? A portfolio of public work?
It's highly unlikely. There is a lot of competition for entry-level jobs and not many remote positions available. The barriers to entry are exactly academic qualifications over certifications and portfolios.
That’s why we usually have a team that complements each others’ skill set. The data scientist while great to have everything does not necessarily need to have everything. I tend to focus on their core strengths and depending on what their growth/career aspirations are, help to develop the skills. If it’s soft skills, happy to help them develop that edge.
I'm teaching stats and scientific computing at university. I thought that data science might be a good exit plan in case I get feed up with academia. However, even though data scientists seem to be highly sought after in Germany (judging from what I see on linkedin), the salaries are fairly disappointing.
Ah yes. A shortage. Someone didn’t do their planning. Bubble bubble tool and trouble. Or is this about more visas? And paying people less.
Seriously though, what exactly is a data scientist? What is the requisite skill set? Is this masters level entry or would my feline require a Pointy Headed Degree upgrade to play this game?
Masters and PhD are required for particular employers. Others not so much but there are caveats.
High probability of being the “data” guy who is then made to be the database admin.
I'm available for the right data science gig. Somewhat unconventional CV available on request. I'll try to bring my competent but intellectually foggy business analyst colleague along for the ride if you like.
I find "data science" a cringey expression and I'm not looking for a job currently, so my LinkedIn emphasizes my strengths according to my ego/self-appraisal, not a "desperate" reaching out for fashionable job titles.
(1) If you're going to hop on the 'hot' AI/ML/DS bandwagon, make sure that you actually have 'work' for the highly coveted person you are trying to hire. 'Everyone else is doing it', 'all our clients have data, so selling them data-sciency projects shoudn't be a problem', 'Hire a first DS person, to have them start a brand new DS division' are things that probably are doomed to fail. If you weren't selling classic Data Analyses/Intelligence services before, chances are you are lacking the foundation in your sales and support staff, and maybe even your customer type, to be able to embed this into your company and into a new line of business. The risk your prize new employee will walk to greener horizons to work with peers on exiting real projects is high.
(2) Know where you are heading. If you just want to embellish your 'normal' projects with some gimmicks from the online 'Cognitive Services' api's offered by the various vendors, then yes, sending some of your regular staff on a quick 'boot-camp' style course will probably work. You don't need a 'data-scientist' for that. And if you hire a real one for that type of work he'll probably walk within 2 weeks.