I recently started interviewing ML Engineers for my company. In general I'm quite surprised by the lack of knowledge of people applying for the job. People seem to have several misconceptions, very surface knowledge, and lack even the fundamentals.
That made me question if I my expectations are set right. Is it possible that working in the field every day I expect candidates to know way more than it's reasonable? I'm not sure, and I don't feel we have a solid way to deal with that.
I consider myself a decent ML expert, but the field is so vast that I think I could be tripped up in an interview rather easily. Plus I tend to get a bit rusty with the basics that I never use for anything.
Same experience hiring Junior Data Scientist, it's a shitshow out there. The level is so low that I had to hire a "old school" statistician with a senior position
These fields are relatively new. There doesn’t seem to be a clear path to break into them.
Personally, I believe Kaggle is one of the ways to slowly gain some practical experience: https://www.kaggle.com/
However, I’m not sure if it’s sufficient.
Recently, I’ve been taking a deeper dive into studying various types of competitions. For example, I’ve created a repo where I’m organizing notebooks, etc for a regression competition:
The field is new yes but before covid the recent graduate with even 1 year of experience were so good compared to now. The real problem is that with remote work you can't really learn something if you're a fresh graduate. I had my worst interns during the last two years, and I don't think that they were stupid. The real problem is that they can't really learn in remote when I don't have the time to teach them. When we were in the office they would jsut follow in every meeting, pair coding and so on. I love remote work, for me it's a bless, I exploit wvery single minute of my day working, but for interns and juniors is a nightmare they are not going to learn anything from me or from others.
I once interviewed a ML engineer candidate who had two PhDs in ML who couldn't properly identify regression vs classification problems.
> and lack even the fundamentals
I don't think there is a strong consensus on what the fundamentals are. I've also noticed that the fundamentals differ remarkably between people who think of themselves as "data scientists" vs those who think of themselves as "machine learning practitioners".
Multiple PhDs in the same field is a big red flag. A PhD is to teach you how to do research, so if you get to the end and need another one, you've failed.
In my opinion (MSc math, PhD in applied statistics, currently postdoc at epidemiology department), the fundamentals are books like The Elements of Statistical Learning: https://hastie.su.domains/Papers/ESLII.pdf
I feel like the term data science is completely useless. Machine learning is an approach to "do AI" through statistics. Specifically, it is a branch of statistics where the sole focus is on prediction, compared to e.g. inference.
The topics in that book are good typical "study material" but; 1) we can easily make anyone fail an interview by asking them to derive algorithms or formulas from there, and 2) all these topics predate deep learning, and everything that comes with it.
I agree these are good "filtering" topics, among many others.
I find these definitions funny…. how statisticians vs computer scientists define machine learning differently. You get two different perspectives. Everyone wants to claim AI for themselves. I think the statisticians are pissed off that DNNs works so well.
Anyway least we forget: Neural networks came from cybernetics!
I never claimed that everyone wants to claim AI for themselves.
Artificial intelligence is ofcourse a scientific field on it's own right, even before machine learning was a thing. I'm just saying that AI scientists have used concepts from statistics to create an approach to AI called machine learning. I'm not saying that ML is a subset of statistics, mind you, but the statistical underpinnings of it definitely are. ML is not _just_ statistics too.
Moreover, why would statisticians be pissed about the efficacy of a model?
Firstly, many problems/questions that I work on are not concerned with prediction.
Secondly, even if I did, I would love to use DNNs. It's just that I never have a use for it considering I'm only looking at tabular data. Why bother with DNNs when, say, a random forest will do?
I'd include basic stats before I'd include linear algebra.
You should know it's used in deep learning but unless you need to include some math in a research paper to impress the reviewers (true story) you never need it.
What exactly do you ask an ML emgineer? In my experience if we ask 10 people of the scope of work pf an ML engineer we'll get 10 different answers 9 of which will be all inclusive aka Data Scientist that can also do robust production systems.
I ask for the actual fundamental skills in the job ad. Say 5 skills for junior, 15 skills for senior, organizing other people (incl clients) for a manager
This is a fairly new role in our company. Officially it's described as 50% data scientist and 50% developer, but the loop composition is the same as developer but replacing one of the coding rounds for an ML round.
I'm still experimenting with the format, but I do a mix of asking theoretical ML questions, prior experience with ML, and designing a system to solve a business problem using ML.
I tend to notice that model evaluation is lacking, perhaps because it’s not especially interesting.
But to me, most business applied ML falls under the optimization umbrella. For some reason it’s never portrayed this way, but perhaps if it were, junior practitioners would more commonly pay attention to learning to thoroughly examine how their trained models will perform.
I don't know what type of ML you do, but my experience is that it is hard to pin point poor model evaluation as being the reason for poor performance in production (especially if you have an interest in protecting your ego).
It can be hard enough to figure out if you even have poor performance in production as you often don't know how well you could/should be doing.
I have recently started thinking about what part of the performance issues I have are due to poor evaluation vs under-specification. I'm not sure how I'm supposed to tease the two apart.
Last year I nerded out on model accuracy and it was great fun. I would love to do this more at work but unless I fall into it in an existing role I usually am not considered for data science roles because I don't have 7 Ph.Ds. Simple accuracy, R2 for classification, ROC curves, calibration, Brier Skill scores. Good times.
The best book I found at the time was Kuhn & Johnson. If anyone can recommend a better book I'd love to hear it? (Examples in R or Python, it doesn't matter.)
ML is the "hot" field right now, like "big data" was a couple years ago, it attracts a vast amount of people who are interested in only money (and a huge subset of those people aren't really that interested in "knowing")
The most baffling example is a candidate who admitted they didn't expect ML specific questions and hence hadn't prepared. I spent a minute figuring out if we were interviewing the right candidate.
My expectations are always evolving since this is a new role for us. The current guidelines are that candidates should have broad knowledge of ML fundamentals. We also work through a design challenge together where a candidate solves a business problem using ML.
I'm still figuring out the best ways to evaluate these.
What kind of preparation do you expect, when you say ML specific questions or broad knowledge of ML fundamentals, is it doing linear algebra on the fly or just knowing high level topics like bias-variance, regularization, neural nets, k-means clustering?
The way I like to explain it, which is how I have seen it explained several times, is that all AI can be reduced to search/optimization. ML is just applying the search over the function that will search for the final answer over a dataset (either generated on the fly or prepared beforehand). For neural networks the hypothesis space (all the solutions you are searching through to find the best ones) is the weights for the neural network, and your search strategy/optimization is (usually) backpropagation. If you translate the weights to something traversable by other algorithms they could do just "fine" (assuming infinite time and space) in it's place. It really opens the mind up for experimentation on every bit of the process. The book that really hammered it in for me was Intelligence Emerging by Keith Downing, short, great book on bio-inspired AI.
Both Tom M. Mitchell's "Machine Learning" as well as Russel & Norvig's "Artificial Intelligence: A Modern Approach" define the whole process from propagating the input until you have an output, calculate the gradient and update the weights.
[4] Goodfellow, Bengio & Courville 2016, p. 200,
"The term back-propagation is often misunderstood as meaning the whole learning algorithm for multilayer neural networks. Backpropagation refers only to the method for computing the gradient, while other algorithms, such as stochastic gradient descent, is used to perform learning using this gradient."
As good as Ian Goodfellow's work has been, I think I have to disagree with him on what is part of the back propagation algorithm. Rumelhart et. al 1986 also describe it is a single algorithm from start to end. It is true you can use other things than stochastic gradient descent though, Mitchell also points this out by saying their algorithm was a backpropagation example using gradient descent.
A good run-down of the different algorithms: https://ruder.io/optimizing-gradient-descent/
I think people favour adaptive learning rate options like Adam in practice since they generally do seem to perform well, and are often less sensitive to initial conditions and the exact hyper-parameters used. There will always be people who like to test N optimizers with parameter sweeps to squeeze a tiny bit of extra performance out, but for the rest of us the default Adam or AdamW options and good, unobjectionable choices :)
It’s really hard to compare optimizers. Common architectures and default hyperparameters were discovered alongside Adam so you’d have to redo a bunch of sweeps if you wanted a “fair” comparison. In practice this doesn’t really matter and everyone just uses Adam. If you had infinite compute, you’d try every combo and select the one with the best results.
Adam was very effective when it got introduced so it was widely adopted. Since then only models that work well with Adam have made it from the idea stage to actually working. I think there's reason to believe we have over fit our model architectures to our loss functions and optimizers.
It’s a pleasant surprise to see this shared here, I am the author of this piece. Honestly I wrote this post for myself more than anything else. I also find that my knowledge in a lot of areas is very “surface level.” It’s really easy to regurgitate definitions, but it’s definitely harder to get to the core of those ideas. I hope you enjoyed!
I recently started interviewing ML Engineers for my company. In general I'm quite surprised by the lack of knowledge of people applying for the job. People seem to have several misconceptions, very surface knowledge, and lack even the fundamentals.
That made me question if I my expectations are set right. Is it possible that working in the field every day I expect candidates to know way more than it's reasonable? I'm not sure, and I don't feel we have a solid way to deal with that.