What Is Machine Learning Anyway?

angarg12 · on July 7, 2022

Little offtopic about the curse of knowledge.

I recently started interviewing ML Engineers for my company. In general I'm quite surprised by the lack of knowledge of people applying for the job. People seem to have several misconceptions, very surface knowledge, and lack even the fundamentals.

That made me question if I my expectations are set right. Is it possible that working in the field every day I expect candidates to know way more than it's reasonable? I'm not sure, and I don't feel we have a solid way to deal with that.

cjbgkagh · on July 7, 2022

I consider myself a decent ML expert, but the field is so vast that I think I could be tripped up in an interview rather easily. Plus I tend to get a bit rusty with the basics that I never use for anything.

anton_ai · on July 7, 2022

Same experience hiring Junior Data Scientist, it's a shitshow out there. The level is so low that I had to hire a "old school" statistician with a senior position

melling · on July 7, 2022

These fields are relatively new. There doesn’t seem to be a clear path to break into them.

Personally, I believe Kaggle is one of the ways to slowly gain some practical experience: https://www.kaggle.com/

However, I’m not sure if it’s sufficient.

Recently, I’ve been taking a deeper dive into studying various types of competitions. For example, I’ve created a repo where I’m organizing notebooks, etc for a regression competition:

https://github.com/melling/ml-regression

I’m creating others for classification, nlp, vision, etc

Of course, the self-study method means people have knowledge gaps because there’s no syllabus tailored for an interview

anton_ai · on July 7, 2022

The field is new yes but before covid the recent graduate with even 1 year of experience were so good compared to now. The real problem is that with remote work you can't really learn something if you're a fresh graduate. I had my worst interns during the last two years, and I don't think that they were stupid. The real problem is that they can't really learn in remote when I don't have the time to teach them. When we were in the office they would jsut follow in every meeting, pair coding and so on. I love remote work, for me it's a bless, I exploit wvery single minute of my day working, but for interns and juniors is a nightmare they are not going to learn anything from me or from others.

nl · on July 7, 2022

I once interviewed a ML engineer candidate who had two PhDs in ML who couldn't properly identify regression vs classification problems.

> and lack even the fundamentals

I don't think there is a strong consensus on what the fundamentals are. I've also noticed that the fundamentals differ remarkably between people who think of themselves as "data scientists" vs those who think of themselves as "machine learning practitioners".

mkl · on July 7, 2022

Multiple PhDs in the same field is a big red flag. A PhD is to teach you how to do research, so if you get to the end and need another one, you've failed.

Something1234 · on July 7, 2022

I've never understood why someone would get a 2nd masters or PhD, besides a love of learning.

neodypsis · on July 7, 2022

In your opinion, what are some examples of ML fundamentals? Would you include intermediate linear algebra?

qqqwwweeerrr · on July 7, 2022

In my opinion (MSc math, PhD in applied statistics, currently postdoc at epidemiology department), the fundamentals are books like The Elements of Statistical Learning: https://hastie.su.domains/Papers/ESLII.pdf

I feel like the term data science is completely useless. Machine learning is an approach to "do AI" through statistics. Specifically, it is a branch of statistics where the sole focus is on prediction, compared to e.g. inference.

yobbo · on July 7, 2022

The topics in that book are good typical "study material" but; 1) we can easily make anyone fail an interview by asking them to derive algorithms or formulas from there, and 2) all these topics predate deep learning, and everything that comes with it.

I agree these are good "filtering" topics, among many others.

qqqwwweeerrr · on July 7, 2022

The subject matter in the book is more general than focussing on a specific models; the book is about the statistical underpinnings of ML as a whole.

thrown321 · on July 7, 2022

I find these definitions funny…. how statisticians vs computer scientists define machine learning differently. You get two different perspectives. Everyone wants to claim AI for themselves. I think the statisticians are pissed off that DNNs works so well.

Anyway least we forget: Neural networks came from cybernetics!

qqqwwweeerrr · on July 7, 2022

I never claimed that everyone wants to claim AI for themselves.

Artificial intelligence is ofcourse a scientific field on it's own right, even before machine learning was a thing. I'm just saying that AI scientists have used concepts from statistics to create an approach to AI called machine learning. I'm not saying that ML is a subset of statistics, mind you, but the statistical underpinnings of it definitely are. ML is not _just_ statistics too.

Moreover, why would statisticians be pissed about the efficacy of a model?

Firstly, many problems/questions that I work on are not concerned with prediction.

Secondly, even if I did, I would love to use DNNs. It's just that I never have a use for it considering I'm only looking at tabular data. Why bother with DNNs when, say, a random forest will do?

vadertemp · on July 7, 2022

This is exactly how I tend to explain ML/ AI simply to people. Its mostly stats paired up with Linear Algebra, Calculus and Comp Science.

neodypsis · on July 7, 2022

Thanks, will check that book out!

nl · on July 7, 2022

I'd include basic stats before I'd include linear algebra.

You should know it's used in deep learning but unless you need to include some math in a research paper to impress the reviewers (true story) you never need it.

antman · on July 7, 2022

What exactly do you ask an ML emgineer? In my experience if we ask 10 people of the scope of work pf an ML engineer we'll get 10 different answers 9 of which will be all inclusive aka Data Scientist that can also do robust production systems.

I ask for the actual fundamental skills in the job ad. Say 5 skills for junior, 15 skills for senior, organizing other people (incl clients) for a manager

angarg12 · on July 11, 2022

This is a fairly new role in our company. Officially it's described as 50% data scientist and 50% developer, but the loop composition is the same as developer but replacing one of the coding rounds for an ML round.

I'm still experimenting with the format, but I do a mix of asking theoretical ML questions, prior experience with ML, and designing a system to solve a business problem using ML.

tehsauce · on July 7, 2022

Curious, any specific examples of missing knowledge or misconceptions?

deepsquirrelnet · on July 7, 2022

I tend to notice that model evaluation is lacking, perhaps because it’s not especially interesting.

But to me, most business applied ML falls under the optimization umbrella. For some reason it’s never portrayed this way, but perhaps if it were, junior practitioners would more commonly pay attention to learning to thoroughly examine how their trained models will perform.

locuscoeruleus · on July 7, 2022

I don't know what type of ML you do, but my experience is that it is hard to pin point poor model evaluation as being the reason for poor performance in production (especially if you have an interest in protecting your ego). It can be hard enough to figure out if you even have poor performance in production as you often don't know how well you could/should be doing. I have recently started thinking about what part of the performance issues I have are due to poor evaluation vs under-specification. I'm not sure how I'm supposed to tease the two apart.

mrslave · on July 7, 2022

Last year I nerded out on model accuracy and it was great fun. I would love to do this more at work but unless I fall into it in an existing role I usually am not considered for data science roles because I don't have 7 Ph.Ds. Simple accuracy, R2 for classification, ROC curves, calibration, Brier Skill scores. Good times.

The best book I found at the time was Kuhn & Johnson. If anyone can recommend a better book I'd love to hear it? (Examples in R or Python, it doesn't matter.)

angarg12 · on July 11, 2022

Misconceptions: not knowing regression vs classification, supervised vs unsupervised, or thinking that ML is just neural networks.

strikelaserclaw · on July 7, 2022

ML is the "hot" field right now, like "big data" was a couple years ago, it attracts a vast amount of people who are interested in only money (and a huge subset of those people aren't really that interested in "knowing")

Silica6149 · on July 7, 2022

Were they mostly new graduates, or did they have some years of experience already?

angarg12 · on July 11, 2022

Both juniors and mid levels.

arolihas · on July 7, 2022

Do you have any examples? What are your expectations?

angarg12 · on July 11, 2022

The most baffling example is a candidate who admitted they didn't expect ML specific questions and hence hadn't prepared. I spent a minute figuring out if we were interviewing the right candidate.

My expectations are always evolving since this is a new role for us. The current guidelines are that candidates should have broad knowledge of ML fundamentals. We also work through a design challenge together where a candidate solves a business problem using ML.

I'm still figuring out the best ways to evaluate these.

arolihas · on July 14, 2022

What kind of preparation do you expect, when you say ML specific questions or broad knowledge of ML fundamentals, is it doing linear algebra on the fly or just knowing high level topics like bias-variance, regularization, neural nets, k-means clustering?

imaltont · on July 7, 2022

The way I like to explain it, which is how I have seen it explained several times, is that all AI can be reduced to search/optimization. ML is just applying the search over the function that will search for the final answer over a dataset (either generated on the fly or prepared beforehand). For neural networks the hypothesis space (all the solutions you are searching through to find the best ones) is the weights for the neural network, and your search strategy/optimization is (usually) backpropagation. If you translate the weights to something traversable by other algorithms they could do just "fine" (assuming infinite time and space) in it's place. It really opens the mind up for experimentation on every bit of the process. The book that really hammered it in for me was Intelligence Emerging by Keith Downing, short, great book on bio-inspired AI.

tehsauce · on July 7, 2022

The search strategy/optimizer is actually gradient descent, backpropagation is just an efficient way to compute gradients.

imaltont · on July 7, 2022

Both Tom M. Mitchell's "Machine Learning" as well as Russel & Norvig's "Artificial Intelligence: A Modern Approach" define the whole process from propagating the input until you have an output, calculate the gradient and update the weights.

tehsauce · on July 7, 2022

[4] Goodfellow, Bengio & Courville 2016, p. 200, "The term back-propagation is often misunderstood as meaning the whole learning algorithm for multilayer neural networks. Backpropagation refers only to the method for computing the gradient, while other algorithms, such as stochastic gradient descent, is used to perform learning using this gradient."

imaltont · on July 7, 2022

As good as Ian Goodfellow's work has been, I think I have to disagree with him on what is part of the back propagation algorithm. Rumelhart et. al 1986 also describe it is a single algorithm from start to end. It is true you can use other things than stochastic gradient descent though, Mitchell also points this out by saying their algorithm was a backpropagation example using gradient descent.

amelius · on July 7, 2022

Can anyone explain: why is the Adam optimizer so unreasonably effective? Is anyone even using a different optimizer anymore?

Yenrabbit · on July 7, 2022

A good run-down of the different algorithms: https://ruder.io/optimizing-gradient-descent/ I think people favour adaptive learning rate options like Adam in practice since they generally do seem to perform well, and are often less sensitive to initial conditions and the exact hyper-parameters used. There will always be people who like to test N optimizers with parameter sweeps to squeeze a tiny bit of extra performance out, but for the rest of us the default Adam or AdamW options and good, unobjectionable choices :)

NavinF · on July 7, 2022

It’s really hard to compare optimizers. Common architectures and default hyperparameters were discovered alongside Adam so you’d have to redo a bunch of sweeps if you wanted a “fair” comparison. In practice this doesn’t really matter and everyone just uses Adam. If you had infinite compute, you’d try every combo and select the one with the best results.

locuscoeruleus · on July 7, 2022

Adam was very effective when it got introduced so it was widely adopted. Since then only models that work well with Adam have made it from the idea stage to actually working. I think there's reason to believe we have over fit our model architectures to our loss functions and optimizers.

jonbaer · on July 7, 2022

What is the current SOTA? Demon Adam? https://arxiv.org/pdf/1910.04952v4.pdf

yobbo · on July 7, 2022

Adam functions as a low-pass filter (and/or "compressor") on the gradient. It filters out "noise", which is "wild" during start of training.

This is basically what all "optimizers" achieve in various ways, including momentum.

Adam uses several times more memory, and is slower, than momentum or just SGD. That's reason to not use if not needed.

seanmor5 · on July 7, 2022

It’s a pleasant surprise to see this shared here, I am the author of this piece. Honestly I wrote this post for myself more than anything else. I also find that my knowledge in a lot of areas is very “surface level.” It’s really easy to regurgitate definitions, but it’s definitely harder to get to the core of those ideas. I hope you enjoyed!

mikewarot · on July 8, 2022

Machine learning - the unreasonable effectiveness of trillions of matrix multiplications, and backpropagation as a universal function approximator.

Every time they pull off another thing that's just "too dang silly to work", and yet it does... it really makes me smile.

I took the free Machine Learning course at Stanford ways back, it was fun to get another toolkit, should I ever actually need it.

aaccount · on July 8, 2022

Statistics