I'd argue software engineering is still too hard for ML engineers. Most of ML (not research but commercial applications of what's already been demonstrated) is now well within the realm of engineering, but there are few standard practices, bodies of knowledge, or agreed on processes for doing anything. These are problems that engineering formalism solve, not another auto-ml tool. Maybe I'm saying the same thing as the article, just framed differently?
In my experience, both are true. I'm more on the ML side, and I can tell I don't have the kind of routine and habits that good software engineers have, though I'm learning. But on the other hand, and I've seen this from software engineers who've made the transition to ML, and clearly have a good handle on the concepts (in one case even published papers in ML journals), they don't seem to have the intuition that allows them to select the right tool for the job, or understand when a given technique is appropriate; you could call that "modeling maturity" - a combination of mathematical maturity and the skill of relating domain knowledge to modeling choices.
Right, I see. That's not really possible imo. For things like mlops, sure. But model development, selection, evaluation? From what I've seen, it's exactly when engineers reach for standard tools without giving thought to how it applies to the given problem that they do a bad job.
Intuition is always important, but it shouldn't be the last word in an engineering problem. I think there is room for a lot more rigour in how we build, optimize, and validate ML model performance, so less of it gets left to intuition. The discipline is becoming mature enough that this is possible, I think there is a lot of room to build out "standards" and a "body of knowledge" that can be applied to building ML models. We're seeing it in pockets, but in so many cases, it is still a dark art.
And then from an actual software engineering perspective, so much ML code is just run-once jupyter notebook stuff... there is a lot we can do. I need to give this more thought, but I think it's acknowledged there is a big opportunity here
Even in non-ML software engineering you still have architectural tradeoffs, that while you can in part make reasoned arguments about, you still are relying on the intuition of your technical leadership.
Absolutely agree, the major challenge has been to explain to software engineers that the pick up an algo from github, monkey see monkey do a tutorial followed by automl or grid search is pretty irrelevant nowadays.
If you can’t implement any existing algorithm from a scientific paper it is pretty hard to understand the flexibility that is required designing a training pipeline.
In terms of tech stacks, cloud providers’ sample architectures and code samples are a base for cloud components and API understanding and not actual implementations.
I think industry and society just has too low a bar for software development. If the quality of civil engineers matched that of software engineers as a whole, we’d all be dead from the collapsed buildings and bridges. Of course, software isn’t usually such high stakes, but I don’t think it would hurt the industry to have some formal licensing. If you write a terrible API or whatever, then a review board decides if your license to practice should be removed. The field would suddenly get better, as developers would no longer care about the PMs artificial deadline, if they need more time to write quality code, they will take it, and the PM simply can’t find anyone else to cheap out on the work either.
Now, how you enforce this, I don’t know, but this is my pie in the sky dream.
I would argue that review boards, licensing, etc are only relevant because of the stakes. How to enforce it becomes irrelevant if you can't sell anybody on the idea that it's important in the first place.
Even the strength of your statement "...I don't think it would hurt the industry to have..." doesn't sound like something that any part of government would be willing to get behind.
There's just such a huge leap from "decisions civil engineers make can kill. people" to "decisions software engineers make can annoy people".
Still, you're not wrong that the field might immediately get better in some ways, but I believe we could also go down a lot of paths that would lead to the widespread concept of "stifling innovation so that people are less likely to be annoyed by buggy or hard to use software". I'm sure there are also much stronger arguments in favor of your idea of a better world when you narrow the scope down to specific types of software or use cases.
Your suggestion is extremely laughable and naive and reeks of elitism.
The domains where the quality of code has very high stakes do have very strict processes and have an extremely high bar.
The domains where the stakes are not high, for ex - implementing an API to download user reports, have a very low bar.
And your point about cancelling the license to code if you implement a bad API, that's like cancelling a journalist because they made a typo in their article and it went out.
I'm sure there are plenty of quotes out there that are very similar to this, but kudos in stating what most of us (software engineers) already think, with such clarity. I laughed. I cried. I remembered that this is never going to change. ;)
Most folks don't realize that the vast majority of ML applications can be done with preexisting models, you don't need someone with a PhD to fine tune it unless you need to eek out a tiny bit of improvement. The real value is in the person preparing the data for training.
This is not true at all unless you're only referring to commodity applications. There's no preexisting model for general regression or seq to seq, for instance. You can say, oh, that's a good fit for an LSTM, but you still need to tune the individual layers to suit the application. Although I am slightly conflating ML and DL, since you mention training which heavily implies DL.
Most folks can get away with logistic or linear regression. A tiny portion of those might need to move to random forest. And an even tinier portion may justify deep neural networks. The vast majority of ML is not deep learning, nor is it necessary. We're talking very boring typical business cases.
Yeah, this fixation with deep learning is kinda weird, given that most companies don't have vast amounts of images/text lying around and a pressing business need to understand them.
I'm a big fan of starting simple, which means linear/log regression (normally with lasso, as it does variable selection).
Then, if you can prove business value, it may make sense to start trying to use more unstructured data.
I think it is because linear regression is not science fiction enough even if that is the best that can be done on the data. We didn't hire this AI guy to do simple regressions!
From experience, there are lots of companies looking precisely for an "AI person" who is going to use a bunch of complicated sounding buzzwords and build some overly complex thing without evaluating simpler options. This kind of person gets way more attention than someone who suggests that what the company wants to do can probably be accomplished with a simple shell script, or someone that tells them AI is not magic and in the absence of some data capturing a relationship it can't really do anything. It's a hyped up field and has attracted lots of people who care about hype. (I work in the field, not trying to denigrate it, my point is that many / most business adopters want "AI" for all the wrong reasons
Ran into this very thing writing software for customer support teams. There's a Data/ML team within the company and there was a lot of talk about getting them to help plan the software to add abilities to "learn" what customers wanted with their support requests. The reality of the situation is that you can just take a step back and look at the pain points for the CSRs, the highest cost calls and where the time is spent servicing the customer. In the end, it's pretty clear that we can just do a few queries up front to realize that "oh, this person ordered 2 items of the same size in their last order and haven't returned one yet, it's very likely that they want to return one" or any number of other common scenarios that were found and then prioritized based on a simple cost/benefit analysis.
Could the Data/ML team have done something cool and likely complex to show clearer patterns and maybe get something in place to continually refine that data? Probably. Would it have saved more money for the company eventually?
Unknown. Would it have cost substantially more money to design and implement? Pretty sure.
Ha, I agree that software engineering is too hard for ML engineers, and even for software engineers like myself who have been doing it for 20 years like
zcw100 said :).
Author of the blog post here. It was definitely written from my narrow viewpoint and experience. Our goal is to make more solutions accessible to software developers and instinct was the same as yours - a lot of ML can be within the realm of engineering (even small / one-person teams) and that there are accidental complexities standing in the way of wider use. Our solution (AutoML+SaaS) def doesn't work for every situation. I'm curious to hear more of your thoughts on how ML can be made more accessible to Eng (and vice versa).
I've dived into ML (and DL) with 17 years of software development experience. I'd say it's much easier than software. Yes, there's A TON to learn and experiment with, but still much less than with software. I was able to feel confident enough after just 1.5 years learning and kaggling, and passed easily ML interviews to SF Bay Area companies (hint -- all data science people are extremely glad to see software experience, much more than data science).
"Good pipeline and bad model is much better than bad pipeline and good model" (c) someone
My wife is a researcher at Stanford doing some ML stuff and the constant painpoint isn't the researching of novel models or maths or whatever, it's the IT hell of managing a pipeline and data acquisition. I've helped her introduce things like Docker to the lab which seems to have helped but still - software engineering is a totally different skill that really hinders ML work.
Beyond what makes a good model, IME at a FAANG building an ML product, the bulk of the work in practice tends to be general software engineering. There need to be a sufficient number of people who understand the actual ML pieces under the hood, but even when you're making changes to the models, the bulk of the actual work is not complex ideation, but implementing the ideas in software, and this implementation is usually something you can learn in a few months.
I imagine he is doing the tooling to support the ML team. I've seen listings for that type of work that don't require any ML experience. Usually you need a PhD if you want to be developing and tuning models.
I still remember when SQL was a dark science that could only be managed by administrators that were initiated in secret rooms. Nowadays a lot of people use SQL without deep understanding. They do useful stuff but if you are very skilled you can do way more. And SQL experts are often frustrated with all the amateurs that use their database in such a sub-optimal way.
I expect the same for ML. The tooling will improve until regular developers can use it without understanding the fundamentals. They will do useful things but some skilled people will be able to do way more advanced stuff.
That's pretty much the path of all technologies. They get simplified to a level where they useful for a lot of people but some experts will be able to get way more out it. And usually the experts look down on the amateurs.
As ML expert, I am often frustrated at the sub-optimal things I find in the wild or even publications. But you need to understand everyone is on a growing trajectory.
Coming from a background in computational quantum chemistry, it’s interesting to see all of the people who say ML is “easy” after taking a few online courses and reading some books on data science. If it’s so easy, invent AGI then, since that is the holy grail of machine learning.
Most of these people claiming expertise do not have a deep grasp of the mathematical fundamentals required for state of the art research in the field. Can you develop a neural network with features that are invariant to permutational and rotational symmetries? If so, how do you efficiently generate the irreproducible representations of the product of the symmetric and special orthogonal groups for use in a fast Fourier transform? What is minimum description length and why is it so fundamental? How do you solve trust-region problems on Riemannian manifolds? Throwing an off-the-shelf PyTorch library at a problem does not make one an expert in machine learning.
Haha, I’d never seen that. I suppose my comment is a bit over-the-top compared to my comment history, but it’s somewhat of a touchy point for me when someone jumps into a new field and claims it’s easy and that they’re an expert without being able to back up the claim (some rare people can do this with a new field; most cannot).
Is it though? When I run the hello world for whatever GAN or pytorch, what exactly am I going to do if I get an unexpected error message? Or if the results are off? Would I even know the results are off? I don't even know if pytorch is used for GAN.
You’re going to do exactly what any decent SWE would do - debug it. But of course, if you don’t know what you’re doing then you gotta learn first. Plenty of resources out there on how GANs work.
In the majority of cases, not much. My point is more that many people who claim expertise in machine learning do not have a holistic understanding of the field; their knowledge is patchy and they aren’t able to implement a solution to more than the most general of problems. It would be like a car mechanic that can change the oil but doesn’t know how to locate and fix a leak.
Most neural networks that perform well nowadays are highly specialized to a particular problem domain, and I gave an example of an approach that might be used by someone designing a neural network that is invariant to certain types of symmetries on the input data. This isn’t something a typical DS/ML bootcamper would know how to handle, or even how to approach, despite their claims of expertise.
Author of the blog post here - it's very cool to see this on HN!
I wrote this as someone who considers himself a half-decent software engineer trying to use ML for a side project and feeling frustrated by all the effort and "accidental complexity" involved. Why focus on software engineers and ML in this post/rant/company? Because "software is eating the world" and having ML be more accessible to software engineers will broaden the range of problems they can solve.
Thanks for all the comments - I acknowledge all/most of the criticisms as valid. A SaaS/AutoML solution won't work for everyone and definitely not for every problem, and it won't be the only answer to making ML more approachable.
Wow, it's almost like machine learning and software engineering are different disciplines entirely!
Just because both involve coding doesn't mean software engineers should be expected to have the math chops (stats/prob, linalg, calc, etc.) to make machine learning work for them...
Vice versa is a little more complicated, because ML/DS can be done very inefficiently without the proper coding practices, but understanding the math is independent of that so that point still holds for this comparison.
In-built is the assumption that one person must have both competencies. This may be true for cash-strapped startups but this hardly plays well as general advice.
Definitely not ... I am a SWE in a big company and working on ML projects for the past 4 years. We have a data scientist team which does the data exploration and comes up with a model. Their output would be a jupyter notebook.
Then we have a team of "Applied ML practitioners", which I am a part of, we productionize the jupyter notebook, by setting up pipelines, services etc. We understand ML algos, stats, probability etc, but not as much as our data scientist team does.
Having both in the same person would be good, but is not necessary.
Regardless of how low you get your communication overhead, it still exists. It's rare to find people who can both run and test all the infrastructure and model code, and notice that the transformation you apply on line 34876 of file foobar_now_with_added_ml.py is statistically inappropriate for your problem.
That's not even to mention the really hard part, selecting a good outcome variable and appropriate ways to measure the performance of your system once it hits prod.
You can definitely split this stuff between people, but it gets super-linearly harder as you add more people, so it's really incredible to find people who can do both (and honestly, there aren't that many of them (I'd like to say us, but I'm probably not there yet)).
> It's rare to find people who can both run and test all the infrastructure and model code
It's also unnecessary to do so as long as your institutional processes are capable of synthesizing multiple peoples' competencies across multiple disciplines.
How do you think any machine more complicated than a train car was designed? Do my mechanical engineers need to understand the intricacies of avionics?
> it's really incredible to find people who can do both
Absolutely, and I think you'd have a hard time finding someone who disagrees. But you made a very strong assertion about a "need" which requires much stronger arguments to support.
> It's also unnecessary to do so as long as your institutional processes are capable of synthesizing multiple peoples' competencies across multiple disciplines.
You clearly work at much better run companies than I do ;)
I sincerely wish you luck! Yeah terrible managers are awful. The feedback loop of "my boss won't do shit to help me so I'll figure it out on myself" to "I'm competent in every single technology here and could run the show myself with enough time" to "management recognizes my skills and puts everything on my plate even though I have no time" is pretty dangerous.
We made an experimental foray into the ML space in attempts to accelerate authoring of SQL queries.
After about a week of reading through literature and playing with open ai, it became pretty obvious we were still super far away from being able to build something the business could actually find value in.
My problem scales horribly with the ML training angle we have today, because it's the super complex one-off queries we need the most help with, not the simple ones we can anticipate and train against.
What we need is actual intelligence for many of our problems. Things like subjective criteria are important to us. Realizing maybe a recursive query is a fair compromise to reduce a 400 line monster to 30 lines. Assuming the 400 "looks nasty", that is. I guess you could train that bit too, but then your solution space gets even more impossible to target.
Missing in these advertisements-disguised-as-blogs is an estimate of effort or time. Let's say ML is critical to a new product or internal tool, how many man-years is it reasonable to invest? If your expectation is using ML like a drag-and-drop app builder then you're probably best off using a canned tool but then you won't really have any competitive advantage.
So, I went directly into data science after an econ degree, worked there for 2 years and then transitioned to SWE (at startups). First, I am 100% certain ML will become a part of the standard SWE toolkit (just like apis, docker, sql, etc..).
However, to the relative "hardness" I would say ML currently is much less things but they can be really hard to get your head around (like starting to think in embeddings and vectors), and many APIs are wonky, because they are new. And the whole data science workflow space it in an early spot (we are still missing a "create-data-app", simmilar to the create-react-app.
Standard ML applications will be commoditized in the same way that creating a website has been commoditized. But for anything that isn't standard, and that's the vast majority of what businesses need, ML isn't going to become part of SWE's standard toolkit anytime soon.
ML is kind of like tennis in the sense that you look at Nadal and Federer and all the greats and you can say, "man, I could do that," but you can't, not even close in this life and the next after this one.
In fact, most SWEs who have developed some sort of "intuition" about ML have been quite dangerous, as they tend to be condescending to the real experts, have built things that make no sense at all or fall apart when 1 data point our of 100,000 changes, and when presented with the fact that they have no clue about ML, resort to the "the AI is all hype anyway" comeback.
And vice versa (ML practitioners like me who think they can do SWE with one eye closed, "what's the big deal?") is also true.
It seems to me the biggest challenge for SE transitioning into ML is that ML is a very broad topic and people conflate a lot of roles together. From purely research based questions (backbones, optimizers, initializers etc), to more 'MLOps' like pipelining questions, which tend to fall into the classical engineering / dev ops buckets. So the real question is what type of ML do you want to do?
If you're looking to land a job at FAIR / Deepmind or Google Brain/ Nvidia Research as a researcher or ML scientist the expectations of knowledge are very different than 'data science'. These are research lab groups, that work on pushing the state of the art forward. They are also supported by great engineers, building awesome tools that improve ML research. So transitioning into this sort of role requires more than doing Kaggle competitions, it requires developing an intuition for the respective ML subfield / and trying new things and usually failing. i.e. this is a research role and will require a lot of study and learning
If on the other hand you are looking for datascience / take model and build pipeline to run AI, or perform hyper param sweeps or simply modify some model code, then on I would say that is much more engineering than research ML. This has a much lower barrier to entry coming from engineering and could be a good stepping stone to a transition into pure ML research.
On a more general note to consider when thinking of transitioning to ML is that these systems are probabilistic in nature vs purely deterministic as they are in more general software systems. People (ie humans) are bad at wrapping their heads around distributional processes - you can see this in all fields that deal with them (Quantum vs Classical Physics, Biological Systems etc).
In general I guess what I have seen is when engineers try to dip their toes into ML, what's required is a mindset shift in how to approach problems. Once that happens the depth of that shift determines the type of role with ML you wish to pursue.
It's interesting that you mention this, because there's quite an impressive resurgence of privately funded R&D going on in the ML space. We're in an interesting phase, where the field is moving too fast to have 'canonical' methodologies (though we're getting close). To be an engineer in the deep learning space often requires reading and keeping up with research. Everything just gets dumped on arXiv, because the peer-reviewed publication cycle is almost too slow for the field.
Making a successful transition to ML in my opinion, depends a lot on the individual. Without a strong background in calculus, linear algebra and statistics, it's going to be difficult. Training a model is what people tend to focus on, but in my opinion, that's the easy part. Evaluating/validating a model, analyzing and preparing your data, anticipating model performance, understanding what to do to improve your fit, model selection or architecture. Developing custom deep learning architectures at times requires a bit of an abstract mathematical intuition that I think will suit many engineers very well. A lot of engineers are well equipped to be successful in making a transition, but on the other hand, at least as many aren't.
In the future, I think the field will have many varying degrees of expertise, with the barriers to entry becoming lower all the time. We're reaching a point where some common use cases can be solved adequately in a nearly automated fashion. Some "autoML" tools don't really require any real understanding of ML, though I think it's not wise to get in the habit of using them without understanding how to evaluate a fit. These tools will be great for people who want to occasionally use ML to solve some smaller problems, but as a part of their larger job function.
In some middle area, ML engineers and practitioners will be training and operationalizing models, and keeping up with major developments in research. But there will be some significant changes in the next decade. I predict the nebulous of data science, data analysis and machine learning will become formalized into 3 major skills - exploratory data analysis, machine learning and advanced computational statistics.
At the lowest level, researchers will continue developing the field, which like you say, is probably not something you transition directly into.
One might as well write an article claiming that "UI design is still too hard for software engineers" or "controlling a nuclear power plant is still too hard for software engineers" -- which are true (and it is equally true that software engineering is hard for UI designers and nuclear power plant operators).
Who came up with this silly idea that something that is a valid knowledge domain of its own is suddenly going to become "easy"?
Another perspective, someone with no programming experience whatsoever today can in 1 minute create a new ML application for natural language tasks with performance that would blow most SOTA systems from 5 years ago out of the water, using things like the openai API. And this trend will almost certainly continue, where many tasks can be programmed by simply asking/describing the problem to a massive model and letting it work magic.
That might be true but my experience is that people are much better at building tooling than they are at using it properly.
So many organisations can't even get the basics of efficiency and custom service right but they still invest heavily into cutting edge tech. I can imagine plenty of companies joining the ML bandwagon and still not even really knowing what they do as a business.
If one were to build with the API from scratch. But in theory all you really need to do to deploy a “new model” just is write a short prompt and click a “create endpoint” button.
Most CS grads coming onto the engineer market now will have ML exposure through their chosen college courses. As this wave of knowledge makes its way through the industry, the value of ML specialist knowledge will decline, especially as off-the-shelf pre-trained models improve. Very few companies then will be able to justify the luxury of a dedicated in-house data science/ML engineering team. In other words, most software engineers will be ML engineers, in the same way most software engineers today are Docker/cloud proficient engineers.
Docker is a tool where one reads the documentation and be able to use it. ML, much like math, is a discipline. Sure you can read textbooks and the latest papers. But deep understanding and experience are what make it useful. Sadly these will take much longer.
Sort of hijacking, but I've always wondered: Where are our 'neural binutils'?
I want to be able to compose these tools like I would random unix ones: Something like 'Identify album covers in this image | extract the text in said covers | spotify api'.
It seems like there are so many breakthrough models but both due to technical (size/compute) and industrial ($$$) concerns they remain out of reach for random devs, let alone packageable into a `grep` style composable tool.
Assuming you mean coreutils. binutils is for managing/inspecting binary executables.
But to your point: there were two key innovations and criteria of UNIX pipelines: a common and understandable data format, and writing programs to send and receive anonymous data. Crucially, the input and output formats were the same: plain text, separated by newlines.
In contrast neural networks are applied to a variety of data formats. Images, video, audio, text, social networks etc. each with their own encoding into something an NN can work with, with varying dimensions, features, metadata etc. So it doesn't make sense to bundle them as 'neural utils' but rather utils along whatever pipeline already exists, like GraphicsMagick. Which does leave a huge blind spot for the domain transforms like text recognition.
If you stay within the AI ecosystem, you _can_ set up reusable layers for tensorFlow, but typically you cant swap out something in the middle without retraining all layers below it. Which you might treat as a violation of the anonymity criterion, since the behavior / performance of a layer relies on the specific behavior of those above it.
There's still a lot of opportunity to introduce ML into the classic plain text Unix utilities. Even if you have to retrain, there's still room to improve over existing tools. One example is learned sort, which outperforms radix sort, even including the time it takes to train the model.
The challenge for the engineers at our AI startup is that deterministic testing paradigms don't adapt well to probabilistic models that are continually being retrained. As a scientist is hard to convey the acceptable range of variance and often the random change of individual predictions at the decision boundary. It's also hard debug behavioral issues that actually are systematic model failure versus those that are traditional infrastructure bugs. Often times the band aid is that build lookup tables to ensure certain behavior which in turn also underlying issues from being discovered.
Testing paradigms are either too high level or too specific. Recent work on evolving behavioral tests addresses this but it requires more manual effort and interpretation which kinda defeats to point of automated tests.
Good discussion. I have often wondered about that interface.
This bears more resemblance to traditional manufacturing actually. I think there may be some value in borrowing ideas from statistical process control, rather than trying to force predictions into deterministic cases.
I think it goes without saying that this is too relative. One thing that is common in both fields is the level of research and reading docs it entails.
I used to tell my students "software engineering is 70% reading and 30% coding". This remained consistent as I dove into Data Science and now ML with computer vision at Roboflow.
Of course the first time I was exposed to it during a fellowship, I thought I was out of my depths, but this comes with everything new.
To @deepsun point, I've found Kaggle's intro courses quite excellent as well.
Physically training the models isn't the hard part. It knowing why you using x method, how to interpret the result, and what dials to turn that's the hard part. And there's no automating that.
This line "Set up low-latency, elastic, highly-available, and cost-effective inference close to your data." is the problem I've found that annoys everyone - Software Eng don't understand MLOps, Data Scientists don't understand systems programming, and everything ends up costing way too much money and taking way too much time.
Even with a magic API the latency still isnt good enough, so the choice is often entire ML solutions like the OPs product, months of development time, or a really expensive container.
The best tutorials for indoctrinating software engineers in ML are written accessibly & use real-world business-case data, or easily comprehended data sets. One great example of this, from graph theory, was this fantastic 2013 article by Kieran Healy: https://kieranhealy.org/blog/archives/2013/06/09/using-metad...
There's no getting around the complexities of fit, bias and customized models for many ML problems, so my observation above is obviously limited in its applicability.
I really like the premise, and I really agree with it, but I don't think a SaaS is a solution, the solution is trying to find better abstractions that makes things simple and easier for the developers, using code, and without limiting their flexibility, but that's the hardest thing to do!
As an experienced software developer who used to learn a new framework every week, I thought ML was going to be piece of cake. In reality, I went down this rabbit hole 4 years ago, and I'm still in there, I was so innocent back then
The issue with abstractions is that ML isn't just a few variables or dissimilar systems to choose from, the whole problem domain that could be helped with ML has so many dimensions. Natural language, speech processing, pattern recognition etc. are all completely different. The only thing they have in common is that you might use computers for them.
I think that is why people find ML so hard. It isn't a single sausage machine to create insight from, it is a set of entire stacks from philosophy all the way down to the electronics.
As a senior frontend engineer, practice makes perfect, frontend is easy-peasy. ML/AI seems to have this thick wall of math around it, if I really want to understand what makes the models going.
Yeah, it felt like a sales person being confused why can't become an expert fullstack software engineer on up to date modern stacks without a lot of effort and time.
I‘d argue that integrating ML into a project isn‘t the hard problem, as it was described in this article. Most people will, after some research, be able to build some basic ML functionality using popular libraries and a bit of python.
The hard thing about ML is, when you try to actually understand it. I‘ve met many „ML engineers“ who do their job but do not actually understand what they‘re doing. Understanding ML doesn‘t have to do a lot with coding. It‘s math and statistics.
Nice read. However, it seems a bit fishy that the website publishing this article is a company offering ML as a service. They as a company directly profit from people not wanting to learn ML themselves. I‘m not saying they wrote this article to increase sales, but it‘s a thing to keep in mind.
The article contains lists with bullet items and then suggests that if these were "easy", ML would be more accessible. Well true, but much of the "difficulty" is in establishing faith that the models are meaningful and valuable.
The article is talking about applications where the value is less-or-equal to the work of learning and applying some library (it mentions days or weeks). What is the actual value of applying ML in these cases?
In general, getting a phd is the best way to go but it is not the only way.
"The AI Epiphany" channel by Aleksa Gordić is worth watching.
Check out his origin story: https://www.youtube.com/watch?v=SgaN-4po_cA
He works at DeepMind. He is self taught; without a phd.
I tried to build on top of fast.ai, and it was very easy to start, but all the hooks and magic in fast.ai 2 just made it extremely hard for me to understand and extend the code. I believe it went in a bad direction.
When tensorflow was the dominate api, fast.ai made a lot of sense. Nowadays pytorch seems to have won and it is easy to use directly and there is PyTorch Lightning if you want things even easier to write.
There are a several videos on youtube where the creator reviews a paper and then implements it from scratch. For example this channel is pretty good: https://www.youtube.com/c/AladdinPersson/playlists
You're heralding literal geniuses like Jeremy Howard as examples of "PhD not required". Yes, if you are Jeremy Howard or Leonardo DaVinci, sure you'll excel at this stuff without studying a degree.
Look it’s just buzzwords all the way down. There’s only a handful of people who can do this stuff. {Your business} will drop some buzzwords, someone will write an if statement, management high fives all around and fat bonuses. Deal with it.
lol I did an ML course in 2017. I am a mechanical engineer with some computer skills and managed to do lot of tensor flow models quite successfully. It’s funny to read that this could be hard for software guys.
That's the essential complexity. The OP is correct that there's far more accidental complexity involved than seems strictly necessary. In an ideal world, I wouldn't have had to spend the last few years learning software engineering in order to be a better data scientist.
In the gocoder Bomberland competition, my DL AI trained on 1.5 billion timesteps just lost against someone handcrafting algorithms in Python in one day.
You're absolutely right, sometimes, simple and predictable solutions are much better than AI magic ^^
Good question. We have been doing this for almost 2 years now and we still find new players almost every week! It's a bit of a wild west for sure.
I can't say what we do different from everyone, but a few things that we focus on:
* Speed: we train models based on DL in seconds. So you get real-time feedback on your model/data as you annotate and upload more. This is true for a few, but far from all of our competitors. In our benchmarking we find that we still perform on par with the competition (at least in the "low-data" regime https://www.nyckel.com/blog/automl-benchmark-nyckel-google-h...)
* Level of abstraction: Many competitors expose some ML knobs for their users thinking it will improve the experience. We found that this induces "ML anxiety" for many. As a result we have zero knobs. Just focus on your data, we do the rest.
* API: we have spend a ton of time developing clean API abstractions. Some competitors have great APIs, other don't.
* Cost: we are super cheap. Our lowest tier if $50. We don't charge for training or per function/model.
"Machine Learning is too difficult" ... says company selling "ML platform [which] can be used by anyone and it only takes minutes to train your first model."
No, actually, you're just being dishonest. Even if you hide TensorFlow and the keras models behind a nice GUI, people still need that mathematics knowledge to succeed. And yes, pre-training is great. But you need a shitload of stochastic analysis to make sure that the pre-trained embedding won't distort your results.
"For a software engineer, the hardest thing about developing Machine Learning functionality should be finding clean and representative ground-truth data, but it often isn’t."
That is (in my opinion) an entirely bogus request. Machine learning is a mathematical / statistical tool for modeling large unknown functions. I feel like this sentence is akin in usefulness to:
"For a nuclear power plant, the hardest thing about building one should be to draw how the finished building will look like in the press release".
Someone "doing" machine learning without the requisite math knowledge is effectively driving blind. And worse than that, they don't even know what they don't see, because they lack the skills to identify their blind spots. That's how you end up with a "tank detection AI" that in reality just classifies the weather into bright vs. dark. [1]
Companies like this who promise advanced mathematical algorithms with no prior skill or knowledge are how we unleash a plague of buggy unverified automatons upon the world.
Who cares if you overfitted? See, the model has 100% success rate vs the training set!
Who cares if it denies bail to minorities or hits a few pedestrians from time to time?
The problem isn’t that ML is too hard, it’s that it’s too easy. Crazy people keep connecting ML to systems that matter- that have real, irreversible impact to humans- and they don’t understand it.
I wish ML were 1000x harder/more expensive to integrate so the economics would drive away frivolity.
I've seen it time and time again: Team has a black box ML/AI solution to a "problem." Team wants to eke out better P/R or deal with some complex edge cases. But team's problem is fundamentally ill-posed and no amount of hacking or kludges will actually produce the success criterion that they need.
The problem is the accessibility to these tools, which in many times has led folks to neglect the subject matter expertise required to effectively apply them in the first place. At least as these tools catch on in popularity in myriad problem domains, there will be a new generation of subject matter / domain experts who are familiar with them, and we'll probably jump over this hurdle.
This goes a little too far. For traditional ML, sure, you need lots of deep statistical knowledge. But the fact is that deep learning is different: it's mostly a black box. no one understands exactly what they're doing, how they're biased, and how exactly these models understand things differently than humans, whether you have a PhD in statistics or not.
Because of that, doing deep learning consists of a bunch of cobbled together heuristics for getting good results and probing the model to give a human an intuition for whether it's learning correctly. The tricks and tips for steering that black box have mostly been developed in the last decade: it is not a super deep well.
These tricks and heuristics are like the knowledge needed to be a technician in a nuclear facility, not the knowledge needed to build the nuclear facility in the first place. It's not nothing, to be sure, but unless you're a researcher developing new novel architectures, a very shallow understanding of the statistics will go a very long way.
It's no different, statistical knowledge is still needed to draw the best possible inferences out of the combination of limited data points and prior general information with varying strengths/confidence levels attached. Not to mention that loosely "black box" methods have a long history of their own in non-parametric and semi-parametric stats, so it's not like deep learning is doing anything radically different.
The "deep learning is a black box" meme is about 5 years past it's due date. It's not as tight as for convex models but we do understand what's going on inside, just not perfectly yet.
I think we're talking about different levels of understanding. For things like convex optimization, we have optimality results. For deep learning we have "try to stop training early because it tends to get overfit if you run it too long", "increase the number of parameters in the transformer to magically get uncanny impressions of written text out". These are not the same kind of understanding.
If I hand you a 175B parameter language model, are you really contending we know what's going on in there? At a mechanical level, sure, it's tensor products and activation functions, but that's like saying we know how human brains work because the standard model is very predictive.
Well, the thing is, we do know how human brains work, just not perfectly. A black box is something you can't look inside and don't understand, which we left behind years ago. We can even interpret parts of it. And if you hand me a 175B parameter language model without skip connections I can tell you exactly what will happen because we understand what role they play and how attention decays rank.
Note I'm not saying we perfectly understand everything, or that our understanding is as solid as that involving convex models or linear regression. But "black box" just isn't true anymore and just adding unnecessary mystique. We are somewhere between Newton and the Lord Kelvin and Faraday era of physics, no longer in the ancient alchemist days
Not sure why you're making a distinction between ML and deep learning here. both of them can be black boxes. Calculating the area of a square can be a black box if all you know is how to plug numbers into the formula.
A big part of machine learning is looking at weights and outputs to make sure the results are sane and that you have an understanding of what's going on. This is true no matter what algorithm you use to make predictions.
Even so, there are procedures, protocols, and best practices for working with (and validating) black boxes, acquiring which may require time, skill, and patience.
Sure, but it's finite, reasonably circumscribed, and honestly not that mathy.
I mean, even the example given by the OP about the tanks is super well known (apocryphal[0]) and doesn't require math knowledge to avoid. You just have to have heard of this kind of failure mode
> You just have to have heard of this kind of failure mode
Yes exactly. You have to be aware of it, you have to know what it entails and what can cause it and how to diagnose and fix it.
That’s the other half of the domain knowledge, and just “autoML-ing it” or following some set of prescribed steps won’t necessarily get you that solution.
Resume driven development is real. Who wants a crud app on their resume when they can have a crud + ML app on their resume? I remember back in like 2016 recruiters devoured anyone with the slightest bit of ML experience on their resumes: it fed back into the ego of developers, and suddenly everyone was an ML expert who could do no better than load a JSON of data and import keras. What a strange trip that time was
Snowflake is not like the others in that list, as they just provide a pretty good SQL dialect over cloud storage, and a tolerable UI and API access, for a pretty indeterminate price (who knows how much a credit is worth this quarter?).
Personally, if there's ever a downturn, I plan to play Snowflake sales people off each other and get enough credits to last me a lifetime ;)
"For a software engineer, the hardest thing about developing Machine Learning functionality should be finding clean and representative ground-truth data, but it often isn’t."
Which is so much bullshit. The hardest thing is validating your hypotheses, which machine learning turns into a black box. When we have coworkers who insist on operating on wishful thinking we try to maneuver them out of a job. Except every 10-15 years when the built up pressure of fads overwhelms reason and we all get stupid for a generation (which in software is about five years).
The things that started as AI that we don’t call AI anymore, and don’t lump in with AI when discussing successes or failures? It’s because they can be explained in plain English and implemented without much or even any special jargon that marks it as anything more than exceptionally clever Logic.
Software developers HAVE to have an understanding of the subject they're developing for. Computers are not brains, and they are not able to understand the objective or context in which they run.
I could spit out their crappy tagline - "the hardest thing about developing __X__ should be __Y__, but it often isn’t." - for almost any topic.
"The hardest thing about developing an inertial navigation system should be getting clean sensor readings, but it often isn't"
"The hardest thing about developing MITM proxies should be getting certs configured, but it often isn't"
"The hardest thing about developing web extensions should be setting up your manifest file, but it often isn't"
"The hardest thing about web development should be handling https requests, but it often isn't"
"The hardest thing about having a baby should be labor, but it often isn't"
"The hardest thing about making a car should be getting high quality steel, but it often isn't"
Unfortunately there are a shitload of ML ‘experts’ out there who do not know what they are doing but still get results that are good enough to not get fired and receive copious amounts of money every year. These tools help doing that; companies generally don’t see the difference anyway and they don’t know how to set or evaluate KPIs on these ventures; they don’t even know what or why they are asking for; they just know they need to show progress with AI to not become obsolete.
Except, of course, that they are becoming obsolete because if all of your AI progress is a black box operated by someone else, you have precisely 0 competitive advantage over someone else being equally clueless and purchasing from the same vendor.
I work at Nyckel. In fact, I'm the "ml guy" at Nyckel. I have a PhD in ML and did some research at Berkeley, but I mostly consider myself a ML engineer. My most recent job was in the self-driving car industry, leading a ML team there.
Knowing the math/stats is helpful when navigating the vast set of models to choose from when fitting your data. Although I'd argue that some sort of black-magic "intuition" earned by doing this for a long time is more important in practice...
However, when validating a model, there is really only one way: test it on production data. This is what Nyckel does: upload your production data, do some annotations, and see if it works. Nyckel handles model search, cross validation, etc for you which reduces the risk of bugs. In a way we are making the argument that by focusing on your data, you are most likely to do well.
But what about that pesky out-of-domain issue? Like the tank/cats or whatever? Well, our customers are not trying to develop AGI, but solve narrow problems using image and text classification. And they are also doing it for themselves so they have all the incentives to be honest. Consider one example use-case from a health food store we work with: "what type of legume (from the 10 I offer in bulk) is in this picture"? As long as they train and test on production data from the warehouse camera stream, they are in good shape from a statistical perspective. Sure, if they throw in a picture from anywhere else, they are toast, but why would they?
I believe it is a very common mistake for intelligent people to assume that others will behave at least reasonable. But in my experience, when people do AI without understanding it, all bets are off.
"Sure, if they throw in a picture from anywhere else, they are toast, but why would they?" Since you list a Barcodeless Scanner as an example, the manufacturer of strawberries might run a promotion for blueberries on their box. For a non-expert user, it is unimaginable that a model trained on 3D blueberries might be triggered by a 2D photo of blueberries.
Also, I'm going to go with your legume example. As soon as each new truck arrives, the intern runs out and takes photos of the legumes in their boxes for the AI training. He uploads the images to your website and trains a model. TADA! The model is deployed to production and starts causing issues. But the people working alongside the fancy new celebrated machine don't want to lose their job, so they silently fix what's going wrong. You've just reduced productivity by introducing a costly machine.
Turns out, the different suppliers arrive at different times of day, so the lighting is different. And different suppliers use different box types. But without expert domain knowledge, you wouldn't even consider that this might be a problem. Also, why do you assume the customer will verify their model on independently sampled production data? To someone lacking the domain knowledge, using the exact same set of photos for training and for verification seems just fine. Actually, it's a lot less work that way.
That's what I tried to get at with my blind driver analogy. An untrained person will do things that seem absurdly unreasonable to us. But to them, it's the logical choice. They lack the knowledge to properly understand why what they are doing might be problematic.
Based on your description, however, it sounds like you (and your team of experts) are actively working with this customer and giving them feedback on what to do and how to do it. Have you considered making that part of your offering?
"Use Nyckel to integrate state of the art machine learning into your application. Anyone can curate their data set with our ML platform. A quick chat with an experienced AI engineer helps identify the best model and training procedure for your use case. It only takes minutes to finish your first model. Once created, your functions can be invoked in real-time using our API."
I'm pretty sure any serious business user would be happy to spend $100 for a 15 minute chat with someone that checks that their data is OK and their approach is reasonable. And it's also a nice way to segment out those that'll never become paid users anyway.
If the data is garbage then it doesn't really matter how good your maths knowledge is, I challenge you to get a working "tank detection AI" when you are just training it on pictures of different cats.
The Nuclear Power industry is starting to think about stopping doing all designs on paper, maybe in a few decades they will have achieved this, sending a message that good data is the thing they should work on first isn't a bad idea.
While driving, do you understand how the engine ECU of your car computes how much fuel to inject into the cylinder, how the ECUs distribute the power, and the braking force on individual wheels, or how your rear wheel steering calculates the turning angle of the rear wheels based on your seed and steering input?
You don't need to know most of the details how your car works in order to drive. You need much more knowledge to build one, yes, but not to drive.
There are different levels of abstractions and depending on your problem you need to understand them only up to a certain level. And different people have different problems to solve.
In most real-world problems today, the difficult part is indeed the data, not the underlying math of the activation function, loss function, or optimizer. Just Google "data-centric AI Andrew Ng" to read more on the topic from one of the most well-known people in ML.
Except that we can build cars that work. Whereas for DL AI, in most practical applications, there is like 10% edge cases where things just randomly explode. But don't take it from me, just read "Distributional Reinforcement Learning with Quantile Regression" by Google Brain and Deepmind and they'll tell you
"Even at 200 million frames, there are 10% of games where all algorithms reach less than 10% of human. This final point in particular shows us that all of our recent advances continue to be severely limited on a small subset of the Atari 2600 games."
In short, current AI approaches cannot even reliably win video games from 40 years ago, no matter how much $$$ you burn on GPU power.
How do you expect a non-expert to know if their problem is in the 10% that works well, the 80% that works tolerably, but worse than traditional algorithms, or the 10% where all bets are off?
I agree with this. The emphasis of any product that wants to democratize ML should be on making it easy for lay people to train models, and to collaborate with ML experts.
ML has already added actual value to so many industries. While the AGI fantasy may never happen and we may even see another AI winter, the existing products and solutions that you use every day are powered by ML.