I am a long-time practitioner of putting machine learning tools into production, improving ML models over time, doing maintenance on deployed ML models, and researching new ways to solve problems with ML models.
All I can say is that in based on my experience, I would dramatically disagree with what you wrote.
I’ve always found pre-existing generalist engineering tooling to work more efficiently and cover all the features I need in a more reliable and comprehensive way than any of the latest and greatest ML-specific workflow tools of the past ~10 years.
I’ve also worked on many production systems that do not involve any aspects of statistical modeling, yet still rely on large data sets or data assets, offline jobs that perform data transformations and preprocessing, require extensibly configurable parameters, etc. etc.
I’ve never encountered or heard of any ML system that is in any way different in kind than most other general types of production engineering systems.
But I have seen plenty of ML projects that get bogged down with enormous tech debt stemming from adopting some type of fool’s gold ML-specific deployment / pipeline / data access tools and running into problems that time-honored general system tools would have solved out of the box, and then needing to hack your own layers of extra tooling on top of the ML-specific stuff.
I was going to make a less general comment along these lines, that I have put models to production with GitLab CI, Make and Slurm, and it keeps us honest and on task. There’s no mucking about with fairy dust data science toolchains and no excuses not to find a solution when problems arise because we’re using well tested methodology on well tested software.
All I can say is that in based on my experience, I would dramatically disagree with what you wrote.
I’ve always found pre-existing generalist engineering tooling to work more efficiently and cover all the features I need in a more reliable and comprehensive way than any of the latest and greatest ML-specific workflow tools of the past ~10 years.
I’ve also worked on many production systems that do not involve any aspects of statistical modeling, yet still rely on large data sets or data assets, offline jobs that perform data transformations and preprocessing, require extensibly configurable parameters, etc. etc.
I’ve never encountered or heard of any ML system that is in any way different in kind than most other general types of production engineering systems.
But I have seen plenty of ML projects that get bogged down with enormous tech debt stemming from adopting some type of fool’s gold ML-specific deployment / pipeline / data access tools and running into problems that time-honored general system tools would have solved out of the box, and then needing to hack your own layers of extra tooling on top of the ML-specific stuff.