Hacker News new | past | comments | ask | show | jobs | submit login

XGboost, LGBM, pmdarima, stanpy (for bayesian modelling). Plus a few others.

Don't ask me what they do with all of these, I'm just the guy who make sure the forecast keeps being reproducible.




Yeah boosted tree models are the shit for tabular data


What does that make you titlewise? Data Engineer? ML Engineer?


A Software Engineer. I'm just specialised a bit in DevOps, Data Engineering, and (beware buzzword) MLOps


What's MLops? Is it what I imagine?

Maintaining repos of training/data, APIfying the pipeline, deploying an ML processing pipeline with CI/CD, etc?


You got it. It's unbelievably difficult to get model devs out of the mindset of training on their own VM, saving model outputs and metrics dumps to arcanely named file shares, etc. Once you can convince them that using stuff like workflow pipelining tools and centralized model repo servers isn't going to impede their creative process and that it prevents the mad scramble to find artifacts when there's turnover on the team, things become much more efficient.


Pretty much this. "ML engineering" has come to refer to the somewhat specialized task of implementing models and algorithms, and "ML ops" has come to refer to all of the other stuff that you just mentioned.


Don't forget "keeping the train data from leaking into the test data" and "having a way to reproduce the exact same model I trained last week". Those two are too often forgotten.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: