XGboost, LGBM, pmdarima, stanpy (for bayesian modelling). Plus a few others. Don...

NeutralForest · on Nov 3, 2022

Yeah boosted tree models are the shit for tabular data

chirau · on Nov 3, 2022

What does that make you titlewise? Data Engineer? ML Engineer?

Fiahil · on Nov 3, 2022

A Software Engineer. I'm just specialised a bit in DevOps, Data Engineering, and (beware buzzword) MLOps

moralestapia · on Nov 3, 2022

What's MLops? Is it what I imagine?

Maintaining repos of training/data, APIfying the pipeline, deploying an ML processing pipeline with CI/CD, etc?

navbaker · on Nov 3, 2022

You got it. It's unbelievably difficult to get model devs out of the mindset of training on their own VM, saving model outputs and metrics dumps to arcanely named file shares, etc. Once you can convince them that using stuff like workflow pipelining tools and centralized model repo servers isn't going to impede their creative process and that it prevents the mad scramble to find artifacts when there's turnover on the team, things become much more efficient.

nerdponx · on Nov 3, 2022

Pretty much this. "ML engineering" has come to refer to the somewhat specialized task of implementing models and algorithms, and "ML ops" has come to refer to all of the other stuff that you just mentioned.

Fiahil · on Nov 3, 2022

Don't forget "keeping the train data from leaking into the test data" and "having a way to reproduce the exact same model I trained last week". Those two are too often forgotten.