Hacker News new | past | comments | ask | show | jobs | submit login

Everyone repeat after me: "we need a baseline model".

You should always try some "dumb" models first. You'd be surprised how hard is to beat (of course depends on your KPIs) a historical average model with a more sophisticated method.




Not to mention the plethora of issues that arise from trying to fit an ARIMA onto an AR(1) process... It's weird that people just jump into using insanely complicated models right off the bat.


People love making complicated models. It’s fun and feels like you are having insights. Fitting a boring old ARIMA feels like work.


I've seen this in real time. I don't do statistics as part of my day job, but I've had enough experience and keep up with the field to know what I'm talking about. I've seen senior engineers try to ram in an overly specified ARIMA model just to claim that they've "improved" the system. It worked far worse than whatever model we were using before was (never got to take a look under the hood of that one unfortunately), was prone to wild swings in forecasting, and was eventually deprecated and we reverted to the old model.


But how am I going to get that VC money if I don't say "deep learning"?


If NN (Neural Network) beats baseline, present the NN solution.

If baseline beats NN, present NN as the baseline, and say you have an algorithm even better than NN.

(Joke only.)


I mean...you can always appeal to “old school” AI. Just dig in to the old papers and use their words. Latent semantic analysis (LSA) is an example of a hard to beat baseline model for text:

“By inducing global knowledge indirectly from co-occurrence data in a large body of representative text, LSA acquired knowledge about the full vocabulary of English at a comparable rate to schoolchildren.” (http://www.stat.cmu.edu/~cshalizi/350/2008/readings/Landauer...)


It's not always easy with MBA types.

I once had a mentor with clout on a 9 figure investment committee tell me that maximum likelihood estimation is "the dumbest idea" he'd ever heard.

Words like "Cramer-Rao bound" didn't get through. What worked was saying "deep learning is usually just MLE with a mysteriously effective function approximation".


Modern methods for deriving word embeddings easily beat LSA.


Hard to beat in terms of effort vs. quality of the outcome is more accurately what I meant (it’s two lines of code in scikit-learn [CountVectorizer() + TruncatedSVD()] to go from raw text to document/word embeddings, and the result is often “good enough” depending on what you’re trying to do). See the results on pg. 6 (note LSI==LSA): http://proceedings.mlr.press/v37/kusnerb15.pdf

Also, at least based on papers I’ve read recently, BERT doesn’t work that well for producing word embeddings compared to word2vec and GloVe (which can be formulated as matrix factorization methods, like LSA). See table on pg. 6: https://papers.nips.cc/paper/9031-spherical-text-embedding.p...

Point being: mastering the old models gives you a solid foundation to build from.


Agree, but I bet LSA is still a good baseline due to it's simplicity.


Is there a library that automates benchmark models for a given dataset? That would be useful in helping people focus on the model they’re making.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: