Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I’m not sure I’d say NN was the first major breakthrough.

For many years people considered them too inefficient to compete with SVM, and people genuinely thought kernels was the way to intelligent machines.

Today you find researchers claiming that Bayesian nets will outcompete NN.

We’ve also seen tremendous success of random forests and other ensemble models.

I am sure there are plenty of researchers looking into all sorts of novel ensembles.

I think the major breakthrough is ensemble models and with a little bit of cheekiness you can say that NN are ensemble logistic regressions.



I’m not sure if you are aware, but Bayesian neural networks can be actually be well approximated by appropriate ensembles of standard neural networks [0]. The strength of Bayesian nets (including the approximating ensembles) is that they are able to estimate the uncertainty in their own predictions (by generating a probability distribution of possible predictions), at the cost of more computation needed for training and inference. I don’t think it’s ever going to be a matter of Bayesian nets outright outcompeting standard nets though, it’s just another tool in the toolbox if you want a model which “knows it doesn’t know something” and don’t mind the extra compute needed.

[0] https://arxiv.org/abs/1810.05546


Couldn't that be a way to address the issue of current LLMs hallucinating?


Possibly, but I struggle to reason about Bayesian nets at that scale. I think the level at which a Bayesian net could “know what it doesn’t know” would be regarding uncertainty in what text to generate in a given context, not whether or not the generated text is saying something true. One example could be a prompt in a language not seen in the training data. It could be that some plausible sounding made up thing is likely in a given context. Also, at the end of the day, what you’ll get out of a Bayesian LLM is a sample of several generated texts which would hopefully have more variation than multiple samples from the same standard LLM. I can see it being helpful to see if the different outputs agree or not, but I can’t tell at a glance how well it would work in practice.


Thanks for the explanation!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: