Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I’m not sure if you are aware, but Bayesian neural networks can be actually be well approximated by appropriate ensembles of standard neural networks [0]. The strength of Bayesian nets (including the approximating ensembles) is that they are able to estimate the uncertainty in their own predictions (by generating a probability distribution of possible predictions), at the cost of more computation needed for training and inference. I don’t think it’s ever going to be a matter of Bayesian nets outright outcompeting standard nets though, it’s just another tool in the toolbox if you want a model which “knows it doesn’t know something” and don’t mind the extra compute needed.

[0] https://arxiv.org/abs/1810.05546



Couldn't that be a way to address the issue of current LLMs hallucinating?


Possibly, but I struggle to reason about Bayesian nets at that scale. I think the level at which a Bayesian net could “know what it doesn’t know” would be regarding uncertainty in what text to generate in a given context, not whether or not the generated text is saying something true. One example could be a prompt in a language not seen in the training data. It could be that some plausible sounding made up thing is likely in a given context. Also, at the end of the day, what you’ll get out of a Bayesian LLM is a sample of several generated texts which would hopefully have more variation than multiple samples from the same standard LLM. I can see it being helpful to see if the different outputs agree or not, but I can’t tell at a glance how well it would work in practice.


Thanks for the explanation!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: