I think of it more like a pachinko machine. You put your question in the top, it bounces around through a bunch of biased obstacles, but intevitably it will come out somewhere at the bottom.
By telling it not to lie to you, you're biasing it toward a particular output in the event that its confidence is low. Otherwise, low confidence results just fall out somewhere mostly random.
> By telling it not to lie to you, you're biasing it toward a particular output in the event that its confidence is low.
This is something I really don't understand about LLMs. I think I understand how the generative side of them work, but "asking" it to not lie baffles me. LLMs require a massive corpus of text to train the model, how much of that text contains tokens that translate to "don't lie to me", and scores well enough to make its way into the output?
By telling it not to lie to you, you're biasing it toward a particular output in the event that its confidence is low. Otherwise, low confidence results just fall out somewhere mostly random.