When those language models are wrong or biased, the user will have a worse experience in all three of those scenarios. At least when we look at search results now, we can prune for the facts. Those language models are ingesting that same data to give a monolithic answer your a query. Less transparent, less safe.
I wouldn't be too sure of that. People have shown time and time again that they are pretty bad at determining what a 'fact' is.
It gets far worse when they're shown more data (the issue we are dealing with right now), and these models see far more data than any one of us will see in our lifetimes.
Much of our ability to determine what a 'fact' is is either something like: (1) (if political) 'does it fit with my bias?' or (2) (if physics) 'is it physically possible?'. Since (1) is mostly dependent on what information you show yourself, and (2) depends on comparing to interaction with physical reality, the system has too much information for (1) and no physical space to run experiments in for (2).
Much of our ability to determine what a 'fact' is is either something like: (1) (if political) 'does it fit with my bias?' or (2) (if physics) 'is it physically possible?'. Since (1) is mostly dependent on what information you show yourself, and (2) depends on comparing to interaction with physical reality, the system has too much information for (1) and no physical space to run experiments in for (2).
I don't see a difference. Large language models can also return their sources, as in the example on the Gopher blog post. This will lead to a quicker answer and equal transparency.