I assumed GP was referring to OpenAI, not danswer (given that they mentioned that those companies were training models). And you're still using OpenAI's API, so neither open source and self hosting affect data collection.
You can plugin any model of your choice! Self-hosted, open source models are a great choice if you're very concerned about keeping your data safe and secure
> Note: On the initial visit, Danswer will prompt for an OpenAI API key. Without this Danswer will be able to provide search functionalities but not direct Question Answering.
There are open ai compatible chat/completion endpoints for local LLMs. You point the url to your self hosted version, and use the API key you started it with...