A GAN approach to penalising a generator for generating something that is not supported by it's available data would be interesting (and I'm sure some have tried it already, I'm not following the field closely), but for many subjects creating training sets would be immensely hard (for some subjects you certainly could produce large synthetic training sets)
1. Search databases for documents relevant to query
2. Hand them to AI#1 which generates an answer based on the text of those documents and its background knowledge
3. Give both documents and answer to AI#2 which evaluates whether documents support answer
4. If “yes”, return answer to user. If “no”, go back to step 2 and try again
Each AI would be trained appropriately to perform its specialised task