Training a model on new text is expensive. I don't think it would be feasibly to constantly retrain on an index that's being fed by a live crawl of the web.
What would make sense though is setting up a mechanism whereby the language model can consult a traditional search engine any time it needs to. I've heard about people building demos of that kind of thing on top of GPT-3 already - you take the user's input, use it to generate a search term, execute that search, then feed the results back into the language model so it can use them to influence its reply.
For example: you could ask "Who won the superbowl?"
Language model turns that into a search against Google or Bing or similar and grabs the text from the first page of results.
Then internally executes a completion something like this:
"The search results for the search 'who won the superbowl' are: <paste in search results>. Now answer the question 'who won the superbowl?'"
The question is probably more about being able to create an index untainted by poor economic incentives.
It seems that ChatGPT is already based on some high quality content, evaluation, and filter mechanisms... and _somewhat_ of a powerful reasoning engine. Further, it will be interesting to see if OpenAI innovates a life-long learning approach to avoid classic stability vs plasticity dilemmas when incorporating new knowledge not yet trained on.
However, even if they provide access to the internet, they need a well-built, high-quality index to feed pages into the AI.
My question is: does such an index exist?