Current llm can only handle so much text per generation, are generally slow, and...

Current llm can only handle so much text per generation, are generally slow, and often you pay per token, so there's an incentive in using them sparingly.

If you have a large text corpus to search that means restricting the search field with a query first, then feed the result on llm for extracting facts.

So everything hinges on having an excellent search index to reduce the search space, and the best tool we have as of today is running a semantic search over representations of a text's topics in embedded format.