Yeah,
I built something almost identical in langchain in two days. It can also Google for answers.
Basically in reads through long pages in a loop and cuts out any crap, just returning the main body. And a nice summary too to help with indexing.
Another thing i can do with it is have one LLM go delegate and tell the scraper what to learn from the page, so that I can use a cheaper LLM and avoid taking up token space in the "main" thought process. Classic delegation, really. Like an LLM subprocess. Works great. Just take the output of one and pass it into the output of another so it can say "tell me x information" and then the subprocess will handle it.
Basically in reads through long pages in a loop and cuts out any crap, just returning the main body. And a nice summary too to help with indexing.
Another thing i can do with it is have one LLM go delegate and tell the scraper what to learn from the page, so that I can use a cheaper LLM and avoid taking up token space in the "main" thought process. Classic delegation, really. Like an LLM subprocess. Works great. Just take the output of one and pass it into the output of another so it can say "tell me x information" and then the subprocess will handle it.