Scraping using LLMs directly is going to be really quite slow and resource inten...

travisjungroth · on March 25, 2023

I did this for the first time yesterday. I wanted the links for ten specific tarot cards off this page[0]. Copied the source into ChatGPT, list the cards, get the result back.

I'm fast with Python scraping but for scraping one page ChatGPT was way, way faster. The biggest difference is it was quickly able to get the right links by context. The suit wasn't part of the link but was the header. In code I'd have to find that context and make it explicit.

It's a super simple html site, but I'm not exactly sure which direction that tips the balances.

[0]http://www.learntarot.com/cards.htm

tomberin · on March 25, 2023

These kind of one-shot examples are exactly where this hit for me. I was in the middle of some research when I saw him post this and it completely changed my approach to gathering the ad-hoc data I needed.

arbuge · on March 25, 2023

> Using LLM to write your scrapers though is a perfect use case for them.

Indeed... and they could periodically do an expensive LLM-powered scrape like this one and compare the results. That way they could figure out by themselves if any updates to the traditional scraper they've written are required.

geepytee · on March 25, 2023

I'd invite you to check out https://www.usedouble.com/, we use a combination of LLMs and traditional methods to scrape data and parse the data to answer your questions.

Sure, it may be more resource intensive, but it's not slow by any means. Our users process hundreds of rows in seconds.