This is a tough problem because if you asked most sites/admins if you could scrape their site, they don’t really have much motivation to say yes. For some it feels like you are stealing their hard work. For others they don’t want to pay for the requests your scraper will make to their site, etc.
Regardless, I’m an ideal world, some polite things would be to:
- ask for permission before scraping, explaining how it’s neutral or positive for their business (so they are more likely to support your continued scraping or even potentially provide a more cost-effective format/API for you)
- scrape at a reasonable pace
- scrape off hours
- make requests with a clear user agent so they know where the requests are coming from
Regardless, I’m an ideal world, some polite things would be to:
- ask for permission before scraping, explaining how it’s neutral or positive for their business (so they are more likely to support your continued scraping or even potentially provide a more cost-effective format/API for you)
- scrape at a reasonable pace
- scrape off hours
- make requests with a clear user agent so they know where the requests are coming from