pyspider comes from a vertical search engine project. we have two issues:
- 100+ websites, they may change the template or down sometime.
We need a dashboard to monitor the changes and the fails.
- update in 5 minutes, when the website updated, we need follow that in 5 minutes.
We are using a update time from index(list) page to tell the changed pages.
And pages should been updated after about 30 days in case of we missed something.
A powerful scheduler is needed.
obviously, I hadn't got the right way to do so with scrapy. I'm not very familiar with scrapy. So I can't say something pyspider can do but scrapy not.
pyspider comes from a vertical search engine project. we have two issues:
- 100+ websites, they may change the template or down sometime. We need a dashboard to monitor the changes and the fails.
- update in 5 minutes, when the website updated, we need follow that in 5 minutes. We are using a update time from index(list) page to tell the changed pages. And pages should been updated after about 30 days in case of we missed something. A powerful scheduler is needed.
obviously, I hadn't got the right way to do so with scrapy. I'm not very familiar with scrapy. So I can't say something pyspider can do but scrapy not.