pyspider comes from a vertical search engine project. we have two issues:
- 100+ websites, they may change the template or down sometime.
We need a dashboard to monitor the changes and the fails.
- update in 5 minutes, when the website updated, we need follow that in 5 minutes.
We are using a update time from index(list) page to tell the changed pages.
And pages should been updated after about 30 days in case of we missed something.
A powerful scheduler is needed.
obviously, I hadn't got the right way to do so with scrapy. I'm not very familiar with scrapy. So I can't say something pyspider can do but scrapy not.
Can you compare to scrapy as requested by other posters. Why could you not build on top of scrapy and leverage celery for scheduling etc (http://www.celeryproject.org/)
What is the immediate value add to using pyspider ?