Hacker News new | past | comments | ask | show | jobs | submit login

I'm working on a benchmarking suite https://gist.github.com/binux/67b276c51e988f8e2c31 and meet some problem...

pyspider comes from a vertical search engine project. we have two issues:

- 100+ websites, they may change the template or down sometime. We need a dashboard to monitor the changes and the fails.

- update in 5 minutes, when the website updated, we need follow that in 5 minutes. We are using a update time from index(list) page to tell the changed pages. And pages should been updated after about 30 days in case of we missed something. A powerful scheduler is needed.

obviously, I hadn't got the right way to do so with scrapy. I'm not very familiar with scrapy. So I can't say something pyspider can do but scrapy not.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: