Hacker News new | past | comments | ask | show | jobs | submit login

>> you were crawling news.ycombinator.com, right?

No, for retrieving the Hacker News Posts we were using the public Hacker News API, which returns the posts in JSON format: https://github.com/HackerNews/API

The crawling speed of 100...1000 pages per second refers to crawling the external pages linked from Hacker news posts. As they are from different domains we can achieve a high crawling speed while being a polite crawler with a low crawling rate per domain.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: