Keep in mind that if you crawl this site your ip will be banned. At least mine w...

dbaupp · on Dec 9, 2012

There is an unofficial API: http://www.hnsearch.com/api (Provided by the very search engine referred to in the OP, haha!)

pyre · on Dec 9, 2012

Unfortunately there is no API for getting access to personal information on HN (i.e. comments I have made, or stories I've upvoted). You're relegated to scraping if you want that information.

unholygoat · on Dec 17, 2012

now there is :

here is how you can pull a specific username's submissions and you can add filters: http://api.thriftdb.com/api.hnsearch.com/items/_search?filte...

And then here is how you can pull the comments for a specific thread/discussion/id: http://api.thriftdb.com/api.hnsearch.com/items/_search?filte...

you can now grab a lot of data.. including they enlarged the site's rss feed in hopes of slowing a few of the scrapers..

there are a few items missing, but they added a lot: http://www.hnsearch.com/api

btw that includes a user bio now, as well as things you've upvoted... etc.. its all just done via filters..

the also boosted the rss feed to help slow down the strapers

beatgammit · on Dec 9, 2012

You can always get around this by throttling your web crawler. It will take a much longer time, but at least you'll be able to read HN in the meantime.

alphast0rm · on Dec 9, 2012

The tricky thing when doing this is knowing what rate to stop at without getting permanently banned. I built an Android Market crawler two summers ago, and luckily Google only temp bans (from my experience), so that might be an easier project without any risk.

quesera · on Dec 10, 2012

Respecting robots.txt is probably the best plan.

ig1 · on Dec 9, 2012

Use disposable IPs.

ymn_ayk · on Dec 9, 2012

Why will you be banned? Do you know about the reason?

jgeralnik · on Dec 9, 2012

To prevent people from (unintentionally) DDoSing the site.