Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"HACKER NEWS - A dataset that contains all stories and comments from Hacker News since its launch in 2006."

I know what I'm doing this weekend.



Some resources to get you started:

- How to use the Hacker News dataset https://medium.com/google-cloud/big-data-stories-in-seconds-...

- Discussion of the HN dataset announcement here https://news.ycombinator.com/item?id=10440502

- An iPython notebook: https://github.com/fhoffa/notebooks/blob/master/analyzing%20...

- More: http://debarghyadas.com/writes/looking-back-at-9-years-of-ha...

Disclaimer: I'm Felipe Hoffa, and I work at Google. (https://twitter.com/felipehoffa)


How fresh is the data, and how often is it refreshed? It doesn't seem to be described anywhere...


Looks like the data goes up through 2015-10-13. I created a Look (disclosure: work for Looker) that shows story counts by day for the last 365 days here: https://looker.com/publicdata/looks/169?show=viz


It's mildly amusing:

> SELECT sum(length(text)) FROM [bigquery-public-data:hacker_news.comments] where author="jrockway"

4830955

That's almost 5 MB of comments I've written.


Get outside more :)


Being outside does not preclude internet access... :)


I'm dying to get comment scores, not just submission scores.

There's a lot you could do with that to find the best comments, which is really why HN is so awesome.


Blame HN for that. It was removed from public access right after I made a blog post about it, although it was coincidental I think.


Though the data could be assembled; I can see my comment scores, you can see your comment scores...


> There's a lot you could do with that to find the best comments.

In any given situation, how often does best comment you could write coincide with the one that would get you the most upvotes? I'm guessing not frequently.


Your best is different from his/her best.


Would be interesting to find out the value (upvotes) of sponsored content vs. unsponsered content, as well as comments from green users vs normal users.


There isn't sponsored content on HN. (if you are referring to the job ads, those do not receive upvotes and follow a steady rank decay.)


What? The lack of upvotes (or steady decay) does not make it “not content”.


similar: https://hn.algolia.com/ ( Search Hacker News )




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: