cperciva, I really appreciate the effort you put into this.
One thing I find a bit depressing is that even though this scheme hugely reduces the time one spends on HN by filtering the articles according to votes, the top voted articles are not the ones that I would want to read. I usually come here for that "hacker" hacker news (hackerhackernews.com used to be something like this, but now the domain name has expired it seems). Anyway I am sure there will be people who find your service really useful.
That's exactly why I created http://www.hackerblogs.com if you check HN frontpage right now, there are zero posts by hackers, all come from PR blogs like techcrunch, readwriteweb, thenextweb, etc.
I highly recommend you giving it a try, it is still growing and I hope someday it may become the new source of real hacker news.
* I dare you to find an article like "Flipping arrows in coBurger King" on the news sites right now.
Nice site, just added my blog. Only problem I see is that if it does keep growing and becomes popular, then people could add feeds of irrelevant blogs, and like most social news sites it would turn into a digg/reddit.
I tried to register http://deadpanic.com/blog/ , but apparently didn't make the cut. Any suggestions for improvement, or insight into the criteria you use?
Useful web site, added to bookmarks.
But, regarding "Flipping arrows in coBurger King", I saw that entry as first news in Haskell category, submitted probably shortly after your comment :)
Here is link: http://www.hackerblogs.com/post/rsfepzwv
Remember that being at the top of the front page at some point during the day does not necessarily mean that an article is in the top 10 links for the day.
I think looking at scores -- without looking at how long it took for those scores to be reached -- probably weights in favour of more "interesting" stories and against link-bait stories, simply because the link-bait tends to accumulate most of its votes very quickly.
Out of curiosity, I just ran the numbers for Love in the age of the pickup artist. Right now, at 15:00, it's in 6th place; I suspect that by the end of the day it will fall past 10th place, since these last 9 hours are probably when the most votes are cast and that link isn't getting many votes any more.
So that story is probably a perfect illustration of when sorting by highest score alone is better than HN's score-per-time measure.
I think it ranks, or at least it did when the feature was added, items only by people who have had accounts for at least a year. (I may be wrong or it may have been longer than a year)
If that is the case, then HN Classic is very good proof of the change seen in the submitted stories. I loved seeing the HN Classic homepage and might keep it as my goto page (if it is still updated).
It's not perfect, certainly -- over the past 3 days I've still only read about half of the top-ten links. This is about avoiding the situation where an interesting highly-voted article gets drowned out by all the noise rather than an attempt to find "good" articles which don't get lots of votes.
I started http://www.hackernewsletter.com/ last week. The selected articles I pick are based on a combination votes, comments, and items that I found valuable. I received similar feedback from a couple HN members and thought of splitting up the newsletter based on the article "types". I will be trying that out for the next edition. I think some weeks it will work better than other since the quantity/quality vary a good bit week to week.
I think this would be great, especially as an option. The posts that generate the most discussion are often interesting, and curiously aren't always the top voted.
That's a lot harder to do -- I can get away with scraping the front page right now because highly voted posts are likely to be on the front page for most of their vote-getting lifetime, but that argument probably doesn't hold about highly commented posts.
Hm, this is true. Perhaps you could scrape /newcomments and keep track of which stories are getting new comments?
This would hinge on the frequency of all comments to the site and having your scraping interval be less than the difference of the most recent comment and last comment displayed on /newcomments.
I think it'd just be really great to have another view into "Most active discussions".
I've taken lots of downvotes lately to say that HN is becoming less hacker and far more generalist.
That happened to virtually every other site: Slashdot, Digg, Reddit. As the site gets attention (because of its core strength of great hacker content) it starts drawing in a non-core audience. Because it's a voting system the non-core articles will start to grow in influence. It loses what drew people in in the first place.
I second this! I would gladly pay 5-10 bucks a month for a daily digest of top hacking-related articles (no business, no politics and no Apple rumours/speculations). HN used to be 80% programming and 20% business-related, now it's the opposite.
Business is a lot wider than startup, though. There has been a huge number of articles about Apple - I seriously don't understand how such a massive quantity of (often gossipy) articles about the ipad/iphone and friends got upvoted, especially considering their hacker-unfriendliness. Then also a lot about Google, Facebook, Microsoft, the RIAA, general politics...
I have to say that's surprising. I'm relatively new here, but I thought most people would get here through Paul Graham/YCombinator, which has more to do with business than programming.
personally I came here because I would randomly check paulgraham.com ever since his articles relating to lisp were being posted on slashdot. One day the YC link appeared which in turn had a link to hacker news. While I have gained an appreciation for start up culture(definitely the hacking approach to business) since then I still primarily hang around for articles on programming.
If you are looking for something that is 100% programming then http://news.usethesource.com/news is a good choice. It is built on the same platform as HN
Disclaimer - I have no affiliation with the amazing peeps at usethesource
This doesn't really solve the problem though. There's still to much information to manage it.
I wonder if any social-links-sites (hacker news, reddit etc.) have tried with an algorithm similar to that which last.fm uses, where you get suggestions on stories based on which previous stories you have shown preference for? (I'm sure this sort of ranking has some fancy name as well)
Reddit certainly attempted this a few years ago. They had a 'recommended' tab in their main navigation. It worked about as well as their search at the time (ie. completely useless).
If I recall, it was based on your activity (this was pre-subreddits) so if you interacted with lots of political stories, for instance, similar political stories would be suggested.
The problem was if you downmodded a bunch of articles involving Ron Paul, your recommended links would be Ron Paul articles.
The problem with /best in my view is that links gradually come and go -- it's the right format for "what are the recent interesting stories", but it's not the right format for "what are the most interesting stories since the last time I was here".
More evidence that curation, not creation, is the point of sharpest demand in the media business. The immediate challenge, I suppose, is finding the right mix of focus and serendipity.
More broadly, the challenge is having my own life modeled well enough so that information relevant to short, medium, and long terms plans gets reformulated appropriately.
I could see this leading to a point where 'news' is not something I check in the morning over coffee. Rather, it's a feature that presents itself whenever I shift my attention to doing another thing (e.g. working on project A, planning weekend B, etc.)
The really fascinating thing would be getting updates about apparently tangentially related items. It's the classic 'local angle', only with regard to activity, not place.
Thanks for doing this, interesting tool but I won't be using it! Let me explain.
I have often found that the articles I enjoyed most at HN were the outliers, i.e. not the ten most upvoted ones (anecdotal evidence, never tested this quantitatively). In fact this is what makes HN interesting: the quirky entries. My guess is that most of the 10 articles you select will be already covered by other such sites, diminishing the value of coming to HN in the first place.
The end effect will be similar to Hollywood blockbuster effect. It's not that I don't like to go a blockbuster movie, but I don't want to watch those all the time.
My blog code defaults to only putting one paragraph into the RSS feed -- for most of my blog posts this works well. I've adjusted my script so that future dailys will include the list of links in the RSS feed.
There are a number of article recommendation engines out there that can fill the need for "outlier" articles fitting even the most peculiar tastes. I personally use http://www.euraeka.com and even though it aggregates news from less hardcore programming sources I find it an incredibly powerful source of science and technology news that fit my taste. I tried Digg and Reddit recommendation engines but they all work on user-to-user based recommendations and most of the time i get either inaccurate or trivial recommendations.
you beat me to it! I should have a prototype working by the end of the week for something similar (but hopefully better as well). It was a learning exercise for me so I didn't really lose any time.
since everyone's ideal is diff prob the most surefire and arguably easiest way for you to get the particular aggregated and/or curated view is for you to write a small script that does exactly what you want. if others might want same then make the code avail with a link from your HN profile. If a particular service or view hack becomes popular perhaps PG will add an equiv feature to HN itself, etc.
This is a pretty cool idea and a perfectly simple implementation. However, I think you're solving a problem that doesn't exist: If I wanted efficiency, I wouldn't be reading blogs and news aggregators in the first place. I come to websites like HN to relax and browse through interesting articles and discussions -- sort of like leafing through a good magazine. If you distill it down to 10 articles then suddenly I'm finished reading and I can get on with my work... Too soon!!!
My bad, I get excessively terse when I'm tired. I wasn't suggesting it out of utility but out of appreciation — nothing says thank you quite like a superfluous vanity domain.
EDIT: Just noticed my other post has garnered at least one down-vote so although I'm pretty sure this idea doesn't have legs, on the off-chance you dear reader are one of ~13 other folks who'd like to see this happen, drop me an email to express your interest.
You might be interested in checking out the page -
http://news.ycombinator.com/best
Sounds a lot like what you're doing.. I'm not sure if others know about it already - I think I stumbled upon it by accident.
Could you make it that the actual news articles show up in the RSS feed when displayed on google reader? I don't want to have to click the page and then click the article to actually view it.
The /news page ranks links based on score and time since submission. I'm only ranking links based on score (and whether it has been on a previous daily).
I could scrape the entire site at midnight each day, but I think PG would be very unhappy with me if I did that. Scaping /news every 5 minutes imposes much less load, and since highly ranked links get almost all of their votes prior to falling off the front page, this gives me almost as much information.
I'd prefer to not post the code publicly, simply because I don't want to encourage people to put extra load on the HN server -- but if you want a copy, send me an email and I'll provide it.
I use FreeBSD's fetch(1) to download the page, but curl or wget would have worked just as well. Extracting the data I want (item #, score, and link) is a few lines of perl. Managing the data over the course of the day and writing out the final HTML is done using standard BSD text utilities (sort, join, comm, cut).
Part of the problem is how Hacker News seemingly have turned into something more akin of News. The hacker tidbits are far and few between, and the overall amount of new submissions have skyrocketed.
One thing I find a bit depressing is that even though this scheme hugely reduces the time one spends on HN by filtering the articles according to votes, the top voted articles are not the ones that I would want to read. I usually come here for that "hacker" hacker news (hackerhackernews.com used to be something like this, but now the domain name has expired it seems). Anyway I am sure there will be people who find your service really useful.