> [I]magine two submissions with the same number of upvotes and a different number of views. The article with more views probably has a lower quality than the one with fewer views, because more people have viewed it without giving an upvote. So the ratio of upvotes and views is a signal of quality.
I believe this creates a bias against longer articles. If a submission links to a longer piece, a user could take a long time to come back to HN to upvote the submission.
That's not to say this would be any worse than the current algorithm. By definition, a time-ranked frontpage and moving discussion will always favor shorter articles, or, if the content is long, produce plenty of comments that are only superficially related to the article.
As a submission ages, its rank wanes, and discussion gets more sparse: a piece that takes time to read, understand and digest will probably perform worse than a short one, since the discussion and votes will take place further into the future.
Absolutely true. We thought about it, but didn't have any idea yet. Maybe there's also a balancing feedback loop approach to it. Something like 'longer click to upvote delay' should lead to 'higher rank'.
My probability classes from college finally see some use!
What I would try to do is calculate the expected value of the number of upvotes from the people who have seen it and use that as the metric.
So for any post there should be some number of views and some number of votes. The amount of time from when a user clicks the link to when a user clicks the upvote forms a distribution, because each user is different.
Then you assume that that distribution is a probability distribution for the users who haven't clicked on the post yet, and calculate the expected value using that.
This tries to mitigate the fact that there's a bit more delay in upvotes when reading longer posts.
This doesn't solve the problem of if there are too few upvotes to get a reliable distribution, or basically that it gives no advantage to longer posts when they are first posted. You can solve this with something called a prior distribution, and maybe scraping posted links to determine the length.
Another small issue is users upvoting an article after taking a break from reading, or maybe they are familiar with the link and don't even need to read it before upvoting. Remember we're trying to estimate how long would it take for users to read the article, and in those cases, the user isn't really doing that as much. So you could try to do some filtering or transforming before putting data points of that type into the distribution.
I believe this creates a bias against longer articles. If a submission links to a longer piece, a user could take a long time to come back to HN to upvote the submission.
That's not to say this would be any worse than the current algorithm. By definition, a time-ranked frontpage and moving discussion will always favor shorter articles, or, if the content is long, produce plenty of comments that are only superficially related to the article.
As a submission ages, its rank wanes, and discussion gets more sparse: a piece that takes time to read, understand and digest will probably perform worse than a short one, since the discussion and votes will take place further into the future.