My probability classes from college finally see some use!
What I would try to do is calculate the expected value of the number of upvotes from the people who have seen it and use that as the metric.
So for any post there should be some number of views and some number of votes. The amount of time from when a user clicks the link to when a user clicks the upvote forms a distribution, because each user is different.
Then you assume that that distribution is a probability distribution for the users who haven't clicked on the post yet, and calculate the expected value using that.
This tries to mitigate the fact that there's a bit more delay in upvotes when reading longer posts.
This doesn't solve the problem of if there are too few upvotes to get a reliable distribution, or basically that it gives no advantage to longer posts when they are first posted. You can solve this with something called a prior distribution, and maybe scraping posted links to determine the length.
Another small issue is users upvoting an article after taking a break from reading, or maybe they are familiar with the link and don't even need to read it before upvoting. Remember we're trying to estimate how long would it take for users to read the article, and in those cases, the user isn't really doing that as much. So you could try to do some filtering or transforming before putting data points of that type into the distribution.
What I would try to do is calculate the expected value of the number of upvotes from the people who have seen it and use that as the metric.
So for any post there should be some number of views and some number of votes. The amount of time from when a user clicks the link to when a user clicks the upvote forms a distribution, because each user is different.
Then you assume that that distribution is a probability distribution for the users who haven't clicked on the post yet, and calculate the expected value using that.
This tries to mitigate the fact that there's a bit more delay in upvotes when reading longer posts.
This doesn't solve the problem of if there are too few upvotes to get a reliable distribution, or basically that it gives no advantage to longer posts when they are first posted. You can solve this with something called a prior distribution, and maybe scraping posted links to determine the length.
Another small issue is users upvoting an article after taking a break from reading, or maybe they are familiar with the link and don't even need to read it before upvoting. Remember we're trying to estimate how long would it take for users to read the article, and in those cases, the user isn't really doing that as much. So you could try to do some filtering or transforming before putting data points of that type into the distribution.