Hacker News new | past | comments | ask | show | jobs | submit login

Well, finding and then publishing the name of the least viewed wikipedia article would then drive up the view count, nullifying the initial reason for the search!




This is exactly what I had in mind when I was writing this! Except just observing it would not change it, publishing it would.


He's changed the stats by taking a look at the articles to verify things like disambig status, so you'd have to be quite careful how you did your analysis to ensure no reflexivity.

Personally, I think the reflexivity is part of the fun if you do a followup analysis. For example, I recently scraped WP to find 'the first unused acronym on Wikipedia': https://gwern.net/tla - it turns out to be 'CQK', and I'm looking forward to checking back in 10 years or so to see if anyone wound up using 'CQK' for a company or something, precisely because I wrote it up. We'll see!


What was the initial reason then? I'm not following you. Is there a problem?

TBF, he's analyzing a 2021 dataset, and that will of course not be affected.


No, I'm just making an observation that he will destroy the uniqueness of the object by talking about it.


He is talking about its uniqueness in the 2021 dataset, which is time bound and unaffected by current phenomena.


The quest for finding Hapax Legomena on the Internet suffers from a similar problem, only worse. If you find one, announcing its existence destroys it. See: quizzaciously.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: