Hacker News new | past | comments | ask | show | jobs | submit login
Following the 1st link on Wikipedia leads to 'Philosophy' for 93.4% of pages (kevinstock.org)
61 points by p4bl0 on May 27, 2011 | hide | past | favorite | 31 comments



What about for "Philosophy" itself? I tried it and it seems like I wound up in a loop around "Indo-European languages" which didn't include "Philosophy", but it's possible I might have accidentally clicked the second link somewhere.

What if you set a bunch of random walkers churning through wikipedia, following not the first link but a random link from each page? The frequency with which we wound up visiting any given page would tell us something about the importance of that subject to the total schema of knowledge that wikipedia represents.

(That wasn't a particularly good use of the word "schema", I just kinda wanted to say it.)


I case anyone else is interested, Kevin Bacon is 20 steps from Philosophy.


How about Paul Erdos?



What a coincidence, my Erdos number is seven too!


I did an informal survey, and it seems like the same is true for Conservapedia and "nation" -- which might be the most common non-stop word spoken on "The Colbert Report."

http://www.conservapedia.com/Nation


I wonder if this will affect how people write Wikipedia articles. Will authors start trying to point to something obscure now? It would be neat if the guy went back and did this same analysis in a few months after this sunk into editors brains.


No, this has been a known fact on Wikipedia for years.[1] It has only recently gained mainstream attention after that xkcd comic, but it had been bandied about on reddit and elsewhere long before that. The generally accepted practice on Wikipedia of starting an article with, "[subject] is a [descriptor]? [superclass] that..." means eventually you get broader in scope until you wind up at, surprise surprise, philosophy. Of those articles that don't follow the pattern, the vast majority get stuck in loops. You wouldn't be able to avoid this phenomena without drastically changing how encyclopedic articles are written. In fact, I'd hazard a guess that if you went through a paper encyclopedia and "followed the link" by flipping to the article of the first appropriate word in a given article, you'd see the same pattern emerge.

[1]http://en.wikipedia.org/wiki/Wikipedia:Get_to_Philosophy


> Of those articles that don't follow the pattern, the vast majority get stuck in loops.

Are there any articles that don't follow the pattern and don't get stuck in loops? The only way I can see that happening is if the article has no links at all.


You're correct; it would either have to get stuck in a loop or end in an article that had no links. The latter is highly unlikely, except perhaps for some small stub articles that link to another stub. Of the articles that don't link to Philosophy, I'd guess 99.9% would get stuck in loops or have no links to begin with.


Research has also proven that a relatively very small group of contributors actually make most of the changes on Wikipedia, believe it's in the order of about 100 people (can't find the link at the moment). It's well possible that it's a conspiracy, or there's a secret style guide we don't know about.


A small group of contributors make the most edits by count. A diverse group of contributors write the most prose content, usually on a topic that they're expert in. So the core wikipedians are organizing/wikifying/categorizing and that's the bulk of the edits, but the bulk of the content is a different story.

http://www.aaronsw.com/weblog/whowriteswikipedia


Ok, so I got the details wrong. However, still think this effect could partially be explained by the tightly knit community of editors.


The style guide isn't secret.[1] Also, the research saying that most changes were done by a small group (~1500 way back in the day [2]) was challenged. While the core group of wikipedians are responsible for most of the copyediting, categorizing, etc, the bulk of material additions came from outsiders.[2]

[1]http://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style

[2]http://www.aaronsw.com/weblog/whowriteswikipedia


For the lazy, here's an example of this in action: http://dl.dropbox.com/u/315/random_pics/philosophy.png


As a joke I put Hitler in. Didn't work. Nor did List of Device Bandwidths. Or Philosophy itself.


I think if you do a manual test you'll find that at least some of the ones posted as ending points also eventually lead to philosophy. I know for example I hit Knowledge several times on my way to Philosophy on the day I did the test manually.

Interestingly, Language does create a sort of loop that does never hit philosophy.


He didn't declare those pages as ending points, only pages that were reached.

Philosophy is not an ending point either; it's part of a loop, which is why everything within the loop is at the same percentage.


I wrote a script to use Wikipedia:Random to check as many trails as possible (still data logging with it now!).

http://github.com/basicxman/extended-mind


Wikipedia provides full database dumps, if you seriously want to check as many trails as possible. http://en.wikipedia.org/wiki/Wikipedia:Database_download


This serves the double purpose of not fragging their servers with bot requests.


Makes sense. Philosophy is the root from which all rigorous knowledge domains diverged into specializations.


'Mathematics' and 'property' seem to be where a lot of articles lead to which then ends at 'Philosophy'.


Doesn't work for 'porn', ends in a loop.


Interestingly, "porn" may be one of the few English words for which wikipedia is not among the top ten hits of a google search.


This time there is statistics computed using a wikipedia dump :-).


This isn't useful at all and doesn't even make sense, ironically since "sense" is the first listing.

Title is also completely wrong. It isn't 93.4 % of pages, it is 93.4 % of pages in certain categories (primarily dealing w/ philosophy)

Beyond that, are "sense" or "Perception" the name of a category of pages and the percentage the percentage of pages in that category that lead back to "philosophy" or are they individual pages?

If they are individual pages, you shouldn't have a percentage, you should simply have a yes/no. If they are categories, how could the percentages possibly be the same for different categories? And how could this large number of pages all form a loop?


> 93.4 % of pages in certain categories (primarily dealing w/ philosophy)

Per the article, this was from an entire database dump of wikipedia.

> Beyond that, are "sense" or "Perception" the name of a category of pages and the percentage the percentage of pages in that category that lead back to "philosophy" or are they individual pages?

> If they are individual pages, you shouldn't have a percentage, you should simply have a yes/no. If they are categories, how could the percentages possibly be the same for different categories? And how could this large number of pages all form a loop?

Those are individual pages. The percentage refers to the percentage of other wikipedia pages which eventually reach them, by following the first link in the page. The loop is formed when by following the first links in pages beginning at an article you eventually return to that article; it is, obviously, possible to enter the loop from other pages.

I encourage you to go to wikipedia and play around with following the first link in each article (not in parenthesis or italics) for a bit, as it seems like actually going through that process would alleviate your confusion and lack of understanding.


Thanks for the correction. Totally misunderstood the article.

I still find it incredible and hard to believe that 89.60% of Wikipedia articles end up at Modern Philosophy.


I think you're not understanding the problem here. Did you read the relevant XKCD? The 93.39% is of all pages. So, 93.39% of pages will eventually reach the Sense article. Of those 93.39% of pages that reach Sense, 100% of them will reach Perception, Philosophy, etc. because there is a big loop that contains those pages.


To be fair, this phenomenon was fairly well-know amongst wikipedians long before Randall publicized it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: