Hacker News new | past | comments | ask | show | jobs | submit login

It really bothers me that Google has allowed groups to deteriorate so much. At one time, it was actually a decent archive of Usenet history, but since then, search has deteriorated, and apparently, posts are sometimes missing.

Perhaps the people who are now famous that posted on Usenet back then didn't want posts from their college days in the spotlight, and their friends at Google implicitly let the site go to hell. </conspiracy-theory>

Edit: Hmm, I may be wrong. A while back, many of the links in this list of "memorable Usenet moments" [1] were broken, but they seem to work now.

Edit #2: Okay, I was not wrong. For example, take a look at the link on that page to "December 1982: First thread about AIDS" [2]. The link takes you to a Usenet post that doesn't even mention AIDS, in the newsgroup fa.telecom.

[1] https://support.google.com/groups/answer/6003482?hl=en

[2] https://groups.google.com/forum/#!msg/fa.telecom/EmQ-s_EGgSA...




Some Google usenet fun. Go to the main Google Groups page [1].

Use the search functionality to search for "tim smith csh callan". You get one result, which is a 2007 post from comp.os.linux.advocacy where someone is quoting a 1984 post of mine that was in net.unix-wizards. Note that my 1984 post is not found.

Now go to the Google Groups version of net.unix-wizards [2].

Search there for "tim smith csh callan". Now the above mentioned 1984 post is found, along with another 1984 post.

Lest you think that there is some problem when searching from the main page, click on the "Search all groups" link on the net.unix-wizards search results page, and it only finds the 2007 COLA post that quoted my 1984 post.

A search from the main Google search page, as opposed to the search within groups, finds the first 1984 post as the first result.

I've seen vast numbers of posts become unfindable by search, and then weeks or months later become findable again. For instance, there was a long time when if you searched for "Bill Gates" in Google's usenet archive, it would only return something like a dozen posts.

To put it bluntly, Google's handling of the usenet archives has been negligent and/or incompetent.

[1] https://groups.google.com/forum/#!overview

[2] https://groups.google.com/forum/#!forum/net.unix-wizards


I've seen vast numbers of posts become unfindable by search, and then weeks or months later become findable again.

I think this is an effect of the way Google searches/indices things; I am equally frustrated by pages that disappear from Google's web search which may or may not come back eventually (although I've seen more disappear than come back...) Remember that they're running a huge distributed system, and so consistency/completeness is probably relaxed in order to optimise other things they believe are more important. It's the same reason why even if Google says there are X results in a search, you often cannot view them all.

(Not that I'm actually agreeing with this behaviour, however. It's less noticed on the web where there tends to be a lot of redundant/similar information, but still not desirable at all.)


It was once possible access Google's archive of Usenet without Javascript. And there were "heavy" and "light" versions of the messages. The heavy versions have an enormous amount of Javascript, CSS and HTML cruft.

For example, http://groups.google.com/group/comp.unix.wizards/msg/24222e5...

However, later they switched to HTTPS and #! URLs. Around this time I remember getting $CLASSPATH errors. Perhaps this is evidence to support your incompetence argument?

The "content" here is nothing but some plain ASCII Usenet posts. How difficult is it to serve plain text?

Anyway, today the same URL has been converted to this:

https://groups.google.com/forum/#!msg/comp.unix.wizards/bllj...

As I said in an earlier thread, Google itself developed a proposal to deal with this #! URL problem and advises webmasters to revise these AJAX URL's to "escaped_fragment" style URL's:

http://developers.google.com/webmasters/ajax-crawling/docs/s...

But apparently when the webmaster is Google, the specification does not apply.

Years ago, I made my own archives of some important comp and net groups. Google is not reliable. This stuff should be placed with the Internet Archive.


> Edit #2: Okay, I was not wrong. For example, take a look at the link on that page to "December 1982: First thread about AIDS" [2]. The link takes you to a Usenet post that doesn't even mention AIDS, in the newsgroup fa.telecom.

I think that help article is just messed up. That section of the list seems to be doing something weird with the permalinks. The full links still work....just see e.g. the link in this 2002 metafilter post about the AIDS post. Still works: http://www.metafilter.com/22004/First-mention-of-AIDS-on-Use...

edit: though it's worth noting every other link in that list I've tried (even the ones with the weird permalinks) has worked correctly so far


That thread seems largely incomplete though. It contains 4 posts on Google Groups, but notice how the comments on MetaFilter seem to refer to more.


FWIW, there were issues with missing content even in the days of Dejanews.


True, especially in the really old archives... but there's a big different between missing content from the archive and being unable to find content you know is in the archive because of an apparently broken UI.


There was a company called AIDS that eventually changed its domain name from AIDS.COM...


The problem with usenet is that its usenet. It will never change. The poor experience, the slow updating, the almost non-existant moderation, the impossible spam filtering, endless abuse, nothing to stop stuff like alt.tasteless invading alt.cats - again, the difficulty of searching it well, etc.

I think a lot of people were seeing what web forums were doing (look at Slashdot or Metafilter from that era) and decided that supporting usenet was betting on the wrong horse. I don't really blame them. What we can do with completely controlled systems on our own software and on our own servers vastly surpasses what usenet was capable of.


The complaint here isn't that Google is failing to fix today's Usenet, but that their historical archive of Usenet content going back to the early '80s, which they acquired from DejaNews, has a pretty broken interface (and is worse than it was before they bought it).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: