Hacker News new | past | comments | ask | show | jobs | submit login
Loss of nearly a full decade of information from early days of Chinese internet (chinamediaproject.org)
244 points by cubefox 8 months ago | hide | past | favorite | 85 comments



Intentionally or not this is -- exactly -- what 1984 is all about: changing our perception of history by rewriting or erasing previous writings.

Unfortunately alot of it from the article seems typical: blogs going off line as bloggers move to new technolgoies, social media companies going defunct or just not keeping old content.

Alot of these things can happen in the west. Remember these books you could read? "The Feynman Letters", etc. I'm paraphrasing-- but its impossible now.

Think of this: emails? A person dies and their laptop dies or is disposed of -- they're all gone. In the past the physicality of the letters would persist. Not so now. All this correspondence vanishes.

Facebook, are you kidding me? If someone famous thought to export their data -- and it can be found on a laptop still working (and you have the login password), then maybe. See above. This repeats and repeats for each system we interact with for communication.

Aside from the laptop scenario-- all this is lost. We live now in a blackhole of historical details of information, and soon to be replaced by a fabricated history hallucinated by LLMs perhaps.

Those that love historical understanding should be very worried.


> Posted on Wednesday, May 22, He’s post had been removed from WeChat by the following day, yielding a 404 message that read: “This content violates regulations and cannot be viewed.”

You don't get your comments censored by commenting about natural entropy on the internet. You do get your comments censored by drawing attention to the censors.

I get very tired of people drawing false equivalences between organic human behaviors in the West and intentional abuse by central authorities in China. We can and should do more to preserve our history in the West, but we are already preserving orders of magnitude more data per person than any of our ancestors could have dreamed of. There's no comparison between emails getting lost when someone dies and centralized censors actively purging old content to make it easier to change the party's narrative.


I've love to have a single letter from some of my ancestors.


I have one, actually, from my grandpa's generation. He told another family member about his time growing up in the early 1900s, riding trolleys and eating Walnettos (a strange Walnut-based candy bar). Then the Spanish Flu came around and the eldest sister just died at the breakfast table one day. Later, the family rallied together to care for each other after his father lost his job due to automation. He moved on to doing odd jobs, then later fell off the roof and broke his back, ending up as an invalid for the rest of his days. They talked about the cherry trees they used to feed themselves, which explains grandma's fondness for the cherry soup I hated so much, and how my grandma and grandpa got married and took care of great grandpa while he was invalid.

They also talked about how Wonder Bread (the original sliced bread and origin of the phrase "best thing since sliced bread") came into town and the eldest son went to work for them to support the family after the local baker he had worked for folded, lost a finger to the machinery. At some point, he had some kind of heated dispute at work due to this, was beaten by security, and as I'm told, died from injuries sustained during that beating some time afterwards.

It was a weird little window into bits of family history that would have otherwise been erased.


I have some letters from 3 of my grandparents, but beyond that, it's pretty much nothing.


Interesting perspective!

For us Russians, collective memory for almost all people starts with post-WWII era, usually 1950s. Old generation rarely told me about what was before - although i am old enough to vaguely remember some of my ancestors born around 1895 and spent a lot of time talking to those born around 1910.

One might think that it was about memories being overly heavy - indeed there was Commie and for some, also Nazi terror, hunger, and the stuff - before ~1956 an average Soviet starved at least for several weeks a year, and before ~1951, once in a few years, some relative always died of starvation - that was the norm, but real reason as i understand it, wasn't that. The reason is that there was almost nothing to tell. These people were illiterate peasants living very local-minded lives, without formal jobs (kolkhoz serfdom), without electricity or money, and with very little worldview apart from primitive propaganda pushed once in a while by visiting agitators.

Before WWII, there was almost nothing any of them could tell: only thing that could happen, was repressions, but those to who they happened, couldn't tell anything - they never returned - and their relatives usually forgot of them because it was too scary to remember. Apart from that, it was all the same - endless toil on a small plot of infertile land to produce as much food as possible to avoid kids dying next spring, and keep as much of it from kolkhoz eyes, and slacking off at kolkhoz forced work as much as possible to keep more energy to work on your own plot. Never leaving the village, unless forced out of it by Nazis or Commies (actually happened to my relatives - one day they were forced out of the village and it was razed, moved ~20km away, and left in the field, being issued some formal "compensation" in worthless money - had to dig a new earthhouse). That's the kind of stories i heard.

I can understand why they were not keen on telling them.

Stories of later generations had a lot more of "story" in them and i can understand they lived an actual life.


The original post was about natural entropy on the internet. Websites from 2005 that have disappeared or been redesigned so that you can't find their old content anymore, and the uselessness of search engines, domestic or foreign, for date range queries reaching that far back into the past. Even on the Internet Archive, the earliest working snapshot of Baidu Tieba is from 2006.

You may think that it's impossible for an innocuous post to get censored unless it has inadvertently unmasked a conspiracy to bury the past, but censorship decisions also get made to prevent unwanted reactions. If a post about disappearing content inspires people to complain about censorship, that's enough to suppress it.

If the disappearance of old websites were entirely deliberate, you'd also need to explain why the West is in on it.


> The original post was about natural entropy on the internet.

The post by He Jiayan was, but that post was taken down for violating regulations. TFA is largely about the censorship angle which He Jiayan specifically avoided talking about (not that it helped him).

> If the disappearance of old websites were entirely deliberate, you'd also need to explain why the West is in on it.

Name one figure who was prominent in between 1995-2005 who you can't find any content about from that era when using Google's date filters. A single figure.

Some sites go down organically. It happens. Every site that references a figure who was once favored and is now out of favor? That doesn't happen in the Western internet.


> Name one figure who was prominent in between 1995-2005 who you can't find any content about from that era when using Google's date filters. A single figure.

The original post listed multiple people famous in China at that time (including Taiwanese celebrities) where even Bing and Google didn't get them old enough results. Sure, they return results that supposedly match the date filter, but if you actually read them, it becomes clear that Google got the publication date wrong, because much later events are mentioned in the text. Or e.g. a YouTube video from 2004, before YouTube even existed. (Actually uploaded in 2013.)


Apparently I should have specified: prominent in the West. We've already established decay in the Chinese internet, I want you to back up your assertion that the West is "in on it".

Also, even Jay Chou, who I assume is the Taiwanese celebrity you're referring to, has a bunch of sources that are clearly from those dates:

http://www.china.org.cn/english/culture/81463.htm

https://westeastmag.wordpress.com/2002/09/08/made-in-taiwan-...

https://time.com/archive/6893975/cool-jay/


The west has almost the opposite "problem", where stuff that some people really want hidden and forgotten is replicated and spread and amplified so much that it will never be forgotten. We even have a name for this: The Streisand Effect

It really does illustrate the difference between information being forgotten and being deliberately censored. In the West the harder someone tries to censor information on the internet, the more amplified it is likely to get


[flagged]


I don't see how your comment relates to the parent's comment; however, here's a reply.

> All anyone has to do is farm a few accounts to flag the mildest mention of this out of existence, and to upvote the most obtuse, simplistic anti-enemy animus to the top.

Have you considered that the negative sentiment against Russia and China is genuine? I know of no evidence that the DoD has shills or bots upvoting pro-US-government comments and downvoting other ones. People probably just read the news and form their opinions that way, and there's a variety of different news sources with many different perspectives, which don't get censored.

> I'm jealous of the fact that Chinese people speaking out of the government-range just get deleted, rather than patronized.

It's strange to be jealous of them not having protection from government censorship.


Another false equivalence. "Intentionally or not" actually really matters here. It took work to maintain archives in the pre-digital era, and it takes work to maintain archives in the digital era. So many of those physical letters were lost, rotted, burned, etc.

This is a purge, not a failure to maintain archives. This is like when during the Cultural Revolution, they literally burned archives and letters by intellectuals.


I love your replay, your answer is the near perfect summing up of the issue! My view is some here in America are starting to get too lenient towards Russia and other authoritarian states. Do we not understand that these states want complete control and don't care how they get it? Information and educational purges are two of many ways this is done. After that, it gets dirty.

Rule of thumb, if the Constitution says it stinks, it does. If we don't like something in it, work for a change. In China and Russia they don't have that right.


> In the past the physicality of the letters would persist

I'm willing to bet that these physical letters have historically fared about as well as our digital letters are; otherwise, our world would be absolutely filled with the written detritus of the past.

> Those that love historical understanding should be very worried.

As humans we've always disposed of more than we've kept. It's just not worth the energy cost to operate any other way. Thankfully history is recorded as several overlapping collections and not as a series of single data points.


I inherited plenty of handwritten notes, etc. from my father. Not much from my mother.

After I read them, keeping them doesn't serve much of a purpose... in short-term. That's why I keep them.

What you describe as single data points is exactly what we want, but somehow we don't know that until it's too late. We cherish tablets about copper orders from times far past because somehow it's now more valuable. Who's to say yesterday's letters aren't going to be?


> We cherish tablets about copper orders from times far past because somehow it's now more valuable.

There were three discovered tablets and that was one of them. They were discovered in 1920 but only widely known about 100 years later. They're notable because they're described as the oldest found written complaint. They're mildly useful because they describe specific details of the commerce being conducted at the time, which comports well with other contemporary sources of the same information.

This particular artifact was written in 1750 BC. Our oldest writings extend back to 3400 BC. They're not particularly "cherished" but they are a widely known "meme" thanks to the Guinness Book of World Records.


Tangential, but what is "The Feynman Letters" here? I know of a book of some of his letters, but not about censorship/loss thereof.


Perhaps referring to this? I’m not entirely sure. https://ahf.nuclearmuseum.org/ahf/history/mail-censorship/


Oh thanks, it's funny how I had censoring the mail in a different mental category from publication/archives.


Recently: Google refuses to turn up old pages. I was recently searching for a person who used to have a notable web presence before passing away about a decade ago. I had to dig to find a few links, through DDG and Yandex.


Yandex is getting more and more of my web queries lately. There's a definite irony there.


Google and Bing (so DuckDuckGo as well) seem to like searching for synonyms of search terms and returning the most popular results, thinking popular means relevant. I remember looking for something where I remembered the exact terms and not getting anywhere with them, but on Yandex it was the first hit.


Yandex shows what it thinks you want, Google shows what it thinks you should want.


> Intentionally or not this is -- exactly -- what 1984 is all about: changing our perception of history by rewriting or erasing previous writings.

Yes.

China's current leadership is terrified of dissent. Even mild dissent. Even discussions within the party. There's no good reason to clamp down that hard. The current leadership is doing a reasonably good job. But they now have an Xi personality cult, which never ends well.

Yes, China botched their housing bubble, but so did Japan and the US.


I would guess that 99.9% of letters are destroyed


one link of jack ma between that time period on badu? bro that cant be no accident. if the chinese govt didnt do this id be more surprised i mean there already censoring most of the internet


Should the rewritten history still be preserved as history then?


> Posted on Wednesday, May 22, He’s post had been removed from WeChat by the following day, yielding a 404 message that read: “This content violates regulations and cannot be viewed.”


He will be educated.


Wow - this explains a lot about why Chinese LLMs and AI struggle so much to get data, despite the supposed near-infinite databases: a lot of it is just gone, far more than anyone had ever proposed might be the case, because of neglect, censorship, walled gardens/apps killing everything trapped within, and chilling effects.

You can't scrape what is no longer there, or was never written to begin with...


> supposed near-infinite databases: a lot of it is just gone

Pretty sure megacorps still have their near-infinite databases around, in walled garden as always, but that's another topic. Censored contents are almost certainly soft-deleted.

The problem is more about independent-ish sites, for example Tianya [1]. Think ancient BBS. Or newsgroups. If nearly all archive of these are lost it would indeed be very sad.

[1] https://en.wikipedia.org/wiki/Tianya_Club#cite_note-4


> Pretty sure megacorps still have their near-infinite databases around

Not necessarily. It's quite possible that, if no one accesses them, they got moved to backups with only one copy, and the backups may not actually be restorable in the case of disaster (it happens somewhat frequently even for backups of data you might actually care about, so imagine the effects of neglect on that).


There's also serious coordination problems internally. How easy is it to get access? Did you make buddies with the little emperors in charge of the historical archives? Is the archive in your company large enough? Are the chances of getting access to another competing company's dataset high enough to not require multiple decimal digits to express?

(This I think is part of why Chinese human genomics has been such a disappointment: yes, there's collectively a lot of data, but there ain't nothing like UKBB pulling it all together. Just thousands of fiefdoms.)


There are a few commenters in this thread making blatant false equivalence with the Western internet. This post is about how on major search engines in China, you now set the years to 1998-2005 and search for a non-controversial celebrity and you get zero search results from content actually published in that era.

The loss of the early web due to web hosters not maintaining their own hosting and moving to walled gardens is painful and tragic, but it is not in any way similar (or functionally equivalent) to this blatant censorship.


Yes, this is like if nothing turned up for Bill Gates when you did a search for pre-2006 material.


Yep... Only archive.org has it sometimes and then you need to search there because you won't find it via others.


But for the Western internet, it disappears because the people hosting those websites gave up, so all we have is archive.org. With this case, there appears to be a government-level purge.


The western Internet has a bunch of government archives, in addition to the Internet Archive and Common Crawl.

Many of the government archives are not public for copyright reasons.


Also, the Chinese Internet is "self-segregating".

You know how it is with the Great Fire Wall: you can't visit some outside websites from inside. Wikipedia was blocked completely in 2019.

There's actually another direction: you can't visit some inside websites from outside:

Most Chinese apps/websites are required by law to be tied to person identities. That means they have to be registered by phone number. In China, one person = one phone number. Without Chinese phone numbers, most Chinese apps/websites simply refuse to even let you use it.

There is no way to get a phone number without physically going to a Chinese phone-card bureau and present your ID card.

Indeed, it is getting difficult for foreigners nowadays to visit China. Without a phone number they can't do anything with Chinese apps, but they need that. Getting a phone number requires presenting a passport and a valid visa.

Foreign map apps are usually broken in China.

Foreigners who are not physically located within China are just trouble, from the Chinese point of view. Not only do they not want Chinese people to use foreign apps, they also don't want foreign people to use Chinese apps.

A few months ago I tried registering a QQ account. The "International" version is no longer maintained. When I tried nevertheless the last known good version, it just threw an error. The "domestic" version does not work when the phone is not physically located within China, and requires a Chinese phone number anyway.

About 2 weeks ago I noticed that Zhihu also stopped allowing you to expand long answers without an account. And of course, to register an account, you need a damned phone number. At least it allows American phone numbers.

Philosophically, I think it is the resurgence of the Chinese security mindset: Forbid all inside-outside contact by default. We have everything we need at home anyway.

Our dynasty’s majestic virtue has penetrated unto every country under Heaven, and Kings of all nations have offered their costly tribute by land and sea. As your Ambassador can see for himself, we possess all things. I set no value on objects strange or ingenious, and have no use for your country’s manufactures. --- Emperor Qian Long's Letter to King George III, 1793


Why expect search engines to return historical data accurately? Modern search engines have a lot of tasks like combating CEO and returning up-to-date data, and they have no incentive to preserve history as old as 2005 as it’s very likely that any page from that date has been superseded by more relevant articles. The task of preserving history should be delegated to archive.org and search engines are just not well positioned.


Did you read the post?


I did read the original post in Chinese, though not this post in English. Unfortunately beyond the claim "search engines show up zero results" the other claims like 99.99% pages disappearing doesn't fully match my experience. I have lots of threads between 2000~2010 on Baidu Tieba (Chinese Reddit), and most of these threads representative of my memories from that decade are alive (I can even access them anonymously). The earliest video on bilibili.com still available is av7 (7th video since its creation). While some websites like RenRen did disappear together with its history, most websites that remain relevant didn't delete its history in its entirety (lots of content did disappear because of modern moderation applied retroactively to old content, but that's case by case).


China is too easy of an example of rewriting history by political will.

In North Korea it is illegal to mention famine or hunger.

In Florida it is illegal to mention climate change in any state document.


> In Florida it is illegal to mention climate change in any state document.

citation needed. Oh, found one: https://www.miamiherald.com/news/state/florida/article129837... ( https://archive.is/P9k4m )

> DEP officials have been ordered not to use the term “climate change” or “global warming” in any official communications, emails, or reports

I'm not sure that amounts to illegal, but they did at least make it career impairing. Would be interesting to see someone sue for wrongful termination on that basis...


Ohhhh


> Within the selected date range of “May 22, 1998 to May 22, 2005” on Baidu, there is just one positive result for “Jack Ma” (dated May 22, 2024). [..] Click on the result and you’ll find it is an article posted in 2021

US Google: About 2,580,000 results

A pretty remarkable scrubbing of history.


There's pretty much nothing in that time range on Baidu, I looked up Mao, George Washington, Yue fei (a popular chinese folk hero), Garlic bread, etc.

But without the time filter, theres millions of search results.


It's probably easier to just blanket scrub everything beyond a small set of allowed information (like positive articles about the party) than to selectively delete. Why do they care if valuable information is lost?


https://www.google.com/search?q=jack+ma&tbs=cdr%3A1%2Ccd_min...

First result for me is https://www.scb.co.th/en/personal-banking/stories/business-m... which Google thinks is from 2003-03-15, except it mentions COVID-19 so it obviously isn't.

Second result is https://www.instagram.com/jack_overpower/feed/ which Google thinks is from 2001-01-02, except Instagram didn't exist at that time. It might have pictures from 2001 though.

Third result is http://pacificpower.foreignpolicy.com/15-jack-ma/ which Google thinks is from 1999-02-15, except it mentions Alibaba's 2014 IPO so it obviously isn't.

Fourth result is https://www.facebook.com/story.php/?story_fbid=5041357966634... which Google doesn't show a date for, but it's a Facebook post from 2018.

...

I don't doubt that some of those results are from 1998 to 2005, but the millions of results number specifically is meaningless.


The "custom range" feature simultaneously feels broken, gamed by spammers, and intentionally being scrubbed. I'm surprised they haven't completely removed it yet.


In general I suppose, but per my comment above in this particular case of the scb.co.th article (which mentions SARS crisis), the article was actually published 2021 not 2003, there was no gaming going on, simply Google's data-inference code got it totally wrong on the Siam Commercial Bank article.

I don't want to see Google remove the search-by-date-range feature, it has tons of 100% serious legitimate uses (quote attribution, journalistic, historical, also debunking internet rumors and fresh reposts of old fake virals), but Google could estimate the errorbars on date-ranges and hitcounts, provide a disclaimer and feedback box to encourage users to flag/retag gross errors like this.

If anyone dug deeper into why the date inferencing is now getting broken, I'd speculate they find Google is nowadays getting reciprocally confused by site publishers and advertising networks changing or removing items which contain date; but that's in turn presumably for Google changes which downrank or uprank legit older content.

(I can't find the recent HN article here by an SEO expert with a bullet-point list advising website maintainers to remove all date information, among other things)


Google has perfect vision of the past (didnt latest leak confirm they keep everything crawled indefinitely and have extensive historical records for all domains?) but zero incentive for redirecting you to old websites with no advertisements.


Many old forums are only sporadically indexed by Google even if you do verbatim text searches using the site:... modifier.


>except it mentions COVID-19 so it obviously isn't.

Perhaps it was just updated?

I generally ignore/ get annoyed by articles that don't have a date/ updated on, on the byline.


Sometimes you can find the date embedded inside the source asset files.


> First result [scb.co.th] ... Google thinks is from 2003-03-15, except it mentions COVID-19 so it obviously isn't.

Interesting catch, seems Google grossly mistagged its date. IA confirms it was actually published 2021-09-06 [0], but that isn't tagged or referenced anywhere in the article text or HTML. I'm assuming Google misinferred the date as "2003-03-15" because the first two paragraphs talk about the SARS crisis, which was declared by the WHO around 2003-03.

> I don't doubt that some of those results are from 1998 to 2005, but the millions of results number specifically is meaningless.

Yes, seems there's not much QC on Google's date-inferencing of "old" articles. Hence the date-range is hit-and-miss, and search hit counts (which Google is eliminating anyway). I mean if anyone wanted to QC it, just search "old" internet for telltale terms like "COVID", "Nicki Minaj", "President Zelenskyy" etc. that should hardly generate any hits.

[0]: https://web.archive.org/web/20210601000000*/https://www.scb....


Yep; there may be a lack of incentive to preserve old sites, but what's worse are the ranking algorithms that prevent their discoverability in the first place.


Both the Internet Archive and Common Crawl have tools that reveal actual crawl dates. Search engines are not really intended to be archives, so it's no surprise that they aren't very good archives.


Is it, though? I think you have to define what your search engine is searching to make a claim like that. Internet Archive and Common Crawl (which I will say has its own incentives discouraging the discoverability of old sites through its methodology and limitations of its web crawling) are search engines in their own right.

What are you doing when you use their services? Searching.


Not really prevented, the huge one is http sites being down ranked heavily by google.

But they are still there. Do a specific enough search and they'll be at the top of the search results.


Par for the course.

China as we (the world) knows it is only about 60 years old. This is more true as they go about systematically destroying their own history and forcing village traditions to be stamped out and guided towards the city life.

Losing a blip of internet history during the regime of mass censorship is probably a blessing in disguise.


Written by He Jiayan (何加盐), an internet influencer active since 2018, the essay concluded, based on a wide range of searches of various entertainment and cultural figures from the late 1990s through the mid-2000s, that nearly 100 percent of content from major internet portals and private websites from the first decade of China’s internet has now been obliterated.


[flagged]


Another false equivalence, not at all the same as in the West. This is government censorship scrubbing all search results from pre-2005. That Google update made it harder to search for groups that had put out misinformation and scammy SEO stuff, like googlebombing ("miserable failure" returns George W. Bush). Performing editorial or curatorial work is not in any way the same as a purge.


There is an active war on the Internet Archive, et al. in the West by private corporations seeking to monopolize information. How is this a false equivalence? Glass houses. Remember Aaron Swartz.


The difference is that we can fight back in the West, and on this particular topic we are largely succeeding so far. The people of China cannot fight back against their government, have already lost, and have no recourse.

Absent a drastic change in human nature, we are always going to have people who want to do these sorts of crappy things, in any society. In China, those people have the backing of the government. Here, they're just corporations (not that we don't have crappy government propaganda machines too). Yes, I would absolutely agree that corporations have much more power than they should. But their power is not absolute, by even the wildest stretch of the imagination, and ultimately they're are just another set of people who can -- and do, often enough -- lose when they try to impose their will on others.

That's why it's a false equivalence.


> The difference is that we can fight back in the West, and on this particular topic we are largely succeeding so far.

By what metrics exactly?

Most people agree that Google etc results are increasingly useless, and it won't be long before AI content is based on previous AI content and all of the limitations inherent to that. Anything of value will be locked up and metered out based on class.

Given how captured by capital our political systems are in the West, it seems like only a matter of time before things are exponentially worse than our present point in the decline.


The Internet Archive jumped the shark and brazenly broke the law hoping they'd retroactively get pardoned for it. I dearly hope that the archive survives this lawsuit, but they 100% brought it on themselves.


[flagged]


You think you are being a critical thinker, but you are just playing into anti-Western propaganda that makes it seem like there is no difference between the two. It should be pretty clear that China has the most sophisticated propaganda system in the world, and you have been ensnared by it.


Continuing to the next step of this argument that has been had before ad nauseum: the West is arguably worse in these regards because China et al. make no bones about preferring order to freedom, while we nominally tout our "opportunity" while actually limiting it to a degree such that conversations like the one we're having take place. (Your argument's only refuge is in casting your opponents in this debate as hysterical, which is ironically a hysterical position to take in-and-of-itself. Measure the actual merits, please.)


Oh please. The whole "no one is free anywhere, but at least those other people are honest about it" garbage. There should be a logical fallacy named after this sort of argument.

People in the West are measurably, significantly freer than people in China. That doesn't mean the West is perfect. That doesn't mean that there aren't bad actors in government and in the private sector who want to introduce more systems of control and propaganda.

But the difference is that we're allowed to speak out, protest, and fight against these people, and that allowance is enshrined in the lowest level of laws in most Western nations. Again: not perfect, and the worst of the bad actors will try to bend those laws to find loopholes to silence dissent. And sometimes they'll even succeed at that.

That is wildly different from an authoritarian censorship state like China where you get immediately deplatformed if you say things the government doesn't like. And that's the lucky outcome; annoy the government too much and they'll do far worse to you.


You're comparing ideal to ideal, not practical reality to practical reality. Here's a fact: the Chinese middle class is the size of the US's entire population. The whole thing. Citizens, residents, undocumented immigrants. That's an economic freedom that Americans would - and sometimes do, if you consider law enforcement and the downstream effects of the actions of financial, medical, and industrial professionals - kill for. Which is actually a part of the way American censorship works: empower a buffer class who is preoccupied with maintaining (and lecturing the rest of us about) their political freedoms while most can't access any practical benefits from those freedoms because we can't afford to, in this society where money is speech. This isn't even getting into the more overt and baldly authoritarian ways Americans have their nominal rights infringed upon, simply speaking to the way economic/class-shaping does much of it for us.

Meanwhile, Winnie the Pooh memes say that you're ironically buying into the overstated projection of Chinese control.

But to get back to the crux of the issue: "bad actors" in America (like Google) are not unlike Chinese censors in kind, only degree. That is the conclusion an honest assessment and comparison has to come to. And however much you may want to turn this into some sort of geopolitical pissing match, my message is not, "Let's be more like China," it's, "Let's be what we say we are instead of becoming more like China."


[flagged]


I'd hardly call it the same thing at all.

In the Western internet a lot of early sites have come down gradually over time as their creators stopped paying for them, but many are also still up, including a lot of key relics. They're harder to find these days because Google prioritizes new content (shout-out to marginalia.nu), but they're there, and most (by historical traffic) of those that aren't there are preserved on IA. If there's collective memory loss it's organic and reparable.

This article is about the Chinese government actively erasing memory in order to be able to better control the content that their people read, including a post bemoaning the loss of that content. What has happened to new content on the Western internet (getting isolated in walled gardens) is tragic, but it's not the same thing at all.

> Posted on Wednesday, May 22, He’s post had been removed from WeChat by the following day, yielding a 404 message that read: “This content violates regulations and cannot be viewed.”


Exactly, and without the likes of the Internet Archive, even the open web isn't safe. I think like most history preservation efforts today, you have to be really intentional about it given the sheer amount of information and noise. This isn't like centuries ago when everything public was a few periodicals and newspapers and books, accessible through very known channels.


Content is now behind walled gardens and impossible to search for

Yep. Most of the good vintage computing information is locked up in member-only Facebook groups now.

More and more, when you ask a question on a web forum the response is "Oh, that's in the $x Facebook group." Great. How do I search for the topic? Oh, you can't.


FB or Discord, both black holes.


[flagged]


That's not quite true. There's overall a big, distributed and decentraliszed effort to archive our electronic past, it's not just archive.org. For example, it's fun to look at old newsgroup discussions from 1994, and it's something I can find at Google's archive. And more importantly, it's available to everybody. And it's not the only place to find historical internet stuff.

I do agree though that archive.org is too valuable a resource to ever lose.

Google also isn't perfect. There was a post the other week about how there were no searchable images on Google older than ... I don't remember, 2005?


There has been plenty of content that has been wiped out for a lot of reasons : geocities went through a lot of changes pre-post yahoo buyout, all the university student pages, blogs/self host such has multimania/free/wanadoo (at least in France), forums such has ezboard. So much content disappeared and will disappear.


We just need to ask the NSA for their copy via a FOIA request.


I wonder if they have all the music that was lost on Myspace


They have an internal IceCast server with all of it.


I donate monthly to archive.org. I'm sure I've got more than my money's worth out of reserved websites that had something valuable to me.


it's a question of intent.

'the free world' loses internet content mostly because we are indifferent and don't value the humanities or social sciences as much as we used to.

china loses internet content because it is a one party state dictatorship that is purposely trying to eradicate the cultural artefacts of it's victims and opponents, by hiring masses of people whose career goals are the purposeful destruction of memory.

we do have examples similar to that, for example "free speech advocates" like Peter Thiel and Elon Musk who abuse the legal system to silence critics. But those people are not literally running the government in a government that will never change hands.


I think this is a delusion. The threat that archive.org is under is that the government will shut it down.


Is original MIT BBS still archived? I haven't used it for sometime.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: