I've had a similar problem. In updating my portfolio site recently, I noticed a vast majority of links were dead. Not just live projects published maybe 3 years or more ago (I expect those to die). But also links to articles and mentions from barely one year ago, or links to award sites, and the like. With a site listing projects going back ~15 years, one can imagine how bad things were.
I had to end up creating a link component that would automatically link to an archive.org version of the link on every URL if I marked as "dead". It was so prevalent it had to be automated like that.
Another reason why I've been contributing $100/year to the Internet Archive for the past 3 years and will continue to do so. They're doing some often unsung but important work.
It's _not_ a video recording service. It saves and can replay all network requests during a session (including authenticated requests). It's open source, you can self host, I'm not affiliated even though I'm very happy that it exists
I've updated my portfolio before and noticed that as well. I usually include a screenshot or two when I first add a project, so at least that remains.
If the site goes down later, I just remove the link and don't worry about it. My code from 15 years ago is probably atrocious, so I'll consider it a small blessing :P
I figure you're doing the same as someone that cuts an article that they are mentioned in out of a newspaper and frames it on their wall. I've seen plenty of restaurants and businesses do it.
> It could potentially be considered fair use, since I'm not making a profit and I provide commentary.
Although people through that term around willy nilly, in our current framework that means being sued for a minimum of $100,000 per supposed violation, and making your fair use defense in front of a judge.
Youtubers have reported spending $50,000 just to begin talking with lawyers and preparing briefs.
To clear things up: robots.txt can retroactively hide content from the archive. If it's changed back to allowing the archive's crawler, content from before the ban can be accessed again.
Considering the topic of discussion, how sure can you be that archive.is will still be around in a year? Three years? Ten?
As much as I tried, all I could find about it is that it's run by one guy in Czech Republic who's paying $2000/month out of pocket for hosting, and apparently dislikes Finland.
http://archive.is/robots.txt doesn't seem too bad, it looks like you could slowly inhale everything... in theory. There are no sitemaps (they're there, but empty placeholders); you have to know the site name to be able to get a workable list.
I think http://www.webcitation.org/ might be better in that regard since it's a consortium of "editors, publishers, libraries". See "How can I be assured that archived material remains accessible and that webcitation.org doesn't disappear in the future?" in their FAQ (http://www.webcitation.org/faq). Although from my perspective it seems to be more geared towards academic use.
archive.is is very nice, but they're a URL-shortener as well, so their links are utterly opaque strings of alphanumerics, whereas the Wayback Machine preserves both the full original URL and the date and time it was captured in the archival URL.
archive.is does not crawl automatically, it must be pointed at a page by a user. While this makes it particularly useful for snapshotting frequently-changed pages, it is not a replacement for the proper Internet Archive.
I miss the optimism of the early web, when you could create a simple web page, join a web ring and going online was an event.
It's richer and deeper now, but the rawness and simpleness of it all was enjoyable and novel.
Maybe at the fringes, but I feel that the internet today, with my emphasis being on the "inter" (different) "net" (networks) part of it, is far less deeper or richer than before. What we basically have reduced to are a bunch of siloed netowrks such as Facebook.
When I searched for something when Google first came out I got a mix of results from a variety of sites I had never heard about. Today it's basically Wikipedia at the top, with results from the same list of about 3-4 sites depending on the topic of what I searched for.
I'm beginning to think that there is a niche for a peculiar kind of a search engine. A search engine for static almost-none to none JavaScript pages. It would penalize pages for ad-network usage.
I would really like to not have in search results most sites that try to monetize on my attention. I want raw facts and opinions. No click-bait to grab my attention or feed my internal cave man with rage. No ad-networks or data extraction operations. Just pages put there by people that want to share knowledge and ideas. I mostly find it on pages that lack ads and often are pure HTML - no CSS and no JS. At least in areas that interest me.
Maybe there is a place for a search engine that would index only pages like that? It certainly would be easier than competing with Google on indexing whole of the attention-whoring Internet.
I had that feeling of discovering Internet again when I used tor and surfed hidden websites for the first time and read beginner's wikis, opinions pieces such as The Matrix, etc.
I am not interested in most of the "deep web" but what you say sounds interesting. Could you please provide link to that Matrix thing? And other pieces you found interesting?
http://zqktlwi4fecvo6ri.onion/wiki/index.php/Main_Page is the wiki I stumbled upon when I first accessed hidden websites, the matrix rant is the first link, but it's not in the form of what I remember (PS: I do not endorse the content, it's mostly a critic of our society's mechanisms).
> It certainly would be easier than competing with Google on indexing whole of the attention-whoring Internet.
Probably not, actually; the kind of pages you describe would almost always be leaf nodes on the web graph, so your spider would need to walk "through" the attention-whoring parts to get to them, whether you kept records of doing so or not. (And it'd be very inefficient to not.)
I don't know about that - I find that I get a lot of my information from sites that have user generated content such as Medium, reddit, and of course HN. I think it would be extremely hard to fit in sources like that to your search engine without letting in what I will admit is garbage. Would be very cool if it did manage to though!
When I searched for something when Google first came out I got a mix of results from a variety of sites I had never heard about. Today it's basically Wikipedia at the top, with results from the same list of about 3-4 sites depending on the topic of what I searched for.
...and if you actually try to search for more obscure/"fringe" subjects/phrases with Google, you either get no results (despite knowing that there are still active sites with those phrases), or it starts thinking you're a bot sending "automated queries" and blocking you for a while (not even giving you the option of completing a CAPTCHA.)
The first time that happened to me, which was within this year, was my realisation that Google had truly changed, and not in a good way.
I've sort of had this same experience myself. The quality of links for obscure topics is nowhere near as good as it used to be. It's harder and harder to find the topics I know exist, sometimes they're buried under lots of irrelevant results, or results I know about aren't there at all. I've not experienced the 'bot' throttling to my knowledge, but it certainly feels like they're trying to do some kind of language translation for me when I want explicit results. I'm not convinced Advanced search isn't doing similar translations either
Don't confuse people's tendency to no longer bother looking past the top of the first page of Google with the internet somehow shrinking into whatever fits those slots. Of course the most popular sites now dominate the top of Google's search results, but Google isn't the internet any more than Facebook is.
The breadth and depth of information on the web now vastly surpasses what was available in 1994. Youtube and other video and music streaming sites have provided a media revolution to compare with the transition from radio to television. Social media, whatever its drawbacks are, allows people to communicate and collaborate far more personally than email or basic chatrooms would have.
And let's not even get into the ways that Javascript, HTML5 and Webassembly have and will transform the web into a platform in which virtual machines will converge to becoming just another content type. I know people here like to rend their garments and scream Javascript Delenda Est[0] into the void and just hope everything that happened to the web in the last 20 years just goes away, but the day is coming where all archived and obsolete code will have a URL endpoint that bootstraps a VM and runs it. The best the web of 1994 could do is file downloads, maybe Java applets and flash.
Sometimes the way people here seem to dismiss the modern web is baffling. I get it, but look at it from the point of view of the mainstream web user. The web offers access to so much more than would even have been possible in 1994, and lets people interact with one another on a much more direct and complex level.
Yes, the added richness and depth comes with a lot of baggage, but its undeniably there.
The really bonus of the internet of old was that text was most of its content. Today you have video, pictures, emojys and music and - in my opinnion it reduces the experience.
I remember browsing the internet was much more of a networked thing. If you want to know what it was like you could take a look at Wikipedia, where you still can get lost in a never ending deeper web of links. However WikiPedia is a very cleaned up version of the early web. It lacks animated .gifs for sure.
But the difference with Wikipedia was that people would maintain a Links section full of interesting stuff and people would join web rings for various subjects, interlinking vastly different sites. Finding information often happened through Yahoo! (AltaVista was there too but it was lacking the quality of handpicked results) through a tree based discovery system, to continue through whatever you could find through links on an interesting page. Exchanging links was something that really frequently happened.
It resulted in an internet where you just kept clicking and discovering and digging. Sometimes also frustrating as browsers lacked tabs and I would navigate all links one by one by loading it and going back. I would forget how I arrived at a certain page sometimes because it was so deep and I never found the breadcrumbs again.
> If you want to know what it was like you could take a look at Wikipedia, where you still can get lost in a never ending deeper web of links.
I would have said TVTropes, but the core point is the same.
I remember having to restart my computer, because IE lacked tabs and Windows would let you open so many instances of it that the whole OS ground to a halt.
It's weird to think that now, in the absence of Google, I couldn't find my way from anything to anything else.
I especially miss the blogosphere of the era 1999-2006, before the emergence of Facebook and Twitter. I miss the era when tech people could debate a new technical protocol by posting thoughtful essays on their blogs, and then other technical people would post rebuttals on their own blogs, and the conversation would go back and forth, among the various blogs, but out in the open, and very democratic. Nowadays a lot of the new protocols are, for all practical purposes, designed inside of Google or Facebook or Apple, and then announced to the world, without much debate.
For a close look at the earlier era, see this very long essay I wrote in 2006 (which was popular back in 2006) in which I summarized the tech world's blog debate about RSS:
Not specifically directed at you, but sort of directed at you:
I wish everyone who pined for the 25-years-ago days of mostly text pages would, instead of pining, just go out there and produce that content they want to see.
Instead of pining, start writing. Hosting is cheap or free, browsers still parse simple HTML, there's nothing stopping anyone from creating a return to that simpler form.
I do and I test my personal website on lynx. Doesn't change the fact that sites like this are under the misguided impression that it's cheaper to ship 2 MB of JavaScript with each request rather than just responding with the fucking 2 KB of article text so they end up looking like:
# I Bought a Book About the Internet From 1994 and None of the Links Worked - Motherboard
#motherboard
Quantcast
[p?c1=2&c2=8568956&cv=2.0&cj=1]
The question isn't one of us creating it, although, unlike those downvoting you, I understand where you come from. The question is, who will (and how to) create a curation of sorts. A Google for the simple web. A one stop shop that encourages barebones simplicity and fosters a community where people only allow a simplified internet, and adtech is a nonstarter.
> It's impossible to produce that content because the culture was very different.
What? It's not at all impossible. Get a server, put whatever you want on it. No one is going to force you to monetize or market anything, or use a Node.js backend with AWS and React, or whatever the kids are doing these days. Basic HTML in a plain text editor still works just fine.
> What? It's not at all impossible. Get a server, put whatever you want on it.
Not in Germany. There is a set of laws called "Impressumspflicht" (https://de.wikipedia.org/wiki/Impressumspflicht) which forces you to add a mandatory imprint on your website. If you do something wrong or forget to include some mandatory information (what is mandatory also depends on the kind of website), you can easily get sued (and this has often happened). In other words: It is not easy for a layperson to set up a website in a way that will not easily become a risk of becoming sued.
The context of this conversation seems to have more to do with code and complexity than legal necessities but point taken. The parent was suggesting it was impossible to create the sort of simple, personal, just for fun sites that people used to, but there's no technical reason for that to be the case.
It just happens that people add unnecessary complexity to their projects nowadays because 1) they use frameworks and tools that facilitate it and 2) it looks better on their resumes.
> The context of this conversation seems to have more to do with code and complexity than legal necessities but point taken
The laws introduce lots of complexity, which leads to lots of requirements in the code. So this is no contradiction.
> It just happens that people add unnecessary complexity to their projects nowadays because 1) they use frameworks and tools that facilitate it and 2) it looks better on their resumes.
And 3) because the law requires such complications (in my personal perspective the largest problem that causes the most headaches). Just to give another, "more EU/less German" example:
For my site, I created a Pelican theme which is HTML-only. There's no JS at all! I plan on publishing the theme eventually, once I'm a bit happier with it. You can see it at https://brashear.me. I'm very happy with how quickly it loads. I mentioned to one of my friends that it feels like upgrading to DSL back in the dial-up days.
You're not going to be in the Alexa top 10,000. Go run a docker container for $3/month to host your site, cache it with cloudflare's free plan and pay $12/year for a domain. Its only $48/yr to host all your content. It's really not that ridiculous.
I get this request a lot actually. The reason I decided to not do it was because webrings, though nice, had a lot of problems. The main issue was that people's sites would go away, and then the ring would break. I also didn't want to introduce any functionality that would make sites depend on Neocities backend APIs to function. Web sites are more long-term and durable if they remain (mostly) static.
I tried using "tags" that could bind sites together on Neocities, but to be honest the idea has largely been a failure. People will tag their site "anime" and their site will have nothing to do with anime... but it's a popular tag so they add it in just so they're on a popular tag. Geocities had this problem to a certain extent too (a tech site being in the non-tech neighborhood). You can get a flavor of the problem here: https://neocities.org/browse?tag=anime
One idea I'm considering is to only allow a site to have one tag, rather than 3 like I do right now. Maybe that will stop people from adding tags that are irrelevant to the content of their site. Or it may compound this problem. I'm on the fence about it.
Another idea I'm considering is allowing people to create curated lists of their favorite sites on Neocities, similar to playlists on Youtube. The "follow site" functionality kindof does this, but in a generic way, and it tends to be a bit... I guess nepotistic (hey you're popular, follow me so my site can get more popular too!)
I'm always happy to hear ideas on how to improve this. I do like the idea of related sites being able to clump together, but in practice it doesn't work as well as I would like it to. But maybe it works well enough and I'm overthinking it.
I've got a fancy 1080Ti and Tensorflow. If you have any particular things I could try or should read about, I'm happy to look into doing some research! Googling for "Tensorflow recommender" gave some interesting starting points.
No, I get this request quite a bit from people that sincerely miss web rings. I'm not sure how much of it is anachronistic nostalgia and how much of it is a true desire to bring it back. To give an example of this, I've gotten more than a few requests to add the Gopher protocol to Neocities.
IIRC the old webrings had a CGI backend that would collect the addresses and then you would click "next" and it would take you to the next site. But yeah you could just make a vanilla one. You could also just make one outside of the context of Neocities.
But if the next site went down or didn't link to the next site correctly, you couldn't proceed. That was always my problem with webrings. They depended on each site to embed the ring code properly, and usually they didn't, so you were stuck trying to find a working one. It was a pretty lousy UX overall.
It may have been lousy UX, but I suppose it also provided a social convention of not breaking the chain. Weakest links and all that. Relevant to the OP about linkrot!
Maybe a modern equivalent would redirect downed sites to the IPFS archive.
That's interesting the ones I remember were literally rings of static links, coordinated by the webmasters, presumably over email. I see what you mean now.
Related to this, I had trouble finding examples of pre 1996 web design. The internet archive has a lot from 1997 onwards. The oldest live examples of sites from that era, that I know of are:
As a student at liu who had no prior knowledge about lysator it's always interesting to see lysator links in the wild, they seem to pop up when least expected.
How come the site is hosted at lysator and how come it's still up?
Sidenote: The man hosting the site has a very on-topic profile page[0].
David was a pinball fan like myself and couldn't attend the expo in Chicago that year.
I was commiserating on Usenet about not being able to find affordable hosting for the website. This was way before hosting-only companies were available or even Geocities. David stepped up and generously offered room on Lysator.
As to why it's still up, I'm not sure. There was a short period where it looked like it was offline, but it's back now. Perhaps because Expo '94 is on a lot of "oldest websites that still work" lists.
Lysator is incredibly nostalgic for me; when I was just starting on the internet around 1994-1995, a lot of my favorite websites were on lysator, including the gigantic Wheel of Time Index.
And this one from CNN (still 1996, but appropriate representation of that 1995/96 era when design had changed a bit from the earlier plain white backgrounds & basic text layouts):
I actually watched the debates and couldn't believe Dole closed out his first debate with Clinton imploring youngsters to "tap into" his "homepage" (the above link)
Oh, man. Dole/Kemp was definitely designed in the "Make sure everything fits on a 640 × 480 display, and downloads reasonably fast on a 14.4k modem" era.
I think it's fascinating the change in how we view site navigation. Another commenter gave a link to an old Microsoft site that had links all over the place. But for the most part, it seemed like sites started to standardize on navigation vertically on the left side. Now, we generally see them horizontally on the top, or in hamburger menus. But it's interested how that paradigm shifted. It seems like vertical side navigation would be more prevalent now, given how much wider monitors are.
With wider monitors, my browser is actually narrowed. I split the screen in half and devote one half to the browser and the other half to a text editor and terminal. I used to have two monitors to do that, but now just one is fine, but the side effect is that I browse in a pretty narrow window. A narrow window also makes reading somewhat nicer on some sites, since it's harder to read very long lines of text.
>> A narrow window also makes reading somewhat nicer on some sites, since it's harder to read very long lines of text.
Curious to know how this works with ad-heavy sites. Do responsive sites display differently then too?
Been flirting with going to a single 32" monitor for a while. Just wondering if you think it's worth it with your experiences since I do development as well and have a similar setup (one monitor for editor and smaller laptop for browser/terminal) and would like to hear your input.
I use an adblocker, so it's hard to know what ads do (though I disable it on some sites that I believe to be reasonably trustworthy about ads, and that I want to support, like reddit and some major newspaper sites; though I tend to pay for a subscription for sites that I really want to support and continue to block ads).
But, generally, it's fine. Most sites are entirely usable. There's a few that are quirky, but generally, modern websites are designed to scale down to tablets and phones, so they don't act too weird for a narrow lapptop/desktop browser.
I think the most common problem is sites that switch to hamburger menus at too high a resolution (so I get the mobile hamburger navigation on some sites, even though it's kinda silly looking and slightly less ergonomic). It's not super common, though. Most switch to hamburger a little lower than my browser width.
I recently got the Dell 42.5" 4k monster (P4317Q), and while the dpi and color reproduction aren't good enough for serious design work, I'm able to divide my desktop into thirds for browsing and text editing. I usually keep a browser on the left, documentation in the middle, and a terminal on the right. I still get a little giddy when I turn it on every morning. What I really want is for apple to start producing 42" 8k "retina" screens, but I won't hold my breath. :)
Ha, well it is a recreation and obviously not the original. They didn't have the original code so they had to work out how it was made. There's a readme which explains the process: https://www.microsoft.com/en-us/discover/1994/readme.html
Is microsoft.com currently not loading for anyone else? I'm getting ERR_SSL_UNRECOGNIZED_NAME_ALERT in Chrome, but I'm not seeing it mentioned anywhere else (e.g. Twitter).
Frames have been improved into iframe and then deprecated in HTML 5.
Frames were a nightmare. You can't link to a page in frames, you can't bookmark it either. Frames break the back button. Come in via search engine? You're only in the main frame, your navigation frame is missing. Want to print? Lol, good luck with that. You always ended up (either intentionally or unintentionally) with a browsing session within someone else's unrelated frameset.
Nitpick: iframes haven't been deprecated in HTML5. In fact they've been extended with new attributes like sandbox and srcdoc. Loading content via javascript is often a better approach but iframes allow you to sandbox content from your website's context, e.g. to prevent XSS attacks etc.
Ironically, I think you'll have much better luck looking at web design books from that era, instead of links on the web. I just pitched a bunch of my design books from that area, and they were full of "cutting edge" examples.
If you want to go all the way back, UNC still hosts ibiblio.org, which has links to the first website at CERN http://info.cern.ch/ and TBL's first page.
My web page is still around http://homepages.ihug.co.nz/~keithn/ it was mostly done pre 96 .... not that it was really well designed, I just spammed bezels and had a play with this new cool java thing.
This is a very important reason why books, in general, contain better information that websites. On websites, people care a lot less about the correctness of the information. You can just update stuff later (of course, this doesn't always happen).
Also, sites are a very volatile medium. I often bookmark pages with interesting information to read later, and it inevitably happens once in a while that a site went down and I just can't find the information anymore.
> Also, sites are a very volatile medium. I often bookmark pages with interesting information to read later, and it inevitably happens once in a while that a site went down and I just can't find the information anymore.
I had the same experience and that's why I made a browser extension that archives pages when you bookmark them. (https://github.com/rahiel/archiveror)
Maybe something that archives to IPFS would be interesting. As things are marked as interesting, they are both archived and distributed based on interest.
I still have my bookmarks.html file I started building in 1995, but almost everything in it has rotted away. It's a shame too because a lot of the stuff in there would still be useful or interesting, but nobody wants to pay even a nominal fee to keep it online.
> I often bookmark pages with interesting information to read later, and it inevitably happens once in a while that a site went down and I just can't find the information anymore.
I've recently had this problem with some online fiction that I had bookmarked. Now, I was able to recover thanks to the Wayback Machine, but I really shouldn't depend on that.
I should really put some thought into archiving pages I like or getting a Pinboard account.
I have this problem too, thankfully archive.org has been able to resurrect most of the text based sites I bookmarked ages ago. Such an invaluable resource.
Linkrot is a real problem. Especially for those sites that disappear before the archive can get to them.
On another note, the more dynamic the web becomes the harder it will be to archive so if you think that the 1994 content is a problem wait until you live in 2040 and you want to read some pages from 2017.
Content from Stack Overflow has higher odds to survive than this, they've uploaded a data dump of all user-contributed data to archive.org: https://archive.org/details/stackexchange. It's all plaintext. This is really generous of Stack Exchange and shows they care for the long-term.
That's actually one of the reasons all my personal stuff gets built as HTML/CSS, then just use Javascript for quality of life stuff (image lightboxes that work without putting #target in browser history, auto-loading a higher-res image -that sort of thing).
I know I won't be maintaining it forever, but I want it to be accessible through the archive.
It's actually fairly easy to record web sites despite how dynamic they are; all you have to do is save the response data of each XHR (and similar requests) and the rest of the state (cookies, urls, date/time, localStorage, etc).
For even more accuracy save a Chromium binary of the version at the time so it'll look exactly as intended.
> The average lifespan of a web page is 100 days. Remember GeoCities? The web doesn't anymore. It's not good enough for the primary medium of our era to be so fragile.
> IPFS provides historic versioning (like git) and makes it simple to set up resilient networks for mirroring of data.
IPFS is good and useful, but it only retains what people choose to retain.
If geocities.com/aoeu isn't popular, then IPFS won't store it unless someone bothered to pin it. And as soon as they stop, it'll disappear.
You need a dedicated host (like archive.org) to retain it, or volunteers willing to coordinate and commit their resources. Otherwise it's just more resilient (a good thing), but not permanent.
> Otherwise it's just more resilient (a good thing), but not permanent.
It's not "just" more resilient, It's also much more elegant and convenient: with the current web you need to go find some archive version of the dead link you found, while with IPFS the link can simply work, even after the creator stops hosting it.
IPFS is not a real solution at the moment. It's hard to use it, the default daemon is so agressive Hetzner nearly blocked my server due to it's scanning, and your site needs to be relative-url based to be put on IPFS.
On the other hand, nobody is talking about the problem of domains: yes, linkrot is a thing, but many are due to dead domains and dead blogging/content silos.
I've had to deal with Hetzner and IPFS too -- my conclusion is that it's Hetzner who are aggressive here. In of the cases I had fixed the dialing-local-networks behaviour, and then Hetzner still continued to block the server for about a week. They blocked it on 25-Dec and released it on 31-Dec.
Rather the opposite, a very large fraction of the original pages have been preserved. Many of them were deleted long before by the owners themselves and/or rehosted elsewhere.
Original Geocities content is probably the oldest large body of internet content that will be preserved for many years to come and it will look roughly how it looked on Geocities because it wasn't relying on much of anything other than basic HTML and some images.
Ai. I searched a bit for you, as well in the original crawls but to no avail. Most likely there were very few inbound links to that site which meant it wasn't discovered before it got wiped. I was still archiving stuff long after Geocities had officially shut down, for weeks new stuff turned up. 4428 is there, the next one after that is 4460, I suspect that if it had been archived you'd have found it by now.
Have you checked Jason Scott's torrent of all things geocities as well?
Looks like I even got some of the images on disk here, but no html.
Jacques, I never knew it was you behind reocities, thanks for saving a large part of the old content geocities! The story on what you did at creating reocities alone is a worthwhile read. [1]
I similarly look for the first page I ever created. I was maybe 14 years old and created a pretty thorough Goldeneye 007 fan site, with walkthroughs etc, on angelfire.
I distinctly remember a misspelling in my url--- /liscensetokill --but I can't be sure of my parent category anymore. I think /mi5/
I was in geocities/South Beach/lounge but I can't remember the number and it annoys me. I know for a fact archive.org indexed it but they didn't at all after geocities moved to ~ urls. (That was after the Yahoo buyout IIRC)
A similarly really annoying thing is when you find old technet articles, stack overflow questions, or blog posts that seem potentially really useful, but that have broken images, broken links, etc... so the content (possibly extremely useful at the time) is completely useless now.
It really stresses the importance of directly quoting / paraphrasing the content you want in your plain text, and not relying on external resources for posterity.
The one I hate is when I find old forum posts explaining how to do something in the physical world and all the embedded photos are broken. Not because the image host went out of business or the user deleted them, but just because they didn't log in for a year and the host deactivated their account. This is why whenever I link a photo I upload it to my own server and I never change the URL.
Or when they're broken because the image host discontinued third party image hosting/started charging for it, despite said feature being the only reason their site caught on in the first place.
Looking at you Photobucket. And all those useful images now replaced with a meaningless Photobucket placeholder.
I apologize in advance as there's no non-morbid way to ask this but... what happens to the images on your server if you die tomorrow? It would be exactly the same situation, right? They would exist until your bill is due in a couple years then your account will be deactivated and your images will linkrot.
I have the server paid up for 10 years and the domain for 15. But it's true, eventually all things must end. I do have credentials for all the things written into my will so I guess it ultimately depends on how much my children care about preserving my helpful forum posts.
He's not advocating people rely on his server, he's saying hosting it himself allows him to quote freely without worry. If someone else wanted to quote him and followed his example, they would not have a problem when he died.
Also 500px (I think--if not a similar image host) has recently banned all 3rd-party images, at least at the free level, which has broken a TON of the old forum posts I want to see. It was the defacto image host, kind of like imgur is now.
For anyone curious, you can help fix dead reference links on Wikipedia in just a few seconds. If you find a page that has a dead link (or several), click the "View History" tab at the top, then click "Fix Dead Links" to run the InternetArchiveBot on the page.
Just speculating but possibly the difficulty of finding a truly dead link. Often they just redirect to a registry asking if you want to buy the domain.
but redirects are also present in legitimate pages
What's cool isn't how fast some of these technologies become obsolete, such as various Java applets and cgi-bin connected webcams. It's the static content that can survive until the end of time.
> [The Rolling Stones] actually streamed a concert on the platform in November of that year, using an online provider named MBone for some reason..
The MBone was not a "provider", it was an IP multicast network. This was the only way to efficiently stream video content to thousands of simultaneous clients before the advent of CDNs. https://en.wikipedia.org/wiki/Mbone
MBone lost its reason to exist when the internet backbone turned out to be much easier to upgrade than the edges. There's not much need to implement a complicated proxy system to save bandwidth on the backbone when almost everybody is constrained by their last mile link.
For years I thought TV stations might connect to the MBone to do simulcasts for people on the Internet once broadband became widespread, but the world moved on before it could become a reality. Part of me still thinks this is a missed opportunity but it's too late to cry over it now.
My internship, in 2006, involved working on an IPTV prototype for Philips. I remember working on a multicast server and the set-top-box client -- the server had a DVB-T card, and rebroadcast all the BBC channels over the LAN. Since the stream was always there, changing channel was extremely fast. DVB broadcasts information packets regularly, containing the TV schedule and so on. These were also forwarded over IP, and then cached by the STB.
It was neat, but presumably most of the multicast stuff was abandoned not long after I left.
Someone else worked on a system to help the user schedule their TV around the broadcast times.
The problem with using multicast for content delivery is more down to the subscriber end and how ISPs manage their networks. I worked on a (recent) project that uses multicast in the broadcast context at the mezzanine level. When you've got full control of the network, end to end, it works a lot better.
The subscriber end...and the goddamn Internet backbone. You can't route multicast across the core, which is a huge impediment to its adoption.
There was a vicious circle where the core didn't do multicast so the big iron routers didn't put multicast support in hardware, which made it impossible to support multicast on the core...
I've got a book about javascript from 1995. It mentions closure once and says something like "... but you'll never need to use that feature of the language."
I noticed that the wayback machine no longer lists historical sites if the latest/last revision of robots.txt denies access. Has anyone else experienced this?
In the late-90's I helped build one of the first fortune-500 e-commerce web sites. The website was shutdown years ago, but it view viewable on the wayback machine as recently as a year ago. The company in question put a deny-all robots.txt on the domain, and now none of the history is viewable.
It's a shame -- used to use that website (and an easteregg with my name on it) as proof of experience.
Yip, the irony is that the article about this does not even load on my tablet. Perhaps because of ad blocking at dns level, perhaps not, it does paint a picture though.
There are plenty of restaurants that are older than 1957. So, you would probably find at least one that was still open. I think the oldest in the US is Fraunces Tavern. New York, NY. 1762.
But, worldwide several predate that. Botin Restaurant dates back it 1725... Oldest I could find was Stiftskeller St. Peter. Salzburg, Austria is still in the same building from ~803.
Not all information on the Internet is created equal and not all information needs to be available in perpetuity.
"Free $tuff From the Internet" from 23 years ago is closer to a restaurant guide than a book containing literary or academic references. I'd imagine most of the references in an 1850s coupon book or almanac would also be long dead.
This. I'm really excited about projects like IPFS but I'm not totally comfortable with the "everything persists forever" philosophy as it stands now. Preventing link rot is a very worthwhile cause, but content creators should have control over what persists and for how long (see: "Right to Be Forgotten").
True, it's not exactly the same thing. But I think there is room for a conversation about "published" content as well. The internet covers a much broader scope of content than say print media. I think it is interesting to consider what should be considered "published works" online. Some people think it's fair to say anything you put online is fair game to persist forever. Others like myself think maybe we need a bit more of a fine-grained definition of what constitutes "works" and what persistence properties they should have
My uncle Pat wrote this book (and multiple others in the same series). I'm amazed Vice is talking about it over twenty years later and I'm sure he will be too once I show him the link!
I had lots of fun reading them as an Internet-addicted kid -- but several of the links were dead even before it was officially published.
"It was possible to get on a text-based version of the internet for free in many areas, using only a modem and a phone line. An entire section of Free $tuff From the Internet offers up a lengthy list of free-nets, grassroots phone systems that essentially allowed for free access to text-based online resources."
Makes me want to try to read a Markdown-only Internet browser, which treats native Markdown documents as the only kind of Web page.
You would have to give up all the dynamic convienience we take for granted. Menus would be just links. Basically you would have HN like sites for the small ones and the big ones would have images. That's it.
On second thought that wouldn't be so bad considering the bloat we have to deal with nowadays. (1 MB per page for just news from sites like CNN ugh.)
I owned (and still do own) this book! I would spend many hours as a teenager going through the links and accessing all the cool stuff in the book. This really brings back memories!
And yes, the way I got on the internet in those days was to dial into a public Sprintlink number, then telnet to a card catalog terminal in the Stanford library, and then send the telnet "Break" command at exactly the right time to break out of the card catalog program and have unfettered internet access. Good times.
I was lucky. A public library had a dial-in account with lynx as the shell and the card catalog an inter-library loan systems served as web pages. I just had to hit 'g' and type in any URL, including Gopher, WAIS, Archie, or telnet ones. This left me no great way by itself to get things downloaded locally, but I could telnet into a shell elsewhere, download things to there, and feed them back via zmodem through the telnet and the dialup.
That was before I had a local PPP provider, of course.
Did you try looking them up at archive.org? I expect that many of them will work there.
The web is ephemeral unless somebody archives it. Many companies offer an archive service for your sites for a fee, and archive.org does it to provide a historical record,
Yup. Recently promised a colleague a pdf. I knew what I was looking for, who wrote it and and which site it was on (regional site of my employer). It even featured highly on google (showed up on related searches).
Zilch. Nada...couldn't find it anymore. Gone. Something I had easily chanced upon before I know couldn't find with directed searching. They must have restructured their site.
The article indicates that the "free" stuff on the internet was hidden away in weird places - ftp servers and the like. No google to find it for you, the only way was by word of mouth, or I guess via published book.
Answers a question I always had about "Snow Crash" by Neal Stephenson. The main character, Hiro Protagonist (I still giggle at that name), sometimes did work as a kind of data wrangler - "gathering intel and selling it to the CIC, the for-profit organization that evolved from the CIA's merger with the Library of Congress" (Wikipedia).
I always wondered what made that feasible as a sort of profit model, and I guess now I know - that was the state of the internet in 1992, when the book was published. Seems like a way cooler time period for Cyberpunk stuff, I'm almost sad I missed it :(
Man, I really miss FTP. I remember when you would just FTP to the site you were using and grab a binary from their /pub/. Mirrors were plentiful, and FXP could distribute files without needing a shell on a file server.
I remember when you could just mount \\ftp.microsoft.com\pub as a drive on your PC. Raw, unauthenticated, unencrypted SMB over the public Internet. At least it was read-only share. Good times.
I remember Geocities pages would always be trying to read from my floppy drive. People would not realize you had to upload your pictures before other people could see them and just would put D:\pictures\goatse.jpg as the src of the img tag because they didn't really understand the web (not that I blame them.) Of course, the browser had no problem trying to load local images from untrusted code on a remote host, security policies were, uh, extremely permissive at the time.
(Of course, there would be plenty of attempting to read from the C:\ drive too, but that didn't make a loud, unexpected sound like reading from the floppy drive did)
Yeah, you had to remember to change all the links in the html because FrontPage Express kept saving the local full path so you could keep working on and previewing your site on a browser. HTTP server software (on Windows anyway) was so obscure that you had no idea how to get a free one running on your desktop unless you had access to Windows NT with IIS. All that changed when I installed Mandrake for the first time...
I don't remember that era working all that well except if you were only talking about personal homepages at a few large universities. FTP was a bit faster at first but the experience of tracing around an unfamiliar directory hierarchy and guessing at naming conventions is not something I get nostalgic about.
Also, consider that right-click save as works the vast majority of the time and the exceptions are for content which simply wouldn't be available (e.g. video streams) for direct download due to IP concerns.
The unfamiliar directory structure was always an issue, but even worse was using ftp on windows, which defaulted the transfer mode to ASCII, so if you forgot to change it, you'd end up with corrupt files most of the time.
I mean, this isn't all that surprising. Not unlike buying a twenty-year-old visitor's guide to a city and finding that a number of the shops and restaurants have closed, the stadiums have different names, etc.
People tend to think that our society is very well documented but if you look at what is left of old societies it is usually whatever was engraved in stone or if you're lucky what remained on paper. With the internet replacing most or even all of the paper storage in the short term it is true (besides our present day paper not being acid free to last long enough anyway), we are better documented than ever. But in the longer term it may be a huge gaping hole in history.
And that's different than cities changing at their regular pace and books becoming less dated. It's like the visitor guide itself is no longer readable and therefore you won't even know what was there in the past.
I don't think this is necessarily a bad thing. For one, the 80/20 rule applied back then just as it does today, so most of what's lost is crap to begin with. It's no different than in the real world: surely nobody's lamenting that you can't pick up a copy of Fordyce's Sermons anymore (and I presume that it'd be long forgotten if not for Austen). While some valuable resources were undoubtedly lost, most live on re-posted elsewhere, like memes, but in a "good" way.
Secondly, the book is more analogous to a map or dictionary, and it ought to be a descriptive source, not a prescriptive one. Some language purists may disagree, but I could care less :-). And similar to an old, outdated, map, you'd expect that the details may have changed, but the landmarks are most likely still accurate. NASA's still nasa.gov, MIT's still mit.edu -- well, IIRC www.mit.edu used to point to their library's portal, and web.mit.edu their main page; I see that's changed -- and CompuServ still...exists.
This reminds me of the Final Fantasy IX strategy guide. It integrated with Square Enix's Play Online service but now that FFIX was removed from Play Online, none of the links work any more and the guide is pretty much useless. I'm sure we'll start to see more of this in the coming years. It's not really sustainable to keep a website running forever.
If the service is still running then the added cost of hosting legacy content is near zero. Assuming the site wasn't full of bandwidth hogging live video or anything like that the cost of storage and hosting are almost zero.
Keeping a whole server running however does get a bit more expensive, but you could also rehost it on a cloud provider for a very modest cost per month.
They should make sites incompatible with NoScript illegal. Back in my days (lol its just 10+ years) they used plain text for Web pages and used javascript like custom animation in presentations. Now only javascript is there with some html sprinkled in between.
P.S the rest of it was in Flash though. So no back button for you.
That's just silly. The dynamic nature of javascript is what caused the explosion of websites around 2007, due to the advent of jQuery. Saying we should all go back to plain text is just being an old grump. Should it be overused? No. Just like GIFs shouldn't have been overused on Geocities.
I'm not saying we should have a more dynamic one. But JS fundamentally changed how we interact with websites, and it's here to stay. The majority of big players in the last decaded wouldn't have existed without the dynamic interaction. Saying we should go back to static only, is nonsense.
I completely disagree. The majority of big players has poisoned the waterhole. They've made the web less usable because of how they've made things dynamic. I think going back to static would have some negative impact, but it would make things better for everyone overall.
Just yesterday I was helping an uncle changing the Flash-based menu of his site about classic race cars to a newer one so it would actually show on phones. It was all .htm files that included the Flash menu every time, apparently he worked with some kind of 15-year old copy of Dreamweaver that would add it to the top of every page he created like a template.
I could have switched it to a PHP include but that would either break all existing links, take a bit of work to make .htm files execute PHP or to make them forward permanently to their PHP versions. Or I simply could do the only sane thing: loading menu.php on every page within an IFRAME and changing his 15 years old Dreamweaver template.
I have used the Firefox ScrapBook extension for over 10 years. It saves a static dump of the DOM of the page you are looking at (and all images) so it does not depend on anything external to show the page and you can save pages that required a login or was generated by any amount of complex javascripts. Also saves a link to the original site and comes with features to edit pages and join them (those annoying articles spread out over multiple pages to show more ads) and more. There is a clone of it for Chrome as well that I have not tried.
I've been an arch linux user for a very long time. I very much praise the documentation and it always gets me 90% of the way there when trying to solve a problem or configure something. But when I need that extra 10% I can usually find someone on the forums with a similar issue and the solution is usually a huge rabbit hole of links and some are broken, which gets really frustrating because I have to hope that it was cached by archive.org.
This problem has concerned me for some time. One solution would be for websites to declare their license (Creative Commons, proprietary, public domain, etc.) and then web pages can embed the content of pages linked to when the license allows it.
A web with content-based addressing and versioning built into the protocol could also deal with this situation more gracefully, but again there are copyright issues.
Well yeah, it sucks. I think the problem here is, that an URL was supposed to be a stable, unique identifier for a ressource (like a UUID). But at the same time humans have to enter them and therefore they have to be nice, shiny and up to date with the latest trends, which causes them to change constantly...
Maybe we should built a DHT containing UUIDs for all pages als alternative, stable URIs :D
The reason the book didn't mention spam all that much is that it was from 1994 and probably mostly (or entirely) written before April 12, 1994, which is when the infamous "immigration lawyer spam" hit every newsgroup. It wasn't the first spam, obviously, but it did seem to mark a line where spam (USENET and email) became more and more prevalent.
I have been wanting to find or build a tool that checked if a link was dead before redirecting my browser. If the link turned out dead, just redirect to the appropriate internet archive[0] link. Problem somewhat solved as long as the archive doesn't go bust.
I'm working on https://www.pagedash.com, sort of a personal web archive. Hope it helps people like journalists or authors to keep links alive with their original content, even if the original website goes down.
Difference is that PageDash can archive pages that require authentication because it uses a browser extension to retrieve page assets (it doesn't just delegate the work to a server to do the crawling). It also can grab stuff within iframes. Basically the main feature is "archive the page exactly as you saw it".
Where did you live in 1994? Who lives there now? Is the building still standing?
I'm sure I'd get mixed answers to that and that's still the physical world, not the virtual one where "buildings" can come and go at the press of a button.
This is why projects to do with Internet archiving such as https://archain.org/ interest me, as imagine all of the dead links and data lost 23 years from now!
Best of luck trying to scale git. Microsoft took 3 years and they didn't add Office and Windows to it. And they nearly rewrote their own version of git.
IPFS seems to work effectively for storing versions of files. The downside is, as I mentioned in another post, that it's not permanent. Someone has to pin it or it'll disappear. But it doesn't require anything as complicated to use today as trying to modify git to do something it wasn't meant for.
Those were both founded in 1994, and the book spent relatively little time discussing the web (much of which still works, according to the article, despite the headline)
Mea culpa. I just read the heading and remembered the old times when they told us that Lycos finds everything.
The author only briefly mentions Usenet. That was great fun at the time though.
I have an internet guide from 1995 which describes Usenet and its varying focuses and levels of seriousness in great detail, including an extended section on trolling which concludes memorably by explaining that people that do this "probably have a very small penis (especially the girls)", which could just as easily have been written about Twitter yesterday.
Every time I read something about how social media is changing everything, I remember that it pre-dated the web by a decade...
they are essentially "shareware" CD's of the early '90's, but some are more or less "snapshots" of what was available on the web, check this one as an example:
I had to end up creating a link component that would automatically link to an archive.org version of the link on every URL if I marked as "dead". It was so prevalent it had to be automated like that.
Another reason why I've been contributing $100/year to the Internet Archive for the past 3 years and will continue to do so. They're doing some often unsung but important work.