Hacker News new | past | comments | ask | show | jobs | submit login
BBC To Delete 172 Websites Due to Budget Cuts, Geek Saves Them for $3.99 (readwriteweb.com)
149 points by rmah on Feb 11, 2011 | hide | past | favorite | 67 comments



Horrible reporting.

The cost to the BBC to keep those pages on its servers is more than $3.99. Just because one guy downloaded all of the pages, compressed them, and then seeded them on bittorrent for a cost of $3.99 to him doesn't mean jack.

Notice that he's not hosting the content in any easily readable form. No, he decided to put that burden on everyone by putting it up on bittorrent. Why isn't he hosting the content? Because hosting a heavily trafficked site ain't cheap.


The fact that it'd cost the BBC millions to maintain those pages means jack.

People were going to lose access to that information, and now they aren't. That's really all that matters. With torrents, the burden is on "everyone" only if they really want to keep it up.

edit: Reasoning for downvotes would be appreciated. The BBC has a long history of recklessly losing valuable data. See kgtm's link to previous yc post on this topic.

edit #2: Oh, okay, I see what's going on. When I said "means jack", I didn't mean to imply that the BBC should spend inordinate amounts to keep the data up. What I meant was: Yes, it might have cost them to do it, but look, like what this guy did, there are other ways to keep the data archived. Hell, I'm sure some people would even settle for a CD archive or something.


It does not mean 'jack', it means money that comes from license fee payers would be spent, I am not justifying their removal, but it's overly simplistic to reduce it to a mere discussion of storage costs, the BBC has a responsibility to maintain and support their offerings, it's not rocket science to see that doing that will cost money. Nice that someone has archived that data, I truly appreciate that, moving forward though I think it would be more useful to look at where money is spent and discuss that real issue, not get sidelined with quips and digs at a truly unique organisation that has long set standards in media of all varieties.

As far as losing valuable data, well okay, they maybe blanked a few copies of Doctor Who here and there, but it was not considered valuable at the time, and as always cost was a factor then as it is now, it's easy to pass judgement with hindsight.


> The BBC has a long history of recklessly losing valuable data

This is quite a fascinating topic in itself. http://en.wikipedia.org/wiki/Category:Lost_BBC_episodes http://en.wikipedia.org/wiki/Doctor_Who_missing_episodes


Erasing master tapes of actual shows is one thing; not bothering to preserve promotional web pages for eternity isn't even in the same ballpark.


In fact I see it more as them picking up after themselves.


I agree that BBC should have done something to preserve the content in some form, not necessarily as easily accessible and readable. Heck, they could've packed it up and put it out as a torrent which they would seed themselves and it would've been better than just dumping it.

I upvoted you because I agree about the content, so I can only speculate about downvotes. If I had to guess, I'd say it's probably because you either missed or ignored the point you replied to, which is that the reporting was, indeed, terrible.


Actually he's probably being downvoted because he's taken a simplistic view to what is a complicated issue, and he's being pretty free with my tax pounds :)


You're saying that archiving the data, or packaging it up in a torrent rather than just throwing it away is a waste of money?


"means jack"

Actually it doesn't mean jack. that's my money they'd be using, and from a quick scan of those websites, i'm happy for them to disappear.

For some reason, many people seem to have decided that all data is important and there's an almost fetishistic devotion to saving everything that can be saved. Not all information needs to be preserved for all time.


> Not all information needs to be preserved for all time.

I disagree, and strongly. Future historians will probably disagree with you too. Even the most inane TV shows can prove to be extremely valuable when trying to decipher the culture of various countries 300 or 3000 years in the past. Imagine reading a trashy romance novel from 1000 BC. You'd learn a lot about the people who lived in that time.

You might counter this by saying that with all the information out there now, we only need to keep the "good stuff". But who decides what's good, right now? How do you know that what you decide is good will always be seen as good? Your personal opinions on the content disappearing are totally irrelevant when perhaps millions of people have seen it or have been affected by it.

All information that is created should be saved, especially since we're able to. Imagine the ancient Romans deciding to burn a bunch of books because it cost too much to hire guards for the library. Ouch.


I agree with you, but practically, not all information can be kept.

There are teams of people right now deciding what is 'good' (i.e. what is kept) and what is shredded. They're called archivists, and, using their judgement in conjunction with government-created records retention schedules, they're culling both government and private records. This culled information is removed for a variety of reasons, space being one of the most prominent. By such means is the pool of available primary sources that will form the basis for future histories created by underpaid and underfunded but well-meaning bureaucrats.

Sounds like the same situation obtains not only in the dead-tree archives but also in the digital ones. Sigh. I wish people would prioritize our cultural legacy first instead of last when it comes to funding.


In your analogy, instead of burning the books they could just dump them in big trash heaps in Egypt, or in jars in Palestine, which correspond to bittorrenting quite nicely.


Mind, I don't think future historians will weep too much about losing some promotional websites for such actual content.


Is the downvote above because someone disagrees, or because the downvoter is just completely unaware of the nature of the sites being taken down?

Most of these sites are promo fluff for BBC TV shows. Some are cancelled, some are for news show with little more "content" than "This show next comes on --- on BBC -".

Most of these sites would be dismissed as "marketing" if the BBC were a for-profit company. Marketing for defunct products. The rest is material that's being legitimately archived elsewhere.


(OT: I didn't downvote you above; in fact, I can't, since you replied to my comment I think. However, note that many of us frequently accidentally downvote instead of upvote and can't undo it, especially on mobile touch-browsers if we're not zoomed-in enough for a big arrow)


I've occasionally done that myself, but I then post a note that I meant to upvote.


We don't have the ability to preserve all information for all time. Storage media decays, formats obsolete, storage imposes cost, space and time considerations that are non-zero and non-trivial.

If these sites were a special instance of our culture, then fine. But they're not. We don't need every single episode of Eastenders archived for all time to understand culture. A good sample is way better.

These are different times to the classical era. We have a glut of information, most of which, once archived, will never be looked at again, and will only cost money until finally abandoned.


"We don't need every single episode of Eastenders archived for all time to understand culture."

Speak for yourself!


"We don't need every single episode of Eastenders archived"

I'm sympathetic to preserving the shows themselves.

Their promotional pages on the BBC site when they're no longer in first-run? Feh.


I went to a museum once where they just kept everything, I learnt 2 things, one, that the first radio batteries came in more varieties than we have today, and two, never go to a museum that try's to keep everything...


How much would it cost to re-host all these sites? If you hosted a snapshot it'd be completely static, a small cluster of Varnish servers would be trivial to set up and could take quite a beating (alternately you could leverage a CDN I guess? I don't know much about such things). Either way I doubt it'd be a huge investment to keep them online as static, archive sites (which is a very different commitment to keeping them actually live, which would obviously incur many more costs).


I'd not expect any of these sites to be heavily trafficed going forward, so my guess: $5/month.

Note that large portions of geocities are being hosted on the web by various parties, most of whom are apparently using pizza money to do it. Geocities was still in the top-100 websites or so when it closed, and they're probably getting decent amounts of traffic from people trying to find old sites.

Even if there is heavy traffic, you're hosting a archival copy, not a production site, so you can just degrade performance as needed. See often slooow web.archive.org :)


Completely agree. I think there'd be an initial spike, but anything after that would well be within the capacity of my server, which is ~£30 a month. Not a high price for preserving the information... but how long would I want to keep it online, especially once it was only seeing a handful of hits a day.


Well there is no reason why all of those files couldn't be made available by the same server that is currently acting as the master seedbox - and thus would still be for $3.99.

No, he decided to put that burden on everyone by putting it up on bittorrent.

Give a man a fish and he can feed his family for a day. Let the man catch his own fish and he can feed his family and community for a lifetime.


How many people every day you think would access a BBC archive sites containing personal memoires of WWII survivors? Maybe 10? 100? Let's call it 10000? You can serve 10.000 people a day with a very inexpensive setup, maybe more than $3.99 a month but definitely not high enough to justify this kind of cuts.


>Why isn't he hosting the content? Because hosting a heavily trafficked site ain't cheap.

Rehosting it isn't legal either, but then neither is torrenting it.


Generally, when the torrenting / duplicating of such data is for preservation purposes, everyone turns a blind-eye, and there's even legislation most places to support such efforts. Nobody would be making money from it while it doesn't exist, and it can be viewed as "historical" or "educational", both of which are usually protected uses of duplication.


>there's even legislation most places to support such efforts.

I've seen previews of suggestions for legislation but no actual legislation for this sort of scenario, can you point me to some.

Making money from something isn't a requirement for copyright infringement, nor even is it the test for commercial use of copyright works.

I've never heard of a "fair use" case on historical (preservation) grounds can you point to one? As for educational use in Europe one generally doesn't have the relatively liberal educational use allowances that one does in the USA, and from hazy memory I think that the USA legislation does tie down quite well what is educational use. Certainly unlicensed redistribution except on the most de minimis basis wouldn't get a pass as fair use for education.

IANA(IP)L and am a little out of touch wrt the latest legislative efforts.

---

OT rant:

Just recently it's interesting but all my posts in which I've tried to appraise people of copyright legislation or at least the potential threat of such have been modded down substantially. I have pretty strong views contrary to the current established laws myself, FWIW, but I feel that even if you're an anarchic dissident it's important to know the laws one is fighting against. Clearly others here disagree that reminders about demands of IP legislation are worthwhile for this type of forum.


I'll hunt around a bit, but I don't know of any off-hand. As to evidence of the attitude, I point to abandonware sites. Technically illegal many times, but companies getting uppity about things they don't sell appearing on abandonware sites is exceedingly rare. Similar to anime companies vs fansubbers prior to the anime being translated and sold in that particular area.

Since you seem like you may be interested: looked at http://www.groklaw.net before? They've had quite a bit of activity on IP laws, last time I checked (as has the rest of the internet, but still).


You can browse the entire archive at www.bbcattic.org


I can't believe this passes for reporting. Did anyone contact the BBC for their estimate of the amount they'll save and how?

Also, just because the info isn't public does not mean the BBC doesn't have it stored in their own digital archives.


The BBC has stored at least some of if in an archive in cooperation with the British Library. I've posted a link to an earlier article; you can find it by browsing some of the "TLDs" they're removing.

(BTW, did anyone notice that the BBC calls these sites "TLDs", and that they took a beating for that by people who assumed they were incorrectly referring to "Top Level Domains"? But, TLD can mean "Top Level Directory" too, which is entirely accurate. Most amusing...)

Anyway, I think it's good that we're developing a culture where some people care, and make sure that things get archived, even if it's done sometimes unnecessarily, and almost always only gets a degraded copy (ie, a web rip without streaming videos, and without structured data). I only hope that these vigilante actions don't lead companies to not pre-announce massive data erasures.


Just a note for those wondering about the content of the archive.

Having downloaded it, it's around 2GB compressed. It contains images, but most pages are nonfunctional, due to links being specified from the site root /. To view it properly you need to place the folders at the root of your web server or of your hard drive.


That does not really tell me what is in the pages though. What are the 175 sites that would have been foreve lost?


Load the torrent into your client and it will show you the file structure. You then select the file(s) you want to download.


Indian Food Made Easy is one of them I'll be saving. There's a big one on WWII. It's really a mixed bag.

The full list is in the torrent file.


The BBC needs to make visible cuts in places where the British (anti-BBC) press accuse them of providing services that they believe should be provided by private companies.If the site doesn't vanish the press wouldn't see it as a real cut, would they? [1]

[1] sambeau, http://news.ycombinator.com/item?id=2188870


Well said that man! :)

Also, keeping these sites (without actively updating them, monitoring comments etc) would essentially cost the BBC nothing, especially the low-traffic ones. BBC Online is a very lean organisation.

see: http://news.ycombinator.com/item?id=2189564


The BBC makes a show of "cutting costs" by closing down promotional sites for a pile of mostly-cancelled TV shows and handing off some sites that might actually count as "content" to the British Library. (Meanwhile, they leave up stuff like this because people actually watch Doctor Who: http://www.badwolf.org.uk)

Why is this a story, even in web circles? Why all the hyped-up outrage?

Are people associated with the BBC shilling this?


Not necessarily. But the BBC are. The current UK government has asked all departments to cut spending by more than ever in history, even more than when the empire nearly went bankrupt after the napoleonic war. The BBC whilst nominally independent are funded through a state tax and are a state-owned chartered corporation. The current tories have always been treated with hostility by the BBC as the tories tend to want to reduce the BBC's funding, and the Lib Dems never really got much of a word in edgeways. Consequently the BBC (with it's massive media power) is pushing this to make it very visible that they're doing something about cutting costs.


The real issue is "Cool URIs don't change" vs "The BBC is hideously bloated". Keeping these sites online isn't so terribly expensive. Keeping a few hundred extra lines in your .htaccess isn't expensive. I've not seen a decent estimate of the cost savings, and as far as I know one doesn't exist. The material is being archived inside the BBC, as well as in torrents like this one. But the URLs are being broken, for no reason besides sending a message.


If the BBC is unable to host these sites because of budgetary issues, is the BBC also unable to pursue copyright infringement of these same sites if a third party were to set up the site/s in a different country?


I could easily imagine one of the problems could be the BBC not knowing what rights they have to the content. Likely some of it will be stuff they only have limited rights to; donating it to an external project could plausibly get them into trouble. And hiring somebody to check the contracts over 172 websites won't be cheap.


Did any of you actually see the content?

I downloaded one of the sites in the torrent titled Zombies. It's about a British girl who organizes a community effort to make a Zombie Movie. Sure the site was ripped but not the video content. Nothing on the page is worth seeing other than the video. Once the BBC turns off access to that stream the archive is virtually useless. I'm going to checkout more ripped sites; my gut says they're probably video heavy too.


I've looked at it a bit more. All the smaller sites are just place holders for lesser important topics or for streaming BBC shows. Sure the torrent saves some user comments (not all) and images (not all, some links are broken) but there's not really a lot of meat since that content is periphery.

I'd say that most of the valuable content is still stored on the BBC servers, in the DB where it's being served from right now. I'm sure they're going to repurpose the good stuff if they haven't already. I have an acquaintance who works in the web department at the CBC. They've implemented a pretty neat CMS to repurpose their older media to work with their latest site. I can see the BBC doing the same.


It is near impossible to grab the audio + video assets because they are streamed which makes it impossible to obtain through wget.

Further more, many of those assets are accessed via iPlayer which has restrictions on access outside of the UK.


I don't understand is situation, but it sounds like a case of the Government removing a high visibility public resource as a sort of protest to budget cuts. Eg: http://i.imgur.com/Vdk6D.jpg

Even if that is the case, I'd sure like to see a breakdown of expense when a cut like this is made.


Almost all of the sites being archived or deleted (yes, archived - the reporting that these are all up for deletion was overly simplistic and incorrect) are for programmes or programme strands that are no longer being broadcast, in some cases for many many years.

From what I've seen reported, the only site of any significance which is being removed is 'WW2 People's War', which asked people to put their and their family's personal stories of the second world war into a central archive (http://www.bbc.co.uk/ww2peopleswar/). I believe that this was already being archived by the British Library, so won't be lost.

The overall decision to cut 50% of the 'TLDs' is definitely political, to placate an administration that's beholden to the Murdoch media empire, but the choice of which ones to kill seems mostly pragmatic.


Exactly. When cities are facing potential budget shortfalls, they say things like "If we don't get a tax increase, we're going to have to slash the fire and police departments." They highlight the most important areas (which should of course not be the first items on the chopping block) instead of the actual marginal areas, in order to scare people.


Am I missing something? Where is the price of $3.99 derived from? Cost of creating the spider script? Labor cost of creating the BT seed? This article say nothing about this.


I think it's the (monthly?) price of the seedbox.


Does anybody have a good idea of how much it does cost for a big organisation to keep this kind of content online?

Intuitively, it seems that the answer should be 'not much' -- perhaps on the order of a few thousand dollars/year. A couple of servers, bandwidth, and a sysadmin checking in now and again to apply security patches &c.

But here (also with e.g. yahoo closing geocities), it's argued that the cost of keeping them up is much, much higher. Where does the expense come from?


It's funny, the activist seems to think the deletion is about appeasing political masters, but in doing this, they actually lend credibility to the opposite point: Just leaving the pages in places is unlikely to cost the BBC a lot. Is it possible that the BBC is cutting in the most visible way possible, in order to make the cuts seem much more extensive and painful than they actually are in order to scare the government away from further cuts?


Here's the actual guy who did this. http://178.63.252.42/


I'm sure someone will take the abandoned content, slap it on a server with some Adsense and call it a day.


We've graduated from the grammatically-incorrect singular they to s/he? Really?


Worse still, they (being singular where gender is indertiminate, or you seek to be neutral) is not grammatically-incorrect, and has a long history in the English language. It's a bit like the split infinitive, which is often considered to be incorrect but isn't.

The author: they really should have known better than "s/he", to boldy write for RWW.

Edit: Added a sourcewith examples - http://www.crossmyt.com/hc/linghebr/austheir.html and the Wikipedia page seems to have some discussion about it too - http://en.wikipedia.org/wiki/Singular_they


Should a singular they be used exactly like any other singular pronoun:

  "I will say of that person: they laughs a lot."
  (Compare: "I will say of that person: she laughs a lot.")

  "They is the person you should talk to."
  (Compare: "He is the person you should talk to.")
Or should it instead be:

  "I will say of that person: they laugh a lot."

  "They are the person you should talk to."
? And why?


Presumably if you're referring to a specific individual, you already know the gender so using "they" in that situation would appear to be incorrect.

However, in the general sense I would argue that the conjugation of verbs follows a pattern but that pattern doesn't have to be singular vs. plural. Given "he has" or "she has" one might expect a singular "they" to follow the pattern "they has". However, "I" is singular yet it uses "I have", so it's not unexpected for the singular "they" to use "they have". In other words, the conjugation of "they" is the same regardless of whether it's singular or plural.


they is used in a bunch of situations where the person prefers to be non gender specific. It only feels wrong for the first little bit, but hang out with enough people that have complex gender identities and you will get used to it.


Hyphenating "grammatically incorrect" is grammatically incorrect.


Depends whether you're using it as a compound adjective or as a separate adverb and adjective.

Correct: "Grammatically, this paragraph is incorrect. It is grammatically incorrect. I'm sick of these grammatically-incorrect paragraphs!"

http://en.wikipedia.org/wiki/Compound_modifier


Actually, it doesn't. A compound modifier isn't hyphenated when it's clearly an adverb modifying an adjective -- e.g., when the adverb in question is a -ly word.


Was it in the contect of the BBC? Often known as 'auntie' in the UK...


It's not the hosting that costs much usually, it's developers + editors + managers + their managers. It all adds up very quickly.


How much do you pay editors and managers and developers to keep 12 year old sites sitting on a webserver?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: