Hacker News new | past | comments | ask | show | jobs | submit login
ArchiveTeam rescues Justin.tv videos (twitter.com/textfiles)
101 points by sp332 on June 10, 2014 | hide | past | favorite | 25 comments



I don't think anyone has mentioned this yet, so I might as well: unless they changed it after I left, JTV has been deleting videos older than a week for many years. I wrote the original (well, almost original) system, and IIRC, we took it down to a week of storage back in mid-2009 or so.

JTV caught quite a lot of static for "not giving more than a week's notice", but it's pretty hard to do that when you're going from one week of storage to no weeks of storage.

Also, since people are asking: back when I first worked on it (circa late 2008), we started by installing (again, IIRC) 40TB of RAID6 disk space, and that was enough to store about a month of video before we ran out. By the time I left, we had something like 4-5x that capacity, and we were down to a week of archive storage. So that should give you some perspective on the amount of data involved....and it's probably gone up substantially since then.


> JTV caught quite a lot of static for "not giving more than a week's notice", but it's pretty hard to do that when you're going from one week of storage to no weeks of storage.

I have trouble understanding this point you make. What prevented them from announcing something along the line of "We keep videos for one week at the moment; in 3 months time, however, we will stop this and there will be no more storage." ?


I'll be glad if they remove a bunch of the early stuff, to be honest. I was in the closed beta with iJustine and a few other people, and people were dicks to us whenever we got on camera.


I love the ArchiveTeam.

Jason Scott, the leader of the project, gave a fantastically entertaining talk about how they saved Geocities, Yahoo! Video, and Friendster — using a Distributed Preservation of Service Attack. Definitely worth a watch.

https://www.youtube.com/watch?v=-2ZTmuX3cog


Since people are interested... a fact that may be obscured from the conversation and statistics is that somrthing like 550gb of those videos have ZERO views.

Kudos to archive team!


How can videos with at least 10 views also have zero views? The ArchiveTeam is only keeping videos with at least 10 views because the total size of those videos with less than 10 views is 1.01 Petabyte.


Sorry for not being clear. Out of the 1.1 petabyte, something like 50-60% of that has zero views. Archive Team duplicated everything with 10-infinity views, which was about 10tb.


550 GB, is that all? Or do you mean 550 TB?


The page for their project: http://archiveteam.org/index.php?title=Justin.tv

I'd be curious to know how many hours (days? months? years?) of video content JTV actually has.


5 years ago, it was 22 hours recorded per minute[1]. That translates to 25 years per week, which is their previous retention term. The volume has probably gone up substantially in the intervening time.

[1] http://mashable.com/2009/05/21/justin-tv-usage-stats/


So they're huge numbers, but still pretty small in comparison to YouTube which claims to have 100 hours of video uploaded every minute [1] - meaning they probably have at least several millenia of stored video... wow.

[1] http://www.youtube.com/yt/press/statistics.html


I guess this answers why they're not doing archives anymore. 1,000 TB of videos with fewer than 10 views!


Yeah. Do we know what Justin.tv's hosting costs for all that data was, or what infrastructure they used to store and serve it? That's a lot of data to take care of for a service with no direct monetization.


It is really out of date but the high scalability article has some information http://highscalability.com/blog/2010/3/16/justintvs-live-vid...


Where do you see that statistic?


The rest of the tweets in the thread:

> What about grabbing the 9 views, 8 views, etc? We have more time!

-@erazmus

> The problem is that at some point, we end up with a LOT of disk space used for VERY obscure videos.

> Normally, archive team does not care, but in this case, the disk space will increase to over 1.1pb. PETABYTE

> That's space that can be used to save a LOT of other work out there that needs love. Tough decision.

- @textfiles


They said they archived 10TB (and that covered everything with at least 10 views), and that the total including all videos would be 1.1PB, ie 1100 TB. So there is over 1000TB of videos with less than 10 views.


1.1PB would cost $56,000 at the cost of Backblaze's latest pod, based on their recently posted pricing.[1] That would be for live storage in physical computers. If they put it on tapes at $0.01/GB, it would only cost $11,000 for the tapes.

1: http://www.tuaw.com/2014/03/19/backblaze-now-storing-100-pet...


Read the rest of the conversation:

  >What about grabbing the 9 views, 8 views, etc? We have more time!
  >The problem is that at some point, we end up with a LOT of disk space used for VERY obscure videos.
  >Normally, archive team does not care, but in this case, the disk space will increase to over 1.1pb. PETABYTE
  >That's space that can be used to save a LOT of other work out there that needs love. Tough decision.


I wonder if Usenet is at all a consideration for excess archival. Any NSP I've heard of allows for unlimited uploads. Sounds like a good use for the archaic system.


Not an option. Usenet doesn't preserve data long-term; it's effectively an epheremal medium. (Some archives do exist, but they almost all skip high-traffic groups.)


Most premium providers haven't deleted binaries in several years. At worst, it could at least be temporary archival until more permanent storage is acquired in the future. For example, place the 0 views Justin.TV vids there until ArchiveTeam and justify the space themselves.


0-view vids are over (estimated) 1,000 TB! I don't think most Usenet providers have that spare capacity.


Yea, that's a valid point. I'm pretty sure some could handle it, but many peers may not be able to. Using Usenet for mass archives would have to be done gradually. "more than 25 petabytes of Usenet messages on the Giganews servers." ... "Giganews is 50 percent bigger today than at this point last year and we continue to add servers with this kind of growth in mind." [1]

I just think it's an under utilized medium for this sort of media conservation.

1 - http://www.giganews.com/blog/2014/01/giganews-reaches-2000-d...


In the posted thread.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: