Hacker News new | past | comments | ask | show | jobs | submit login
How Pixar Almost Lost Toy Story 2 To A Bad Backup (tested.com)
169 points by trevin on May 14, 2012 | hide | past | favorite | 75 comments



Obviously, Ms. Susman's circa-1998 home computer would in no way be able to replicate the level of storage provided by the Sun server cluster and its 4.5 TB disk array (http://www.hoise.com/primeur/99/articles/monthly/AE-PR-09-99...).

What was deleted, per Loren Carpenter on Quora, was, "in effect the database(s) containing the master copies of characters, sets, animation, etc. The frames were to be computed from that. A few hundred MB, if I remember." (http://www.quora.com/Pixar-Animation-Studios/Did-Pixar-accid...)

A little more detail than provided in the video, where they just called it, "the film", as if the movie's entirety of data were both deleted and duplicated on that home PC.


It is worth noting that Galyn's home machine, and the one that I drove over to get with her and return to Pixar as depicted in the video in question, was in fact an SGI. Not your average "home computer" by any stretch. Although you are correct, there was nowhere near 4.5 terabyte of drive space in that Indigo II or Octane (I forget what she had in her house specifically, although I can ask her if people really want to know).

The sum of all the source files (models, animation, shaders, lighting, sets, etc..) was probably in the hundreds-of-megabytes range at that point in time as Loren pointed out in his comments on Quora.

I've also added a lot more details to that Quora thread too (http://www.quora.com/Pixar-Animation-Studios/Did-Pixar-accid...) if folks want to read more about the whole thing, although it was 14 years ago so my memory is a little foggy on some of the details at this point in time.


Thank you for explaining this. It was literally the first thing that I was wondering.


Now it is time for me to verify my both home computer and business laptop backups.


I did this to myself in college, thankfully on machines where I didn't have most of my stuff. I figured it was an annoyance and put in a restore request to the support group. Days went by and I was getting frustrated that they still hadn't gotten around to it. Finally I caught the head of the support group in the hall and asked about it.

"Well, the good news is that we're getting a new backup system." Long pause. "The bad news is that your files are at about the third gigabyte of a two gigabyte tape."

Ever since, I'm always frustrated by people that assume things are backed up. If you don't test your backups, you don't have backups.


I cannot say who the client is for this, but it's a government department.

I built a cluster which used a disk array for storage. When I was contracted to build this I was assured that they were building several other clusters for use as test, QA and prod and that all I had to do was build the dev system and document so thoroughly that the other systems could be built from the documentation. At the time I wasn't yet security cleared so would not be allowed to build the other environments.

Some time later (6-8 months) I was called back to help solve a performance related problem. When I arrived I discovered that the dev machine was now production, and that no other machines were built. I would have to do the work on production.

Worse though... their backup and failover mechanism wasn't what I'd documented, but involved an identical piece of hardware nearby which was powered off. They hoped to simply turn it on when production failed.

Their daily process was to approach the disk array and remove the 1st hot swappable disk from the live machine, and swap it with the 1st disk in the powered off identical system. When the array had rebuilt the first disk, then move the 2nd disk.

I am seldom speechless. It was only a critical part of a £2bn project. Beyond all of the obvious WTFs, I still wonder whether the 2nd machine could be powered on at all given that all of the disks were pulled at different points in time.


> If you don't test your backups, you don't have backups.

I couldn't agree more! It's amazing how often the question "we take backups, right?" produces "uuhh..". And yes, if you don't test them, they're worthless. Because the rule is, if you haven't tested them, they won't work when the shit's really hitting the fan.


Back in the day we would take the days backup tapes and restore them to a test system that was also available as a "playground".


How should I test my backups?

I have a projects folder, which is on Dropbox and additionally on Time Machine. I can confirm they are really in both by browsing them. Anything more I should do?


Take a fresh, empty system (a VM if you lack hardware); restore data from backups to it; check if you can do your work.

It may be that your projects folder really is backed up, but that it relies on some stuff that is outside the folder for whatever reason - for example, a data file that didn't fit in dropbox, or some changes to config files outside the folder (where noone remembers exactly what needed to be changed); or a connection that relies on a private key that's on your computer, gets used automagically, but is not backed up.

I mean, it's probably overkill in your case - most likely your project source is quite safe; but for a working production system it's an entirely another matter.


i especially love the (existing) backups that can't be actually used to bring the system "back up" .... experienced this more than once.


I guess there's the related one - mostly by managers and developers - where they feel safe if all the data is on a RAID array without backups. Arrgh.


I was there at the time this happened. They were doing backups, but not testing them. The version of tar being used, from IRIX, I think, used 32-bit file offsets and couldn't back up more than 4GB of data to one file. The TS2 models and animation files took up more than that and it overflowed the maximum file size. The moral is that untested backups aren't backups.


If she wouldn't have had the backup in her house, couldn't you have simply restored the files from the original disks? Of course you would have to rebuild the folder structure somehow, but that would be the work of some days, not months, right?


Same era we had a backup that wrote to a DAT tape "/dev/nrst0", Somebody mis-typed it and the backups were all going to a file "/dev/nstr0" - we only noticed when somebody claimed they could read the backup with the tape ejected !

SunOS version mid 1990s - you had to be root to run the backup script AND you could create files in /dev without warning AND / and /home were in the same partition so it had lots of space


A few months into my first job a department called because their PC wouldn't boot, the hard disk was dead. They took tape backups every day, so I checked the last tape but it was blank. I asked if they backed up every day and said yes, and said that for the last two weeks or so the backups had taken only a few seconds, where they used to take about an hour. They were very pleased because it was such a tedious job. They had one weeks worth of tapes and they were all blank. Turns out the tape drive had gone faulty a few weeks before the hard drive failed, just long enough to go through a full cycle of all their backup tapes.


I watched some poor guy at an agency I worked for slide into the drive the single CDRW that his life's work was sitting on. A few whizzes later a large explosion shook the room from the drive and bits of plastic went flying all over the room from the drive (it was one of those shitty pioneer slot loaders).

He literally, despite being 30, cried his eyes out for about an hour.

That was when I learned not to fuck around with data security and backups...

I wish everyone this experience at least once but without the pain of losing so much. It's really important to always have the backup monkey on your back.


When I was 16, I lost a hard drive with 2 years of game development work on it. I eventually rebuilt the game much bigger and better, and shipped it, but not before I cried for three days straight. Now I never leave anything un-backed up that I would be disappointed about losing. Fortunately for us, storage is far cheaper now than it was when I was a kid.


It wasn't just a bad backup. It was a bad backup "system". One that didn't involve testing the backups obviously or even any redundancy. I guess the sysadmin never read the story (urban legend?) about the guy in Sweden who brought his backup tapes home and erased them on the car's electric seats.

The comment on the cartoon "I remember thinking this is really bad we're going to get our backups" comes from someone who obviously does not have the appropriate fear of what can happen to even begin to understand a good backup system. I'm guessing she is much more careful with her children and anticipates things in advance.

As far as "the backup being bad" where were they stored? Were they stored offsite?


Quick tip: refresh your dev environments from the previous day's prod backups. You get to test them - and the people's skills to operate the backup system - every day, for free.


Probably written wrong not degraded.

It's very easy to say the backup should be tested, there should be mirrored recovery systems etc. but this is a movie, even in a studio system this is a one shot deal, the systems and even the hardware will be disposed of after the master is cut,the people will leave for other jobs etc.

Now in the middle of crunch to make the movie - tell the producers you are going to double the HW budget for a test system to test the backups and you are going to stop doing any new work until a backup system is built and tested.

Personally I'm amazed they had any sort of backup beyond personal copies on animators machines.


"It's very easy to say the backup should be tested, there should be mirrored recovery systems etc"

I've been doing Unix backups for about 27 years. Back when it was more of a pain then it was in 1998 and certainly today. Total backup, incremental tapes the whole thing. Taking the tapes offsite. I wasn't a sysadmin but it was my business. So if I didn't have a backup it would be my problem and my loss both in work and money. I can assure you that setting up a backup plan (key word "plan") that evaluates the cost and trouble is worth it. The only reason higher ups don't approve of this is that they don't know about what can happen if something fails.

Ask yourself this question. Do you think Pixar changed their backup process after this event? I think they probably did. Not only because the failure happened to them but also because higher ups saw what could have happen if not for the woman who wanted to be with her child at home who saved the project in process.


Yep - There is nothing more eager to spend money on backups than senior management the day AFTER a disaster!


Probably written wrong not degraded.

That was my thought. Heck, it happened to me once.

Combination of a SCSI cable knocked loose, but not visibly so, and the brand new sysamdin (me in both cases) knowing enough to head and tail the backup log file looking for 'start' and 'complete' but not knowing to look in the middle for what was actually happening.

Lost a month of data in the (thankfully) little used legacy ERP system we used. If the entire plant had still been using it it would have been a disaster, not an inconvenience.


Pixar is a movie studio that developed mpviemaking technology over many years. No one is talking about backing up Tom Hanks in case he keels over.


My memory is foggy (it was 13 or 14 years ago) but to share a little bit of the technical landscape at that time with folks reading this thread...

I think we set things up giving her a full tree with her SGI machine on the netwrok, physically in the building. Then we sent the machine home with her and incrementally updated it weekly via rdist over an ISDN line.

I wish we had Dropbox back then!


It can't be that simple, Pixar has to have massive distributed systems right? I mean it can't be a single linux system with a single backup, if that's the case you would be safer storing it in Dropbox (Not that you would) since at least then you would have copies on all of your nodes.


Hi, I'm the "Oren Jacob" from the video. Rendering at that time was distributed across hundreds of CPUs in hundreds of Sun boxes in a renderfarm. The authoritative version of the film's data (models, animation, lighting, sets, shaders, textures, etc..) which was where all rendering pulled data from or verified that it had the most up to date versions, was stored on a single machine.

There were several backup strategies for that machine, but they failed. A few reasons are outlined elsewhere in this thread (running past the 4gig limit on tapes) but, if memory serves, other errors were also occurring that involved drives being out of space so error logging wasn't working properly.

Making matters worse, after our initial restoral, the crew came back to start working on the show and then, a few days later, we discovered that the restoral we thought was good was, in fact, not. So we had, in effect, split the source tree and now had a bad branch with a week of the crew's work on it, including animation.

So we had to untangle the mess as best as we could, which was about 70% script based, and about 30% by hand. Further details are in my post on the Quora thread... (http://www.quora.com/Pixar-Animation-Studios/Did-Pixar-accid...)


I imagine they're referring not to the rendered frames, but to the models, textures, etc. that the animators were using.

And honestly...the larger the system, the more things that can go wrong. Lots of people back up files to RAID arrays, only to find out that their parity drive(s) are hosed at the worst possible moment. The price of a good backup system is eternal diligence.


And even then, the entire film is created in storyboard form, and then the voice work is done, and then the final rendering is done. Sure, re-doing parts of the work would be a pain, but they wouldn't have had to get Tim Allen back into the studio.


If you are suggesting that could they store Toy Story II content in Dropbox you A. forget how long ago Toy Story II was (1999) and B. underestimate the size and throughput of all the data that goes into making such a movie.

edit: added "could" to make it clear that I'm suggesting that such was an impossibility.


The tech of 1999 is certainly underestimated today

Still, in 1999 I had a PII 333MHz and 10GB HD (and Windows 98, sigh)

But reading about the history of Pixar, the difficulties (in the first Toy Story) involved the workflow of creating a full-feature film (apart from all the other problems they had)

So I'm guessing, since Toy Story 2 was their 3rd feature film, things were still a bit raw (especially given their fast pace)


How is that underestimated, a world of 10GB Hardrives when that's much less than the base package of dropbox these days. Managing the terabytes of data required for the movie back then was a much more difficult task.


What is underestimated is what one could do with it.

Sure, 10GB fits in a usb stick today (or better, fits in a 3 year old one)

And managing the amount of data was certainly a challenge.

If you see the first Toy Story today you can think of that being rendered in real time by a modern graphic card easily.

But still, they had what was available and managed to do all kinds of tricks

Now see what they could do in 1984 http://www.youtube.com/watch?v=TYbPzyvResc


My comments are specifically about data storage. It was much harder to store terabytes back then.


Did you just decide to not read the part where he said "Not that you would"? He was making a point, not actually suggesting that they use Dropbox before it was invented to make backups of their multi million dollar movie production.


"Not that you would" Implies that you could. I was pointing out that they could not, for two rather poignant reasons. I didn't mean to do it offensively, but the point is that syncing and managing the huge amount of data that they have would be difficult and impractical over the internet even today, and this was also 13 years ago. Especially then, maintaining a single "master" copy on a single linux machine was in fact a pretty decent solution to the problem.


^ This. However wow it's been that long? My point was a single linux box seems "Low tech" considering you need a render farm just to produce the movie. Also they were clearly referring to assets and not the rendered film (Hats, sets etc.. getting deleted) which is 1000x worse.


The infrastructure to maintain authority on terabytes of data on late 90's hardware in a distributed fashion would be a pretty significant amount of development effort.

It'd be much easier to just get a giant RAID NAS, and manage it all there, which should in theory be quite sufficient assuming you are keeping proper backups. Many of the render nodes will have copies of the needed data, but the authoritative copy needs to live somewhere.


Toy Story 2 was released in 1999.

I'm assuming they were working on it quite a bit before then.

I think you're overestimating the tech available at that time


Toy story 2 was quite a while ago. I'm sure it's much different now.


Where is this true?

  It was most likely "sudo rm -rf .*" which will actually go BACKWARDS up the tree as well.
Edit: At the very least, this doesn't work on OS X 10.5, Ubuntu 10.04 (server edition), and CentOS 5.8. For example:

  $ mkdir -p foo/bar
  $ cd foo/bar
  $ rm -rf ..
  rm: "." and ".." may not be removed


.* will be expanded by the shell to . ..

and all local files beginning with .

the BSD version of rm will never remove . or .. see:

https://github.com/freebsd/freebsd/blob/master/bin/rm/rm.c#L...

checkdot is called against all argv:

https://github.com/freebsd/freebsd/blob/master/bin/rm/rm.c#L...

any version of rm that does remove '.' or '..' is not POSIX compliant.

The GNU coreutils version of rm does not contain this check:

http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;f...

(edit: it does - thanks, was running from memory and didn't actually check it)


The GNU version of rm does contain this check - it's in remove.c, not rm.c:

http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;f...


Well, that happened to me at least once on Linux, ten years or so ago. I don't remember if it goes backwards up the tree or just one level (I think it's the latter), or if the current 'rm' protects against this or not.

Imagine you place yourself in '/tmp' to throw away all those pesky '.' directories and files that applications leave there. You run that command, it matches '..' and your whole system is gone (btw, 'rm -rf .??*' avoids this).


Both are true: It will only go up one level, and current rm has protection that will not let it delete the root.


The shell expands .* to all directories beginning with a dot, including the two special hard links "." and ".." .

See for yourself by running:

  echo .*


'echo .*' and notice how the second entry is .., the parent directory. It won't go up the tree indefinitely, but it will go up one level.


Given my memory of standard .alias and .cshrc files back then, as well as the studio wide aliases in use, it seems likely the the command someone typed in expanded into..

> /bin/rm -r -f *

Running that command from the top of the directory tree where ToyStory2 lives should delete everything below, which would wipe the show.


Definitely true on Linux with bash - .* will expand to .. as well as all hidden files/folders.



Oh right, seeing this in the morning browse reminded me I was gonna test my offsite backups, but I was in bed with the iPad at the time.

pokes Backblaze and grabs a couple of random files

It works! Yay.


"And then, some months later, Pixar rewrote the film from almost the ground up, and we made ToyStory2 again. That rewritten film was the one you saw in theatres and that you can watch now on BluRay."

http://www.quora.com/Pixar-Animation-Studios/Did-Pixar-accid...


Someone had ran RM * on their shared storage, deleting the file system. They were "saved" by someone having a copy of the movie on their PC at home.

The data wasn't gone (only the file system was nuked), but would have required specialist intervention to restore. Or a hacker with the right tools (assuming such tools exist for the *nix file systems - they certainly do for fat32 and NTFS)..


I don't recall specifically what version of *nix was being run on what hardware at that time, but I do remember talking about the fact that things (like virtual memory) were writing to the drives before we even pulled the plug on the machine.

And, at that point in time, introducing more uncertainty into the restoral process wasn't going to help. There was already too much uncertainty everywhere we were looking.

I'll post about that separately...


"Someone had ran RM * on their shared storage, deleting the file system."

Doesn't this command, like any other delete-command, require top privileges (admin/root) that are unavailable in normal situations?


Presumably the people working on the film had the ability to edit the assets, and thus could delete them.


I'm still under the impression that modifying and completely removing are different things, but perhaps it was a sloppy configuration or that they put too much trust in their employees.


Depends on the filesystem. Traditional unix permissions are simply r, w, x.


In traditional Unix you can delete a file if you have write access to the containing directory. The permission bits of the file don't matter.

On all (?) modern Unix variants, you can set the sticky bit on a directory to also require write access to the file.


If you lack write permission on the directory, you can't delete any files in it, but you can still modify them if you have write permission on the files themselves. It's an unusual configuration, but it's possible with standard Unix permissions.


I would assume this would be standard forensics. Might not be tools for their particular file format, but it would be possible to carve the data.


Or a hacker with the right tools (assuming such tools exist for the nix file systems - they certainly do for fat32 and NTFS)*

*nix file systems purge the file immediately, unless it is still opened by some program. fat32 deletes the entry from the file allocation table, and the data is eventually overwritten as new files fill that same location.


The metadata may be zeroed out, but I'm unaware of standard Unix file systems that will zero out the contents of the files.


Linux filesystems like ext(2|3|4) don't zero the contents of the data blocks, and you can use tools to recover the data.


After about 10 years of not doing "rm -rf *" I finally did manage to hose an entire system when I entered the wrong directory (/) for script I had written to randomly alter files. 'Course it was on a test vm, but it was still pretty amazing to watch what happens when 10% of the files on a system had random data added to them, or were randomly removed. :)


My how far we've come. Today, many home owners have good backups. But a little more than 10 years ago, one of the most important movies of the last few decades was almost lost forever.

I imagine that Pixar has a lot of backups and some of them are offsite. Back in the late 1990s, I don't think offsite backups of a project like this would work.


An interesting dimension of this that other commenters haven't noted is that keeping a home backup of this scope would be so clearly against policy at most corporate technology companies as to be a termination-worthy offense.


Regardless of what happened, I don't think Toy Story 2 really would have been lost. The writers and animators would at least be able to recreate the general idea from what they remembered.


I've lost creative works to bad backups - even if you recreate the work immediately, it's Never The Same!


Yes, the creative process often works in a way that either you do the idea that popped into your head when it comes in or it never comes to fruition. You can't recreate the moment when you created something.


I once lost two weeks of programming to an errant delete.

I recreated it in about two days, and it was better.

Obviously, artwork != code. But I find a second pass nearly always improves things.


I'm curious to hear more about users' standard backup systems for life/work (other that a version control system for code).

Any thoughts?


Google has my email, Dropbox has my taxes and a few other important documents, wedding photos are also on a family member's computer, and svn / github takes care of my code for work / play.

Nothing else I have is that important. Lots of games and recorded TV shows. Either I already finished it, or I'm probably not going to.

I could loose my desktop or my laptop and only be out time. If both died at once I might loose a few important things, but nothing too life-shattering. In general, I can pick up a brand new computer and be productive by the end of the day.


Dkel




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: