Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This sounds like there's a whole other story in the aftremath, would be interesting to hear how they dealt with it (deciding between dropping scenes vs reproducing):

> Jacob admits that about 10% of the film’s files were never retrieved and are gone forever, but they were able to make the film work without the scenes.

Also, this seems to be a second hand rereporting, the linked article has a little more detail (https://thenextweb.com/news/how-pixars-toy-story-2-was-delet...)

It also has the real beef of the story:

> Pixar, at the time, did not continuously test their backups. This is where the trouble started, because the backups were stored on a tape drive and as the files hit 4 gigabytes in size, the maximum size of the file was met. The error log, which would have told the system administrators about the full drive, was also located on the full volume, and was zero bytes in size.



At AWS, we use this story in some of our presentations to quickly illustrate the importance of backups and testing them. It also highlights the fact that DR scenarios can often arise from simple mistakes such as "rm -rf", and not the "hurricane" or other global event example that often gets overused. A good, cautionary tale.


The linked article is much better. This event happened right about the time I left Pixar and I heard about it long ago. In the end, much of what was recovered from Galyn's workstation wasn't used. Some time after this event, it was decided that the film was not working and a huge rewrite was done. John Lasseter stepped in as director and, through Herculean efforts, the film was completed on time.


Modern development practices — idioms, frameworks, libraries, traditions — make it difficult to test your backups. The priority on responsiveness means that you have to work extra hard when architecting the system to ensure that you can restore from backups without losing recent changes — which makes it impractical to confirm that your backups actually work.

Offering Continuous Restoration as a feature should theoretically be a way of differentiating a development tool in the software marketplace. But short-termers hold power in so many organizations that it's not clear to me whether it would actually win. Even if ransomware kills the company, executives still keep their money.


That link provides more info about the story, I remembered reading about it years ago, Also after they restored most of the files deleted, most of the movie was scraped and redone.


Scraped or scrapped? Can you explain more?


scrapped - after we got this version of the film back, we decided to rewrite the story. that meant that we substantially rebuild the film in the next year, even after we had recovered the version from Galyn's machine.


Many years ago I was responsible for backups in a mixed Unix/Windows system. I asked for funds and time to do a disaster recovery exercise in case a server system disk failed but was turned down flat. Luckily we never had to restore anything more than a few individual files.

It's not enough to confirm that the backup has actually been created you need also to have the procedures necessary to do the restore and the personnel and hardware to do it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: