Hacker News new | past | comments | ask | show | jobs | submit login
Your Personal Archiving Project: Where Do You Start? (loc.gov)
98 points by diodorus on May 13, 2016 | hide | past | favorite | 20 comments



I wonder how encryption, if it indeed becomes as widely used as many of us hope it does, will affect the ability of future historians to read all these archives.

All of my computers and backups are encrypted. If I were to drop dead tomorrow or suffer some kind of memory loss, 99% of the documents and pictures I own will turn into meaningless blobs, assuming the cryptography in the products I use are any good. I have no intention to print out the keys and give them to someone while I'm still young and healthy. Part of the reason is that I believe those files should die with me. There's nothing incriminating, nor anything particularly embarrassing, it's just nobody else's effin' business. That's what privacy means to me.

And yet, it is the business of historians and librarians to poke into dead people's private documents and learn from them. In an ideal future society, maybe it will be historians, not spooks, who try to break commonly used encryption methods.


I think this is going to be a big problem, even without the specific hard-failure mode of encryption — people are generating tons and tons of data and most of it goes in the same place and is frequently hard to manage.

A generation ago, when someone died their next of kin were probably going to be reasonably comfortable donating the contents of their office to the local university — you could quickly scan to see if there was anything truly bizarre hidden in the closet, most people didn't take that many photographs (especially NSFW ones, what with meddling clerks) and you could quickly look over the ones they did have to make sure there wasn't anything you wanted, etc.

Now, though, almost everyone carries around high-quality cameras and generates gigabytes a year of geo-located pictures. Many casual conversations which would have been lost forever are now recorded. Some of it might be easy to exclude (“Scholars will just have to live without the Chief Justice's online dating history”) but it's hard to imagine being able to quickly go through a mixed service like Facebook or a cloud photo backup service without some potentially embarrassing oversights.

I'd bet the most likely outcome of this would be scholars having to wait for increasingly long embargo periods to expire to avoid the odds of data-mining or computer vision finding unwanted links to people who are still alive.

That's not to say that this didn't happen in the past – a good example was the time that Justice Thurgood Marshall donated his papers to the Library of Congress under terms allowing access after his death, leading to a minor firestorm when a clever Washington Post reporter used them for insights about other people who were still serving on the Supreme Court:

http://www.nytimes.com/1993/05/26/us/chief-justice-assails-l...


If your data dies with you, the rest of us lose whatever insight your data might have given us.

Suggestion: leave decryption instructions with (attorney, loved one, safe deposit box, etc.) so we don't miss your take on life. Someone (can't know who) may benefit and spend a moment thanking you.


Shamir's Secret Sharing[1] seems like it would be a good mechanism to encrypt decryption keys with - you can give a part of the key to several people, and require that a certain number of them put their keys together to decrypt the information.

[1]: https://en.m.wikipedia.org/wiki/Shamir%27s_Secret_Sharing


Shamir's approach is just one option for secret sharing, so you are being overly specific.

The Wikipedia page for secret sharing links to https://en.m.wikipedia.org/wiki/Tontine

Apparently the word has been use to refer to secret sharing. It hints at the problem with using this strategy to preserve access to encrypted data: all the sharing parties will die eventually.


The most difficult issue I've encountered is identifying a format to use. Here are some specifications, which I think are widely applicable:

* Readable 100 years from now. If that sounds crazy, consider whether your family currently has photos 75-100 years old? Ours does, and I don't think it's unusual. Come to think of it, possibly the factor that determines the age of a family's archive is how long they have had access to cameras.

* Maximum fidelity: Complete and correct, over time. This involves resolution, color gamut and some sort of color correction (I'm not quite sure; I'm no color expert), and error correction for the ravages of time.

* Metadata: Stored in the same file as the image (otherwise the relationship between image file and metadata file will almost certainly be lost in 100 years), readable 100 years from now (UTF-8 seems the obvious choice), machine-readable and parsable for building databases, user-defined fields, and editable (to add and update information after the image is created).

* End-user control of data: That online service you're using might not be around in 100 years, or even in 1.

Maybe I'm missing one or two specs here, but does anyone know the recommended solutions? What do professional archivists use?


Pdf/A and xml based formats in theory. And in general use open source software. In practice use the Odf format because I support that and it does what you want. And don't use pdf/A because too many implementations are pretending they produce pdf/A while they don't.


> First, approach your collection as a single unit of stuff. Don’t dwell on individual photos or letters yet. Think about the entire collection as a mass of related things. Kells said, “You’ll scare yourself if you think, ‘I have two hundred things.’ The project will seem bigger.” It is one collection.

This is good general advice for solving any problem, from archiving a lifetime of curios to building a complicated software project.


One of my first jobs out of college I was stuck on a bug for several days and I was very obviously in panic mode with a deadline looming. An older engineer agreed to help me if I met him at his office at 7am. Up bright and early he greeted me at his office door and immidiately asked me to demo the bug. After seeing that I had a grasp of the issue he didn't try to troubleshoot. He just smiled and said, "You'll figure it out. Just ask yourself what is one thing you could do right now to get closer to solving the problem. That's what I always do and something always comes to me." With that he stood up to indicate the conversation was over and escorted me out of his office. I'm 35 now and it hasn't failed me yet.


Great story and great tip. Thank you for sharing.


Reminds me of the saying:

If you think about the work, you will always have work. If you think about the goal, you will reach the goal.


I'm pretty much exactly the opposite. A problem seems huge and daunting until I subdivide it into steps which I can complete in under half a day.


If you have hundreds of thousands of files to categorize and put into folders, you will be subdividing forever. It's much easier to put everything in as few piles as possible and rely on searching and other methods. I think that was the intention of that comment.


The point is that I'm not going to subdivide the files, but subdivide the tasks - "I'm going to search for all my personal letters to mum this morning", or "I'm going to look manually through 5,000 email subjects for things that look interesting".


or worse, “I’ll just get rid of it all.”

No, that's really the best thing. Don't keep a lot of crap. Even sentemental crap. It clutters your life, physically and psychologically.


i agree. it's quite cathartic to think back on all the information i've lost over the years, vague memories of past events that mainly live on through their consequences on my life and personality.

although some things, especially technical knowledge, are great to archive/store for later retrieval. i just don't feel the same for personal things. i think it's healthy to forget, especially in the information age where being able to do so is practically a luxury.


I agree too, getting rid of mostly everything I owned, regularly cleaning my hard drive and bookmarks is the best way. The less stuff that clings onto you the better.


The problem with sorting physical objects is that it's basically like inserting everything into a database for later retrieval, except you're only allowed to have one index (whatever you choose to base your categories on).

It's often better to scan things like photos and documents and tag them; that way you don't have to decide ahead of time how you're going to search them.


I always have this joke I tell which I'm starting to get more serious about. I'd like to get a medium size statue of myself made that I can hand down. It's hard to lose and would need to be deliberately destroyed(or disaster) to stop existing.


Relevant project that's usable now: https://camlistore.org/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: