Hacker News new | past | comments | ask | show | jobs | submit login
Archivematica: Open-source digital preservation system (archivematica.org)
141 points by sanqui on Oct 2, 2021 | hide | past | favorite | 14 comments



Others out there, such as:

    - DataVerse: https://dataverse.org
    - Omeka: https://omeka.org/s/
    - also many internally developed archival systems
To us it's mostly up to the granularity of the metadata, and wether metadata fields support e.g. ISO standards for validation (which is why it's difficult for a single metadata standard to rule all, unless all you need is free text descriptions). Another need is a way to batch ingest.

    > Compatible with hundreds of formats
I'm becoming increasingly critical of file format verification as a gatekeeper for digital preservation, but perhaps someone can explain why I'm wrong. Striving towards open specifications is of course a must for long term preservation (cave paintings persevere, while digital data is fragile in so many ways), but if conversion incurs data loss, parallel archiving should be an option. Worse, we've had issues with old systems that deny archiving data althogether because of an automated format checker standing in the way (we'll use FITS [0] in the future, but that just uses a bunch of other type checkers under the hood and doesn't seem bomb proof from the little testing I've done). Formats such as MP4 also seem like a nightmare to validate (granted, lots of cameras out there ignore following specifications). But archiving nothing at all over a few proprietary formats every now and then is a horrible outcome (see experience above). At the very least, it must be possible to override automated format checking if necessary.

    [0]: https://projects.iq.harvard.edu/fits


You'll want to be sure to test FITS thoroughly for your use case. It really needs some love. It's using some outdated tools and has a number of bugs. I submitted a bunch of PRs to the project this summer, but maintaining FITS is currently not a priority at Harvard, though I was assured that its not abandoned.

As a digital preservation aside, folks may be interested in OCFL[1], which is a preservation focused storage layout specification that had its 1.0 release last year.

[1] https://ocfl.io/


Ouch, thanks for the heads up. :/ We'll be stuck with FITS this time around, though.

Also, someone should put OCFL on the front page, while HN is in an archiving mood.


I now use https://archivebox.io/ and now am preparing to do a backup of my pinboard.in to it.


See also ESSarch: https://github.com/ESSolutions/ESSArch (used by the national archives of Sweden and Norway and participating in the EU digital preservation building block).


Is there any Open-Source digital preservation systems for 3d and animated 3d data such as .blend (Blender) or .gltf (GLTF2).

Bonus points for non-AGPL so I can connect a proprietary system over the internet to it and have the archived data be free and libre.


https://archivebox.io/ is MIT licensed. Not sure if they support that specific case but maybe send them a pull request if they do not?



I do not understand your concern about AGPL. You seam to believe that connecting a proprietary system to an AGPL service is prohibited. It is not. AGPL only adds on top of usual GPL that source code must be available to remote users even though they do not exactly receive the software. If you do not plan to modify the AGPL software, or if you do not plan to hide those modifications from its users, then I can't see what's the issue.


I've seen experiments built on top of https://3dhop.net for viewing (if that's the issue), but have no experience myself. Don't know about animation, though.


Can it serve up archives as torrents automatically published to an RSS feed?


I think that's contrary to this field's worldview. They seem more like librarians: more or less the end of the data lifecycle. (Unless it's the kind of data that gets retrieved, as if from a library, and used for further analysis.)


Does it do versioning of documents?


It seems so, unless I'm misunderstanding the documents. From an old, random readme for a workshop, then the official wiki:

    - https://github.com/mjordan/archivematicaworkshop#archivematica-and-aip-migration
    - https://wiki.archivematica.org/AIP_re-ingest
Perhaps someone with better knowledge than I could tell whether this implies that each version gets its own PID (crucial for publications etc).




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: