Hacker News new | past | comments | ask | show | jobs | submit login

On a related note, if any of you are designing a file format, please place metadata at the end, so multiple copies of the same file with different metadata can be easily deduplicated. If the metadata is variable length and at the beginning, the block boundaries get shifted. In order to deal with shifted files, one ends up using a pseudorandom function of block contents to decide where the dedup block boundaries are (rather than some multiple of the filesystem block) in order to be able to re-sync the dedup block boundaries between the two files beginning with different length metadata.

For that matter, if you're writing a library to handle an existing file format that has a flexible location for metadata, please put any metadata (especially user-generated metadata) as far back in the file as possible.

Of course, the main advantage is that you don't have to shift all of the data around if the user starts editing file metadata, but it does also help block-level deduplication.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: