Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: bef – a tool that encodes/decodes interleaved erasure coded streams (github.com/gbletr42)
59 points by gbletr42 10 months ago | hide | past | favorite | 44 comments



Hello Hacker News! I'm been developing this piece of software for about a week now, to serve as a fast and easy to use replacement for par2cmdline and zfec. Now that it is in good and presentable state, I'm releasing it to the world to get users, feedback, testing on architectures that aren't x86[-64], etc. If you have any feedback, questions, or find any bugs/problems, do let me know.


You should at least be benchmarking against par2cmdline-turbo instead (stock par2cmdline isn't exactly performance-oriented). Also, you need to list the parameters used as they significantly impact performance of PAR2.

Your benchmark also doesn't list the redundancy %, as well as how resilient it is against corruption.

One thing I note is that both ISA-L and zfec use GF8, whilst PAR2 uses GF16. The latter is around twice as slow to compute, but allows for significantly more blocks/shards.


Got it, I'll add par2cmdline-turbo (I didn't know it existed) to the list along with those key details. Likewise, I'll also go and describe the benefits and downsides of each tool in more detail. I'll get around to it when I release the next version soon that fixes some of the problems described in this thread.


Thanks for doing that.

> par2cmdline[-turbo] encode: par2 c -r25 test

That command is rather unfair to PAR2 - you should add `-b48 -n2 -u` to make the comparison fairer.

PAR2 ain't exactly fast, particularly compared to GF8 focused formats, but the numbers you initially gave seemed wildly off, so I suspected the comparison wasn't really fair.

Ideally you should also be including the version of each tool used.


I've updated the benchmarks to include those flags. I've also specified the versions of all software involved.

It seems par2 is significantly faster with those options set than without, as in by an order of magnitude, it seems par2 struggles greatly with the large number of blocks that it sets by default. Thank you for telling me.


Nice work - thanks for your efforts!

Yeah, the compute complexity for Reed Solomon is generally O(size_of_input * number_of_recovery_blocks)

If you don't specify the number of blocks, par2cmdline defaults to 2000, so at 25%, it's generating 500 parity blocks, which is obviously much slower than what you're generating with the other tools.

Having said that, PAR2 is generally aimed at use cases with hundreds/thousands of parity blocks, so it's going to be at a disadvantage if you're generating less than 10 - which your benchmark shows.


I'd recommend explaining even a tiny bit what erasure coding is. I had to look it up as I didn't know the term. It's really cool, explain it yourself, why you're excited about it!


Sure, erasure coding is a form of error correcting codes that can be applied to data such that you can lose some n number of codes before you can successfully reconstruct the input data. For example, take k input symbols, and put it into an erasure code algorithm to get k+n symbols, where any n of the output symbols can be lost before you fail to be able to reconstruct the data. Symbols in this case can be some number of bits/bytes.

This is a really important property in situations where there can be big giant bursts of errors, because you can still reconstruct the data regardless. IIRC, CDs/DVDs/BDs all use two concatenated Reed Solomon (a type of erasure coding) coded symbols that are then interleaved with each other, which provides the disk protection against things like accidental scratches.


Nice! Add that to the README ;)


Done! Thank you, for it never occurred to me someone might stumble upon my software without already knowing (a dumb lapse of mind, I know).


I am genuinely not trying to be a vulgar dingus but in my native language (Dutch) this is the verb for orally pleasuring women.

I don’t know how that figures into your decision to name it this but at least now you’re aware.


Somebody could compile the (probably extremely short, if not nil) list of pronounceable sequences of phonemes, that are not vulgar, sexist, demeaning, insulting, "hurtful", or otherwise objectionable in any language.


If someone did that and then domain-squat all of them, then the rest of us would be forced to choose hilarious names for our software.


Checking urban dictionary is a good start.

In this case, it’s not a great source. There are two, low-ranking Dutch definitions but they’re very direct or vulgar.


In any language? I suspect that approaches the null set.

...And then copyright them all. Muhahaha.


I was contemplating adding "to at least one person" at the end of that sentence, but then it definitely would be the null set.


I was not, as I am a english speaker with no knowledge of Dutch. I find it a funny coincidence, but if I change the name bef now it'll mess with my muscle memory.


Fair enough, guess it’s not that big a deal. Cheers for making it though!


Downvote me for this, but given the context.. muscle memory?


haha! Everything in the source code is prefixed with bef_ to avoid any possible symbol collisions when linking, so it'd be a pain to unlearn prefixing it.


Do you happen to know its etymology?


Your guess is as good as mine. A quick search proved inconclusive, there’s a few explanations but they seem anecdotal at best. Most point to old uses of the word bef the same way we use “muts” (knitted cap) as a metaphor for female genitalia today.


Super neat!! I currently take encrypted zbackup files par2 split them and then seqbox and burn them to M-discs.

What is a better sequence of steps in your opinion?

https://github.com/MarcoPon/SeqBox


I'm not familiar with zbackup, but from a google search appears to be a tool to deduplicate and encrypt data. The process envisioned while making this was to use a series of pipes unix style to make the backup, e.g.

tar c dir | zstd | gpg -e | bef -c -o backup.tar.zst.gpg.bef

and then to get back that file with the terribly long filename

bef -d -i backup.tar.zst.gpg.bef | unzstd | gpg -d | tar x


Makes sense and is very neat. I can likely replace the par2 splits with a single file using your tool.

Since your head is in the thick of this problem, I’d recommend you look at seqbox and consider implementing sbx headers and blocks as an optional container that would give you resilience to filesystem corruption. That way your tool would be an all in one bitrot safeguard and streaming/pipe based!

Regarding zbackup, it’s perhaps a bit obscure but extremely useful tool for managing data. The way I use it I’m able to get both dedup and lazy incremental backups, although with a computational cost, but not so significant. The encryption is a nice side effect of its implementation that is also handy.


Small error in your restore command (given your creation command)

> bef -d -i backup.tar.zst.gpg.bef | unzstd | gpg -d | tar x

Should probably be

> bef -d -i backup.tar.zst.gpg.bef | gpg -d | unzstd | tar x


This is really nice work, great job.

You mention how the parameters are all customizable but I want to ask almost the opposite: is there a recommend set of defaults for xxx situation that the user can apply, so they don't have to be experts to figure out usage?

e.g. a recommended option for "sharing over the internet" vs "burning to a dvd" vs "writing to tape"

(I'm aware that these have their own redundancies/error control, but obviously I do not consider them sufficient.)


Currently there is just one set of defaults, but I could very well add multiple different defaults dedicated to certain use cases such as the ones you have. I imagine it'd be something like this

bef -c --default share -i input -o output, bef -c --default dvd -i input -o output, bef -c --default tape -i input -o output, etc.

It seems like a good idea and wouldn't exactly be hard to implement.


This is pretty cool and I appreciate the comparison to par2. I have a (personal) backup workflow using par2 now, and this looks like an interesting replacement.

The dependency for doing erasure codes is itself pretty interesting[1]. It has a number of backends. I've used one of those, ISA-L, in the past at a major storage vendor for Reed Solomon parity blocks.

[1]: https://github.com/openstack/liberasurecode


Hey this is very cool! And something I've looked for multiple times before


This sounds similar to fountain codes https://en.m.wikipedia.org/wiki/Fountain_code


Fountain codes are a class of erasure code, but not a specific tool.


Is this a specific tool that implements fountain codes then?


Um, no. From reading the source, I believe this tool only uses Reed Solomon, which is not a fountain code. The actual erasure code implementation(s) live in external libraries.

Fountain codes have historically been patent encumbered (invented in the early 2000s) in a way that Reed Solomon (1960s) is not. So most open source software does not use them. We might see more use going forward with those patents expiring.


Thats a really cool program. Is it possible to do it the opposite way - recover data with occasional noise inserted between symbols?


Sadly my tool currently doesn't account for that type of corruption, as it doesn't know what is good data or bad data when reading. So if bad data is inserted between symbols/fragments, rather than corrupting the symbols/fragments themselves, the tool would read them naively and exit with an error when the hashes don't add up. I'm sure there's a clever way of defending against that, but as of this moment I'm not entirely certain how best to do so.


It seems like it would be an important capability, if a whole chunk of tape is corrupted, say, but you don't know how long the chunk was.

It wasn't immediately clear from the description -- can you tune _where_ the parity blocks are stored, to make sure that a corruption doesn't knock out the parity too? I guess this would be the interleave distance, is that the term?


the parity fragments are by default just stored at the end, interleaved like every other fragment. It doesn't actually matter whether a parity or a data fragment get corrupted, as long as there are at least some determinate number of them not corrupted.

In terms of controlling how many blocks and their respective fragments are interleaved, that can be controlled by the -l argument in the command. I'll paste the info I have in the manpage here.

>The number of blocks(specifically their fragments) to interleave. The default number is 3, as it provides protection for a bad burst to corrupt both the block in front of and behind it.

In general, you can approximate the burst size ratio needed to destroy a set of interleaved blocks beyond repair via this equation (this may not be fully accurate as its napkin math)

let n = number of blocks to interleave, B = number of bytes per fragment, m = number of parity fragments, k = number of required fragments to rebuild a block.

((n - 1) * m * B + n - 1) / (n * (k + m) * B)

this is because a given corruption could traverse a whole fragment and then leech into another, taking some other fragments with it with the most unlucky of bursts. When you remove the B and take the limit as n goes to infinity, it approaches the ratio of m/(k+m), or in other words as you interleave more and more blocks, you get a maximal burst size closer and closer to the size of your total number of parity fragments.

Also, the header specifies exactly the size of each fragment, we don't depend on the fragments themselves to tell us their size. In fact, the only metadata a fragment has is its hash and the number of padded bytes for the whole interleaved set of fragments, the format is (almost, ignoring padded bytes) completely described from the header as a design principle. That's why a whole fragment can be turned to garbage and we don't care, just skipping ahead to the next fragment.

edit: thinking on it, I could add a mode of behavior such that it recovers as much as possible if a specific block is corrupted, rather than the situation right now where it quits streaming after it finds a corrupted block.


Thank you, that's helpful!

> in other words as you interleave more and more blocks, you get a maximal burst size closer and closer to the size of your total number of parity fragments.

I think this could be called out more clearly, as my default assumption was this limit case -- the file grows by X amount, I can recover from up to X amount of corruption. It sounds like that's only the case if the interleave is set to maximum, which is not the default.

Is that because, as you increase the interleave, it increases the amount of the stream you need to read in order to repair any corruption?


>Is that because, as you increase the interleave, it increases the amount of the stream you need to read in order to repair any corruption?

Yes, as you need to read the entire set of interleaved blocks to get each respective block, so to keep memory consumption low, we don't want to interleave too many blocks. I could increase the default to something higher though.

Regarding the tape problem, I was being a bit daft in my response, as if you lose a chunk of tape, you also lose some indeterminate number of data with it, which my format currently isn't capable of recovering from. I'll try to fix that to some degree in the next version, at the cost of not guaranteeing backwards compatibility as I feel it falls under that major problem comment I made in the v0.1 release.


how do i build it?

edit. ok 10 minutes later if have persuaded automake/conf/etc to create a makefile. now xxhash wont compile because src/bef.c:278:2: error: unknown type name ‘XXH128_hash_t’; did you mean ‘XXH32_hash_t’?

edit ok many more minutes later i purged ubuntu xxhash and installed my own copy and re-negotiatied with automake/conf/etc

edit lol downvoted for asking how to build.

edit ok now that its built, havent the foggiest how to use it. no example or hello world is given in readme.

edit nevermind figured it out. ./bef -c -i bef -o bef.test

edit so i still dont understand it. i bef'ed the Makefile, removed a character, tried to 'deconstruct' it, the output is zero bytes


> i bef'ed the Makefile, removed a character, tried to 'deconstruct' it, the output is zero bytes

I can't reproduce this. These are the commands I used, with it freshly compiled on a ubuntu docker container, both the v0.1 release and the master tree.

./bef -c -i Makefile -o Makefile.bef

dd if=/dev/zero of=Makefile.bef bs=1 oseek=300 count=1

./bef -d -i Makefile.bef -o Makefile2

cmp Makefile Makefile2 || echo "failed!"

edit: oh, I see, you 'removed a character'. Depending on what character you removed or corrupted from the output, you could've either hit the issue described above with inserting noise, but this time removing information, or you caused corruption in the header by either corrupting the magic number, hash type, or hash itself. The command line utility automatically truncates the output before calling the deconstruct function. The header is sadly the biggest single point of failure in the tool/format, which is why I introduced the --raw flag for those who don't want it.


ok thanks, now i get it. file has to be same length as original. This worked for me (my dd doesn't have oseek)

dd if=/dev/urandom of=x bs=1M count=2

./bef -c -i x -o x.bef

dd if=/dev/zero of=x.bef bs=1 seek=20000 count=100 conv=notrunc

./bef -d -i x.bef -o x2

sha256sum x x2

i also tried count of 1000 and even 10000 and it still worked!!!! pretty awesome


>and even 10000 and it still worked!!!! pretty awesome

Mind you, bursts greater than a certain size, around 8000 bytes for the defaults, are a game of probability in what damage they will do. In most cases it'll work like you did for you, but in some unlucky offsets 10000 byte corruptions with default settings could do something like this.

1 byte corrupts fragment x of block n, 4096 bytes decimate fragment x of block n+1, 4096 bytes decimate fragment x of block n+2, 1807 bytes corrupts fragment x+1 of block n.

The defaults only have 1 parity fragment, so the fact two fragments were corrupted for block n means it can't be faithfully reconstructed. In the average case you should be fine, but it is something I should probably communicate more effectively in the README (I only hint at it with the 8K per 192K comment).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: