Hacker News new | past | comments | ask | show | jobs | submit login
Horcrux: Split your file into encrypted fragments (github.com/jesseduffield)
497 points by rgbrgb on Sept 16, 2023 | hide | past | favorite | 135 comments



Caution, this tool uses AES in OFB mode[0] to encrypt/decrypt the file, without any guarantee of the ciphertext integrity(no MAC).

[0]: https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation...


That wiki went over my (fairly) technical head. Care to ELI5?


My attempt of explanation with a (hopefully) relatable example: imagine that you store all your backup files encrypted, then restore the backup on your machine. The backup is stored on untrusted NAS but it's OK since it's encrypted, right?

With OFB (and CTR, and any other unauthenticated cipher basically), no. Ciphers guarantee confidentiality, but nothing else. This means that the attacker can alter one of your encrypted backed-up files. It's not a hypothetical attack. For example your backed up /bin/ls file likely starts with something like (hexencoded):

    7f454c4602010100000000000000000002003e000100 .ELF..............>...
Let's assume attacker wants to trick you into running "echo hacked" on your system. They may achieve this by altering your backed up /bin/ls such that it decrypts to something like this:

    23212f62696e2f73680a6563686f206861636b65640a  #!/bin/sh\necho hacked\n
And it's really easy! Stream ciphers work by XORing input plaintext with keystream:

    ciphertext = plaintext ^ generate_keystream(key)
    plaintext = ciphertext ^ generate_keystream(key)
If you look closely at the equations, it's apparent that one can flip a byte in decrypted plaintext by flipping a byte in the ciphertext. To drive the point home, a small demonstration in Python (I'll use CTR instead of OFB but it's the same):

    >>> from Crypto.Cipher import AES
    >>> def xor(a, b): return bytes([ac ^ bc for ac, bc in zip(a, b)])
    >>> aes_enc = AES.new(b"mysupersecretkey", AES.MODE_CTR)
    >>> plaintext = open("/bin/ls", "rb").read(32)
    >>> ciphertext = aes_enc.encrypt(plaintext)
    >>> known_plaintext = bytes.fromhex("7f454c4602010100000000000000000002003e000100")
    >>> wanted_plaintext =bytes.fromhex("23212f62696e2f73680a6563686f206861636b65640a")
    >>> flip = xor(known_plaintext, wanted_plaintext)
    >>> flipped_ciphertext = xor(flip, ciphertext)
    >>> aes_dec = AES.new(b"mysupersecretkey", AES.MODE_CTR, nonce=aes_enc.nonce)
    >>> aes_dec.decrypt(flipped_ciphertext)
    b'#!/bin/sh\necho hacked\n\\d\xc3\xd9\*o.sh\n'
In this example attacker can change the first 22 bytes of file to arbitrary payload by abusing the predictable header of the ELF file. No knowledge of the key is necessary.


Thank you!


Modes like OFB effectively produce a keystream(not unlike a stream cipher) which is then XOR'd with the plaintext to produce the ciphertext.

To decrypt, the same keystream is XOR'd with the ciphertext to produce the plaintext.

If there is no integrity on the ciphertext, you can simply start flipping bits in the ciphertext, and arbitrarily change what the resulting XOR with the keystream will be.


My blind stab at it is this encryption always produces an output that's not validated to be what was originally encrypted and it does so in a way you could even intentionally damage a specific portion of the data without needing to know how to decrypt it.

Curious to see how wrong I am :).


That's right, there's no guarantee the data hasn't been tampered with after encryption. The mechanics of the tampering you could do depend on the cipher mode you use.

To give a simplified example (which doesn't match what this program does but is useful to demonstrate), ECB is the simplest mode (which really shouldn't be used for anything). Your input is split into fixed-length blocks (16 bytes for AES) and each block is encrypted separately, producing a deterministic ciphertext for each block. (e.g. a block of all "A" will always encrypt to the same thing).

So if an attacker is able to figure out what plaintext a block of encrypted data corresponds to, they could use that knowledge to build a "fake" encrypted message. They could also remove blocks from a message, or shuffle them around.

If you're interested in playing around more practically with this kind of thing, I highly recommend the https://cryptopals.com/ challenge sets.


I just finished the first 11 challenges, including detecting ECB ciphers.

Thank you for sharing these! They are very fun and interesting challenges.


It’s this.

A cipher with a MAC or authentication (like an AEAD) will return “TRUE” and the data that was definitely decrypted properly. A normal cipher just returns data and it has no idea if it was decrypted correctly, so if you decrypt ABC and it should be XYZ, the wrong key will still “work” but the data will be X@9 which means nothing to you.

Normally, you would validate your data after encryption. Like using something “Here is my result, and it has to start with the ascii characters of a valid date.” An authenticated cipher allows you to not care or have to know anything about the data at all, they’re nice. With the downside that you also need to supply the MAC or TAG or SIG along with the data package.


There's a cool paper-based backup tool that also uses Shamir Secret Sharing to let you distribute a bunch of paper copies to your friends to restore a file optically:

https://github.com/cyphar/paperback


I really need to set aside some time to work on some kind of usable interface, graphical stuff has never been my strong suit. At the moment, all of the features work but you need to copy-paste the contents of the QR codes into the command-line. And I need to freeze the on-disk (on-paper?) format.


Is this different or do they do the exact same thing?

https://github.com/paritytech/banana_split


Paperback predates that tool (though to be fair, it still has to have a nice GUI interface made for it, and to be honest I have been working on it off more than on). Also, I suspect they don't support reconstructing lost shards or adding new shards after sharing (this is trivial to do from the maths but very few tools seem to have this pretty important feature). I talk about this in the LCA talk I did on paperback[1]. There are also some attacks against SSS that require workarounds and it seems to me they don't have any protections against them (in paperback's case, there are several measures to defend against forged shards -- one of which is that all shards are signed with an Ed25519 key whose private half is in the sharded secret).

That being said, the underlying cryptography is quite old and there aren't too many new ways you can spin it -- the goal of paperback was to make it possible for paper backups to be done in a way that non-technical folks can understand the properties. How well it currently lives up to that goal is a different question, of course.

I mainly wrote paperback because the existing SSS systems I could find at the time were all either very primitive, had serious security issues (I found two or three serious cryptographic vulnerabilities in the handful of existing tools at the time -- and I'm not a cryptographer by any stretch of the imagination), or were not usable as a paper backups system. There are more around now and it's possible I wouldn't have bothered with paperback if they'd existed back then.

[1]: https://www.youtube.com/watch?v=GI9rKdM9rB8&t=1368


It’s the same but with a framework to share between people rather than just hiding pieces of paper around the world.


This is very neat! Feels a bit too early to rely on it, but could be very cool for lots of people if it was a bit more mature and easy to use.


Does SSS work well for large inputs, or do you need to AEAD the plaintext and then SSS the key?


Paperback uses the AEAD-and-SSS-the-key construction, but that's more to do with the properties such a scheme gives you (it allows more flexibility with how you distribute the secret -- you can keep the main document with a lawyer and distribute the shards so even if the shard holders betray you they don't have access to the original document). You also want to have protections against fake-shard attacks (which allow an attacker with a real shard to withhold the secret from others during recovery and then keep the recovery secret to themselves), and so you need to make use of traditional cryptography anyway by signing shards to detect fakes.

To answer your question though, it primarily depends on the SSS construction and how big the quorum (number of shards needed for reconstruction, not the total number of shards created) is.

Most constructions use Galios Fields and thus chunk the input and produce a new polynomial for each chunk -- the vast majority use GF(2^8) which means its per-byte. Paperback uses GF(2^32) to get 4-byte chunks so that the x-value used is more collision-resistant. I suspect for large documents GF(2^32) will be about 4x faster because Galois Field operations are very fast (which is why most tools use them over prime fields) but using 4-byte chunks reduces the number of operations needed by 4x. I considered going for GF(2^64) to get 8-byte chunks but doing so requires 128-bit integers in some operations. You can also process the chunks in parallel if needed. While very large documents would take longer, the recovery operation is pretty efficient and recovery time should scale linearly. The main scaling issue with SSS is if you want to increase the quorum size -- most of the algorithms are at least quadratic with respect to the quorum size if not worse.

If you implement it the way described in the original paper (using a prime field) then it mostly depends on the efficiency of the bignum library you are using -- though you probably want a bignum library that supports doing operations in a finite field because doing large multiplications and then calculating the modulus afterwards is a lot of wasted work and memory. I suspect Galios Fields are far more efficient at any size.

For paperback's SSS implementation, I've written some preliminary benchmarks and it seems you can easily do secret recoveries at a rate of at least 75 KiB/s depending on the quorum size (32-person quorums are around 75 KiB/s while 5-person quorums can run as fast as 900 KiB/s). So, for reasonably-sized quorums you can easily have several-megabyte sized documents with sub-minute recovery times. Of course, doing AEAD and sharding the key is still much faster for larger documents (ChaCha20-Poly1305 can run at ~2GB/s on my machine according to "openssl speed"). But it's not necessary.


There was a post about shamir secret sharing recently by levchin (and chaos that it caused at paypal): https://max.levch.in/post/724289457144070144/shamir-secret-s...

This is 100% one of my favorite algorithms, but I have to say I've started to wonder if I just have an academic fascination with it and the practical use cases are more limited (limiting?) than I think


Shamir's Secret Sharing is basically perfect for nerdsniping.

It is simple enough to understand without hardly any crypto background or advanced math. You basically just need a bit of algebra. It is surprising that it is possible at all. It has a clearly visible concrete use case (at least at first). It is something that you can implement yourself (though there are still footguns).

This, IMO, leads to its wild overrepresentation in discussion forums and even overrepresentation in industry solutions.


Great story, thank you!

> I've started to wonder if I just have an academic fascination with it and the practical use cases are more limited (limiting?) than I think

I don't think the linked post should prompt such wondering? (It's not clear if you are drawing a causal link between the two) It was an issue with the interface _to_ the algorithm, not with the algorithm itself.


> I've started to wonder if I just have an academic fascination with it and the practical use cases are more limited (limiting?) than I think

I'm currently writing an entire book based on this concern: pmfpbook.org


And the answer is...


According to the FAQ it uses a Shamir Secret Sharing Scheme to split an encryption key. But you can actually use a Reed-Solomon scheme[1] and have no encryption key to achieve this objective.

First, you process the input data with an all-or-nothing transform (AONT)[2]. Then you split it into Reed-Solomon shares. AONT is required due to Reed-Solomon being vulnerable to a distinguisher[3] and it will most likely leak information about its input.

[1] <https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_cor...>

[2] <https://en.wikipedia.org/wiki/All-or-nothing_transform>

[3] <https://en.wikipedia.org/wiki/Distinguishing_attack>


Is that approach information-theoretically secure? Shamir Secret Sharing is, and the mathematics is very simple.

Note that the encryption step is, strictly speaking, not necessary for Shamir either. But there are benefits to encrypting the secret and only using SSS for the key (I'm not sure if that's how Horcrux works, but that's how my fairly-similar tool Paperback[1] works).

[1]: https://github.com/cyphar/paperback


> Is that approach information-theoretically secure?

No. But neither is SSS if an encryption key is involved - because the symmetric encryption step is not information-theoretically secure.

Using SSS directly on the input will be information-theoretically secure but also multiply the size of the input.

> and the mathematics is very simple.

The mathematics of SSS and Reed-Solomon are virtually identical.


I was able to find a project which appears to do exactly this: <https://github.com/atbarker/AONT-RS>

Paper: <https://www.usenix.org/legacy/event/fast11/tech/full_papers/...>


I can think of several ways to "take a password encrypted file and split it up in such a way that you can re-assemble it without all of its parts" but being able to do so without a password is where it's piqued my interest.

This is one of those things that looks interesting to me as someone who enjoys security topics, algorithms and the like. I can see digging into the source code/reading about the techniques involved because I want to know how it was done. But I am struggling to figure out where something like this would be useful -- is there a threat model where this would be useful as a "layer in the onion" of things protecting sensitive data or a way in which this would be superior for a use case that isn't already solved better by something else?

Even the author kind of jokes about an old diary password vs remembering enough hiding places. I think the latter would be a much bigger problem from personal experience and would require having fewer parts and more hiding places, eroding the provided security in the process[0]. :)

Multi-sig comes to mind, but my only exposure to that is "having to participate in it" by "putting a USB key in and running a program along with two of my coworkers" in order to push the one update we ever pushed with that tooling. I remember we all had to provide our keys and "if I was out sick, no update was going out" but I'm not sure if that was a limitation of what we were using or if we just set them up that way because it was "alpha" at the time.

None-the-less, very interesting!

[0] I can imagine, though, that "under the mattress" would be the equivalent security to setting your password to "password".


What about using it to share information with people for them to use in an emergency. Data is stored in multiple locations, but only someone with access to 3 of them would be able to read the data. The existing physical access controls would be the actual safety mechanism. A thief would have to break into 3 locations in order to get the data.

We could just encrypt the data and then split the password into three pieces and store those in different locations, but then you would have to store both the data and the password.


Except that there are problems with the emergency scenario on both sides:

- In an actual urgent emergency, getting in touch with the 3 people and getting the data might be difficult and take days/weeks. One is on vacation, another is going to take a couple of days to try to remember where they stored the USB key...

- And nothing limits it to emergencies. The 3 people can just get together immediately and examine the valuable information inside, whether it's account numbers, crypto keys, a will, etc.

So the only value here is in not having to remember a separate password. But I expect it's just as likely for people to lose the files entirely as much as it is for them to lose passwords. If you share a password-encrypted file with 5 friends, I wouldn't be surprised if 10 years later, only 1 of them still had it. In that case, better to store file+password per-person, rather than require 3 of them to keep it.


Worse than that. "Oh fuck, I lost that thing" is going to be a common occurrence in these scenarios. Data lost forever. You can then go to K-of-N schemes or whatever, but the truth is that if you don't have a professional obligation to keep some data safe and accessible, people will simply lose it (and this will still happen in plenty of cases where you do have a professional obligation).


Step 1 : Create a compressed (must not 'store') archive split into small-ish sections of size N/K.

Step 2 : Run par2 against the archive sections and create as much parity as possible.

Step 3 : Distribute any middle (not the start of the archive, and not the incomplete end piece) segments of the archive and a 'correct' number of parity / recovery packets.

If a complete count of the archive and parity pieces are brought together, par2 can recover the remaining pieces of the archive and extraction will be possible.

The hashes only (no recovery data) par2 file should be distributed with all copies.


This is what Shamir's secret sharing algorithm does, just that that has the benefit of being information-theoretically secure.


Related (at least by name - I think there is more than one project here). Others?

Horcrux: Split your file into encrypted fragments - https://news.ycombinator.com/item?id=29395575 - Nov 2021 (11 comments)

Show HN: Horcrux, a Playground for Shamir Secret Sharing - https://news.ycombinator.com/item?id=26256726 - Feb 2021 (40 comments)

Horcrux Encrypted Messaging - https://news.ycombinator.com/item?id=24322069 - Aug 2020 (93 comments)

I made an app that lets you split a file into horcruxes - https://news.ycombinator.com/item?id=24026445 - Aug 2020 (80 comments)

You can now make horcruxes out of your confidential files - https://news.ycombinator.com/item?id=21933983 - Jan 2020 (17 comments)


This brings to mind parity files on Usenet. Large archives would be uploaded split into smaller files and a number of parity files. You could then recreate the original archive using any (for example) n-3 out of n files.

https://en.wikipedia.org/wiki/Parchive


Recently learned that Clevis also supports Shamir Secret Sharing, and it's in fact the only way to configure multiple pins even if they're of the same type and authority (ie. the RAID0 of SSS):

https://github.com/latchset/clevis#pin-shamir-secret-sharing


I thought this was neat and went to his repo - he's the lazygit/lazy docker person! Nice to see the prolificacy!


> Q) This isn't really in line with how horcruxes work in the harry potter universe!

> A) It's pretty close! You can't allow any one horcrux to be used to resurrect the original file (and why would you that would be useless) but you can allow two horcruxes to do it (so only off by one). Checkmate HP fans.

Well the whole point of hurcruxes is to have backup in several places. This tool to backup your backups in several places is a much better use of the name https://github.com/chrispoole643/horcrux. Checkmate jesseduffield ;)


Yes, but no! Technically, horcruxes were backups, but the emotional weight of the concept came from splitting your soul, lessening your essence as you hedge against danger.

Partial backups follow the spirit of the idea better.

As an aside, I read the concept as an attack on promiscuity. "Whore crux". Contrast to Lord of the Rings' assault on marriage, where wearing the ring makes you invisible and slowly fade into nothingness, "like butter being scraped over too much bread".


I don't think these concepts occurred in Harry Potter or Lord of the Rings.

Is it possible you maybe you made these up?


https://www.voxco.com/blog/everything-about-textual-analysis...

"Reading" into text (similar to reading betwen the lines) is a skill taught in the arts. The idea is that there are meanings beyond the literal. Symbols can stand in for feelings or ideas, and multiple interpretations can be kinda true and not true at the same time.

Here's an example:

https://www.youtube.com/watch?v=nucYxqgiV7k

The literal interpretation is that this is a liquid that alters people's perceptions, and you could ask literal questions about it.

- How was it made?

- How do the police know if people are drinking it?

- What are its effects on the brain?

But those questions don't have satisfying answers, because it's a symbol, a stand-in, a variable:

- It's an allusion to "rose-colored glasses". When people are passionate, they look for the good and ignore problems.

- It's about addiction. The act of drinking is similar to drinking alcohol. He "comes to" in a filthy, miserable living space, the same way someone would after a downwards spiral caused by substance abuse.

- It's about social pressure, "drinking the koolaid", and the emptiness of modern office life.

Which one of these interpretations is "the right one"? None? All?


>"Reading" into text (similar to reading betwen the lines) is a skill taught in the arts.

Only if you use a very generous definition of the word 'skill'


> Contrast to Lord of the Rings' assault on marriage, where wearing the ring makes you invisible and slowly fade into nothingness, "like butter being scraped over too much bread".

Wait, what ? Eh...


Well, I think that's ostensible, as any reading of the author's biography can attest.


I couldn't disagree more. You can easily twist his letters to his son to create this narrative. If you read it as a whole his message isn't at all anti-marriage, he was a devout Christian who believed wholeheartedly in marriage. His message was that marriage takes sacrifice, faith, and a conscious effort (in his opinion specifically on the side of the man). It's a bit Kierkegaardian.


I think that might mean we agree. I guess I should have said I thought it ostensibly false.


Whoops, I totally misinterpreted that! Apologies.


Yeah, I agree. There’s no way it was intended as a criticism of marriage.


Well, when it's put like that ^^.

It's just that I never thought about it in that way, that outlook took me by surprise :).

I read the Howard Carter biography more than 25 years ago and I don't remember much about that aspect, I was more geeking about lotr at that time than Tolkien's opinions.


Gollum is what really sells me on LotR being about marraige. He's a caricature of a woman hungry for commitment. I've seen the "gollum look" a few times in real life, and they captured it pretty damn accurately in the movies (more monstrous ofc)...


I'm not sure if you are trolling, but these comments are telling us far more about you than they are about LotR, Tolkien or women.


Tolkien fell in love with his future wife when he was 16 and they were so close they were buried together under one gravestone.


> Contrast to Lord of the Rings' assault on marriage, where wearing the ring makes you invisible and slowly fade into nothingness, "like butter being scraped over too much bread".

That’s an interpretation I’ve never encountered before. That’s not a canonical interpretation.


I don't recall reading anything about lessening the soul, maybe as a byproduct of the process but not something that was intentional.

Horcrux was to backup the soul and I believe any one fragment is enough to allow for recovery.


https://www.potter-search.com/?search=Horcrux

> “Seven! Isn’t it bad enough to think of killing one person? And in any case . . . bad enough to divide the soul . . . but to rip it into seven pieces ...”

> Lord Voldemort has seemed to grow less human with the passing years, and the transformation he has undergone seemed to me to be only explicable if his soul was mutilated beyond the realms of what we might call ‘usual evil’


To be fair, the creator of HP is a crazy conservative. So, the idea of "Whore crux" being an attempt of fighting promiscuity isn't surprising.


This is ridiculous, Rowling's political views are completely unremarkable for a 58-year old woman and if anything leans liberal. Very few normies in their late fifties would survive the scrutiny she got.


She blasts her views on twitter constantly — she deserves all the scrutiny she gets.


This is a ridiculous take, Rowling is basically a cookie cutter late-second wave feminist, exactly the kind of person you'd expect a liberal in the 80s to be.

And since I assume you're inferring she's conservative from the popular opinion that she's anti-trans, if you read her political views, she supports transgender rights unequivocally but doesn't like what she perceives as MtF support encroaching on support systems for biological women. Make what you will of exactly "how" liberal she is, but she is literally pro-trans-rights and a feminist which is completely incompatible with conservatism. And furthermore, it's exactly the kind of take you'd expect from someone who formed their political identity in the 80s.


There's also that one of her pen names was also the name of the guy who decided that gay conversion therapy didn't have enough electroshock involved.

https://en.wikipedia.org/wiki/Robert_Galbraith_Heath


Is there any reason to think that she had even ever heard of him when choosing that pen name? It's not as if "Robert" and "Galbraith" are unusual names. For that matter, Rowling has said where she took the names from, and unless she's flatly lying about that then there's no connection with this guy.


> a cookie cutter late-second wave feminist

Not a terribly large number of these people are allying themselves with people who want to ban abortion because they have similar goals around restricting trans rights.

> she supports transgender rights unequivocally

And yet platforms people who publicly say that all trans people should be forcibly sterilized against their will as a matter of public policy.


Conservative is good when the alternative is moral rot that supports promiscuity.


Relax himmler


This is one of the most badass names for a project I have seen in a while!


It's inaccurate though. The whole point of horcruxes in the book is that Voldemort can always resurrect himself if one of them remains. This tool is the opposite: you need multiple parts to reconstruct the file.


Yeah but that's just copy paste. This is more fun.


They already acknowledged this in the faq fwiw.


They acknowledged it but dismissed it with an incorrect statement, and then declared victory with "checkmate HP fans" despite actually being wrong.


Cool. Not being privy to Harry Potter inside baseball I could not tell from your original comment whether you had seen that FAQ entry.


I’m not an extreme Harry Potter fan or anything. It just bothers me when people ride the coattails of some popular term/phrase but then get it wrong. Another one is “isomorphic” as in “isomorphic JavaScript” which abuses the term from mathematics to mean something completely unrelated.


I had a coworker try to use "isomorphic" to mean "when given the same inputs and environment, always produces the same outputs", then accused me of pedantry for pointing out that misusing a word with a very clear definition was likely to cause confusion.


I think they mean idempotent


Whoops, you're right, I brain farted the wrong word.

To be clear, though, they were still wrong. `f(x)=2x` has the property they described of consistently giving the same input for a given input (if you pass in 1, it will always output 2), but it is not idempotent because f(f(x)) does not, in general, equal f(x).


ok I totally get this.

Microsoft co-opted “DNS” as “Digital Nervous System” to try to exploit business decision makers’ dim acquaintance with the term. They had a pattern of doing this with other internet acronyms back in the day. Annoying af.


He could not resurrect himself. He needed someone in not-ghost form, to collect some special items and perform a magical ritual. Some of the special items were also one-time use iirc.

Perhaps, this tool needs to additionally encrypt some of the pieces with the dna of one's father or whatever.


The point is, this is a very inappropriate project name, if it is to be a reference to the storyline.


Horcrux would be a cool name for a database backup service


No because a Horcrux is a nasty lossy copy that involves somebody's death to be made.

Any wholesome leader who is also a HP fan would not appreciate the name tbh.



This is called Shamir's Secret Sharing and I once found this interesting guide for implementing it in Rust:

https://gendignoux.com/blog/2021/11/01/horcrux-1-math.html


Since we're sharing implementations, here's mine:

https://francoisbest.com/horcrux


And here's mine: https://github.com/sowbug/b39tools

This is specifically for an SSSS-based distributed physical backup of a BIP-39 wallet seed and passphrase.


Not mine, but http://point-at-infinity.org/ssss/ is another good one, with Debian packages.


No mention of libgfshare, which comes with utilities `gfsplit` and `gfcombine`: https://manpages.ubuntu.com/manpages/trusty/man1/gfsplit.1.h...

We use it in production to pregenerate a pool of certificate requests for SSL key pinning. Rotating a key requires m-of-n cooperation.


Maybe it's too naïve of a question, but why not just using XORs?

Let's say we have a file that we want to split into 4 encrypted shards. We could just generate 3 random files, and make the fourth one be a bitwise XOR of the 3 files plus the original one. Then, doing a bitwise XOR of the 4 encrypted shards will return the original file.

Assuming the PRNGs are unbiased, how would this method not be safe enough?


In case this method makes sense, I'm already registering the EXOrDIA name for a python package: https://github.com/juancroldan/exordia


With XOR you need all the shards to restore the file. With SSS you can lose a predefined number of shards and still restore the file with the other shards.


No idea what algorithm it uses, but if you want to be fancy, pass each fragment of the encrypted file thru a DFT, and save each k-th frequency into a separate file. Such a file will appear as an image with a curious pattern, or an audio file, so if anyone asks, that "4 wav" is music. What's interesting is that 1.wav and 2.wav (and other k.wav) will be nearly identical, so they can be put together into a 7.1ch audio stream (if splitting into 8 pieces). Bonus points: apply flac encoding on top.


Very cool! Now what would be really interesting is combining this with steganography techniques to hide the horcrux fragments inside totally unrelated-looking files.


That would be neat. A printed set of family pictures for example.

Digital steganography is well known but I’ve never personal come across tools for physical images, however I guess I haven’t ever looked.


The note on not being able to allow only one horcrux to resurrect the original file feels unintuitive. I mean I get it'd be a pretty useless thing to do (as you'd effectively have just used an extremely complicated method of copying a file in a way it could still be read by anyone who gets a copy) but it seems odd you couldn't actually do it.


For easier use I build a shamir secret sharing based web tool :)

https://simon-frey.com/s4

p.S. In style of "don't do your own crypto" it uses hashicorps implementation of Shamir


We also maintain a shamirs CLI called horcrux here:

https://gitlab.com/unit410/horcrux

We decided to use Hashicorp vault's implementation of Shamirs which was the most widely used / battle tested golang implementation we could find.


>horcrux. Looks like somebody beat me to both the name and concept, however this repo doesn't support thresholds of horcruxes

"this repo" I wonder if they mean "that repo." I realize it can be read both ways, but that is a bug, not a feature, in this readme. If the author is here can you please clarify?


Huh I threw together something like this a few weeks ago using asymmetric keys - pass a file and key and it output as smaller encrypted files.

Only way to decrypt them - have all the files and the key.

Didn't think it was that useful but seems similar to the functionality, but I like that you don't need to share a key here.


The difference here is that with Shamir secret sharing, each file also has a portion of the key and you only need a subset of the files to decrypt. In the example in the readme you can decrypt with at least 3 parts. This means you can lose 2 and still decrypt


Does anyone know of a tool like this that’s maintained?

I actually would like to use this for a fairly out there use case but it hasn’t been updated in 3 years and it doesn’t look like the author has even been active on GitHub since that time.


That was the problem for me too.

There are many implementations of Shamir Secret Sharing out there, but they're all idiosyncratic and impossible to vet unless you have a background in cryptographic security, and like you said nearly all are not actively maintained.

The types of things you would go to the trouble of Shamir Secret Sharing, are also the types of things that might surface decades later (e.g. crypto keys). Imagine going on a treasure hunt to retrieve 2 shards, and then needing to go on an additional internet archaeological hunt for the specific GitHub repository used to generate the shards.

Unless someone solves this issue, I can't see Shamir Secret Sharing becoming any more mainstream.


The author of this tool basically took the Shamir code from Hashicorp Vault, which is pretty mainstream. If you're looking for a solid implementation, I would start there[0]. I wouldn't use the Shamir code from this repo, as it's an old version of the vault code using field arithmetic that doesn't run in constant time.

[0]: https://github.com/hashicorp/vault/blob/main/shamir/shamir.g...


y'all are not understanding the underlying principle: these things work holographically.

Dumbledore even said that copying the soul fades it, when Harry was inquiring and specifically pondered the maximum number of times the soul could be copied.


Needless to say calling this project Horcrux is just silly: it should be named after some other fictional story which requires all of the pieces to be assembled (like the new Starfield story I think).

A Horcrux is like a fax facsimile and loses a lot of information. The entire point of the story was that the gang had to destroy ALL instances of the Horcrux, because He could be revived even with just one (holographic) copy.


Harry himself being the final Horcrux so he had to be temporarily killed to clear it out.


Tom asked somebody how many times the soul could be copied, too.


This is broadly how StorJ works. I've used them before and it's a pretty cool storage solution (https://www.storj.io)


My understanding of modern cryptography is limited. Can someone explain if someone gets my horcrux files, would they be able to decrypt the original? If so, then what is the benefit of this tool?


Yeah, so this is best thought of as a backup tool in which the files are kept separately from each the, more than just a different way to encrypt a file.

If you give a plaintext file to a friend for safe keeping, the friend can see the file. You could encrypt the file, but encrypting a file requires a key or password, which you could forget or lose.

The idea here is: split the file into three parts. Give one part to the bank deposit box, one to a friend, and put one in a safe at home. Now, if you want to restore the original, nobody can restore without getting (for example) two of the three pieces.

This is nice because your friend and the bank can’t do anything with their copies alone, and won’t trust each other with their copy normally. But if you lose your piece, you can still ask the friend and the bank for theirs.


Hiding the fragments is like hiding your password. Of course they get the original if they have the fragments. That's the purpose.

You could always encrypt the file before splitting it.


I would imagine if you're worried you could just split the files with this, and of you think the encryption is 'weak' you could encrypt again with another tool. Probably not a bad idea.


It would be nice if the FAQ said how big the split files would be. If you're doing it with 3 of 5, will the split files each be ~1/3 the size of the original file?


I have always wanted a cloud storage client which spread my files in chunks out among OneDrive, GDrive, etc. Essentially cloud raid to make it harder to put things back together.


You and me both!!! If you ever find that, please remember this and point me to it as well. My thoughts were: use all the free capacity of the drives you mentioned to make a filesystem like HDFS which distributes blocks all over the place.. :)


The name alone guarantees this program will live on. Sometimes I think a good name is 50-80% of the success of a product or person.


What's the practical benefit over just sharding a decryption key and handing out multiple shards to each person?


1. You need all the shards of the key to decrypt the text instead of just reaching a threshold.

2. The full encrypted text is available to each person, making it vulnerable to a brute force attack at some point in the far future.

I'm not entirely sure if this implementation actually covers that second point though. It could be including the entire encrypted text with each copy. But it would theoretically be possible to protect against brute force attacks in that way.


The second point, yeah that's relevant.

On the first point, just give each person n-1 shards, each missing a different one. Then any 2 can decrypt. Or configure it for however many participants there are and they minimum number needed to encrypt.


The key part about Shamir is that having any number of shards short of the threshold doesn't reveal anything about the secret. Let's say you split your 256 bit encryption key into 4 64-bit pieces with each person getting 3 of the 4. Each person now knows 3/4 of the secret. Now any one person simply has to brute force the remaining 64 bits of the key in order to decrypt.


Literally made this comment and been surprised with the notion of Shamir Secret Sharing! Check my comments


Someone check this mans hard drive


Isn't fountain code doing something similar? Albeit for slightly different purpose?


I was just thinking about something like this problem.

At $work, we use multi-signature signing to move cryptocurrency around, so that at least N of M officers of the company need to sign, to prove that "the company" actually intends a movement of funds to happen. This ensures that no single officer can embezzle funds; and it also ensures that an attacker would have to do some kind of multi-target simultaneous coordinated rubber-hose attack (rather than just waiting to kidnap one of us when we go on vacation) to get access to the funds.

I was trying to think of a way to extend that kind of security to the encryption of data — specifically the encryption of low-level root-account passwords (like an AWS account's root password). Shamir's Secret Sharing is the obvious first step... but you'd also then want two additional properties:

1. the decrypted secret should not be held even temporarily by any of the parties, but rather should be held by — and used "through" — a neutral system, so that the secret is reusable rather than needing to be burned the first time it's revealed

2. the neutral system — despite being likely owned by a third party! — should have no ability to exfiltrate the password into the hands of the third party.

I think this can be workable in the specific case of wanting to use the root password as an HMAC, by doing SSS decryption inside a single-shot non-durable abstract machine with homomorphically-encrypted memory, wrapped in a network daemon: the network daemon spins up a copy of the abstract machine; receives each SSS key split from its owner, feeding each as it receives it directly into the abstract machine; the abstract machine, after receiving sufficient key splits, signals to the daemon that it is now "empowered" to sign; and the daemon can then use the abstract machine to HMAC arbitrary plaintexts sent to it, until it exits, and the abstract machine's state is lost.

The real trick, though, would be making this work for HTTP Basic Auth over TLS, by delegating the generation of exactly one TLS frame — the one containing the Authorization header — to the abstract machine; where the network daemon would then act as an HTTP proxy, inserting this Authentication header into requests made through it. Having something like this could really improve security around the use of a lot of sensitive control-plane APIs!

---

Of course, in a corporate context, you probably have a third party you can trust with the plaintext — e.g. an Enterprise Password Manager service — so you can relax property 2. In such a case, you don't need a fancy abstract machine; you can just ask said service to build a feature allowing secrets to be configured to require N-of-M confirmations by ACLed team-members to unlock them; and to build another feature for "opaque secrets" that are never revealed to anyone, but instead are injected into an HTTP Authorization Proxy that the Enterprise Password Manager service spin up themselves on their backend.

But it's still fun to think about the case where your secret is so secret that you need to hide the plaintext from literally everyone but your co-conspiriators. :)


If x officers died in a freak accident where there's not enough for N to sign, is there a way to recover?


Seems no different than if all N parties with the key were lost. Is there a backup strategy in case the key people are all wiped out simultaneously? How does ICANN handle it?

There was a recent story[0] about a crypto company where everyone "lost" the keys (I assume inside job).

[0] https://news.ycombinator.com/item?id=37268750


Yes, if you plan for it. There are several mitigation strategies for this.

The simple approach, if you have enough people you fully trust, is to just have a larger number of participating signers / distributed key-splits than you need, such that you don't actually need a majority of the key-splits to be used, to achieve a signing quorum. (E.g., instead of 2-of-3 signing, you'd use 2-of-4 or 2-of-5 signing.)

In this setup, usually there's a "core" of signers who are active participants in the system, where the required quorum size does constitute a majority of the "core" of signers; and then there are other "backup" signers, who aren't normally participants in signing as part of their job duties, but who can be called upon to act as additional signers in an emergency — usually just to participate in a "handover of power" (activation of new keys) for new signers (e.g. if the CIO and COO die, and you hire a new CIO and COO.)

These signers should not be anyone who any of the "core" signers has power over — not any of their reports, not anyone they can fire, and not their own friends or family. (Personal acquaintances from outside the company do make good "backup" signers, though — an officer's personal accountant is a frequent choice.)

Also, ideally, there should not be enough such "backup" signers that a single "active" signer could collude with the "backup" signers to embezzle funds. The signing configuration should still always imply the need for at least two "active" signers. (So 2-of-4 / 2-of-5 are actually bad configurations; a minimal ideal configuration would be more like 3-of-5 (with one backup signer), or 4-of-7 (with two backup signers); and you don't get both "not a majority" + "at least two active" together until 4-of-9 (2 backup) signing.)

A more expensive version of this approach, but one which could be considered more secure and with less required trust, is to give the extra signing key splits needed to achieve a signing quorum to one or more law firms, under contract to only release those key-splits upon a legal notice of demand from all still-living members of the signing group. The law firm must then verify for itself that 1. enough people are dead that the signing quorum condition can indeed no longer be met without first activating one of the offline'd keys; and 2. exactly all remaining living members of the signing quorum really want the keys released (probably this would be done through in-person individual interviews and notarizations.)

Also, to prevent this arrangement from acting as a https://en.wikipedia.org/wiki/Tontine (in the sense that it incentivizes murdering your co-officers), usually there would be a line in this contract stipulating that the keys won't be handed over while the still-living signers are under suspicion of directly or indirectly murdering the dead signers; and that the keys will only be handed over after a 60–90-day delay (which gives time for such a suspicion to arise, if relevant.)

Another mitigation strategy, that works well in combination with either of the above, is for officers with signing keys to leave their password-locked hardware signing device (smart card, crypto wallet, etc.) with a trusted coworker whenever they're going somewhere potentially dangerous; and to leave the password to unlocking said signing device with the executor of their will, only to be released upon their death. In this setup, people who travel are temporarily unable to sign (the person they've entrusted their signing device to doesn't have its password, and so cannot sign for them); so travel scheduling + signing configuration must be planned together to ensure that a signing quorum is always possible to achieve.


You don't need FHE to do this – what you're describing is generically a multi-party compute scheme that can be solved by other means.

Since you have familiarity with signatures, I'll plant the seed of how this particular MPC scheme works, in parts: Distributed Key Generation, and Circuit Creation and Execution

The first step is securely generating a key – but how do you know it was done correctly with SSS? SSS follows what is called a "trusted dealer" model, where you assume what generates the Shamir splits has operated correctly, but can we do better than that? Thankfully, yes! Feldman Verifiable Secret Sharing is an enhancement over SSS, where the splits generated are committed to by raising the coefficients of the SSS polynomial to a public generator of an additively-homomorphic group, or in other words, treating the coefficients like private keys, and producing the public keys (as in elliptic curve cryptography, the private key is just a large scalar number, the public key is produced by performing scalar multiplication of the private key against a known public point – the generator, part of the public parameters of a particular curve). By having the public points (A_0, A_1, ..., A_n) corresponding to the coefficients (a_0, a_1, ..., a_n, where a_0 = secret), you can calculate the secret's commitment by using x=0, as P(x)=A_nx^n+...+A_1x+A_0 where x=0 cancels out all terms besides the secret's commitment. Similarly, you can verify your share split by using x=(your share identifier), then raise your share to the same generator. And finally, to prove that all shares are valid, each person of course must verify their share matches their commitment, but then you can do Lagrange interpolation in the exponent (because the only operations you do over the commitments are multiplying by publicly available scalars (the share identifiers), and adding public points together) across any threshold combination of commitments, they should all resolve to P(0). That's FVSS in a nutshell.

But this still requires a trusted dealer! You have to trust that the dealer itself is not corrupted and exfiltrating secrets. How can we remove the trust component? By having all share holders perform FVSS – each party creates a random polynomial, samples shares accordingly, sends the shares (securely, of course) to each corresponding party, then everyone individually adds their provided (and self-sampled) polynomial samples. Because polynomials can be added together, you have effectively combined the random polynomials of all parties, without any one party knowing the real coefficients. Then verification involves each party raising their shares to a generator like in FVSS, and publishing their public commitments. The same Lagrange interpolation process is performed to verify all threshold combinations of public points resolve to the same public point – essentially confirming the secret was generated correctly without having to reveal the secret.

Now, to use the secret – execution. The technique for this is called Oblivious Transfer – it essentially leverages the same intractability problem of a given cryptosystem (discrete log for elliptic curves) to bootstrap a circuit garbling technique. There's many ways to do this, I'll explain the simplest form, for two parties:

Party 1 generates a public key, private key pair (a, A), and has two choices for messages (m_0, m_1), and sends A to Party 2.

Party 2 generates a private key (b), and a choice bit (c), and depending on c:

  c == 0: Party 2 generates B = b \* A

  c == 1: Party 2 generates B = b \* G + A, where G is the generator of the curve.
Party 2 sends B to Party 1

Party 1 calculates:

  e_0 = Hash(a \* B)

  e_1 = Hash(a \* (B - A))
Party 1 encrypts m_0 with e_0, m_1 with e_1, and sends both encrypted messages.

Party 2 calculates:

  e_c = Hash(b \* A)
Party 2 decrypts m_c with e_c.

Because Party 2 does not know a, they cannot calculate the encryption key for the message they did not choose, as the encryption keys calculated by Party 1 are contingent on an operation on B that cancels out the respective values produced when creating B from the choice, multiplied by the private scalar a.

Additionally, because Party 1 does not know b, they cannot determine which choice Party 2 actually made.

Now, for a simple choice between messages this seems silly, but this is actually sufficient to bootstrap a logical circuit – any computable circuit can be evaluated through this process of oblivious transfer. What does that give us? The ability to jointly compute operations over data required to be kept private by each side, such as an HMAC. Since HMACs are generally used in idempotent operations, generating the HMAC for a request using an OT circuit is sufficient to do what you're wanting.

For basic authentication, you'd need to extend the OT circuit evaluation all the way out to TLS frame construction itself, but is still possible.

NB: I am creating a decentralized network, Quilibrium, to make it easy to build and deploy applications that evaluate as OT circuits – MPC TLS is on the roadmap.


Does this use a mix of striping with oarity and public key cryptography?


Boo that's not what a Horcrux is!


isnt this how you hide malware on a machine?


Clever name


Cool name


I was just thinking about something like this problem. At $work, we use multi-signature signing to move cryptocurrency around, so that at least N of M officers of the company need to sign, to prove that "the company" actually intends a movement of funds to happen. This ensures that no single officer can embezzle funds; and it also ensures that an attacker would have to do some kind of multi-target simultaneous coordinated rubber-hose attack (rather than just waiting to kidnap one of us when we go on vacation) to get access to the funds.

I was trying to think of a way to extend that kind of security to the encryption of data — specifically the encryption of low-level root-account passwords (like an AWS account's root password). Shamir's Secret Sharing is the obvious first step... but you'd also then want two additional properties:

1. the decrypted secret should not be held even temporarily by any of the parties, but rather should be held by — and used "through" — a neutral system, so that the secret is reusable rather than needing to be burned the first time it's revealed

2. the neutral system — despite being likely owned by a third party! — should have no ability to exfiltrate the password into the hands of the third party.

I think this can be workable in the specific case of wanting to use the root password as an HMAC, by doing SSS decryption inside a single-shot non-durable abstract machine with homomorphically-encrypted memory, wrapped in a network daemon: the network daemon spins up a copy of the abstract machine; receives each SSS key split from its owner, feeding each as it receives it directly into the abstract machine; the abstract machine, after receiving sufficient key splits, signals to the daemon that it is now "empowered" to sign; and the daemon can then use the abstract machine to HMAC arbitrary plaintexts sent to it, until it exits, and the abstract machine's state is lost.

The real trick, though, would be making this work for HTTP Basic Auth over TLS, by delegating the generation of exactly one TLS frame — the one containing the Authorization header — to the abstract machine; where the network daemon would then act as an HTTP proxy, inserting this Authentication header into requests made through it. Having something like this could really improve security around the use of a lot of sensitive control-plane APIs!

---

Of course, in a corporate context, you probably have a third party you can trust with the plaintext — e.g. an Enterprise Password Manager service — so you can relax property 2. In such a case, you don't need a fancy abstract machine; you can just ask said service to build a feature allowing secrets to be configured to require N-of-M confirmations by ACLed team-members to unlock them; and to build another feature for "opaque secrets" that are never revealed to anyone, but instead are injected into an HTTP Authorization Proxy that the Enterprise Password Manager service spin up themselves on their backend.

But it's still fun to think about the case where your secret is so secret that you need to hide the plaintext from literally everyone but your co-conspiriators,


Fun repo that uses the Shamir Secret Sharing Scheme to break a file into parts which can be recombined to get the original. From the README

Who this is for:

    People who need to encrypt a big sensitive file like a diary and don't expect to remember any passwords years from now (but who paradoxically will be capable of remembering where they've hidden their horcruxes)

    People who want to transmit files across multiple channels to substantially reduce the ability for an attacker to intercept
FAQ

    Q) This isn't really in line with how horcruxes work in the harry potter universe!
    A) It's pretty close! You can't allow any one horcrux to be used to resurrect the original file (and why would you that would be useless) but you can allow two horcruxes to do it (so only off by one). Checkmate HP fans.


And for people who are confident that the some compatible horcrux tool is still available.


Store it with the files


[flagged]


Why?


[flagged]


i’m totally a hacker, i hack things




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: