I have an idea for a filesharing site where the files *must* be encrypted. The s...

NayamAmarshe · on Aug 17, 2023

https://wormhole.app

warkdarrior · on Aug 16, 2023

> parse the file using ffmpeg, ghostscript, libreoffice ect

> Known illegal md5s

Yeah, no. Those lists (of known formats and of known illegal files) will quickly grow unmanageable.

sparkie · on Aug 17, 2023

You don't need to keep the lists, just keep a Bloom filter of sufficient size to keep false positives low.

CodesInChaos · on Aug 17, 2023

You can at best save log2(n)-1.4 bits per entry using smarter encoding compared to a naive list. That's perhaps a factor of 2-3x, depending on list size and acceptable false positive rate. For example if you have a list of a billion entries and accept a one in a million false positive rate, the naive list needs 30+20=50 bits, while an ideal encoding will need 21.4 bits, a 57% reduction.

So I don't think bloom filters have a significant impact on the manageability of those lists. Though I doubt the storage size will be the main concern, compared to the effort of adding entries to that list.

Vt71fcAqt7 · on Aug 16, 2023

would Select COUNT(1) from records where md5 = newMd5 not scale? (or if exists (select 1...) ... )

TylerE · on Aug 17, 2023

No. That’s only going to be performant if you can perform an indexed read. That index is not going fit in memory.

korse · on Aug 16, 2023

Sounds a bit like Mega...

Vt71fcAqt7 · on Aug 16, 2023

You are right. From page 26 of their whitepaper:

When a public file link is shared publicly, the following is embedded into the public link:

https://mega.nz/#! || Base64( File Handle ) || ! || Base64( Obfuscated File Key )

This is enough information to find the file identified by its File Handle on the server, then download it, verify the Condensed MAC of the overall file, unobfuscate the File Key, then decrypt the file using the File Key and IV.

It should be noted that everything after an anchor hash (#) in the URL is not sent to the MEGA servers and is kept locally in the client’s browser[0]

Didn't think of that. So I guess my idea isn't adding too much. Just a lot of compute without so much in return. It still simplifies moderating the files, however.

[0]https://mega.nz/SecurityWhitepaper.pdf

Etheryte · on Aug 16, 2023

This would simply mean your server gets pwned at the speed of light. There is no way you could ever keep up to date with every security patch across that many integrations on one public-facing server. The compute would be expensive too, but that would almost be an afterthought to the first problem.

Vt71fcAqt7 · on Aug 16, 2023

I can copy code that parses the files from these projects without having to execute any programs ie. not actually opening any files. Or you could just set up two servers. Servers A recieves the file from the client and sends it to server B. Server B does not have any network access except to server A. It reads the file, and sends back a magic number to A indicating if the file is good or 0 if it is bad. If server A recieves any other response it wipes server B. This could also just be a VM. And yeah compute would be pretty high. This is just a way to completely solve the responsibility problem. (for one definition of responsibility, that is, which is open to debate, surely.)

offices · on Aug 17, 2023

If you want to ensure all the foos in a box are bar-ed, why not just bar them before you put them in the box?

Vt71fcAqt7 · on Aug 17, 2023

Because then I have the decryption key (the baz).

offices · on Aug 21, 2023

Public key cryptography?