Hacker News new | past | comments | ask | show | jobs | submit login

I have an idea for a filesharing site where the files must be encrypted. The server tries to parse the file using ffmpeg, ghostscript, libreoffice ect. and if it can read it the file is rejected. Similarly if a decryption key can be provided for any md5 then it is deleted. The hash for every file will be public. Known illegal md5s are blocked/reported. Doesn't need to be md5 specifically. This completely solves all moderation issues with the caveat than anyone who can open the file can also delete it. Not sure how that could make money either.




> parse the file using ffmpeg, ghostscript, libreoffice ect

> Known illegal md5s

Yeah, no. Those lists (of known formats and of known illegal files) will quickly grow unmanageable.


You don't need to keep the lists, just keep a Bloom filter of sufficient size to keep false positives low.


You can at best save log2(n)-1.4 bits per entry using smarter encoding compared to a naive list. That's perhaps a factor of 2-3x, depending on list size and acceptable false positive rate. For example if you have a list of a billion entries and accept a one in a million false positive rate, the naive list needs 30+20=50 bits, while an ideal encoding will need 21.4 bits, a 57% reduction.

So I don't think bloom filters have a significant impact on the manageability of those lists. Though I doubt the storage size will be the main concern, compared to the effort of adding entries to that list.


would Select COUNT(1) from records where md5 = newMd5 not scale? (or if exists (select 1...) ... )


No. That’s only going to be performant if you can perform an indexed read. That index is not going fit in memory.


Sounds a bit like Mega...


You are right. From page 26 of their whitepaper:

When a public file link is shared publicly, the following is embedded into the public link:

https://mega.nz/#! || Base64( File Handle ) || ! || Base64( Obfuscated File Key )

This is enough information to find the file identified by its File Handle on the server, then download it, verify the Condensed MAC of the overall file, unobfuscate the File Key, then decrypt the file using the File Key and IV.

It should be noted that everything after an anchor hash (#) in the URL is not sent to the MEGA servers and is kept locally in the client’s browser[0]

Didn't think of that. So I guess my idea isn't adding too much. Just a lot of compute without so much in return. It still simplifies moderating the files, however.

[0]https://mega.nz/SecurityWhitepaper.pdf


This would simply mean your server gets pwned at the speed of light. There is no way you could ever keep up to date with every security patch across that many integrations on one public-facing server. The compute would be expensive too, but that would almost be an afterthought to the first problem.


I can copy code that parses the files from these projects without having to execute any programs ie. not actually opening any files. Or you could just set up two servers. Servers A recieves the file from the client and sends it to server B. Server B does not have any network access except to server A. It reads the file, and sends back a magic number to A indicating if the file is good or 0 if it is bad. If server A recieves any other response it wipes server B. This could also just be a VM. And yeah compute would be pretty high. This is just a way to completely solve the responsibility problem. (for one definition of responsibility, that is, which is open to debate, surely.)


If you want to ensure all the foos in a box are bar-ed, why not just bar them before you put them in the box?


Because then I have the decryption key (the baz).


Public key cryptography?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: