I have an idea for a filesharing site where the files must be encrypted. The server tries to parse the file using ffmpeg, ghostscript, libreoffice ect. and if it can read it the file is rejected. Similarly if a decryption key can be provided for any md5 then it is deleted. The hash for every file will be public. Known illegal md5s are blocked/reported. Doesn't need to be md5 specifically. This completely solves all moderation issues with the caveat than anyone who can open the file can also delete it. Not sure how that could make money either.
You can at best save log2(n)-1.4 bits per entry using smarter encoding compared to a naive list. That's perhaps a factor of 2-3x, depending on list size and acceptable false positive rate. For example if you have a list of a billion entries and accept a one in a million false positive rate, the naive list needs 30+20=50 bits, while an ideal encoding will need 21.4 bits, a 57% reduction.
So I don't think bloom filters have a significant impact on the manageability of those lists. Though I doubt the storage size will be the main concern, compared to the effort of adding entries to that list.
This is enough information to find the file identified by its File Handle on the server, then download it, verify the Condensed MAC of the overall file, unobfuscate the File Key, then decrypt the file using
the File Key and IV.
It should be noted that everything after an anchor hash (#) in the URL is not sent to the MEGA
servers and is kept locally in the client’s browser[0]
Didn't think of that. So I guess my idea isn't adding too much. Just a lot of compute without so much in return. It still simplifies moderating the files, however.
This would simply mean your server gets pwned at the speed of light. There is no way you could ever keep up to date with every security patch across that many integrations on one public-facing server. The compute would be expensive too, but that would almost be an afterthought to the first problem.
I can copy code that parses the files from these projects without having to execute any programs ie. not actually opening any files. Or you could just set up two servers. Servers A recieves the file from the client and sends it to server B. Server B does not have any network access except to server A. It reads the file, and sends back a magic number to A indicating if the file is good or 0 if it is bad. If server A recieves any other response it wipes server B. This could also just be a VM. And yeah compute would be pretty high. This is just a way to completely solve the responsibility problem. (for one definition of responsibility, that is, which is open to debate, surely.)