How hard can it really be to stop this technically?

Kristine1975 · on March 25, 2016

The pirates (arrr!) are essentially using steganography when they hide videos in jpeg images or pdf files. Steganography can (sometimes, depending on the algorithm used) be detected, but in the end it's an arms race between one side using more sophisticated steganography and the other side using more sophisticated detection mechanisms.

benplumley · on March 25, 2016

I'd be very surprised if that were the case. The impression I got from the article was that they were just changing the extension on video files so that they could be uploaded as PDFs. For it to be steganography would require the PDF of the video file to still be readable as a PDF, which is vastly more work and takes much more technical knowhow (and the ability to download a decryption program) than just renaming.

cooper12 · on March 25, 2016

Looks like MediaWiki (the software behind WIkipedia) verifies that the MIME type matches the extension: "MediaWiki tries to detect the MIME type of the files you upload, and rejects the file if the file-extension does not match the mime type" (https://www.mediawiki.org/wiki/Manual:MIME_type_detection) However, MIME types can be spoofed... (http://security.stackexchange.com/questions/35933/how-can-i-...)

hobs · on March 25, 2016

Depends on the file type, there are a few that ignore anything in certain areas and if you change the extension should just work.

Pretty sure, with gifs and zip you can just combine them together with one append command and you are done.

Kristine1975 · on March 25, 2016

ZIP can contain arbitrary data before the archive data itself. That's an intended feature that's used e.g. for self-extracting archives. They look like this on disk:

  +-----------+-------------+
  | extractor | zip archive |
  +-----------+-------------+

The extractor, when run, simply opens itself as a ZIP archive and decompresses itself.

So you could append the ZIP archive to an image file and then decompress the result without having to remove the image beforehand.

But you cannot concatenate several ZIP archives and expect a working bigger archive.

robbiep · on March 25, 2016

how many legitimate uses can there be for >300mb jpeg and gifs?

Kristine1975 · on March 25, 2016

Just split the files into a lot of small parts like in the good old Usenet/Hotline days. Then upload your "collection of holiday pictures".

cooper12 · on March 25, 2016

One thing that's happened is that Wikimedia Common's is utilizing IP range blocks for the country. (meaning users will have to request exempt accounts) [0] However, users can also upload files to their local Wikipedias and the Portuguese Wikipedia couldn't really range block a whole country of its participants. That's what this article and the mailing list [1] is having trouble with. Wikipedia does have a permissions system so that's the likely avenue they might pursue. For example, on the English Wikipedia, users have to be "autoconfirmed" to create pages or upload files. However, it is very easy to attain: an account just has to be older than 4 days and made at least 10 edits. Maybe they'll create a stricter group for uploading files. (though who knows, maybe they'll start encoding the files as text comments into articles) Another possibility is making each upload subject to review, similar to a pending changes system, but that has its own problem in having enough reviewers. [2] When I looked at the mailing list discussion when it was just starting, the foundation was suggesting user education over blanket banning policies. It's certainly not a problem with an easy solution, and in my opinion it's much more of a people problem; throwing tech at it might exacerbate matters. Not to mention that any potential solutions will also make it harder for new would-be editors, making it a tragedy of the commons. (where I'm referring to the open nature of Wikipedia rather than the free data) [3]

[0]: https://commons.wikimedia.org/w/index.php?title=User_talk:St...

[1]: https://lists.wikimedia.org/pipermail/wikimedia-l/2016-March...

[2]: https://en.wikipedia.org/wiki/Wikipedia:Pending_changes

[3]: https://en.wikipedia.org/wiki/Tragedy_of_the_commons

Sidenote: Wikipedia and commons already have a concept of "patrolling", which puts up every new article or file to be patrolled by another editor for spam/copyright violations. That's likely how these bad uploads were caught. Unfortunately this doesn't stop the upload in the first place and requires work to look at and delete the files which is hard to keep up with. (Not to mention that not enough people patrol pages either. On the English Wikipedia, nearly every page I've had patrolled was by one user called SisterTwister. I was also given autopatrolled rights on commons even though I'm hardly a prolific uploader)