Hacker News new | past | comments | ask | show | jobs | submit login
Malware Hidden Inside JPG EXIF Headers (sucuri.net)
221 points by cubictwo on July 16, 2013 | hide | past | favorite | 61 comments



Although this obfuscation is very clever, I think the article overstates a bit. The author claims that the following two commands "harmless by themselves":

  $exif = exif_read_data('/homepages/clientsitepath/images/stories/food/bun.jpg');
  preg_replace($exif['Make'],$exif['Model'],'');
I don't agree with that. While the first command is indeed harmless, the second one is executing a REGEX machine with a dynamic regex param. This is almost like having a user-defined format string to printf(), and thus very suspicious and should be found by any decent security analysis tool.

Even without the "/e" feature of preg_replace, this could result at least in a denial of service attack via some specially crafted regex parameters.

So I agree that this is clever, but I don't agree that this call to preg_replace() should normally be considered harmless by itself. So the question remains why the authors overstate the cleverness of this attack, and I think they give the answer in the last sentence of the article:

"Note: Any of Sucuri clients using Server Side Scanning are protected against this type of injection (detected by us)."


I completely agree with your assessment.

I would like to add that there's probably no reason regex would even be needed for this task. str_replace() would be the more appropriate call. In ##php on Freenode we often have to tell people that you don't need to use preg_* functions if you are not using the power of regular expressions. If you don't need the power of regular expressions, you should be using str_replace() for your operation. In this case, str_replace() would be not only be more appropriate, it would have helped remove the risk of exploit.

That aside, there have been reports of people exploiting metadata in images for years with PHP and other languages. This is not new in any way.

I'm disappointed that the article seems to stress that JPEGs are somehow inherently insecure when, in fact, it should be stressing that one should always be extremely careful dealing with user-submitted input.

In this example, one could simply replace the image uploading aspect with any sort of submitted data. If you're going to be dynamically passing user input into functions of your application, you should always be certain to appropriately clean and escape those situations. Whether that data is hidden in the EXIF data on an image, is going to be put into a database, or is dynamic regex, a healthy level of distrust for all user submitted data is necessary.


> In this case, str_replace() would be not only be more appropriate, it would have helped remove the risk of exploit.

Actually, the exploit was a backdoor planted by the hacker, so the system was already compromised. The recommendation to use str_replace in this instance would be useless, as it appears the victim didn't even put the code there in the first place.


I see. I didn't read it as a back door attack but an injection attack against bad code.


The point of the post seems to be to advertise their product. The comments seem to confirm that.


I think that the main point or cleverness is to avoid an explicit eval ( base64decode ( blahblah ...) ) in the PHP source code, which might be detected. A preg_replace call is also easy to spot, but may occur naturally in the source.

Constructs using a sequence of eval with base64decode and often gzip compression are very common for obfuscating malicious PHP code. I would expect that people are much more likely to look for these in the source code to find out if a website has been compromized rather than looking at EXIF data. So I think, yes, this is something different.

In the article they call this a steganographic malware. But actually the information is not embedded in the picture content, but in the meta or EXIF data. It would be an application of steganography, if the malicious code would be extracted from information that is part of the actual picture and it is not obvious that this information is present in the first place. For example, there could be the drawing of a house in the picture with a cow and a horse and the grass blades in front of it represent 1s and 0s. Then some clever program executes the grass blades code.

It seems that the authors were among the first to detect this kind of attack using the EXIF data in the wild. And of course the want to advertise that their product detects this kind of compromize.


Indeed, that's another overstatement in the article:

Calling this kind of embedding "steganographic", although the malicious code appears clear text within the image raw data.


I think you are over thinking a bit. We do many blog posts per month sharing what we find in the "wild". If you go to the blog, you will see the amount we have.

This is just another one that our team found interesting enough to share.


Why do you have to dog on the post? I've ran into malware attacks in the past, and none of the things I've seen on the net caution you to look for pre_replace instances gone wrong. eval(), base_64_decode() and the like get all the press.


and if you do need a dynamic pattern, this is what preg_quote() is for.

http://php.net/manual/en/function.preg-quote.php


Important to note the jpg is just one part of the malware. It is harmless by itself. It still requires some other file to actually execute it. The jpg just contains further instructions for the backdoor. The jpg is really just an obfuscator.


Exactly. The attacker-added PHP code to run preg_replace is still there. But it does look quite innocuous! This really points to why when compromised you need to wipe the box and start over from scratch, not assume you can find all the backdoors by auditing the filesystem.


From PHP documentation (http://php.net/manual/en/function.preg-replace.php)

    5.5.0  The /e modifier is deprecated.  Use preg_replace_callback() instead.


Indeed. Yet it appears we are afflicted with lazy malware authors who continue to use deprecated APIs instead of updating their exploits.

To be serious: According to the article, the call to preg_replace() was part of the backdoor added by the attackers, it wasn't a pre-existing hole in the site code.


From the perspective of an attacker it doesn't really matter if the malicious code contains deprecated things or lacks elegance or is generally ugly.


You can configure your server to log usage of deprecated features in PHP so that the attack would ultimately appear in the log. Admittedly, it would still take a pretty vigilant Sys Admin to catch it.


True, but most value hosting platforms run anything between 5.2.x and 5.4.x. with no option to upgrade.


Read: deprecated. Stop acting like it's removed in 5.5.


If I'm reading this correctly, someone compromised a site (by other means) and then added the exif_read_data() and preg_replace() lines to the code somewhere as a back door?

If a site was compromised, wouldn't any modified files be replaced with canonical source? Or do people manually scan through files looking for code that looks suspicious?

Or am I misunderstanding and this site for some reason legitimately called preg_replace() with exif make/model parameters (which seems pretty unlikely to do anything useful)?


I suspect it's a way of hiding backdoors in "free" themes/plugins. It's a technique that's not the usual obvious eval(base64decode("0xabad1dea")) that people might scan for (either by eye or with automated tools).

That means this exploit might well be _in_ the "canonical source" - where the user doesn't suspect the "Super awesome social shopping comparison plugin" they downloaded is likely to be trojaned.

(I wonder if anyone's running a scan over the WordPress.org plugins/themes librarys looking for unexplainable exif_read_data() and preg_replace() calls?)


The code in question seems to be removing any occurrences of the Make from the Model. This might be useful if you have a camera that sets the Model to contain the Make name, like setting Make="Nikon" and Model="Nikon D5000". If we want to know the actual model number, then we have to remove the Make from the Model, giving us " D5000". Using preg_replace() might just as good as anything to do this.


No, the signature for preg_replace is ( $pattern , $replacement , $subject) . So, given the call as preg_replace($exif['Make'],$exif['Model'],'') it would replace any occurances of Make with Model within the empty string ''. This is essentially a noop unless your exif data contains exploit code.


In that case I agree that it would never be used as it was written, and suggest that the author must have jumbled the parameters. Calling the method as preg_replace($exif['Make'], '', $exif['Model']) would correct the behavior to something useful, while keeping the vulnerability in place.


Eval in replace funtion.. But why?!


Because PHP.


You mean Perl, right? Because that's what PCRE is, Perl Compatible Regular Expressions, and the e modifier does in fact come from Perl.


In Perl wouldn't the /e modifier live outside the replacement string, as part of the literal regexp, and therefore out of harms way? The issue here is the "/e" modifier in PHP can be injected in to a context where the programmer didn't expect it. The preg_replace function should take the eval modifier as a flag parameter.

Perl does have a ?{} eval expression, but it has to be enabled with an extra 'use' directive. The documentation also suggests enabling taint checking or "constrained evaluation within a Safe compartment".

Contrast this to PHP, where /e appears to be on by default and there's no mention of it whatsoever in the documentation, except in the change log to say it's deprecated in 5.5.

Also PCRE, which PHP uses, is a C library... so I doubt it has any eval functionality. PHP implements this on top of PCRE to mimic Perl.

No I'm afraid it's another shoddy PHP design decision to blame here.


Cute rant. But PHP uses /e in exactly the same way: The /e is part of the literal regexp.

> Contrast this to PHP, where /e appears to be on by default

No, /e is not enabled by default.

> and there's no mention of it whatsoever in the documentation

Yah, that's because it's not on by default. Do you expect them to document every idea you make up?

> is a C library... so I doubt it has any eval functionality

pcre_callout()

> No I'm afraid it's another shoddy PHP design decision to blame here.

No I'm afraid it's just another ignorant PHP basher to blame here. Do you just like to make stuff up about PHP, then bash it?

PS. To anyone reading this: Don't use /e with untrusted input, it's not safe. But if must assign blame, then blame perl for it, not php.


  | Don't use /e with untrusted input
This isn't a case of someone running:

  preg_replace($VAR1 + "/e", $VAR2,'');
No one passed untrusted input to '/e.' It's a case of untrusted input being passed to preg_replace() in an insecure way, which allowed '/e' to be enabled. All of this is sort of irrelevant anyways. The developers did not add the preg_replace() function, the attacker did so as a way to eval code without directly calling eval() (which would be easy to spot in the code, since it should so rarely be called in practice).


> Cute rant.

It wasn't a rant.

> But PHP uses /e in exactly the same way: The /e is part of the literal regexp.

PHP doesn't have literal regexps as part of the language. The regexps in PHP are string arguments to the preg_* family of functions. Perl has literal regexps. e.g. $var =~ s/$pattern/$replacement/e. If you inject "/e" in to $replacement variable, what happens? I don't know, my Perl is rusty, but I'm thinking that that would be either an unevaluated context, or a syntax error.

> No, /e is not enabled by default.

Only as of 5.5. That doesn't make the decision to enable it by default in the past excusable.

> pcre_callout()

This doesn't look like it has anything to do with /e or code evaluation. It's a callback mechanism that users of the library can use. How it's used is irrelevant, the topic at hand is PHPs behaviour.

> To anyone reading this: Don't use /e with untrusted input, it's not safe.

You missed the point. Programmers can use preg_replace poorly and not know that /e even exists. That was one point in the article: that this little function has a surprising and potentially dangerous feature. Just saying "oh well, don't use untrusted input" is naive.


> PHP doesn't have literal regexps as part of the language.

It doesn't really change anything. Instead of using / to delimit the parts each part is sent as an argument to the function.

> Is/$pattern/$replacement/e. f you inject "/e" in to $replacement variable, what happens? I don't know

In PHP that regex would look like preg_replace("/$pattern/e", $replacement), so injecting a /e in the replacement doesn't do anything - it has to be in the pattern not in the replacement.

> Only as of 5.5. That doesn't make the decision to enable it by default in the past excusable.

No, it was never enabled by default. Why do you think it was enabled by default? That wouldn't even make sense.

> It's a callback mechanism that users of the library can use. How it's used is irrelevant, the topic at hand is PHPs behaviour.

How it's used is to - drumroll - enable /e - the callback runs the PHP function that executes the code whenever it sees a place where the /e should activate. i.e. /e is certainly part of the preg C library.


> Contrast this to PHP, where /e appears to be on by default and there's no mention of it whatsoever in the documentation, except in the change log to say it's deprecated in 5.5.

The preg_replace documentation says "Several PCRE modifiers are also available, including 'e' (PREG_REPLACE_EVAL), which is specific to this function" with a link to the PCRE modifiers page: http://de2.php.net/manual/en/reference.pcre.pattern.modifier.... This page has a bunch of warnings on the /e modifier :)

Generally putting user input into any regular expression function without running preg_quote over it first is a bad idea. Not just because of /e, but also various other issues, e.g. causing pathologically slow matches (DOS) or segfaults by deep recursion (DOS and maybe security relevant).


> No I'm afraid it's another shoddy PHP design decision to blame here.

Please keep in mind the context of the thread. The argument wasn't the implementation, but the decision to include eval functionality in replacement functionality. Something that exists in PCRE because of Perl. So unless you want to suggest PCRE doesn't have an /e modifier that performs an eval, you aren't really adding anything to the discussion.



Yes, Python has eval, but it doesn't hide in the bushes and ambush you. The same can not be said for preg_replace and the /e option.


Just curious, how many points does your comment have right now even though it's completely wrong? I'm not insulting you for making a tiny mistake, I'm wondering why a 7 hour old post with multiple 6 hour old corrections attached to it still has a positive score.


because eval is the real source of the problem


Would you remove the ability to load libraries, too? Because that's as dangerous as eval when it comes to purposely writing code to run external commands. The ability of a programming language to run code is not the cause of the problem, it's having domain-breaking misleading functions like a string replacer that can compile and execute.


yeah because libraries, which is available for scrutiny of a community (and you), is the same as a function that can run any arbitrary code in your program at runtime.


If you can load libraries at runtime, you can load a secret malicious library, or replace a standard library with arbitrary code before triggering the loading.

So would you remove that ability to avoid its exploitation potential by malware?



note: “... in replace function”


He's talking about the preg_replace /e modifier not eval, the function.


This isn't new, and it certainly isn't specific to JPGs. It's quite frequent for people to embed malicious code into files masquerading as images. There has also been instances where hacked WordPress instances[1] hide code inside images so they aren't so obviously found ready for execution at a later date

The most common way to exploit this however is by uploading an image with a valid magic number (e.g. GIF89a) which will any mime type checks and then finding a way to rename that file to one with an executable extension, or finding a way to include it through an LFI vuln.

A simple solution is to simply load the image up in ImageMagick (or GraphicsMagick, GD) and re-write the image to disk.

Disclaimer: Former lead developer of an image hosting service

[1] https://media.blackhat.com/bh-eu-12/Be'ery/bh-eu-12-Be'ery-F...


This is not a bug in php that is being exploited. The server has already been compromised. The code look harmless enough for most people to ignore it, this is the point here.

It's a clever obfuscation


Seems like itt was possible to hide it so well due to an oversight in the API for preg_replace. In-band signaling in general makes it easy for people to accidentally add security holes. What they should have done is pass the regex options as a separate parameter, then it would have been obvious that something fishy was going on here.


If I understand this correctly, there is PHP code (preg_replace) in the jpg exif header filled into the preg_replace function in the executable. The executable looks clean that way. The executable finds the jpg via hard-coded file path.

Seems possible that the malicious code could appear in an image itself and be flagged with a simple but distinct header. The executable could just scan display memory for the simple header and execute subsequent code when header is found. Then when a user browses the image on a website, the code is executed without hard-coding the location.

Pure speculation. Such an executable may be too visible. In any case it seems the essential problem of the preg_replace or some kind of PHP executable command would still be the red flag to find such malware.


Interesting... but clueful site owners already strip all EXIF headers anyway, for performance reasons. Options for doing this abound (e.g. mod_pagespeed, grunt imagemin, http://smush.it, etc).


How about tricking the server into generating the payload like this?

https://www.idontplaydarts.com/2012/06/encoding-web-shells-i...


Ingress wasn't through EXIF headers, obfuscation is the goal. If this post effects your environment, you've already been compromised, and the hackers are merely attempting to avoid detection by hiding in the EXIF headers.

This file wasn't uploaded through any legitimate means, it was replaced with malicious intent.


Wow, this would bypass the execution constraints we put on the upload directories. Usually I make an uploads folder rw-rw-rw (0666); it would take another attack in conjunction with it, but that's really something.


Only if someone modifies an executable file to call preg_replace with exif headers.


So the real hole is in `preg_replace` which can execute arbitrary strings on the server if the /e flag is found? Why does this exist?

Any how, this seems to be a genius exploit! If a site lets users upload photos and they use this function, a user could do this.

Basically a good frame work should have 'safe' functions that are designed to accept 'user input' i.e arbitrary strings and never set up a situation where these strings are executed


It isn't an exploit, but rather a code obfuscation technique.

The attacker compromised the site first (the exploit used isn't disclosed in the article), and then tried to conceal the malware they installed on the compromised system by putting most of the malware in an EXIF header.

The malware code used a hardcoded image path, so for someone else to replace the code to be executed, they would need to have access replace the particular image file that contains the malware.

One interesting aspect is that the malware images are public, and probably already referenced to from the site (since the attacker used an existing image), so it might be possible to write a spider (or something based on Common Crawl perhaps) that finds many compromised sites.


Uh what? If a site lets users upload photos and puts their uploaded content into a preg function without validation, they are indeed vulnerable. However, just loading the exif data is not enough to trigger this.

Putting untrusted input into a regular expression pattern is something you shouldn't do in the first place (even without the /e modifier)


Surely any file format with metadata can contain malware if there is a bug in the program reading it...


i don't see how this is anything more than yet another lesson in why you should escape strings from external data soures unless you have a very good reason not to... interesting example though.



nice post!


Great discovery!


As malicious as this may be it is pure genius. It should include a message as such:

Elk Cloner: The program with a personality It will get on all your disks It will infiltrate your chips Yes, it's Cloner! It will stick to you like glue It will modify RAM too Send in the Cloner!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: