How about instead of talking about whether Wikileaks is good or bad or whether y...

openasocket · on March 31, 2017

Self-modifying the underlying machine code isn't what it used to be. Besides the difficulty in writing it, there's lot's of caveats about how it interacts with the cache and the instruction pipeline. It also requires setup, because with modern memory protection all the machine code is read-only. Changing the memory protection for some machine code to be executable and writable at once will set off some alarms (And isn't even possible on systems with W^X). So you need to change it to just writable, make your modifications, then change it back to just executable, which is less suspicious, it just looks like what JIT compilers do. But all in all self-modifying code doesn't really give you anything.

The exception to that is packers and other obfuscation techniques, which are related to self-modifying code. The general idea with these is that you take your real program and compress/encrypt/mangle/etc it and store that data in an executable. The code in that executable de-compresses/decrypts/demangles that data, sets it as executable, and then runs it. Unlike traditional self-modifying code, packing is orders of magnitude easier to write for the malware developer. The advantage here is that an antivirus tool can't determine what your real program does statically unless it understands how you mangled it, which is hard to do in general. To "unpack" an executable you've got three general techniques:

1. Packers tend to get reused a lot, so just have a person write an unpacker for popular packers by hand, and do some pattern matching to figure out which packer an executable is using. This doesn't work for everything, but it's fairly simple.

2. Dynamic Analysis. Run the executable and watch the contents of memory as the program unpacks itself, the real program should pop right out. Of course you have to run the executable in some sort of sandbox environment, and there's ways for the malware to detect that and alter it's behavior. This also isn't the most efficient process, so you can't really do this to executables during, say, an antivirus scan.

3. Symbolic Analysis. Basically static analysis on steroids to figure out what the executable will do without actually running it. The malware can't stop this with sandbox detection. But it's super slow and is still an active area of research.

canada_dry · on April 1, 2017

> Dynamic Analysis. Run the executable and watch the contents of memory as the program unpacks itself

Of course nowadays the makers of fine malware detect whether they are running inside a sandbox, and won't activate.

cormacrelf · on April 1, 2017

You can always run it on a real, unimportant machine not connected to anything. (And never connect that machine to anything ever again.) That feature just makes it slightly more difficult and costly to compromise program security.

asimpletune · on April 1, 2017

Wow, very interesting. Thanks!

DBNO · on April 1, 2017

Edit: part of my comment is corrected by comment below - Thanks openasocket!

Another comment about the content of this article:

Three quarters down the wiki page there is code for "adding foreign language" to the code. The options are are to add code comments in Arabic/Chinese/Russian/Korean/Farsi. My gut reaction is the purpose of this added language is to obfuscate the true source of the code - i.e. the code has Chinese comments in it so it must be from China. Ahh. I guess this makes sense to do. Only problem now is that the Chinese/Russian/Farsi/etc characters that they included in their code is now public. (Obviously now the CIA will change the foreign language words they insert)

I'd posit if someone had an X-year-old (i.e. x=7) copy of some malware, and the malware had these specific foreign language comments as shown by the article, there's a good possibility the source of the malware would be from the us government.

openasocket · on April 1, 2017

This is for obfuscating string constants, the foreign languages included is a red herring. The reason for this is that nontrivial code often has string constants in it, and the string contents are stored in the ELF/PE file in a manner that makes it trivial to extract. Since these strings often reveal a lot about the malware (e.g. a string constant "Your computer has been infected with randomware. Please deposit %d bitcoins to address %s") antivirus signatures often use them to detect specific kinds of malware, and reverse engineers find them useful in determining what a binary does. This framework scrambles the string contents (using techniques like XOR-ing every character against a random key), and injects some code into the executable so that the strings are unscrambled on startup. They just have foreign languages in the example to demonstrate this framework correctly handles unicode.

Analysts never use the language of the code comments for attribution, because such things are trivial to forge.

eraptic · on April 1, 2017

Considering that debug symbols, comments in code and Cyrillic characters in the metadata of files is being used a solid evidence Russia hacked the DNC, I'd say that it's probably still a useful tool

openasocket · on April 1, 2017

Source? I've read the stuff Crowdstrike and Manidant have put out and they mentioned none of those as evidence. Just binary analysis and network indicators from what I've seen.

DBNO · on April 1, 2017

Thanks for this insight! I'll edit my comment to credit you, but I won't delete it since someone might have the same thought process as me.

My comment:

So I see now (thanks to you) that it is just showing test cases (test warbles) to demonstrate that these scrambling techniques work with foreign languages. However, why would the us gov need to make sure that this program can successfully obfuscate Unicode strings in Chinese/Russian/Arabic/Farsi?

My gut reaction: while code comments would be trivial to forge, it appears the us gov is still using foreign language strings in some way - maybe having just one string constant originally in a foreign language that is then obfuscated/scrambled (such as by xoring every char against a random key)

yeukhon · on April 1, 2017

Just FYI. Those Chinese characters are really really really rarely used in any writings. In fact, anyone with Chinese reading compression will tell you those are gibberish words and none of the words make any sense.

willstrafach · on March 31, 2017

This framework seems comparable to many open source obfuscation solutions. I would hope to see more advanced techniques, then again, maybe their requirements called for ensuring things did not look too obfuscated (the more tricks used, the more likely a signature could be detected for their tradecraft).

Personally I do not believe self-modifying code would make much sense in their use case. In fact, this would not be possible on iOS due to kernel-based security protections.

bluejekyll · on April 1, 2017

Ok. In that vain, here's a question; should you use any of these tools as an American citizen, beyond what you use them for, are you breaking any laws? That is, could you be guilty of something like sedition or something like it by using these thing illegally gotten?

AnkhMorporkian · on April 1, 2017

Not unless they have a security clearance or are in the military and have been ordered not to access them. For an ordinary citizen, it isn't illegal to have classified information as long as you weren't a party in their theft.

It's hairier for people with clearances. Technically you could have your clearance revoked for accessing classified information despite the fact that it's public. I don't know if that's ever happened, but it's a possibility.

ethbro · on April 1, 2017

For that idiom, it's "vein".

https://www.usingenglish.com/reference/idioms/in+that+vein.h...

bluejekyll · on April 1, 2017

Gah... thanks. You can speak the damn language for 36 years, and still screw it up all the time.

throwaway91111 · on April 1, 2017

I imagine the promulgation of w^x doesn't make it a natural fit for most aspects of malware.

kaminsod · on March 31, 2017

Self modifying code isn't really new or secret stuff.

https://en.wikipedia.org/wiki/Metamorphic_code

https://web.archive.org/web/20070602060312/http://vx.netlux....