How about instead of talking about whether Wikileaks is good or bad or whether you support them or not, let's talk about the content of the post.
From what I've read so far, this is pretty freaking cool. It's super interesting to read these docs and see their thought process involved, especially since the product their building is so different from what people are making on a day to day business. It actually looks pretty fun to work on. Also, I think it's neat to read about their need for developing frameworks that can be used around the agency to accomplish stuff.
Unfortunately, I didn't ready anything about self modifying code, which is probably the most difficult malware to detect and probably to write. Maybe it's in there though, I didn't read the whole document. I came to the comments about half way through to see dozens of people talking about whether they support Wikileaks or not which I think is fine, free country, but I'd like to actually know what some people who work with this kind of stuff think.
A framework for compiling to self modifying, yet correct, code wiukd be super cool. I wonder if it always has to be written by hand? Probably not but maybe that's a separate tool Wikileaks has yet to release.
Self-modifying the underlying machine code isn't what it used to be. Besides the difficulty in writing it, there's lot's of caveats about how it interacts with the cache and the instruction pipeline. It also requires setup, because with modern memory protection all the machine code is read-only. Changing the memory protection for some machine code to be executable and writable at once will set off some alarms (And isn't even possible on systems with W^X). So you need to change it to just writable, make your modifications, then change it back to just executable, which is less suspicious, it just looks like what JIT compilers do. But all in all self-modifying code doesn't really give you anything.
The exception to that is packers and other obfuscation techniques, which are related to self-modifying code. The general idea with these is that you take your real program and compress/encrypt/mangle/etc it and store that data in an executable. The code in that executable de-compresses/decrypts/demangles that data, sets it as executable, and then runs it. Unlike traditional self-modifying code, packing is orders of magnitude easier to write for the malware developer. The advantage here is that an antivirus tool can't determine what your real program does statically unless it understands how you mangled it, which is hard to do in general. To "unpack" an executable you've got three general techniques:
1. Packers tend to get reused a lot, so just have a person write an unpacker for popular packers by hand, and do some pattern matching to figure out which packer an executable is using. This doesn't work for everything, but it's fairly simple.
2. Dynamic Analysis. Run the executable and watch the contents of memory as the program unpacks itself, the real program should pop right out. Of course you have to run the executable in some sort of sandbox environment, and there's ways for the malware to detect that and alter it's behavior. This also isn't the most efficient process, so you can't really do this to executables during, say, an antivirus scan.
3. Symbolic Analysis. Basically static analysis on steroids to figure out what the executable will do without actually running it. The malware can't stop this with sandbox detection. But it's super slow and is still an active area of research.
You can always run it on a real, unimportant machine not connected to anything. (And never connect that machine to anything ever again.) That feature just makes it slightly more difficult and costly to compromise program security.
Edit: part of my comment is corrected by comment below - Thanks openasocket!
Another comment about the content of this article:
Three quarters down the wiki page there is code for "adding foreign language" to the code. The options are are to add code comments in Arabic/Chinese/Russian/Korean/Farsi. My gut reaction is the purpose of this added language is to obfuscate the true source of the code - i.e. the code has Chinese comments in it so it must be from China. Ahh. I guess this makes sense to do. Only problem now is that the Chinese/Russian/Farsi/etc characters that they included in their code is now public. (Obviously now the CIA will change the foreign language words they insert)
I'd posit if someone had an X-year-old (i.e. x=7) copy of some malware, and the malware had these specific foreign language comments as shown by the article, there's a good possibility the source of the malware would be from the us government.
This is for obfuscating string constants, the foreign languages included is a red herring. The reason for this is that nontrivial code often has string constants in it, and the string contents are stored in the ELF/PE file in a manner that makes it trivial to extract. Since these strings often reveal a lot about the malware (e.g. a string constant "Your computer has been infected with randomware. Please deposit %d bitcoins to address %s") antivirus signatures often use them to detect specific kinds of malware, and reverse engineers find them useful in determining what a binary does. This framework scrambles the string contents (using techniques like XOR-ing every character against a random key), and injects some code into the executable so that the strings are unscrambled on startup. They just have foreign languages in the example to demonstrate this framework correctly handles unicode.
Analysts never use the language of the code comments for attribution, because such things are trivial to forge.
Considering that debug symbols, comments in code and Cyrillic characters in the metadata of files is being used a solid evidence Russia hacked the DNC, I'd say that it's probably still a useful tool
Source? I've read the stuff Crowdstrike and Manidant have put out and they mentioned none of those as evidence. Just binary analysis and network indicators from what I've seen.
Thanks for this insight! I'll edit my comment to credit you, but I won't delete it since someone might have the same thought process as me.
My comment:
So I see now (thanks to you) that it is just showing test cases (test warbles) to demonstrate that these scrambling techniques work with foreign languages. However, why would the us gov need to make sure that this program can successfully obfuscate Unicode strings in Chinese/Russian/Arabic/Farsi?
My gut reaction: while code comments would be trivial to forge, it appears the us gov is still using foreign language strings in some way - maybe having just one string constant originally in a foreign language that is then obfuscated/scrambled (such as by xoring every char against a random key)
Just FYI. Those Chinese characters are really really really rarely used in any writings. In fact, anyone with Chinese reading compression will tell you those are gibberish words and none of the words make any sense.
This framework seems comparable to many open source obfuscation solutions. I would hope to see more advanced techniques, then again, maybe their requirements called for ensuring things did not look too obfuscated (the more tricks used, the more likely a signature could be detected for their tradecraft).
Personally I do not believe self-modifying code would make much sense in their use case. In fact, this would not be possible on iOS due to kernel-based security protections.
Ok. In that vain, here's a question; should you use any of these tools as an American citizen, beyond what you use them for, are you breaking any laws? That is, could you be guilty of something like sedition or something like it by using these thing illegally gotten?
Not unless they have a security clearance or are in the military and have been ordered not to access them. For an ordinary citizen, it isn't illegal to have classified information as long as you weren't a party in their theft.
It's hairier for people with clearances. Technically you could have your clearance revoked for accessing classified information despite the fact that it's public. I don't know if that's ever happened, but it's a possibility.
From what I've read so far, this is pretty freaking cool. It's super interesting to read these docs and see their thought process involved, especially since the product their building is so different from what people are making on a day to day business. It actually looks pretty fun to work on. Also, I think it's neat to read about their need for developing frameworks that can be used around the agency to accomplish stuff.
Unfortunately, I didn't ready anything about self modifying code, which is probably the most difficult malware to detect and probably to write. Maybe it's in there though, I didn't read the whole document. I came to the comments about half way through to see dozens of people talking about whether they support Wikileaks or not which I think is fine, free country, but I'd like to actually know what some people who work with this kind of stuff think.
A framework for compiling to self modifying, yet correct, code wiukd be super cool. I wonder if it always has to be written by hand? Probably not but maybe that's a separate tool Wikileaks has yet to release.