The Broadcom link in the posted tweet records [some of?] their reasoning. Things like very North America specific strings, activity happening M-F for certain things (compilation, etc), capability (access to zero days implying deep pockets to buy said zero days), and breadth of target, etc.
That said - it ABSOLUTELY BOGGLES MY MIND that, if these are not leaked, but rather recovered from attempted attacks, how are _any_ valid timestamps and strings not randomized as part of the build process!? I'm not saying it refutes or confirms, I'm just wondering - how difficult is it to read an ELF | PE and remove / change those things, and if it's as easy as I'm thinking, why would you not do so? Or replace with preprocessor directives that you could setup to random values for production builds to use strings and timestamps that indicate some other entity? All of this seems straightforward to me, like, could do via shell scripting or python. Is there a valid reason to leave this stuff in? Are we seeing some low priority work that the TLA wants to leak to show that they're out there and capable?
> Or replace with preprocessor directives that you could setup to random values for production builds to use strings and timestamps that indicate some other entity?
They do, except they're not random. Check out the CIA Vault 7 leaks from a few years ago. They purposefully leave trails that point to other countries including using foreign languages for variable names/comments.
> “[D]esigned to allow for flexible and easy-to-use obfuscation” as “string obfuscation algorithms (especially those that are unique) are often used to link malware to a specific developer or development shop.”
> The source code shows that Marble has test examples not just in English but also in Chinese, Russian, Korean, Arabic and Farsi. This would permit a forensic attribution double game, for example by pretending that the spoken language of the malware creator was not American English, but Chinese, but then showing attempts to conceal the use of Chinese, drawing forensic investigators even more strongly to the wrong conclusion, — but there are other possibilities, such as hiding fake error messages.
Ah OK good, thanks for the link. Right, this seems like something _I_ could probably handle with a weekend or two's worth of research (meaning it's pretty simple because I'm no hacker).
And Broadcom _does_ note that they associate with Vault7 group via the whole picture, but it's weird they present the strings and dates data without noting that it would be trivial to fake, and don't give any specificity to the other data points.
I guess for this type of work the only thing you _really_ have is the code's intent, if you can figure that out.
That said - it ABSOLUTELY BOGGLES MY MIND that, if these are not leaked, but rather recovered from attempted attacks, how are _any_ valid timestamps and strings not randomized as part of the build process!? I'm not saying it refutes or confirms, I'm just wondering - how difficult is it to read an ELF | PE and remove / change those things, and if it's as easy as I'm thinking, why would you not do so? Or replace with preprocessor directives that you could setup to random values for production builds to use strings and timestamps that indicate some other entity? All of this seems straightforward to me, like, could do via shell scripting or python. Is there a valid reason to leave this stuff in? Are we seeing some low priority work that the TLA wants to leak to show that they're out there and capable?