I saw a very cool security talk a few years ago about how you can use the browser to do all kinds of evil things (did you know that the Same Origin Policy does not prevent you from making the request, but just seeing the response? And even then, you can guess at what kind of response you got).
One of the great points in the talk was JS obfuscation. Now, there are many techniques for doing this, but I really like this one as it just looks cool. Since you can translate most functional JS into ASCII, you simply encode every character into a binary coding using spaces for 0 and tabs for 1. Then you write a very simple converter + put an eval() around it, and viola, you are running arbitrary code. To a casual observer it would look like you have delivered a mostly empty file, while in fact you are delivering perfectly valid JS. Not unbreakable, but certainly fun.
It is kinda irrelevant that the Same Origin Policy doesn't prevent requests since, you don't even need to use a javascript request, a simple image tag does the same
Realistically, how big is your exploit? 30KB? So it'll be 30*7 = 210 KB (JS is actually 7 bit ASCII). That's plenty of code to do something malicious. Nothing is preventing you from minifying the code before converting it to this whitespace encoding.
I suppose you could make your encoding include other whitespace chars, like newlines, carriage returns, etc. Then you could use base 4 instead of base 2.
Currently, there is only 5 characters in the HN headline: "[]()+". Maybe a mod removed the "!" because they thought it was being used for emphasis. It could be placed before the "+".
That's a pretty awesome hack of JS's (rather insane) semantics. It's probably worth understanding at least the primative examples though if you're a JS programmer, because this actually happens sometimes to expressions you write.
I was interested in how this might be shrunk to something more reasonable... I got this far before getting bored: http://pastie.org/8324713
That code, run as-is in your browser, should alert 'hi'.
It's constructed with the intent of replacing the variables a-e and various named parameters with short sequences in the given character set that can be cast to strings. For legibility, I shortened many of the encodable sequences to strings (such as []["slice"])
The table holds 5^3 characters, which is 125; you could start the loop at 2 instead of 0 to get 2-127 as target characters, or use 6 input arguments. The 5 I chose were ![], [], +[], +{}, ~[] -- this should let you encode anything after overhead at a ratio of 12:1 with high repetition that gzip can take advantage of. This could be awful, I don't know; but maybe somebody will find this interesting.
I put in alert('hello') and it worked. That's awesome. But how? I searched the code it made and didn't see 'hello.' I understand the stuff below, how it uses JS's weird properties to the basic types... but how does it encode characters?
Take the first part, `(![]+[])`. `![]` evaluates to `false`. Then `+[]` coerces false into a string, so the expression is `"false"`.
The rest of the expression (more complicated) evaluates to `[[1]]`, which will grab the `"a"` from `"false"`. Now why there is the extra surrounding brackets, I'm not sure, because `[1]` would have worked as well.
It uses various techniques to get at e.g. type names, and then uses string indexing to access characters in the type name. It has enough characters like this to be able to call fromCharCode.
It converts certain things into strings. Like [][[]] => undefined, and then gets the characters by index from that string and puts your original string together.
we can expand our alphabet still further, and access some even more exciting properties:
[]["constructor"] --> function Array() { [native code] }
([]+[])["constructor"] --> function String() { [native code] }
(![])["constructor"] --> function Boolean() { [native code] }
(+[])["constructor"] --> function Number() { [native code] }
[]["filter"]["constructor"] --> function Function() { [native code] }
Almost there now.
By converting these back to strings we can access even more characters, and by passing the strings to the Function() constructor we can can construct functions and evaluate them! In other words, we have "eval". Let's use it to access the window object:
So now we have access to the global context and eval. We don't quite have access to the full range of letters, but we have enough letters to call toString, and use it's base conversion ability to get the full lowercase alphabet:
And now we have "p", we can use escape and unescape to get most of the rest:
unescape(escape(" ")[0]+4+0) --> "@"
So there you have it!
The source code essentially runs this process backwards: it repeatedly uses regular expressions to convert the code back into "()[]!+" one step at a time.
Im not sure but once you have 0, 1, 2 and 10 as numbers you could add and subtract those to get any number you want, then express those numbers as strings (casting).
It's more a showoff of a js idiosyncrasy - they found an ugly looking subset of characters that is Turing complete and wrote a translator to it.
If you're new to the concept of Turing tarpits, then this should blow your mind. On the other hand, this is a sufficiently advanced bit of CS thinking that no future employer should question the author's basic competency. I certainly have never written a translator.
I'm not sure it's a translator by that definition. The input and output are legal Javascript. It's moreso a demonstration of a functionally complete subset of the language.
You can get all the primitive values by taking advantage of unary plus, binary +, empty arrays, array dereferencing, function calls, and the standard strings returned by some basic expressions. The numbers are straightforward. The strings are dereferenced with numbers to get some individual letters. You can get methods by using array dereference on objects with strings. You get the rest of the letters with btoa and atob. Then you get eval, and you're off to the races!
I'm sure, but I think the encoder just goes token by token, eval'ing string conversions to get identifiers. You'll notice that for pretty simple expressions you get absurdly long strings out.
>I'm not sure it's a translator by that definition. The input and output are legal Javascript. It's moreso a demonstration of a functionally complete subset of the language.
Well, okay, so it's still strictly speaking javascript but if we consider the functionally complete subset to be our 'target' language we end up in the same place, methinks :).
It teaches you the funky behavior of JavaScript. For instance, `[]["filter"]` returns an empty function called filter.
In order to get particular characters (for example: f), the script uses "false"[0], where "false" is derived from adding ![] + [], and 0 is derived from +[].
The funkiness is really that []["filter"]["constructor"] is a function that behaves like eval(), or rather, it returns a function that when called executes the string that was passed to the constructor function. That's how they're actually running the code.
By demonstrating creative abuse of a language, one can better understand what the language constructs do and the scope of their capability.
The comparable IOCCC.org contest is a great place to learn C: sure the primary focus is making obfuscated code, and the audience marvels at the bizarreness thereof, but taking the time to pick a submission apart and figure out why it works (despite a host of reasons why you'd think it shouldn't) expands your understanding of the language.
It's a great way to improve your language skills when you've reached the initial "I know language X" confidence.
This is interestingly reminiscent of swearjure[1], clojure without alphanumerics. The IRC log in that post is very enlightening to see how the idea evolved into something absolutely frightening. I would like to see the history of JSFuck.
This is quite funny. I learned about JSFuck in 2012 when I took some of my fellow students to a JavaScript meetup in Hamburg. JSFuck has since developed to be a kind of running gag among the students at our computer science department when it comes to things with a high WTF factor.
I don't know if the stuff mentioned here isn't reversable. I never tried it (just with images), wouldn't this not just show a big blob in the HTML file?
Essentially it just builds a new function of a given body text, then eval's it. You could easily check out that source code before eval'ing with a log statement.
One of the great points in the talk was JS obfuscation. Now, there are many techniques for doing this, but I really like this one as it just looks cool. Since you can translate most functional JS into ASCII, you simply encode every character into a binary coding using spaces for 0 and tabs for 1. Then you write a very simple converter + put an eval() around it, and viola, you are running arbitrary code. To a casual observer it would look like you have delivered a mostly empty file, while in fact you are delivering perfectly valid JS. Not unbreakable, but certainly fun.