Hacker News new | past | comments | ask | show | jobs | submit login
This JPEG is also a webpage (coredump.cx)
833 points by cocoflunchy on Aug 10, 2016 | hide | past | favorite | 226 comments



I abused this concept to compress demo code in PNG files, with great success. http://demoseen.com/blog/2011-08-31_Superpacking_JS_Demos.ht...

This is, at present, the most efficient way to pack demos on the web; a few characters of uncompressed bootstrap code, then the rest is deflated.


My democrew (Ninjadev) has used this technique for multiple WebGL/Javascript productions over the last few years now.

You can see the final packed .PNG results here: Crankwork Steamfist https://stianj.com/crankwork-steamfist/, Everything is Fashion https://stianj.com/fashion/, and Inakuwa Oasis http://arkt.is/inakuwa-oasis/.

The tool used for creating both the demos and the packed .PNG is made by us and available on GitHub here https://github.com/ninjadev/nin/.


Also a Ninjadever. Daeken's blog post was one of the inspirations that led us to implementing this in our toolchain. p01's Matraka [1] was another. The PNG trick is pretty common in the js scene nowadays, with packing tools such as JsExe [2] readily available.

[1]: http://www.pouet.net/prod.php?which=59403

[2]: http://www.pouet.net/prod.php?which=59298


Oh man, I didn't know you guys released your tools. I'm a big fan of your stuff -- awesome to see such polished prods on the web. Happy to have helped enable some amazing work!


That first link froze my (rather old) computer - had to reboot!


Seems like some bits are missing on that page, e.g. I see nothing between "What is the bootstrap? Well, it's what turns our PNG into code and runs it. Here's the one I use:" and "The 4968 here is really the size of the decompressed data in bytes times 4"


Good catch -- thank you! Imported this from an older blog and must've lost that code block somehow.


Looks like some non-ascii characters didn't make the jump to hyperspeed, either, unless your friend really is named Nicolás Alvarez


It was looking fine on my end, but I realized I wasn't setting an encoding on the page. Should be UTF-8 now and not cause any more problems. Thanks for pointing it out!


This link at the beginning seems to be not working: https://demoparty.mozillalabs.com/


Yep, it's inactive. New URL is http://www.mozillalabs.com/en-US/demo-party/


Doesn't seem to work on the latest Firefox.


You can see this in action for yourself on a unix cli:

  $ curl -o squirrel.html http://lcamtuf.coredump.cx/squirrel/
  $ file squirrel.html
  squirrel.html: JPEG image data, JFIF standard 1.01, comment: "<html><body><style>body { visibility: hidden; } .n { visibilit"
Open the file in a browser and read the page. Then:

  $ mv squirrel.html squirrel.jpg
Open the renamed file in a browser and only the image appears.

I'm not sure what the security implications are. I'm not creative or devious enough to think of anything offhand, but a lot of attack vectors start off with this sort of misdirection.


> I'm not sure what the security implications are.

You can use this technique to phish signatures. Send someone a document that reads "X" in format A and "Y" in format B. The victim signs file.A thinking they are endorsing X but you can plausibly claim that they signed file.B (because it's the same file) and hence endorsed Y. This is why digital signature standards need to include meta-data, e.g.:

https://github.com/Spark-Innovations/SC4/blob/master/doc/fil...

Scroll down to "bundle files"


> but you can plausibly claim

And anyone else can plausibly claim that you carefully forged a file to get a victim to sign it -- the signature will be of the whole file, not just a single view of it.

But that said, you shouldn't sign binary files unless you have a reasonable understanding of what is in it (or trust the party presenting it to you).


> And anyone else can plausibly claim that you carefully forged a file to get a victim to sign it

Yes, of course, but by the time someone realizes this the damage may already have been done.

> you shouldn't sign binary files

There are a lot of things that people shouldn't do that they do nonetheless.


there are websites where you can upload files such as images, but they filter html for security reasons. if you can present that html to someone (eg. through deeplinking an iframe to the document), it could contain (or load) javascript that runs in the context of that site...


This is why it's important to correctly set the Content-Type header when serving files. Also why it's a good idea to have user-uploaded content served from a separate domain.


Some browsers might try to sniff the mime type, so an additional header would help : "X-Content-Type-Options: nosniff"[1]

1. https://blogs.msdn.microsoft.com/ie/2008/09/02/ie8-security-...


Thanks! I remembered something like that existing but I couldn't remember the header name :)


Please elaborate on the part about UCG from a separate domain. Why is this?


If someone uses this trick to upload a PNG like this to your server, and that person is tricked into running it as HTML, then that HTML has access to your cookies and can make make AJAX requests (circumvent same origin protection).

If user content is on a separate domain, they can't do that.

Also fishing is a lot easier when you're on the real domain...


Sorry, didn't see this comment earlier. GitHub's blog post on why they did this gives some good insight.

https://github.com/blog/1452-new-github-pages-domain-github-...


file -k may give different results, but binwalk [0] is probably a better choice

[0]: http://binwalk.org/


Interestingly, this page is intercepted by my router which then just sends me a redirect to one of its settings pages. Odd.


Well now, that is interesting… deep packet inspection? or just a truly insane bug?

What router is it?


Here's the request and response from router: http://pastebin.com/e7rxLsGJ

The router itself is a BT Internet (UK) branded one. Not sure of the exact model but I'll try to find out...


I found some references online to something called "Smart Setup", which you can apparently turn of under Advanced Settings -> Home Network -> Smart Setup, but no idea what it actually does and why it intercepts random requests...


Besides the model number, can you also tell us the firmware version?


BT Home Hub 5 (Type A) Software version 4.7.5.1.83.8.204.1.11

But, false alarm anyway, nothing interesting is happening. The firmware had updated and reset parental control settings on the router. The domain is on some blacklist apparently so it was redirecting to a page to finalise parental control preferences.

Sorry it wasn't any more interesting than that.

Edit: the reason it took me a while to figure this out was that the settings page it was redirecting to was nothing to do with parental controls!


The HomeHub 5 is a Thomson (Technicolor) router, if I remember correctly.


So how did you figure it out?


I visited some other black listed sites <.< >.> and discovered the pattern, then dug around in the router settings to see what had changed. Disabling parental controls sorted it and I can now see the squirrel/chipmunk/unidentified rodent.

Lesson learned: just use a VPN.


> I visited some other black listed sites <.< >.>

Is that an ASCII representation of what I think it is? (a well known .cx site)


It was shifty eyes. But good imagination skills +1


I saw it as just a 'shifty-eyes' emoticon, but then again I don't know of said .cx-site so I maybe missing it.


aaisp - your life just became simpler.


the fact that it encodes "/" on a part of the url in a parameter, but not on another is a very good indication that whatever this 'feature' is doing is badly thought out and the implementation was done by the intern.


This pretty much describes the entire BT HomeHub firmware, to be honest.


"Deployed on one out of four residential gateways globally, Cisco Videoscape OpenRG is the industry's most widely used residential gateway software." -- Cisco

At least we know a little about the software issuing this redirect!


This intrigues me. I wonder why the router would be meddling in such affairs.


Another reason to use https, if you don't want routers filtering your website's content.


It probably validates Content-Type and X-Content-Type-Options


Prior discussion, years ago, many comments: https://news.ycombinator.com/item?id=4209052


which begs the question: where are all the "essential squirrel facts" that were promised?


Maybe a product manager realized it didn't make sense to provide "essential squirrel facts" to a page featuring the image of a chipmunk. :-)


Author here. Rookie mistake! It's actually a golden-mantled ground squirrel.

https://en.wikipedia.org/wiki/Golden-mantled_ground_squirrel

A chipmunk would have a stripe going across the eye.

(Today, you learned your first squirrel fact!)


I did learn something new, not just about squirrels but about chipmunks, too.

Thank you.

And here I was impressed merely by the delivery of image data in the HTML stream. Little did I realize your page is practically an Encyclopedia Rodentia.

Kudos and hats off.


The fact they didn't want us to know


I'll admit, I had the same thought. :)


Some PoC||GTFO PDFs are also valid in other formats—"polyglots". They usually do PDF+HTML+ZIP, though sometimes they get (even more) creative.

https://www.alchemistowl.org/pocorgtfo/


Keep in mind that combining most normal formats with most archive formats is trivial, because normal formats start at the beginning, and archive formats have a table of contents at the end. Concatenate both files and you're done.

Combining with PDF is also on the easy end of things, because the PDF header just has to be somewhere vaguely near the start.



A testament to one of the worst decisions in computing history - not to fail displaying a web page with an error message in case it is not a valid HTML document.


Being flexible about what markup is accepted has meant the web could gain new features and gracefully degrade, and has made it more fault-tolerant. It's not at all a failing.

Compare that to JavaScript, which will happily fail if you use new syntax or a missing function, and thus web pages which rely on JS often show up as just a full screen of white when something goes wrong, which it frequently does. That's not to say JS should be as flexible as HTML is here, but it provides an interesting contrast.


There's nothing wrong with some tolerance, like ignoring tags that it doesn't know. But if the syntax is wrong, it shouldn't try to fix it or guess what the user meant, just display an error. Accepting invalid syntax means all HTML parsing becomes vastly more complicated. Which creates room for bugs, exploits, and unexpected situations like OP's post.


> But if the syntax is wrong, it shouldn't try to fix it or guess what the user meant, just display an error. Accepting invalid syntax means all HTML parsing becomes vastly more complicated.

Why push the complexity onto the user? Someone who just wants to make a working website doesn't care about your pedantry.

Do they want their page to fail to render entirely when PHP outputs a warning? Do they want their website to be completely broken because they forgot to convert some of their text from Latin-1 to UTF-8 before pasting it into the document? Should we really expect them to have to modify their blogging software to validate custom HTML snippets, lest the entire page become unusable? Will they be pleased when their style of code falls out of favour in future, is deprecated, and then their page doesn't work at all later?

Moreover, strictness can backfire when you have such a diversity of implementations.

> Which creates room for bugs, exploits, and unexpected situations like OP's post.

The OP is not so much an unexpected situation as a carefully engineered one that's completely within the constraints HTML sets.


> Do they want their page to fail to render entirely when PHP outputs a warning?

Displaying warnings is fine. Invalid XHTML, less so.

> Do they want their website to be completely broken because they forgot to convert some of their text from Latin-1 to UTF-8 before pasting it into the document?

The encoding of the content has nothing to do with the document markup.

> Moreover, strictness can backfire when you have such a diversity of implementations.

On the contrary: this prevents subtle bugs in the interpretation of invalid data by different implementations.


>> Why push the complexity onto the user? Someone who just wants to make a working website doesn't care about your pedantry.

Because this approach has historically resulted in people who "just wanted to make a working website" making websites that only work in specific browsers (or worse yet, specific versions of specific browsers on specific platforms). And then those sites stuck around and infrastructure got built around them that made them hard to fix.

We've spent the best part of 00s fixing that mess, and there are still some pockets that haven't been properly cleaned up. If that's not a lesson to learn from, I don't know what is.


> Compare that to JavaScript, which will happily fail if you use new syntax or a missing function, and thus web pages which rely on JS often show up as just a full screen of white when something goes wrong, which it frequently does.

Isn't that more due to failure to handle exceptions and display errors to users?


> Isn't that more due to failure to handle exceptions and display errors to users?

You could describe it that way.... or you could describe it as failing to have reasonable default logic for handling faults & gracefully degrade.


Silent failure or not, the page still doesn't work.


A parser could always just ignore tags it does not recognize, no need to try to make sense of any random collection of tags.


I kinda agree, but one has to concede that XHTML has failed for a reason.


XHTML failed because doing the old, broken, tag-soupy mess still worked exactly as well from the user perspective. You just can't get people to work harder for invisible benefits. In a sense, it's a reason, but it doesn't mean that tag soup is a good thing.

I often wonder how different the internet would be if Postel's prescription never gained traction and fail-fast behavior were the norm instead.


Also because all of the tooling was terrible and extensibility was non-existent. It's easy to imagine a world where XHTML worked out better because the browsers provided clear, informative errors rather than a blank page, someone at the W3C cared enough to have a usable validator which produced helpful warnings and errors, and attention was paid to the less friendly bits of the XML ecosystem[1].

Instead, it felt like we had a bunch of people who though that big lofty standards were so obviously correct that everyone else would take care of those boring implementation details, and 99.9999% of web developers correctly realized that there was very little measurable downside to sticking with something which was known to work.

1. Simple examples: namespacing is a good idea but it leads to gratuitous toil in most tools – e.g. a valid XML document which has <foo> should just work if you write a selector for /foo, as present in the document, rather than requiring you to do kludgey things like have to lard up every parser registering the same namespaces which are also declared in the document and writing fully-qualified selectors like /mychosenprefix:foo or /{http://example.org/fooschema/1.0}foo for every tag, every time.

Similarly, getting XPath 2.0 support to actually ship in enough tools to be usable would have made one of the better selling points for using XML actually exist as far as the average working programmer is concerned.


Writing proper XML tools is very difficult. XMLs are usually parsed without considering the dtd. It seems the dtds are just for humans to read. As you mention, XML becomes more interesting with extensibility. IMHO XSLT is the key technology for that but unfortunately there is no reasonable support for XSLT 2 because the standard is just too complex. And XSLT 1 is barely interesting to use.

By the way, there's SLAX, it's isomorphic to XSLT but with nicer syntax. Nice approach but anyways, the standards are horribly complex and stiff.


The main lesson I've learned from this is that a spec is far more likely to be successful if it's paired with at least one working implementation and more than trivial test data. Writing a validator is both important for adoption and perhaps more so for flushing out parts of the spec which are too hard to implement or annoying to work with.


Maybe it never would have taken off because many more people would get frustrated trying to make something show up on the screen and give up. Or get frustrated trying to make tools that produced something that all of the browsers would display no matter what weird things the users did.

Maybe it would be so bad that somebody else would make a new, more permissive standard that took off instead.

Maybe all of that has already happened.

Maybe that line from Battlestar Galactica was right - All of this has happened before; all of this will happen again.


It would probably be a lot smaller!


This actually used to happen, to some degree. No error, but you would get a blank page in Netscape if you failed to close a TABLE tag.

They started making browsers more lenient since there was so much poor malformed HTML being produced


How would you have prevented huge schisms during the IE push?

In a world where invalid HTML documents aren't rendered at all we could have had the evolution of the format dictated by Microsoft because of their market position.


So you are one from the XHTML2 camp then? Good we got HTML5 and good that the weird years of transition to XHTML 1 and unclear vision with XHTML2 and ECMAScript for XML (E4X) are long gone.


Or one of the best. Switch wen browsers to strict processong and you will hardly find working web page.


Yeah, imagine if processors gave best effort to processing binaries... what could possibly go wrong :/

The decision to allow this was made early and the liberal accept/strict transmit paradigm has in general made the web a mess.

On the plus side, the consistent failure of browser vendors to apply strict controls to input means that as an application security person I will probably never be out of work :D Even though this pattern of behavior is starting to change, legacy support means that I will still be dealing with these issues well into my retirement!


> as an application security person I will probably never be out of work

How many security flaws are the result of malformed HTML?


Malformed HTML? None. Browsers attempting to be lenient in what they accept? Loads.

To take this article as an example, according to the HTTP specification, the `Content-Type` header is supposed to have the final say in what media type is being served. Internet Explorer decided it would be better to use heuristics. I think the idea was that if a web host was misconfigured, rather than have the web developer fix their bug, it would try to guess its way out of the error.

Which kinda worked. The problem was, it opened it up to abuse. If you had a web host that allowed untrusted people to upload images (e.g. profile photos), you could construct an image that tricked Internet Explorer into thinking that it was an HTML document, even if the server explicitly told clients that it was an image. The main difference between images and HTML, of course, is that HTML can contain JavaScript, which would now execute in the security context of your web page.

So all of these web hosts, thinking they were only giving people the ability to upload images, were now letting people execute JavaScript on their domain – simply because Internet Explorer tried to be lenient.

The workaround ended up being forcing downloads with `Content-Disposition` headers instead of displaying inline. That's why, for example, visiting the URL of an image on Blogger directly triggers a download instead of showing the image.

Other examples that spring to mind:

Netscape interpreting certain Unicode characters as less than signs. People were correctly escaping `<` as `&lt;` but the Unicode characters slipped through and caused XSS vulnerabilities in that browser.

Browsers ignoring newlines in pseudo-protocols. Want to strip `href="javascript:…"` out of comments? No problem… except some browsers also executed JavaScript when an attacker placed a newline anywhere within the `javascript` token.

Being lenient in what you accept has caused security vulnerabilities over and over again and there's no reason to think that it will stop now.


> To take this article as an example, according to the HTTP specification, the `Content-Type` header is supposed to have the final say in what media type is being served. Internet Explorer decided it would be better to use heuristics. I think the idea was that if a web host was misconfigured, rather than have the web developer fix their bug, it would try to guess its way out of the error.

Which kinda worked. The problem was, it opened it up to abuse. If you had a web host that allowed untrusted people to upload images (e.g. profile photos), you could construct an image that tricked Internet Explorer into thinking that it was an HTML document, even if the server explicitly told clients that it was an image. The main difference between images and HTML, of course, is that HTML can contain JavaScript, which would now execute in the security context of your web page. So all of these web hosts, thinking they were only giving people the ability to upload images, were now letting people execute JavaScript on their domain – simply because Internet Explorer tried to be lenient.

This is an interesting example, though I think this is a fair bit different than being lenient on HTML interpretation.

The topic was strict HTML. Accepting malformed HTML doesn't seem to pose much of a problem. Blindly executing a non-executable file seems like a much different problem.

> Netscape interpreting certain Unicode characters as less than signs. People were correctly escaping `<` as `&lt;` but the Unicode characters slipped through and caused XSS vulnerabilities in that browser.

This doesn't sound like being lenient. This just sounds like a bug.

> Browsers ignoring newlines in pseudo-protocols. Want to strip `href="javascript:…"` out of comments? No problem… except some browsers also executed JavaScript when an attacker placed a newline anywhere within the `javascript` token.

Huh? I don't understand the scenario being described here. It again sounds like a bug rather than lenient acceptance of data, though.


> I think this is a fair bit different than being lenient on HTML interpretation.

It's not. There are two areas where the leniency was a problem here. Firstly, the leniency in rendering one media type as a completely different media type because the browser heuristic thought it was being lenient. Secondly, the leniency in parsing HTML out of an image file – you can't do that with valid HTML.

> Accepting malformed HTML doesn't seem to pose much of a problem.

I've literally just given three specific examples of it causing security vulnerabilities.

> This doesn't sound like being lenient. This just sounds like a bug.

No, it was intentional. It was specifically Unicode characters that looked like less than and greater than signs, but weren't.

> I don't understand the scenario being described here.

Somebody noticed that href="java\nscript:…" wasn't being parsed as JavaScript, and it was causing some malformed pages to fail to work properly. Rather than let it fail, they tried to fix it by stripping out the whitespace, and caused a security vulnerability.

If these three examples aren't enough, take a look at OWASP's XSS filter evasion cheat sheet. There's plenty of examples in there of lenient parsing causing security problems:

https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_She...


> It's not. There are two areas where the leniency was a problem here. Firstly, the leniency in rendering one media type as a completely different media type because the browser heuristic thought it was being lenient. Secondly, the leniency in parsing HTML out of an image file – you can't do that with valid HTML.

I think you can argue the first is a problem. You have an example demonstrating as much. Arguing that the second is a problem is much harder. Lenient HTML acceptance been hugely advantageous to the adoption of the web. There may have been some issues from this, but it's valuable enough that the effort to "fix" it was abandoned and the W3C and WHATWG returned to codifying what leniency should look like.

> I've literally just given three specific examples of it causing security vulnerabilities.

Well, at least one example. Coercing a file served as an image to HTML isn't an issue of accepting malformed HTML, nor would I agree that the JS example is a problem with leniency.

> No, it was intentional. It was specifically Unicode characters that looked like less than and greater than signs, but weren't.

Okay, I reread your last comment. I initially thought you were saying that Netscape was treating '&lt' as '<'. So Netscape decided to treat some random unicode chars that happen to look kind of like the less-than symbol (left angle bracket: ⟨, maybe?) as if they're the same as the less-than symbol? This seems amazingly short-sighted and pointless. How was this issue not seen, and was this even solving a problem for someone?

> Somebody noticed that href="java\nscript:…" wasn't being parsed as JavaScript, and it was causing some malformed pages to fail to work properly. Rather than let it fail, they tried to fix it by stripping out the whitespace, and caused a security vulnerability.

So the issue here is incompetent input sanitization. I don't think the browsers being lenient here is the issue.

> If these three examples aren't enough, take a look at OWASP's XSS filter evasion cheat sheet. There's plenty of examples in there of lenient parsing causing security problems: https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_She...

A few of these are interesting in the context of browsers being lenient. e.g. This one requires lenience as well as poor filtering:

  <IMG """><SCRIPT>alert("XSS")</SCRIPT>">
Most of these are just examples of incompetence in filtering, though, and a great example of why you 1) should not roll your own XSS filter if you can avoid it, and 2) why you should aggressively filter everything not explicitly acceptable instead of trying to filter out problematic text.


> Arguing that the second is a problem is much harder. Lenient HTML acceptance been hugely advantageous to the adoption of the web.

Wait, that's a completely different point. The argument here is that it caused a security vulnerability, and it did. If the lenient HTML parser didn't try to salvage HTML out of what is most certainly not valid HTML, then it wouldn't be a security vulnerability.

> This seems amazingly short-sighted and pointless. How was this issue not seen, and was this even solving a problem for someone?

You could ask the same of most cases of lenient HTML parsing. It's amazing the lengths browser vendors have gone to to turn junk into something they can render.

> So the issue here is incompetent input sanitisation.

No, it isn't. That code should not execute JavaScript. The real issue is that sanitising code is an extremely error-prone endeavour because of browser leniency – because you don't just have to sanitise dangerous code, you also have to sanitise code that should be safe, but is actually dangerous because some browser somewhere is really keen to automatically adjust safe code into potentially dangerous code.

Take the Netscape less than sign handling. No sane developer would think to "sanitise" what is supposed to be a completely harmless Unicode character. It should make it through any whitelist you would put together. Even extremely thorough sanitisation routines that have been worked on for years would miss that. It became dangerous through an undocumented, crazy workaround some idiot at Netscape thought of because he wanted to be lenient and parse what must have been a very broken set of HTML documents.

This is not a problem with incompetent sanitisation. It's a problem with leniency.


You have some compelling examples of problems from leniency. I think in some cases the issues are definitely magnified by other poor designs (bad escaping/filtering) but you've demonstrated that well-intentioned leniency can encourage and even directly cause bugs.

Thanks for providing actual, concrete examples.


Malformed HTML may escape sanitization on input in a vulnerable web app, and still render on the victim's browser because their browser wants to be helpful.

(Yes, the output should have been escaped, but that is sadly not always the case)


I don't see how this has anything to do with malformed HTML or lenient rendering rules. In the scenario you're describing, well-formed but malicious HTML could also escape sanitization.


Which are the faults of the authors. No one expects malformed source code to compile, a video with corrupted headers to play properly or a binary containing invalid instructions not to crash. This decision allowed people to get away with broken web pages instead of forcing them to fix their mistakes.


It's a better web environment than it was years ago, but there was a time that even simple layouts required invalid code to display similarly across the browsers of the time.


The web would not have been as successful if it wasn't for this leniency. Full correctness is only worthwhile if both attaining it does not excessively harm other goals and the cost of not attaining it is severe enough.

The cost of a malformatted HTML document rendering despite the errors is not that severe compared to the benefits it provides, as we have seen.


Well of course if correctness is not a requirement then it's not being paid attention to. That doesn't in any way indicate that it was a good decision not to require it in the first place.


If it had been made as difficult as possible for enthusiasts learning a markup language to get what was essentially text document to actually display anything, it's probably not too much of an exaggeration to say the World Wide Web wouldn't have existed in its current form.

It's not as if many of the web's security holes are related to whether a page displays valid HTML markup or not.


I don't think the assumption is justified that enforcing well-formed HTML documents would have been a significant barrier. To me it actually seems easier to have a few simple and strictly enforced rules than having a more or less random assortment of exception to save a handful of key strokes.


The failure of XHTML Strict suggests otherwise.

Ultimately it's less about saving keystrokes and more about amateur enthusiasts having the opportunity to start with the browser rendering their unformatted document rather than an "Error at line 1" warning, and changes they introduce being considerably less likely to break the entire page


XHTML Strict only solved one set of leniencies in the web platform, which were also the least important kinds. Malformed HTML makes writing browsers complicated but doesn't generally seem to cause security issues or other visible, obvious, must-fix-now problems.

The real security due to leniency problems in the web platform revolve around the handling of data and how JavaScript works, which were not addressed by XHTML. In that sense it's not surprising it went nowhere.


That's exactly my point. Low barrier to entry allowed the web to explode.


Or the web was ready to explode anyway, and requiring a bit more strictness would not have significantly hampered its adoption.


Right-clicking on the image and selecting "View Image" (Firefox), or "Open image in a new tab" (Chromium), gives the webpage, not the image. I can see why that happens: the menu items just open a URL and don't force it to be an image. However, it was a bit disorienting.


I did this too, repeatedly, until I was smiling a very big smile.


I didn't know what he was talking about until I tried this:

  data:text/html, <html><img src="http://lcamtuf.coredump.cx/squirrel/"></html>
Put that in the url of the browser.


I'm surprised nobody in this thread has mentioned PICO-8, a "virtual console" which compresses its "cartridges" in the form of a PNG file. When viewed in a browser or on a computer the file is displayed as a neat stylised image of a cartridge with a description, box art etc but when opened in the PICO-8 executable reveals all of the game code, art and music/sound assets in fully uncompressed editable form. The cartridges can be shared freely on sites that leave the original file intact without re-compressing. Nifty!

http://www.lexaloffle.com/pico-8.php


This site uses the xmp tag (deprecated in HTML 3.2, removed in HTML5) which I found interesting and had never seen!

https://developer.mozilla.org/en-US/docs/Web/HTML/Element/xm...

It's similar to the pre tag but doesn't require the escaping. I guess you just have to make sure you don't have a closing xmp tag :)


<xmp> is great when you absolutely, positively, do not want any entities rendered under any circumstances. It's unfortunate that it's being deprecated, since it has its uses.


> <xmp> is great when you absolutely, positively, do not want any entities rendered under any circumstances. It's unfortunate that it's being deprecated, since it has its uses.

  <![CDATA[ here &entities; or <angle|<brackets>> will not interpreted ]]>
There is no need for special-casing xmp, when SGML and XML already define CDATA escapes.


Removed?

"User agents must treat xmp elements in a manner equivalent to pre elements in terms of semantics and for purposes of rendering. (The parser has special behaviour for this element though.)" — https://html.spec.whatwg.org/multipage/obsolete.html#require...


I was just going off what the MDN page said about it being removed in the HTML5 standard. It looks like WHATWG just has a "living standard" and W3C still uses the versioning, so it's probably removed from W3C standards. I'm not too familiar with the reality of these standards.


I use <xmp> for debugging purposes all the time.


> Pretty radical, eh? Send money to: lcamtuf@coredump.cx

How to send money to your email address? Not that I would send you some, but I wondered how you want to have that money received?


Not to be rude, but in the USA (where SWIFT or bank wire transfers can be expensive) an email address as a recipient of an online fund transfer is a pretty common; ie: paypal, venmo, chase quickpay

now specifically in this case, lcamtuf (at google security) is joking and doesn't want your money.

this hack is actually pretty crazy - an arbitrary HTML / jpeg polyglot file that fooled a browser could be used for js injection, say from a site that allowed jpeg file uploads, and validated mime type.


This has been done in the past. I remember seeing an advisory as far back as 2010, but at the moment can only find these two more recent advisories:

https://websec.io/2012/09/05/A-Silent-Threat-PHP-in-EXIF.htm...

https://blog.sucuri.net/2013/07/malware-hidden-inside-jpg-ex...

The way we protected ourselves against it at <earlier company> (since we allowed image uploads at a variety of locations) was to decode and recode the image before storing and strip out comments.


I agree transcoding all user content is a must, but even that can be dangerous :-) as with ImageTragick which lcamtuf discussed here: https://lcamtuf.blogspot.com/2016/05/clearing-up-some-miscon...


He\she isn't actually expecting payment, just as they don't really think that the trick is "pretty radical." It's just a mildly amusing way of providing contact information.


It is pretty radical in the 80's-90's sense of the word.[0]

[0] http://img03.deviantart.net/ebd4/i/2015/166/9/6/radical_dude...


1. Most Gmail users can receive money by email (https://support.google.com/mail/answer/3141103 and coredump.cx MX records point to Gmail)

2. Ask him his Bitcoin address

3. Paypal to this address

:)


You don't need to ask him for Bitcoin address. Just send him private key of a bitcoin wallet. Or this https://www.bctip.org/en/


You are perfectly right. As a matter of fact I have done this in the past—written a brainwallet passphrase on a birthday card :)


Isn't Google Wallet ded?


No, just pining for the fjords!

But seriously, no, it's alive and kicking... https://www.google.com/wallet/

Page Info shows the last modification was: Thu 31 Mar 2016 01:02:44 PM EDT

I don't personally use it much but I have an account to pay for my domain name though them that I use for Google Apps.


Only google card, the debit card that was attached to GW.


Nope! I pay my rent using it for example.


No. I use it occasionally.


Just put it in a JPEG and send it over.


email money transfer should work fine aka Interac E-Transfer. see http://interac.ca/en/interac-e-transfer-consumer.html


Only in Canada :)


I assume paypal when i see an email.


PayPal?


I like this bit: <img src="#" [...]>


Here is a similar experimental project I made, between image and web page : http://raphaelbastide.com/guropoli/


So in theory, can analytics platforms be compromised so that JPEG tracking pixels could turn into full-fledged sites interfering with the parent page at, say, a bank website? Firing off credentials in the background?


No, because if parsed as a JPEG, arbitrary code wont be run. If the jpeg was somehow parsed as JS, then possibly yes.


The image reference tag is for an image. As stated previously, if you look at the JPEG itself, it starts off with a JPEG comment, which embeds the entire html block, then starts a comment block for the remainder of the JPEG data. Browsers are very liberal in what they accept, so that initial 20-byte header is ignored, although you can see it if you inspect the page's elements.


Yes, I get that - but if a tracking pixel is downloaded and interpreted as a jpeg, then it will parse anything in the COM section as a comment, and not execute anything in it, unless there was some sort of vulnerability in the JPEG implementation


If these tracking pixels are in iframe elements instead of img elements.


If they were iframe elements, they'd still be sandboxed from the parent page, and unable to phone-home with information, right?


If sandbox attribute is used and browser supports it.


>No server-side hacks involved

I doubt this. In request with Accept:"image/png,image/;q=0.8,/*;q=0.5" server souldn't respond with something with Content-Type:"text/html"


mkdir -p ~/tmp/squirrel

cd ~/tmp/squirrel

wget -O index.htm http://lcamtuf.coredump.cx/squirrel/

firefox index.htm

Alternately, using an actual webserver instead of opening a file://

python3 -m http.server &

firefox http://127.0.0.1:8000/

Also works.

Note that the actual img tag has src="#" so when you are looking at the file opened locally, the image is also from local disk, not from his server, so it's legit.

However, the fact that it needs to be identified as a HTML file by the server implies to me that the idea others had ITT of abusing image hosting sites using this trick probably won't work.


That's kind of nitpicky.


...how is that possible?


He placed HTML inside a COM (comment) segment of the JPEG, which is perfectly legal. The ending of his HTML is "<!--" which starts an HTML comment, telling the browser to ignore all the data which follows (the image data). Since browsers are liberal in what they accept, they ignore the first 20ish bytes of the JPEG header, see the starting HTML tag, render the page, and ignore the fact that the comment never closes.

http://dev.exiv2.org/projects/exiv2/wiki/The_Metadata_in_JPE...


Actually the browser isn't ignoring the first bytes of the JPEG before the <html> tag - it's there on the page, just hidden with CSS.


For example, see how it renders in a text browser (links here):

https://imgur.com/a/WVWpU


That's true, I stand corrected.


...this is probably a hint:

00000000 ff d8 ff e0 00 10 4a 46 49 46 00 01 01 01 01 2c |......JFIF.....,|

00000010 01 2c 00 00 ff fe 03 72 3c 68 74 6d 6c 3e 3c 62 |.,.....r<html><b|

$ file index.html index.html: JPEG image data, JFIF standard 1.01, resolution (DPI), density 300x300, segment length 16, comment: "<html><body><style>body { visibility: hidden; } .n { visibilit", baseline, precision 8, 1000x667, frames 3

I wonder what are the security implications of that.


> I wonder what are the security implications of that.

At least any terminal escape sequence can be executed if you run `file` on a JPEG, it seems, since this:

    curl -s 'http://www.imagemagick.org/image/fuzzy-magick.png' | convert - -set comment "$(printf 'asdf\x1b[1;31mTest?\x1b[0m hmm')" test2.jpg
    file test2.jpg
Results in red text on my terminal for me.

(It also results in file writing a 0xff 0xdb to the terminal, which the terminal turns into the unicode fallback character since it's not valid text…)


I just get the following so it seems like my version of file has been patched to handle this case.

    test2.jpg: JPEG image data, JFIF standard 1.01, aspect ratio, density 72x72, segment length 16, comment: "asdf\033[1;31mTest?\033[0m hmm", baseline, precision 8, 320x85, frames 3


Same here with file(1) version 5.22 from the Debian jessie package repo. I'd be interested to know in which versions this kind of thing actually works.


My first guess would be "the MacOS one", since from what I have heard, MacOS tends to have older (sometimes much older) versions of basic system utilities. I don't have a MacOS machine nearby to check, so this is just a guess.


Well that and they have BSD rather than GNU versions


Might be a way to bypass malicious script detectors that see the JPEG header and stop trying to process the file.

It will be stopped dead by metadata filters though. Stripping out the comment would be step #1 for those devices.


What device would have metadata filters installed though? Not a modern browser, by default.


I was thinking more like an IDS.


I remember HTML parser was standarized in WHATWG. I guess it never gives up and keeps trying to find valid HTML tags



I appreciate the technical trickery in this version, but has it not been possible to do this since at least 1996[1] by having the server serve different files based on the "Accept" http header?

[1] https://www.w3.org/Protocols/HTTP/1.0/spec.html#Accept


That's for the initial resource itself, not subsequent asset requests. For instance, an asset request is a separate request on a different domain, such as one for a customer service analytics tag, which is requested without any accept resource type request filter.


So if the initial request comes with "Accept: text/html" then you serve a webpage. The request when linked as img comes as "Accept: /" and you serve an image.

Perhaps caching or other schemes would break this, I'll try to knock up a PoC.


Great, now construct a jpeg image that's valid HTML and a valid uncorrupted picture as well.


Can someone explain in simple English how this works?


1. HTML is very forgiving. HTML also provides a comment mechanism <!-- --> between which anything will be ignored from a browser perspective.

2. JPEG also allows for comments and other embedded metadata which won't show up in the displayed image.

3. Start the file with a JPEG header and metadata section, then switch between HTML and JPEG using the comment functionality mentioned above

Essentially!

"But wait, why is it shown as an image in one context and as a web page in another?"

The answer is in the question: Context. If you expect a JPEG you will get a JPEG, and same for HTML.


JPEG images allow comments. The html is stored in the comment. Browsers assume they are reading an HTML file and loads the comment. When the browser thinks it is reading an image, like in an <img> tag, it reads the image ignoring the comment.


great hack, could you get javascript working inside a jpeg as well? Or obfuscate the javascript and decrypt in the browser for steganographic purposes?


No, the JPEG header is at the start of the file, unlike the PNG which is at the bottom.


Pardon my ignorance, but what does that matter?

There's an important note in that the HTML is not at the true start of the JPEG file, it's slightly after. You can even view some of the JPEG format bytes if you view source of the HTML.

So if the browser ignores some of the JPEG file, why not most of the PNG file? Perhaps you run the risk of some random byte screwing up the HTML though.. not sure.


Oh, you're right. I was thinking of the combined zip/png, and it's the zip file that has the header at the bottom, so my previous comment is completely wrong. The article seems to be adding the HTML in the EXIF data (thus making it a completely valid JPEG) and the browser tries to be very accommodating in what it accepts, thus ignoring the junk data (or what it thinks is junk data) at the start of the HTML file.

Whether JS would work or not depends on how much the browser tries to recover from errors there. I would guess not much, but what would you gain from a combined JS/JPEG file anyway?



You could do the same thing with a .wav file, embedding the HTML after the data sub-chunk. Adobe Audition uses this method to embed application-specific metadata for the file (marker and sub-marker locations, for example).


WAV is actually a specialization of an old container format called RIFF. RIFF allows for extensibility by allowing the embedding of arbitrary data.

So this is technically a legitimate use of RIFF because it's designed to support multi-purpose contents, not just PCM data.


Aha! This is how Beatport were putting title/artist data in WAVs.

I'm not sure why they did that instead of just sending FLACs, though...


Nice trick! I'm interested in encoding other kinds of data in images in order to share apps/music/etc between mobile devices.

Unfortunately, your HTML gets lost I try to save the picture on my phone.

iPhone > Safari > Save to Camera Roll

Copy off the picture; it's been re-encoded and has no JPG comment field.

Any solutions to this are welcome! The re-encoding makes patterns impossible to decode as well (they degrade after being shared a couple of times). See Cemetech's jsTified emulator for an example of a ROM file as a JPG - he uses B/W and it still requires the file to be synced from a computer, not saved from the browser.


Well this will certainly appeal to Steganography enthusiasts and perverts who have clumsily been loafing around .onion sites for years and who now finally have a way to share content in the clear. And of course the NSA, FBI and CIA are suddenly stuck trying to figure out why this goofy squirrel is so popular in Yemen.

mv squirrel.html squirrel.jpg sudo apt-get install steghide steghide embed -cf squirrel.jpg -ef secret.txt mv squirrel.jpg squirrel.html

And voila...


Curious, how do search engine crawlers interpret this? Would it be the same as a browser, i.e. the bot would treat the requested img url respectfully?


I'm a big fan of Ange Albertini's work with this kind of stuff - https://github.com/corkami/

There's a talk called "Funky File Formats" but there's (fittingly) multiple versions of it so you're best off searching for it.


Saumil Shah released a framework for producing images using this technique to deploy browser exploits (but could potentially be used for anything).

Worth a look if you'd like to make your own! http://stegosploit.info/


Neat idea. Here is a BMP I just threw together that is also a webpage.

https://mega.nz/#!49MnjKCJ!7HShAESfmM2R450x4z-zLmtCDotOhsLze...


Along similar lines: https://www.alchemistowl.org/pocorgtfo/

Some of the PDFs also happen to be valid images, audio files, zip archives, etc.


This is cool. Though the 133kb download size for the html isn't great.


Yeah but the image comes straight out of cache.


Good point, well on the second load anyway.




This would be a unique way to make downloading images harder.


Just what I was thinking. It doesn't actually prevent the user from downloading the image, just makes them think that they failed to download the image (since it saves with a .htm extension).

The HTML file could be one that admonishes the user for attempting to scrape the file, all the while the file they wanted is sitting right there. A modern day Purloined Letter.


I just right clicked on the image and saved it in Firefox and it gave me a .jpg. I was still able to change it into a working .html file by renaming it. I imagine some systems for downloading would get it messed up, but Firefox at least treats it as a JPEG when you're interacting with the image tag.


I wonder if browsers are smart enough to only download the file once for the html and then cache it for the embedded image.


It looks like Safari is at least, and it even de-duplicates it in the Web Inspector so it only lists a single resource (which gets listed as "type: image"


Yep, Safari doing the same for me. Although I'm seeing it as an image with type "text/html", which is odd.


Can I upload such image for instance to facebook and intent to run it as html (with some JS inside)?


They usually set the content type to that of an image so the browser won't execute the JS.

They've messed this up in the past, see this legendary bug bounty report [1]

1. https://whitton.io/articles/xss-on-facebook-via-png-content-...


Should i expect the same behavior in Facebook? I didn't get image though.


No explanation of what's going on?


Can you make it play a video too?


well I just thought that it is some bug in html img tag. But still nice finding


Cette jpg est pas une pipe.


You can use URI to embed images too, not sure how this is done though, why not just use URI?


Data URIs are useful for embedding resources in a page (e.g. a single HTML file containing all of its own CSS, JS, images, etc.)

This is different: it's a single file which can be parsed as either a HTML page or a JPEG. Hence, when a program expects a HTML page (like a browser loading a Web page), it will be parsed and displayed as a HTML page. When a program expects a JPEG file (like a browser loading the "src" of an "img" element) it will be parsed and displayed as a JPEG.

The trick is to use each format's comment syntax to hide the other format. Not sure if the HTTP headers need to be set differently for each request or not.


I think it may only work if you omit a Content-Type header. Checking Firefox's Network tab, it looks like the server isn't serving one for that page.


It's sending the "wrong" one --

    $ curl -I http://lcamtuf.coredump.cx/squirrel/
    HTTP/1.1 200 OK
    Date: Thu, 11 Aug 2016 05:18:00 GMT
    Server: Apache
    Last-Modified: Mon, 19 Sep 2011 23:31:49 GMT
    Accept-Ranges: bytes
    Content-Length: 135938
    Content-Type: text/html


Oh. Huh. My bad.

I guess browsers only forbid ignoring Content-Type for stuff like JS, then. For JPEG it's probably not a security concern.


It's not "embedded" like that, it's just an <img> tag that points to the same URL as the page.


anchors are not interpreted as html


nice trick!


Reminds me of Pied Piper for some reason :-)


C'mon, don't be so stingy, give him a Ben at least :)

   _____________________________________________________________________
  |                                                                      |
  |  =================================================================== |
  | |%/^\\%&%&%&%&%&%&%&%&{ Federal Reserve Note }%&%&%&%&%&%&%&%&//^\%| |
  | |/inn\)===============------------------------===============(/inn\| |
  | |\|UU/              { UNITED STATES OF AMERICA }              \|UU/| |
  | |&\-/     ~~~~~~~~   ~~~~~~~~~~=====~~~~~~~~~~~  P8188928246   \-/&| |
  | |%//)     ~~~_~~~~~          // ___ \\                         (\\%| |
  | |&(/  13    /_\             // /_ _\ \\           ~~~~~~~~  13  \)&| |
  | |%\\       // \\           :| |/ ~ \| |:  3.21  /|  /\   /\     //%| |
  | |&\\\     ((iR$)> }:P ebp  || |"- -"| ||        || |||| ||||   ///&| |
  | |%\\))     \\_//      sge  || (|e,e|? ||        || |||| ||||  ((//%| |
  | |&))/       \_/            :| `._^_,' |:        || |||| ||||   \((&| |
  | |%//)                       \\ \\=// //         || |||| ||||   (\\%| |
  | |&//      R265402524K        \\U/_/ //   series ||  \/   \/     \\&| |
  | |%/>  13                     _\\___//_    1932              13  <\%| |
  | |&/^\      Treasurer  ______{Franklin}________   Secretary     /^\&| |
  | |/inn\                ))--------------------((                /inn\| |
  | |)|UU(================/ ONE HUNDERED DOLLARS \================)|UU(| |
  | |{===}%&%&%&%&%&%&%&%&%a%a%a%a%a%a%a%a%a%a%a%a%&%&%&%&%&%&%&%&{===}| |
  | ==================================================================== |
  |______________________________________________________________________|
source: http://chris.com/ascii/index.php?art=objects/money


We detached this comment from https://news.ycombinator.com/item?id=12262995 and marked it off-topic.


Why? I mean as if the parent was on topic... and this is a little joke, jokes don't kill.


There's low tolerance for jokes on HackerNews. I'm not entirely sure why, but it is what it is.


To stop the place becoming like reddit, where all insightful comments are buried under a torrent of jokes. I've had my fingers burnt a few times and then sulked for a few days after one of my many hilarious quips was down-voted into oblivion.

But ultimately I think it's for the greater good.


My comment stays at 12 upvotes. Certainly nobody would want jokes to surpass serious comments, but this is not achieved by killing every single joke. Also, there's another joke on that thread.

Sth. that one needs to get used to on HN I think... Randomly picking on arguably off-topic and arguably offending posts proactively. Also, I'm shocked seeing that the mod here has so low karma and so short history of HN usage.


The system is far from perfect, but I've never got the impression that the rules are applied malevolently. I wouldn't like the job of moderating comments, it's really hard and ironically people tend to have a zero tolerance policy if they feel hard done to by an error in judgement on behalf of the mods.


That's a chipmunk, not a squirrel.


I'm not an expert on squirrels, but that could be a ground squirrel[1] of some sort. Some varieties of ground squirrel look a lot like chipmunks.

[1]: https://en.wikipedia.org/wiki/Ground_squirrel


A quick search for "striped ground squirrel" turned up only one variety, the thirteen-lined ground squirrel. The animal in the photo has stripes like a chipmunk.


Look for stripes on the face: without them, it's probably a squirrel. Both ground squirrels and chipmunks have body stripes, so the face is a better indicator.

http://naturemappingfoundation.org/natmap/facts/chipmunk_vs_...

http://www.differencebetween.info/difference-between-squirre...


You're right, there's no stripes on its face. I stand corrected.

Sadly, that was my most upvoted comment in ~9.5 years on this site.


Ahh, I was about to unvote you, but now I can't bear to do it. Keep my upvote.


To my (admittedly untrained) eye, the picture looks a lot like an Indian palm squirrel (https://upload.wikimedia.org/wikipedia/commons/d/d1/Indian_P...) – which does have stripes on its back.


That's not a chipmunk. This is a chipmunk.

http://pmdvod.nationalgeographic.com/NG_Video_DEV/935/311/le...


All chipmunks are squirrels.


I'll be damned. This should be squirrel fact #1.


Family Sciuridae! So they are, I didn't know that.


A Chipmunk is a type of Squirrel https://en.wikipedia.org/wiki/Squirrel



My University's (Minnesota) mascot is a gopher, but the physical mascot is a chipmunk/squirrel (debated) name Goldy Gopher - but it's definitely not a gopher. These rodents can be confusing!


Speaking of gopher... could this be a way to get images on gopher:// pages?


It's a webpage, not an image.


It's both! That's the whole point - the same sequence of bytes can be interpreted as either an HTML web page or a JPEG image, and the page itself demonstrates both at the same time.


DO NOT run Chrome's Timeline dev tool on this when reloading. Crashed a couple tabs.

Good edge case for browser tests!


Works for me, 50.0.2661.94 on Windows 7 x64.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: