Also a Ninjadever. Daeken's blog post was one of the inspirations that led us to implementing this in our toolchain. p01's Matraka [1] was another. The PNG trick is pretty common in the js scene nowadays, with packing tools such as JsExe [2] readily available.
Oh man, I didn't know you guys released your tools. I'm a big fan of your stuff -- awesome to see such polished prods on the web. Happy to have helped enable some amazing work!
Seems like some bits are missing on that page, e.g. I see nothing between "What is the bootstrap? Well, it's what turns our PNG into code and runs it. Here's the one I use:" and "The 4968 here is really the size of the decompressed data in bytes times 4"
It was looking fine on my end, but I realized I wasn't setting an encoding on the page. Should be UTF-8 now and not cause any more problems. Thanks for pointing it out!
Open the file in a browser and read the page. Then:
$ mv squirrel.html squirrel.jpg
Open the renamed file in a browser and only the image appears.
I'm not sure what the security implications are. I'm not creative or devious enough to think of anything offhand, but a lot of attack vectors start off with this sort of misdirection.
> I'm not sure what the security implications are.
You can use this technique to phish signatures. Send someone a document that reads "X" in format A and "Y" in format B. The victim signs file.A thinking they are endorsing X but you can plausibly claim that they signed file.B (because it's the same file) and hence endorsed Y. This is why digital signature standards need to include meta-data, e.g.:
And anyone else can plausibly claim that you carefully forged a file to get a victim to sign it -- the signature will be of the whole file, not just a single view of it.
But that said, you shouldn't sign binary files unless you have a reasonable understanding of what is in it (or trust the party presenting it to you).
there are websites where you can upload files such as images, but they filter html for security reasons. if you can present that html to someone (eg. through deeplinking an iframe to the document), it could contain (or load) javascript that runs in the context of that site...
This is why it's important to correctly set the Content-Type header when serving files. Also why it's a good idea to have user-uploaded content served from a separate domain.
If someone uses this trick to upload a PNG like this to your server, and that person is tricked into running it as HTML, then that HTML has access to your cookies and can make make AJAX requests (circumvent same origin protection).
If user content is on a separate domain, they can't do that.
Also fishing is a lot easier when you're on the real domain...
I found some references online to something called "Smart Setup", which you can apparently turn of under Advanced Settings -> Home Network -> Smart Setup, but no idea what it actually does and why it intercepts random requests...
BT Home Hub 5 (Type A) Software version 4.7.5.1.83.8.204.1.11
But, false alarm anyway, nothing interesting is happening. The firmware had updated and reset parental control settings on the router. The domain is on some blacklist apparently so it was redirecting to a page to finalise parental control preferences.
Sorry it wasn't any more interesting than that.
Edit: the reason it took me a while to figure this out was that the settings page it was redirecting to was nothing to do with parental controls!
I visited some other black listed sites <.< >.> and discovered the pattern, then dug around in the router settings to see what had changed. Disabling parental controls sorted it and I can now see the squirrel/chipmunk/unidentified rodent.
the fact that it encodes "/" on a part of the url in a parameter, but not on another is a very good indication that whatever this 'feature' is doing is badly thought out and the implementation was done by the intern.
"Deployed on one out of four residential gateways globally, Cisco Videoscape OpenRG is the industry's most widely used residential gateway software." -- Cisco
At least we know a little about the software issuing this redirect!
I did learn something new, not just about squirrels but about chipmunks, too.
Thank you.
And here I was impressed merely by the delivery of image data in the HTML stream. Little did I realize your page is practically an Encyclopedia Rodentia.
Keep in mind that combining most normal formats with most archive formats is trivial, because normal formats start at the beginning, and archive formats have a table of contents at the end. Concatenate both files and you're done.
Combining with PDF is also on the easy end of things, because the PDF header just has to be somewhere vaguely near the start.
A testament to one of the worst decisions in computing history - not to fail displaying a web page with an error message in case it is not a valid HTML document.
Being flexible about what markup is accepted has meant the web could gain new features and gracefully degrade, and has made it more fault-tolerant. It's not at all a failing.
Compare that to JavaScript, which will happily fail if you use new syntax or a missing function, and thus web pages which rely on JS often show up as just a full screen of white when something goes wrong, which it frequently does. That's not to say JS should be as flexible as HTML is here, but it provides an interesting contrast.
There's nothing wrong with some tolerance, like ignoring tags that it doesn't know. But if the syntax is wrong, it shouldn't try to fix it or guess what the user meant, just display an error. Accepting invalid syntax means all HTML parsing becomes vastly more complicated. Which creates room for bugs, exploits, and unexpected situations like OP's post.
> But if the syntax is wrong, it shouldn't try to fix it or guess what the user meant, just display an error. Accepting invalid syntax means all HTML parsing becomes vastly more complicated.
Why push the complexity onto the user? Someone who just wants to make a working website doesn't care about your pedantry.
Do they want their page to fail to render entirely when PHP outputs a warning? Do they want their website to be completely broken because they forgot to convert some of their text from Latin-1 to UTF-8 before pasting it into the document? Should we really expect them to have to modify their blogging software to validate custom HTML snippets, lest the entire page become unusable? Will they be pleased when their style of code falls out of favour in future, is deprecated, and then their page doesn't work at all later?
Moreover, strictness can backfire when you have such a diversity of implementations.
> Which creates room for bugs, exploits, and unexpected situations like OP's post.
The OP is not so much an unexpected situation as a carefully engineered one that's completely within the constraints HTML sets.
> Do they want their page to fail to render entirely when PHP outputs a warning?
Displaying warnings is fine. Invalid XHTML, less so.
> Do they want their website to be completely broken because they forgot to convert some of their text from Latin-1 to UTF-8 before pasting it into the document?
The encoding of the content has nothing to do with the document markup.
> Moreover, strictness can backfire when you have such a diversity of implementations.
On the contrary: this prevents subtle bugs in the interpretation of invalid data by different implementations.
>> Why push the complexity onto the user? Someone who just wants to make a working website doesn't care about your pedantry.
Because this approach has historically resulted in people who "just wanted to make a working website" making websites that only work in specific browsers (or worse yet, specific versions of specific browsers on specific platforms). And then those sites stuck around and infrastructure got built around them that made them hard to fix.
We've spent the best part of 00s fixing that mess, and there are still some pockets that haven't been properly cleaned up. If that's not a lesson to learn from, I don't know what is.
> Compare that to JavaScript, which will happily fail if you use new syntax or a missing function, and thus web pages which rely on JS often show up as just a full screen of white when something goes wrong, which it frequently does.
Isn't that more due to failure to handle exceptions and display errors to users?
XHTML failed because doing the old, broken, tag-soupy mess still worked exactly as well from the user perspective. You just can't get people to work harder for invisible benefits. In a sense, it's a reason, but it doesn't mean that tag soup is a good thing.
I often wonder how different the internet would be if Postel's prescription never gained traction and fail-fast behavior were the norm instead.
Also because all of the tooling was terrible and extensibility was non-existent. It's easy to imagine a world where XHTML worked out better because the browsers provided clear, informative errors rather than a blank page, someone at the W3C cared enough to have a usable validator which produced helpful warnings and errors, and attention was paid to the less friendly bits of the XML ecosystem[1].
Instead, it felt like we had a bunch of people who though that big lofty standards were so obviously correct that everyone else would take care of those boring implementation details, and 99.9999% of web developers correctly realized that there was very little measurable downside to sticking with something which was known to work.
1. Simple examples: namespacing is a good idea but it leads to gratuitous toil in most tools – e.g. a valid XML document which has <foo> should just work if you write a selector for /foo, as present in the document, rather than requiring you to do kludgey things like have to lard up every parser registering the same namespaces which are also declared in the document and writing fully-qualified selectors like /mychosenprefix:foo or /{http://example.org/fooschema/1.0}foo for every tag, every time.
Similarly, getting XPath 2.0 support to actually ship in enough tools to be usable would have made one of the better selling points for using XML actually exist as far as the average working programmer is concerned.
Writing proper XML tools is very difficult. XMLs are usually parsed without considering the dtd. It seems the dtds are just for humans to read. As you mention, XML becomes more interesting with extensibility. IMHO XSLT is the key technology for that but unfortunately there is no reasonable support for XSLT 2 because the standard is just too complex. And XSLT 1 is barely interesting to use.
By the way, there's SLAX, it's isomorphic to XSLT but with nicer syntax. Nice approach but anyways, the standards are horribly complex and stiff.
The main lesson I've learned from this is that a spec is far more likely to be successful if it's paired with at least one working implementation and more than trivial test data. Writing a validator is both important for adoption and perhaps more so for flushing out parts of the spec which are too hard to implement or annoying to work with.
Maybe it never would have taken off because many more people would get frustrated trying to make something show up on the screen and give up. Or get frustrated trying to make tools that produced something that all of the browsers would display no matter what weird things the users did.
Maybe it would be so bad that somebody else would make a new, more permissive standard that took off instead.
Maybe all of that has already happened.
Maybe that line from Battlestar Galactica was right - All of this has happened before; all of this will happen again.
How would you have prevented huge schisms during the IE push?
In a world where invalid HTML documents aren't rendered at all we could have had the evolution of the format dictated by Microsoft because of their market position.
So you are one from the XHTML2 camp then? Good we got HTML5 and good that the weird years of transition to XHTML 1 and unclear vision with XHTML2 and ECMAScript for XML (E4X) are long gone.
Yeah, imagine if processors gave best effort to processing binaries... what could possibly go wrong :/
The decision to allow this was made early and the liberal accept/strict transmit paradigm has in general made the web a mess.
On the plus side, the consistent failure of browser vendors to apply strict controls to input means that as an application security person I will probably never be out of work :D Even though this pattern of behavior is starting to change, legacy support means that I will still be dealing with these issues well into my retirement!
Malformed HTML? None. Browsers attempting to be lenient in what they accept? Loads.
To take this article as an example, according to the HTTP specification, the `Content-Type` header is supposed to have the final say in what media type is being served. Internet Explorer decided it would be better to use heuristics. I think the idea was that if a web host was misconfigured, rather than have the web developer fix their bug, it would try to guess its way out of the error.
Which kinda worked. The problem was, it opened it up to abuse. If you had a web host that allowed untrusted people to upload images (e.g. profile photos), you could construct an image that tricked Internet Explorer into thinking that it was an HTML document, even if the server explicitly told clients that it was an image. The main difference between images and HTML, of course, is that HTML can contain JavaScript, which would now execute in the security context of your web page.
So all of these web hosts, thinking they were only giving people the ability to upload images, were now letting people execute JavaScript on their domain – simply because Internet Explorer tried to be lenient.
The workaround ended up being forcing downloads with `Content-Disposition` headers instead of displaying inline. That's why, for example, visiting the URL of an image on Blogger directly triggers a download instead of showing the image.
Other examples that spring to mind:
Netscape interpreting certain Unicode characters as less than signs. People were correctly escaping `<` as `<` but the Unicode characters slipped through and caused XSS vulnerabilities in that browser.
Browsers ignoring newlines in pseudo-protocols. Want to strip `href="javascript:…"` out of comments? No problem… except some browsers also executed JavaScript when an attacker placed a newline anywhere within the `javascript` token.
Being lenient in what you accept has caused security vulnerabilities over and over again and there's no reason to think that it will stop now.
> To take this article as an example, according to the HTTP specification, the `Content-Type` header is supposed to have the final say in what media type is being served. Internet Explorer decided it would be better to use heuristics. I think the idea was that if a web host was misconfigured, rather than have the web developer fix their bug, it would try to guess its way out of the error.
Which kinda worked. The problem was, it opened it up to abuse. If you had a web host that allowed untrusted people to upload images (e.g. profile photos), you could construct an image that tricked Internet Explorer into thinking that it was an HTML document, even if the server explicitly told clients that it was an image. The main difference between images and HTML, of course, is that HTML can contain JavaScript, which would now execute in the security context of your web page.
So all of these web hosts, thinking they were only giving people the ability to upload images, were now letting people execute JavaScript on their domain – simply because Internet Explorer tried to be lenient.
This is an interesting example, though I think this is a fair bit different than being lenient on HTML interpretation.
The topic was strict HTML. Accepting malformed HTML doesn't seem to pose much of a problem. Blindly executing a non-executable file seems like a much different problem.
> Netscape interpreting certain Unicode characters as less than signs. People were correctly escaping `<` as `<` but the Unicode characters slipped through and caused XSS vulnerabilities in that browser.
This doesn't sound like being lenient. This just sounds like a bug.
> Browsers ignoring newlines in pseudo-protocols. Want to strip `href="javascript:…"` out of comments? No problem… except some browsers also executed JavaScript when an attacker placed a newline anywhere within the `javascript` token.
Huh? I don't understand the scenario being described here. It again sounds like a bug rather than lenient acceptance of data, though.
> I think this is a fair bit different than being lenient on HTML interpretation.
It's not. There are two areas where the leniency was a problem here. Firstly, the leniency in rendering one media type as a completely different media type because the browser heuristic thought it was being lenient. Secondly, the leniency in parsing HTML out of an image file – you can't do that with valid HTML.
> Accepting malformed HTML doesn't seem to pose much of a problem.
I've literally just given three specific examples of it causing security vulnerabilities.
> This doesn't sound like being lenient. This just sounds like a bug.
No, it was intentional. It was specifically Unicode characters that looked like less than and greater than signs, but weren't.
> I don't understand the scenario being described here.
Somebody noticed that href="java\nscript:…" wasn't being parsed as JavaScript, and it was causing some malformed pages to fail to work properly. Rather than let it fail, they tried to fix it by stripping out the whitespace, and caused a security vulnerability.
If these three examples aren't enough, take a look at OWASP's XSS filter evasion cheat sheet. There's plenty of examples in there of lenient parsing causing security problems:
> It's not. There are two areas where the leniency was a problem here. Firstly, the leniency in rendering one media type as a completely different media type because the browser heuristic thought it was being lenient. Secondly, the leniency in parsing HTML out of an image file – you can't do that with valid HTML.
I think you can argue the first is a problem. You have an example demonstrating as much. Arguing that the second is a problem is much harder. Lenient HTML acceptance been hugely advantageous to the adoption of the web. There may have been some issues from this, but it's valuable enough that the effort to "fix" it was abandoned and the W3C and WHATWG returned to codifying what leniency should look like.
> I've literally just given three specific examples of it causing security vulnerabilities.
Well, at least one example. Coercing a file served as an image to HTML isn't an issue of accepting malformed HTML, nor would I agree that the JS example is a problem with leniency.
> No, it was intentional. It was specifically Unicode characters that looked like less than and greater than signs, but weren't.
Okay, I reread your last comment. I initially thought you were saying that Netscape was treating '<' as '<'. So Netscape decided to treat some random unicode chars that happen to look kind of like the less-than symbol (left angle bracket: ⟨, maybe?) as if they're the same as the less-than symbol? This seems amazingly short-sighted and pointless. How was this issue not seen, and was this even solving a problem for someone?
> Somebody noticed that href="java\nscript:…" wasn't being parsed as JavaScript, and it was causing some malformed pages to fail to work properly. Rather than let it fail, they tried to fix it by stripping out the whitespace, and caused a security vulnerability.
So the issue here is incompetent input sanitization. I don't think the browsers being lenient here is the issue.
A few of these are interesting in the context of browsers being lenient. e.g. This one requires lenience as well as poor filtering:
<IMG """><SCRIPT>alert("XSS")</SCRIPT>">
Most of these are just examples of incompetence in filtering, though, and a great example of why you 1) should not roll your own XSS filter if you can avoid it, and 2) why you should aggressively filter everything not explicitly acceptable instead of trying to filter out problematic text.
> Arguing that the second is a problem is much harder. Lenient HTML acceptance been hugely advantageous to the adoption of the web.
Wait, that's a completely different point. The argument here is that it caused a security vulnerability, and it did. If the lenient HTML parser didn't try to salvage HTML out of what is most certainly not valid HTML, then it wouldn't be a security vulnerability.
> This seems amazingly short-sighted and pointless. How was this issue not seen, and was this even solving a problem for someone?
You could ask the same of most cases of lenient HTML parsing. It's amazing the lengths browser vendors have gone to to turn junk into something they can render.
> So the issue here is incompetent input sanitisation.
No, it isn't. That code should not execute JavaScript. The real issue is that sanitising code is an extremely error-prone endeavour because of browser leniency – because you don't just have to sanitise dangerous code, you also have to sanitise code that should be safe, but is actually dangerous because some browser somewhere is really keen to automatically adjust safe code into potentially dangerous code.
Take the Netscape less than sign handling. No sane developer would think to "sanitise" what is supposed to be a completely harmless Unicode character. It should make it through any whitelist you would put together. Even extremely thorough sanitisation routines that have been worked on for years would miss that. It became dangerous through an undocumented, crazy workaround some idiot at Netscape thought of because he wanted to be lenient and parse what must have been a very broken set of HTML documents.
This is not a problem with incompetent sanitisation. It's a problem with leniency.
You have some compelling examples of problems from leniency. I think in some cases the issues are definitely magnified by other poor designs (bad escaping/filtering) but you've demonstrated that well-intentioned leniency can encourage and even directly cause bugs.
Malformed HTML may escape sanitization on input in a vulnerable web app, and still render on the victim's browser because their browser wants to be helpful.
(Yes, the output should have been escaped, but that is sadly not always the case)
I don't see how this has anything to do with malformed HTML or lenient rendering rules. In the scenario you're describing, well-formed but malicious HTML could also escape sanitization.
Which are the faults of the authors. No one expects malformed source code to compile, a video with corrupted headers to play properly or a binary containing invalid instructions not to crash. This decision allowed people to get away with broken web pages instead of forcing them to fix their mistakes.
It's a better web environment than it was years ago, but there was a time that even simple layouts required invalid code to display similarly across the browsers of the time.
The web would not have been as successful if it wasn't for this leniency. Full correctness is only worthwhile if both attaining it does not excessively harm other goals and the cost of not attaining it is severe enough.
The cost of a malformatted HTML document rendering despite the errors is not that severe compared to the benefits it provides, as we have seen.
Well of course if correctness is not a requirement then it's not being paid attention to. That doesn't in any way indicate that it was a good decision not to require it in the first place.
If it had been made as difficult as possible for enthusiasts learning a markup language to get what was essentially text document to actually display anything, it's probably not too much of an exaggeration to say the World Wide Web wouldn't have existed in its current form.
It's not as if many of the web's security holes are related to whether a page displays valid HTML markup or not.
I don't think the assumption is justified that enforcing well-formed HTML documents would have been a significant barrier. To me it actually seems easier to have a few simple and strictly enforced rules than having a more or less random assortment of exception to save a handful of key strokes.
Ultimately it's less about saving keystrokes and more about amateur enthusiasts having the opportunity to start with the browser rendering their unformatted document rather than an "Error at line 1" warning, and changes they introduce being considerably less likely to break the entire page
XHTML Strict only solved one set of leniencies in the web platform, which were also the least important kinds. Malformed HTML makes writing browsers complicated but doesn't generally seem to cause security issues or other visible, obvious, must-fix-now problems.
The real security due to leniency problems in the web platform revolve around the handling of data and how JavaScript works, which were not addressed by XHTML. In that sense it's not surprising it went nowhere.
Right-clicking on the image and selecting "View Image" (Firefox), or "Open image in a new tab" (Chromium), gives the webpage, not the image. I can see why that happens: the menu items just open a URL and don't force it to be an image. However, it was a bit disorienting.
I'm surprised nobody in this thread has mentioned PICO-8, a "virtual console" which compresses its "cartridges" in the form of a PNG file. When viewed in a browser or on a computer the file is displayed as a neat stylised image of a cartridge with a description, box art etc but when opened in the PICO-8 executable reveals all of the game code, art and music/sound assets in fully uncompressed editable form. The cartridges can be shared freely on sites that leave the original file intact without re-compressing. Nifty!
<xmp> is great when you absolutely, positively, do not want any entities rendered under any circumstances. It's unfortunate that it's being deprecated, since it has its uses.
> <xmp> is great when you absolutely, positively, do not want any entities rendered under any circumstances. It's unfortunate that it's being deprecated, since it has its uses.
<![CDATA[ here &entities; or <angle|<brackets>> will not interpreted ]]>
There is no need for special-casing xmp, when SGML and XML already define CDATA escapes.
"User agents must treat xmp elements in a manner equivalent to pre elements in terms of semantics and for purposes of rendering. (The parser has special behaviour for this element though.)" — https://html.spec.whatwg.org/multipage/obsolete.html#require...
I was just going off what the MDN page said about it being removed in the HTML5 standard. It looks like WHATWG just has a "living standard" and W3C still uses the versioning, so it's probably removed from W3C standards. I'm not too familiar with the reality of these standards.
Not to be rude, but in the USA (where SWIFT or bank wire transfers can be expensive) an email address as a recipient of an online fund transfer is a pretty common; ie: paypal, venmo, chase quickpay
now specifically in this case, lcamtuf (at google security) is joking and doesn't want your money.
this hack is actually pretty crazy - an arbitrary HTML / jpeg polyglot file that fooled a browser could be used for js injection, say from a site that allowed jpeg file uploads, and validated mime type.
The way we protected ourselves against it at <earlier company> (since we allowed image uploads at a variety of locations) was to decode and recode the image before storing and strip out comments.
He\she isn't actually expecting payment, just as they don't really think that the trick is "pretty radical." It's just a mildly amusing way of providing contact information.
So in theory, can analytics platforms be compromised so that JPEG tracking pixels could turn into full-fledged sites interfering with the parent page at, say, a bank website? Firing off credentials in the background?
The image reference tag is for an image. As stated previously, if you look at the JPEG itself, it starts off with a JPEG comment, which embeds the entire html block, then starts a comment block for the remainder of the JPEG data. Browsers are very liberal in what they accept, so that initial 20-byte header is ignored, although you can see it if you inspect the page's elements.
Yes, I get that - but if a tracking pixel is downloaded and interpreted as a jpeg, then it will parse anything in the COM section as a comment, and not execute anything in it, unless there was some sort of vulnerability in the JPEG implementation
Note that the actual img tag has src="#" so when you are looking at the file opened locally, the image is also from local disk, not from his server, so it's legit.
However, the fact that it needs to be identified as a HTML file by the server implies to me that the idea others had ITT of abusing image hosting sites using this trick probably won't work.
He placed HTML inside a COM (comment) segment of the JPEG, which is perfectly legal. The ending of his HTML is "<!--" which starts an HTML comment, telling the browser to ignore all the data which follows (the image data). Since browsers are liberal in what they accept, they ignore the first 20ish bytes of the JPEG header, see the starting HTML tag, render the page, and ignore the fact that the comment never closes.
Same here with file(1) version 5.22 from the Debian jessie package repo. I'd be interested to know in which versions this kind of thing actually works.
My first guess would be "the MacOS one", since from what I have heard, MacOS tends to have older (sometimes much older) versions of basic system utilities. I don't have a MacOS machine nearby to check, so this is just a guess.
I appreciate the technical trickery in this version, but has it not been possible to do this since at least 1996[1] by having the server serve different files based on the "Accept" http header?
That's for the initial resource itself, not subsequent asset requests. For instance, an asset request is a separate request on a different domain, such as one for a customer service analytics tag, which is requested without any accept resource type request filter.
So if the initial request comes with "Accept: text/html" then you serve a webpage. The request when linked as img comes as "Accept: /" and you serve an image.
Perhaps caching or other schemes would break this, I'll try to knock up a PoC.
JPEG images allow comments. The html is stored in the comment. Browsers assume they are reading an HTML file and loads the comment. When the browser thinks it is reading an image, like in an <img> tag, it reads the image ignoring the comment.
great hack, could you get javascript working inside a jpeg as well? Or obfuscate the javascript and decrypt in the browser for steganographic purposes?
There's an important note in that the HTML is not at the true start of the JPEG file, it's slightly after. You can even view some of the JPEG format bytes if you view source of the HTML.
So if the browser ignores some of the JPEG file, why not most of the PNG file? Perhaps you run the risk of some random byte screwing up the HTML though.. not sure.
Oh, you're right. I was thinking of the combined zip/png, and it's the zip file that has the header at the bottom, so my previous comment is completely wrong. The article seems to be adding the HTML in the EXIF data (thus making it a completely valid JPEG) and the browser tries to be very accommodating in what it accepts, thus ignoring the junk data (or what it thinks is junk data) at the start of the HTML file.
Whether JS would work or not depends on how much the browser tries to recover from errors there. I would guess not much, but what would you gain from a combined JS/JPEG file anyway?
You could do the same thing with a .wav file, embedding the HTML after the data sub-chunk. Adobe Audition uses this method to embed application-specific metadata for the file (marker and sub-marker locations, for example).
Nice trick! I'm interested in encoding other kinds of data in images in order to share apps/music/etc between mobile devices.
Unfortunately, your HTML gets lost I try to save the picture on my phone.
iPhone > Safari > Save to Camera Roll
Copy off the picture; it's been re-encoded and has no JPG comment field.
Any solutions to this are welcome! The re-encoding makes patterns impossible to decode as well (they degrade after being shared a couple of times). See Cemetech's jsTified emulator for an example of a ROM file as a JPG - he uses B/W and it still requires the file to be synced from a computer, not saved from the browser.
Well this will certainly appeal to Steganography enthusiasts and perverts who have clumsily been loafing around .onion sites for years and who now finally have a way to share content in the clear. And of course the NSA, FBI and CIA are suddenly stuck trying to figure out why this goofy squirrel is so popular in Yemen.
Just what I was thinking. It doesn't actually prevent the user from downloading the image, just makes them think that they failed to download the image (since it saves with a .htm extension).
The HTML file could be one that admonishes the user for attempting to scrape the file, all the while the file they wanted is sitting right there. A modern day Purloined Letter.
I just right clicked on the image and saved it in Firefox and it gave me a .jpg. I was still able to change it into a working .html file by renaming it. I imagine some systems for downloading would get it messed up, but Firefox at least treats it as a JPEG when you're interacting with the image tag.
It looks like Safari is at least, and it even de-duplicates it in the Web Inspector so it only lists a single resource (which gets listed as "type: image"
Data URIs are useful for embedding resources in a page (e.g. a single HTML file containing all of its own CSS, JS, images, etc.)
This is different: it's a single file which can be parsed as either a HTML page or a JPEG. Hence, when a program expects a HTML page (like a browser loading a Web page), it will be parsed and displayed as a HTML page. When a program expects a JPEG file (like a browser loading the "src" of an "img" element) it will be parsed and displayed as a JPEG.
The trick is to use each format's comment syntax to hide the other format. Not sure if the HTTP headers need to be set differently for each request or not.
To stop the place becoming like reddit, where all insightful comments are buried under a torrent of jokes. I've had my fingers burnt a few times and then sulked for a few days after one of my many hilarious quips was down-voted into oblivion.
My comment stays at 12 upvotes. Certainly nobody would want jokes to surpass serious comments, but this is not achieved by killing every single joke. Also, there's another joke on that thread.
Sth. that one needs to get used to on HN I think... Randomly picking on arguably off-topic and arguably offending posts proactively. Also, I'm shocked seeing that the mod here has so low karma and so short history of HN usage.
The system is far from perfect, but I've never got the impression that the rules are applied malevolently. I wouldn't like the job of moderating comments, it's really hard and ironically people tend to have a zero tolerance policy if they feel hard done to by an error in judgement on behalf of the mods.
A quick search for "striped ground squirrel" turned up only one variety, the thirteen-lined ground squirrel. The animal in the photo has stripes like a chipmunk.
Look for stripes on the face: without them, it's probably a squirrel. Both ground squirrels and chipmunks have body stripes, so the face is a better indicator.
My University's (Minnesota) mascot is a gopher, but the physical mascot is a chipmunk/squirrel (debated) name Goldy Gopher - but it's definitely not a gopher. These rodents can be confusing!
It's both! That's the whole point - the same sequence of bytes can be interpreted as either an HTML web page or a JPEG image, and the page itself demonstrates both at the same time.
This is, at present, the most efficient way to pack demos on the web; a few characters of uncompressed bootstrap code, then the rest is deflated.