I would assume that any image reformatting or exif stripping by online platforms...

olliej · on March 18, 2023

This isn't an exif issue.

This isn't a metadata issue.

An underlying IO library changed its behavior so that instead of truncating a file when opened with the "w" mode (as fopen and similar have always done, and this API did originally), it left the old data there. If the edited image is smaller than the original file, then the tail of the original image is left in the file. There is enough information to just directly decompress that data and so recover the pixel data from the end of the image.

You're not necessarily recovering the edited image data, just whatever happens to be at the end of the image. If you are unlucky (or lucky depending on PoV) the trailing data still contains the pixel data from the original image - in principle the risk is proportional to how close to the bottom right of the image the edits were (depending on image format).

ericpauley · on March 18, 2023

Not saying it is. Sensible exif stripping (re-serialization) also has the upside of removing trailing data, which would prevent this.

olliej · on March 19, 2023

No, the whole point is that with this bug is that more filtering or stripping would not fix/prevent it. The problem is not some kind of "trailing data in memory" issue.

The bug is you say "write to this file" which is meant to erase the existing file if such exists, but the underlying library either had a serious regression, or intentionally broke API compatibility, and changed the behavior to not erase the existing data. Your exif stripping + reserialization would write the new data down and the trailing data from the original file would still be present: e.g. exactly what is happening in this bug.

No amount of processing in memory, no amount of reserialization, no amount of data filtering prevents this bug. The bug occurs at the point of IO, because the IO is meant to have erased the original file, and it did not, so if you write fewer bytes to the destination file than were present in file being overwritten the tail of the overwritten file remains and is leaked.

To make it very clear that this is not an error in processing the image: if you opened "image1.png" (or whatever format), edit it, and then saved it over a different file that already exists, say "image2.png", and then send image2.png to someone, this bug will allow the recipient to extract the trailing data for the original image2.png, it would not show any information about the original image1.png.

ericpauley · on March 19, 2023

This is not the case when the exif stripping is happening at the service side (By online platforms, in my original comment). Yes, anything happening before save is useless because the trailing data is kept. But if a service (e.g., Facebook) then does exif stripping via re-serialization the trailing data is lost.

olliej · on March 20, 2023

Server side filtering isn't relevant. A user editing or removing things from their photo does not expect that data to exist on the image uploaded to a server.

creatonez · on March 21, 2023

The uppermost comment in this thread is making the suggestion to use server-side filtering just in case something goes wrong with end-user software. So that's why other commenters were using this assumption and ignoring the software itself.

Retr0id · on March 18, 2023

EXIF stripping won't necessarily catch it (but probably would in most instances - depends on how you do it), but reformatting or reencoding will.

ericpauley · on March 18, 2023

I’m guessing most exif stripping would deserialize the image and write a new file, so unless that has the same bug as this (overwriting the existing file without truncation), it ought to work?

jsheard · on March 18, 2023

Discord strips EXIF but the author was still able to unredact the images they'd posted there.

Some implementations of EXIF stripping might help, but it's not guarenteed.

Retr0id · on March 18, 2023

Discord doesn't strip EXIF from PNGs, only JPEGs

jsheard · on March 18, 2023

Seriously? What's the reasoning behind that?

Retr0id · on March 18, 2023

It's rare to see PNGs in-the-wild containing EXIF data, it's a feature that's only been in the spec since ~2017. I'm actually looking for one to double-check my statement about discord, but I can't find any.

Edit: I made my own. I can confirm that the exif chunk was not stripped. https://cdn.discordapp.com/attachments/541730746805649476/10...

rtldg · on March 20, 2023

That's interesting. I've seen a couple of rotated PNGs before which I assumed were caused by Discord stripping the EXIF and orientation data. Found a PNG like that without EXIF from May 2022 so I wonder if Discord stopped stripping or if it was stripped on the person's device somewhere.

Retr0id · on March 18, 2023

A naive approach to stripping EXIF from a PNG would be to parse up to the start of the first eXIf chunk, discard the contents of that chunk, and then include the rest of the file verbatim without actually parsing anything.

But yes, a more sensibly coded EXIF stripper would deserialise and reserialise. Unfortunately I am no longer able to assume that programmers will behave sensibly.

Edit: Also, the PNGs generated by Markup don't contain EXIF in the first place, so an EXIF stripper could reasonably decide that no changes are necessary at all.

ericpauley · on March 19, 2023

Does anyone take this “naive” approach in practice? Any good image sanitization I’ve seen is equivalent to taking a screenshot of the image, re-serializing pixel contents but ignoring anything else. Any reputable service (e.g., Gmail) must take this approach to prevent proliferation of possible image-based malware.

As you noted above Discord doesn’t sanitize PNGs. This exposes a failing on their end as well, as large services taking input from users should sanitize images to protect both senders and recipients.