Sir, there's a cat in your mirror dimension

woopsn · 2024-05-14T18:41:38 1715712098

In most photos with a recognizable subject, spectral energy will be concentrated around the origin (the upper left corner) as it is here

https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_pr...

The same is true for the DCT of the woman. Meanwhile, the subject of a photo is typically located towards the frame's center. This helps minimize interference between the space and frequency domain data in the composite, thus preserving kitty's expression when the transform is inverted

https://substackcdn.com/image/fetch/w_1456,c_limit,f_webp,q_...

(and vice versa for the woman)

jameshart · 2024-05-14T19:25:40 1715714740

And this is only true of the DCT - 2D Fourier transforms of images usually concentrate the data near the center of the image.

kragen · 2024-05-14T23:24:36 1715729076

that's sort of true and sort of false. here the origin is plotted in the upper-left-hand corner, and in the 2d fft images you're used to looking at, it's plotted in the center instead. but you can plot the dct that way too, so it's sort of false

it's sort of true in that if you plot the standard 2d fft in this coordinate system, the data will be concentrated not in one corner of the image but in all four of them. the dct really is unusual in putting all the low-frequency stuff at positive frequencies instead of equally at positive and negative frequencies

nico · 2024-05-14T20:57:35 1715720255

It makes me think how the lens of the camera, is focusing the light/image at the center of the sensor, so it would make sense that data is also denser at the center, where the lens concentrated more light

jameshart · 2024-05-14T22:13:56 1715724836

So… I think you’re a bit confused about how lenses work and what they do (they don’t focus all the light into the middle, they focus light from one plane onto another one. They only focus light from the center of the frame onto the center of the image - that’s why it’s an image)

But… there is something interesting about what ‘focusing’ looks like in the frequency domain, and the difference between the frequency-space-transform of a sharply focused image and a blurred image - or of the same image focused at different focal planes - shows up as a predictable transformation in the frequency space; which means you can apply transformations in frequency space that cause focus changes in the image domain like a lens does.

kragen · 2024-05-14T23:30:20 1715729420

your first paragraph is completely wrong. the lens concentrates collimated light parallel to its axis at its focal point, regardless of where it falls on the lens. (and, strictly speaking, only at a single wavelength.) collimated light coming from near-axial directions gets focused more or less to a point on more or less the focal plane. but light at a single point doesn't have a direction, being a wave. there is in fact a very profound connection between the action of a lens and the 2d fft; see my sibling comment for more details

your second paragraph is correct, and it is a special case of the convolution theorem; see https://en.wikipedia.org/wiki/Fourier_optics#The_2D_convolut...

jameshart · 2024-05-15T00:12:23 1715731943

I don't think the idea that (idealized, camera) lenses focus light from distinct points in one plane (or at infinity) onto distinct points in another plane is 'completely wrong', but I'm open to being educated on my error.

A lens focuses light parallel to its axis onto its focal point; it focuses parallel light coming in off-axis to other points on the focal plane.

Alternatively, and equivalently, it focuses divergent light coming from common points on planes closer than infinity, onto matching points on other planes behind its focal plane.

smallnamespace · 2024-05-15T03:46:25 1715744785

Lenses bring parallel rays of light (alternatively, light from infinitely far away) to the focal point. They don’t bring idealized points to points.

One consequence is you can’t use lenses to bring anything to a temperature higher than the temperature of the source light. For example you can’t use lenses + moonlight to light things on fire.

Here’s an HN thread going into the physics of it: https://news.ycombinator.com/item?id=18736700

kragen · 2024-05-14T23:28:50 1715729330

yes, as it happens, the image on the focal plane of the camera resulting from light coming from a particular direction is in fact the 2d fourier transform of the spatial distribution of that light at the lens. this property has been used to build optical-computing military machine vision systems using spatial light modulators since the 01980s, because of some other useful properties of the fourier transform, that spatial shifts become phase shifts, so you can look for a target image everywhere in an image at once. as far as i know, these systems have never made it past the prototype stage

see https://en.wikipedia.org/wiki/Fourier_optics#Fourier_transfo...

nico · 2024-05-15T15:28:13 1715786893

> is in fact the 2d fourier transform of the spatial distribution of that light at the lens. this property has been used to build optical-computing military machine vision systems

Amazing. Do you have any links/references about those systems and how they should work in theory?

kragen · 2024-05-15T19:26:48 1715801208

yes, see the link above

meindnoch · 2024-05-14T22:27:17 1715725637

What are you talking about?

whimsicalism · 2024-05-14T19:47:30 1715716050

its the same fundamental effect

emeraldd · 2024-05-14T20:21:03 1715718063

Considering the title of the article, this comment had me thinking of some supernatural. Took me way to long to realize what it was talking about...

woopsn · 2024-05-14T21:13:08 1715721188

Clarifying, the specter of a hidden animal will usually take the form of a diffuse sparkle or blur, typically hovering off to the person's side and somewhat above them, and as a result when carried through to the "other side" cannot possess what remains of the person in that domain (because they are returned to the origin in turn).

toast0 · 2024-05-14T17:49:38 1715708978

I'm a little bit slow with all this stuff, can somebody confirm this is the process:

a) take photo of woman and photo of cat

b) DCT cat into the frequency domain

c) composite the frequency domain cat into the visual image of the woman

d) if you DCT the composite image, you get the cat back? (or more specifically, you get the visual cat and the frequency domain woman composited; but the visual cat dominates)

woopsn · 2024-05-14T18:03:20 1715709800

Yep, that's it.

wwilim · 2024-05-15T11:56:23 1715774183

Does that mean DCT(DCT(image)) == image?

jcul · 2024-05-15T16:22:48 1715790168

Yeah, that's it.

rmnclmnt · 2024-05-14T20:23:47 1715718227

From what I remember from some student project many years ago, this technique is the basis for robust digital watermarking for any kind of signals, be it images or audio.

Of course the main application is to detect copyrighted material even after signals being heavily processed (e.g. ripped or cam’d movies, provided by JPEG-2000).

If anyone in the movie industry can provide some more technical details, I’m all ears!

phyzome · 2024-05-15T03:15:07 1715742907

I once tested a watermarking system (Digimarc?) and found that while it was robust against all sorts of noise and scaling, it failed with even a 1% rotation of the image. I wonder if it was a Fourier Transform based algorithm.

kiernanmcgowan · 2024-05-14T17:25:15 1715707515

A great example of the time-frequency (or space-frequency in this case) duality of fourier transforms. The math of the FT doesn't care about the "direction" that your going for the transform, so function that look similar in time/frequency will have similar FT in the frequency/time space.

In this case, embedding the frequency plot of the cat in the space plot of the women means that the FT of the women will cause the cat to appear, and vis-versa.

ryandrake · 2024-05-14T17:56:45 1715709405

It's a very cool and interesting steganographic application! Want to hide an illicit image inside an innocent image? Just convert it to frequency domain and composite it onto the other image. As long as the viewer knows how to transform it back, you have a covert way to send images that is potentially hard to detect.

bee_rider · 2024-05-14T21:34:16 1715722456

It would be hard to detect if the other party didn’t know what to look for, but easy if they did.

If you combined your hidden image with a one-time-pad it should be indistinguishable from noise, right? And noise would be expected in a lossily compressed image. I wonder if anyone has done that. It seems like we’d probably never know unless they told us!

cvwright · 2024-05-14T23:24:25 1715729065

There were worries after 9/11 that terrorists were using stego to plot attacks, posting their messages “hidden in plain sight” inside images on public websites.

Someone (Niels Provos?) did a pretty thorough search and analysis of images on eBay and came up with nothing. Apparently it was just post-9/11 paranoia.

jontutcher · 2024-05-14T22:32:17 1715725937

A similar fun trick was used by Aphex Twin (and others) to make a weird face appear in the audio spectrogram of one of his tracks: https://news.ycombinator.com/item?id=8509105

TheOtherHobbes · 2024-05-15T09:04:36 1715763876

MetaSynth has been around since the late 90s and combines time (samples) and frequency (image) transforms of audio with Photoshop-style filters of the images.

https://uisoftware.com/metasynth/

sim7c00 · 2024-05-14T22:36:46 1715726206

love this, venetian snares too. thanks for confirming haha, i wasnt sure how they did it! cool memories =) thx! didnt know which one it was from aphex twin. these guys are magicians :D

jessetemp · 2024-05-14T17:30:42 1715707842

That post just gets better all the way through.

I can't believe I never realized the frequency domain can be used for image compression. It's so obvious after seeing it. Is that how most image compression algorithms work? Just wipe out the quieter parts of the frequency domain?

abetusk · 2024-05-14T17:43:50 1715708630

Yep, this is how both MP3 (and Ogg-Vorbis) and JPEG all work. Picking the weights for which frequencies to keep is, presumably, chosen based on some psychoacoustic model but the coarse description is literally throwing away high order frequency information.

dylan604 · 2024-05-14T20:11:41 1715717501

> chosen based on some psychoacoustic model

Does audio encoding use a similar method of using matrices to pick which frequencies get thrown away? Some video encoders allow you to change the matrices so you can tweak them based on content.

astrange · 2024-05-14T20:44:17 1715719457

Audio is one dimensional, so it doesn't use matrices but just arrays (called subbands).

And you can't get too hard into psychoacoustic coding, because people will play compressed audio through all kinds of speakers or EQs that will unhide everything you tried to hide with the psychoacoustics. But yes, it's similar.

(IIRC, the #1 mp3 encoder LAME was mostly tuned by listening to it on laptop speakers.)

dylan604 · 2024-05-14T21:55:52 1715723752

I know one mix studio that has a large selection of monitors to listen to a mix through ranging from the highest of high end studio monitors, mid-level monitors, home bookshelf speakers, and even a collection of headphones and earbuds. So when you say "check it on whatever you have available", you have to be a bit more specific with this guy's setup

pushedx · 2024-05-14T19:03:01 1715713381

DCT is also often used as a substep in more complex image (or video) compression algorithms. That is, first identify some sub-area of the image with a lot of detail, then apply DCT to that sub-area and keep more of the spectrum, then do the same for other areas and keep more or less of the spectrum. This is where the quantization parameters that you have seen for video compression algorithms affect the behavior.

alexlarsson · 2024-05-14T17:53:06 1715709186

You don’t generally completely wipe the high frequencies, just encode it with less bits.

astrange · 2024-05-14T20:47:12 1715719632

Images are not truly bandlimited, which means they can't be perfectly represented in the frequency domain, so instead there's a compromise where smaller blocks of them are encoded with a mix of frequency domain and spatial domain predictors. But that's the biggest part of it, yes.

Most of the problem is sharp edges. These take an infinite number of frequencies to represent (= Nyquist theorem), so leaving some out gets you blurriness or ringing artifacts.

The other reason is that bandlimited signals infinitely repeat, but realistic images don't - whatever's on the left side of a photo doesn't necessarily predict anything about whatever's on the right side.

doetoe · 2024-05-15T06:14:36 1715753676

A real image not, but a digital image built up from pixels certainly is band limited. A sharp edge will require contributions from components across the whole spectrum that can be supported on a matrix the size of the image, the highest of which is actually called the Nyquist frequency

astrange · 2024-05-15T08:48:09 1715762889

Not quite. You can tell this isn't true because there are many common images (game graphics, text, pixel art) where upscaling them with a sinc filter obviously produces a visually "wrong" image (blurry or ringing etc), whereas you can reconstruct them at a higher resolution "as intended" with something nonlinear (nearest neighbor interpolation, OCR, emulator filters like scale2x). That means the image contains information that doesn't work like a bandlimited signal does.

You could say MIDI is sort of like that for audio but it's used a lot less often.

caf · 2024-05-15T00:02:10 1715731330

I thought the image transform was conceptually done on a grid of infinitely-repeating copies of the image in the spatial domain?

astrange · 2024-05-16T02:37:22 1715827042

Yes, or by extending the pixels on the edge out forever. The question is which one is more effective for compression; it turns out doing that for individual blocks rather than the entire image is better.

(With mirroring things could happen like the left edge of the image leaking into the right, and that'd be weird.)

rocqua · 2024-05-14T21:12:48 1715721168

How are images not bandlimited? They don't get brighter than 255, 255, 255 or darker than 0,0,0

astrange · 2024-05-14T21:33:50 1715722430

Bandlimited means limited in the frequency domain, not the spatial domain.

(Also, video is actually worse - only 16-235. Good thing there's HDR now.)

rocqua · 2024-05-14T21:15:01 1715721301

There is more to it. Often the idea isn't just that you throw away frequencies, but also that data with less variance is possible to encode more efficiently. And it's not just that high frequency info is noise, it also tends to be smaller magnitude.

jcul · 2024-05-15T16:24:58 1715790298

I remember seeing some video where they did a FT of an audio sample and then just used mspaint to remove some frequency component and transformed back to the audio / time domain.

Something along those lines anyway.

Animats · 2024-05-15T06:32:02 1715754722

JPEG 2000 is even weirder. That's a wavelet transform. If you truncate a JPEG 2000 file, you can still recover a lower resolution image. At some file length, the image goes to greyscale, as the color information disappears.

birracerveza · 2024-05-15T07:55:45 1715759745

How is that weird? That seems like a feature.

whimsicalism · 2024-05-14T18:49:27 1715712567

If the cat were more focused in the upper left, I don't think this demo would work as well. DCT will have lots of high magnitude low frequency components which will drown out the cat if it is near the top left.

lucianbr · 2024-05-14T18:57:53 1715713073

Also the fact that JPEG throws away a lot of data without us noticing is hardly a discovery, rather the stated purpose of the compression algorithm.

ziofill · 2024-05-14T21:09:29 1715720969

One interesting thing is that in the quantum description of position and frequency (i.e. position and momentum if you account for hbar), it is not possible to cram two different functions into one in this way because functions that differ by a position-dependent phase are different quantum states.

deadbabe · 2024-05-14T23:25:36 1715729136

Is there a way to reliably scrub an image of any possible hidden watermarks that can be created like this?

tsukikage · 2024-05-15T10:18:25 1715768305

"Reliably" is a difficult word. If you understand how a specific watermark works, then yes, absolutely. If you want a fully general method that counters every possible thing you might come across... well. That's hard.

"Imperceptible" watermarks work by altering detail humans don't notice or pay attention to. So your scrubber would need to reliably remove or change all such detail. Removing such detail is absolutely something we can do - the article mentions one way, other commenters make other suggestions, and also lossy image compression in general works by losing exactly such details from the compressed image so there's that as well.

But /reliably/ get rid of /everything/, so you can be /completely certain/ no watermarks encoded in ways imperceptible to a human can possibly be left, without knowledge of the specific watermarks you want to remove or at least a way to test for their presence? You're looking at some drastic technique, in the realm of "theoretically possible but impractical"; e.g. one way might be to hand the image to a human artist, commission them to paint a copy, scan that in and use that.

Note how, in the article, it's still possible to pick out the cat even as the jpeg compression level increases. If someone found a way to avoid encoding that information without degrading original image in ways noticeable to human observers, we'd all be all over that, because it would give us a way to make image files even smaller than we can now.

This is an active area of research, precisely because it is key to getting better compression for sound and video to better understand how humans perceive things, what they notice and what they do not, so that we can reliably avoid storing information that humans will not notice the absence of / changes to, while still storing everything humans do notice. It is possible that we will one day have a complete enough understanding of human perception to make some kind of general guarantees here. But that day is not today, and tomorrow doesn't look good either.

topherclay · 2024-05-15T04:48:31 1715748511

Of course. the first image of the blog post shows that you can "paint over" the largely unused area and you don't lose much of your original image. The hidden watermarks make use of this unused area so you can just paint over that area with blank data in order to "scrub" any hidden watermarks.

forgotusername6 · 2024-05-15T06:34:32 1715754872

I'm pretty sure you could also layer the cat noise evenly over the image without significantly damaging the woman. The DCT puts all the importantly information top left, but there is nothing stopping you adding a step to distribute that information across the whole image, or using another transform that didn't have the same concentration effect

topherclay · 2024-05-15T07:53:40 1715759620

Well sure, but whatever extra step you use to encode you will also need as an extra step to decode.

readthenotes1 · 2024-05-14T17:56:25 1715709385

This article makes the case that "steganography" should be renamed to "catography"

tzot · 2024-05-14T19:39:09 1715715549

“Stegatography” can be an even more appropriate choice if you speak one of the languages born around the European side of the Mediterranean sea.

selimthegrim · 2024-05-14T21:29:31 1715722171

https://stegato.com/

hi-v-rocknroll · 2024-05-14T22:48:58 1715726938

It's clearly a startup in Los Gatos. Herding is the premium add-on.

xuhu · 2024-05-14T22:14:19 1715724859

How is the DCT of the two images done here exactly ? Clearly 8x8 tiles like in JPEG are not used, otherwise the similar blurry background tiles would still look similar in the DCT composite. Are the 2D DCT basis functions not a thing in this case ?

doetoe · 2024-05-15T06:21:19 1715754079

The 8x8 is just a choice made for jpeg, the DCT can be done for any m x n array (or m x n x k x ...). Here the full image is transformed

baobabKoodaa · 2024-05-14T21:52:59 1715723579

Can someone please ELI5 for me?

I don't understand how the cat is encoded in the image that has both woman and cat. I assume the visible pixels are in some way slightly altered to encode the cat?

GloriousKoji · 2024-05-15T00:14:16 1715732056

There's a magical math operation called DCT (discreet cosine transformation) which can turns things into dust (frequency domain) and back (spatial domain). So you DCT a woman and you get woman-dust. If you DCT the woman-dust you get the woman back.

So what you do is DCT a cat to get cat-dust and sprinkle it on the woman. It's hard to see the cat-dust but if you look really closely you can see it (upper left corner of the image). We now have a dusty woman.

Then you DCT the dusty woman and get a dusty cat! Look in the upper left and you can see the woman-dust. Apply the DCT again to this image and we're back to the dusty woman.

Just apply DCT all day long to swap between a dusty cat and a dusty woman!

You must be wondering why does this work? It's due to the properties of dust and human perception. When we DCT the woman and cat you'll notice most of the dust is in the upper left corner. That's where all the heavy dust is. It's fine to lose the lighter dust further out or even add more dust out there since most of the weight is in the upper left, the DCT will get you close enough.

pwg · 2024-05-14T22:13:57 1715724837

@toast0 answered your question here: https://news.ycombinator.com/item?id=40357927

baobabKoodaa · 2024-05-14T23:19:12 1715728752

"DCT cat into the frequency domain" is not really a "ELI5" level explanation.

sim7c00 · 2024-05-14T22:32:09 1715725929

i know nothing of this stuff, but it reminds me of aphex twin and venetian snares encoding images into their sounds. is that a similar thing somehow? i thinknfor venetian snares the track was something like song for my cat. if you'd use certain tools, the frequencies would show a picture of a cat.

edit: venetian snares was an album, songs about my cats. you can find it on youtube, unsure if i can link it.

tiagod · 2024-05-14T23:04:32 1715727872

Yes it's a similar thing, as mentioned in the article. The spectrogram is the frequency-domain, your wave file is the time domain.

kwhitefoot · 2024-05-15T17:57:44 1715795864

Does anyone know how to do this in Octave or some other free software rather than Matlab?

kwhitefoot · 2024-05-15T20:52:28 1715806348

In case anyone else wants to know the solution is:

>> woman = imread("~/Downloads/woman-with-cat.png"); >> colormap('gray'); >> imagesc(woman, [0 255]); >> pkg load signal >> cat = dct2(woman); >> pkg load image >> imagesc(imsmooth(cat, "Gaussian", 1), [-4 4]);

dct2 is unimplemented in the image package but exists in the signal package.

gjm11 · 2024-05-15T00:55:51 1715734551

I wonder whether this was inspired by https://xkcd.com/26/. (I'm guessing not since there's no mention of it. But it's a nice coincidence.)

dookahku · 2024-05-14T21:02:19 1715720539

this reminds me of Hough transformations.

ActorNightly · 2024-05-15T08:59:10 1715763550

>In MATLAB, you can do the following:

Why