Show HN: Small C Program to Convert Photos to Audio

cnvogel · on June 9, 2015

Aphex Twin had hidden spectrogram images in some of his tracks in the past, unfortunately I think they are not well preserved by multiple steps of compression/reencoding to youtube videos.

https://www.google.de/search?q=Aphex+Twin+Spectrograph&tbm=i...

https://youtu.be/M9xMuPWAZW8?t=5m31s

KeytarHero · on June 9, 2015

On Madeon's latest album, he hid his logo in the transition between two tracks: http://i.imgur.com/yPJTTs0.jpg

It sounds pretty cool too - you can hear the tones going up & down: http://www.youtube.com/watch?v=lz10F2Rtqv8&t=3m40s

daeken · on June 9, 2015

On the flipside of this, there's an interesting hack you can use to generate textures from audio, primarily for use in size-constrained demos. http://www.iquilezles.org/www/articles/gmdlsgfx/gmdlsgfx.htm

kylophone · on June 9, 2015

This turns an image into a PCM audio file. The image is visible on the spectrogram. For example, here's Ernest Hemmingway punting a beer can: http://imgur.com/QR5a8mw

jheriko · on June 9, 2015

+1 stb_image.h it uses a good interface for an image loading library, something nobody seems capable of even in the "good ol' days"... just a shame it doesn't support more of PNG (16-bit channels would be nice, and support for bigger images too)

using libpng, libjpeg etc. however is a massive pain with lots of work required even if you just want to do what everyone does pretty much and load a file into RGBA buffer and get back the width and height. :)

kylophone · on June 9, 2015

Yep. The stb_image.h API is very simple. Also, because its just an #include, you don't need to worry about your users needing to figure out how to link a library.

neilh23 · on June 9, 2015

I got some very interesting results from feeding fractal images into a program like this a few years back - unfortunately, I don't have the resulting sounds, but you pick less busy images with filaments, and adjust the contrast - the result is very organic. Must have a go with this ...

naggie · on June 9, 2015

Very interesting. I wonder how it would cope with some noise -- for example playing the resultant file and recording it with a microphone for example.

I suspect there will be content lost at the bottom and the top of the image depending on the frequency response of the microphone/speaker.

KeytarHero · on June 9, 2015

If it's just through a cheap speaker and mic, you'll probably lose a lot of the image.

This uses a linear frequency scale (which is just the nature of Fourier transforms), whereas our ears are sensitive on a log frequency scale. In other words, the information that's most important to our hearing, which is what a mic & speaker will preserve the best, is in the bottom 10% of the image.

A cheap speaker & mic will probably lose a lot of content above about 10 kHz - which is the entire top half of the image. Even though this wouldn't be that huge a difference to our ears, it would sure look bad in the image.

As for background noise, the difference would probably look like the difference here: http://www.sweetwater.com/insync/media/2010/09/RXAdv-e-xlarg... (that's a screenshot of audio restoration software that removes noise, so it's technically doing the opposite process as best it can, but the difference would be similar).

dktbs · on June 9, 2015

This isn't exactly related (as it doesn't produce a spectrogram), but the software pixivisor plays around with this idea: http://warmplace.ru/soft/pixivisor/

The software can act as a transmitter or receiver. In trasmitter mode, you can provide it a static image or animated gif, which it will convert into audio which plays continuously. In receiver mode, pixivisor listens via the mic or line-in (depending on hardware platform and whats attached) and reconstructs the image from the audio. You can then manipulate the audio however you want.

This demo uses a korg monotron's low pass filter and LFOs to mangle an animated gif of a cat: https://www.youtube.com/watch?t=63&v=g2W1W4fwEkg

Its really interesting to me to see how the audio modulation is represented in the receiver's output.

wrigby · on June 9, 2015

I would expect to see the frequency response of the speaker + mic combination show up as darker and lighter horizontal bands in the image (since these spectrograms are plotted with frequency on the y axis). It might look really cool!

Edit: depending on the time alignment / phase response of the speakers, you might see the low frequency parts of the image get distorted to the right.

kylophone · on June 9, 2015

Give it a try! The output file contains both very low frequencies and also frequencies up to Nyquist, in this case 24kHz.

wrigby · on June 9, 2015

I'll try to give it a shot! What are you using to generate the spectrograms?

kylophone · on June 9, 2015

  sox -c 1 -r 48000 -b 32 -e float -t raw out.raw -n spectrogram

_lce0 · on June 9, 2015

please post your results!

science is fun

el33th4xx0r · on June 9, 2015

transfer it via phone line, you'll got facsimile