Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Small C Program to Convert Photos to Audio (github.com/kylophone)
53 points by kylophone on June 9, 2015 | hide | past | favorite | 16 comments


Aphex Twin had hidden spectrogram images in some of his tracks in the past, unfortunately I think they are not well preserved by multiple steps of compression/reencoding to youtube videos.

https://www.google.de/search?q=Aphex+Twin+Spectrograph&tbm=i...

https://youtu.be/M9xMuPWAZW8?t=5m31s


On Madeon's latest album, he hid his logo in the transition between two tracks: http://i.imgur.com/yPJTTs0.jpg

It sounds pretty cool too - you can hear the tones going up & down: http://www.youtube.com/watch?v=lz10F2Rtqv8&t=3m40s


On the flipside of this, there's an interesting hack you can use to generate textures from audio, primarily for use in size-constrained demos. http://www.iquilezles.org/www/articles/gmdlsgfx/gmdlsgfx.htm


This turns an image into a PCM audio file. The image is visible on the spectrogram. For example, here's Ernest Hemmingway punting a beer can: http://imgur.com/QR5a8mw


+1 stb_image.h it uses a good interface for an image loading library, something nobody seems capable of even in the "good ol' days"... just a shame it doesn't support more of PNG (16-bit channels would be nice, and support for bigger images too)

using libpng, libjpeg etc. however is a massive pain with lots of work required even if you just want to do what everyone does pretty much and load a file into RGBA buffer and get back the width and height. :)


Yep. The stb_image.h API is very simple. Also, because its just an #include, you don't need to worry about your users needing to figure out how to link a library.


I got some very interesting results from feeding fractal images into a program like this a few years back - unfortunately, I don't have the resulting sounds, but you pick less busy images with filaments, and adjust the contrast - the result is very organic. Must have a go with this ...


Very interesting. I wonder how it would cope with some noise -- for example playing the resultant file and recording it with a microphone for example.

I suspect there will be content lost at the bottom and the top of the image depending on the frequency response of the microphone/speaker.


If it's just through a cheap speaker and mic, you'll probably lose a lot of the image.

This uses a linear frequency scale (which is just the nature of Fourier transforms), whereas our ears are sensitive on a log frequency scale. In other words, the information that's most important to our hearing, which is what a mic & speaker will preserve the best, is in the bottom 10% of the image.

A cheap speaker & mic will probably lose a lot of content above about 10 kHz - which is the entire top half of the image. Even though this wouldn't be that huge a difference to our ears, it would sure look bad in the image.

As for background noise, the difference would probably look like the difference here: http://www.sweetwater.com/insync/media/2010/09/RXAdv-e-xlarg... (that's a screenshot of audio restoration software that removes noise, so it's technically doing the opposite process as best it can, but the difference would be similar).


This isn't exactly related (as it doesn't produce a spectrogram), but the software pixivisor plays around with this idea: http://warmplace.ru/soft/pixivisor/

The software can act as a transmitter or receiver. In trasmitter mode, you can provide it a static image or animated gif, which it will convert into audio which plays continuously. In receiver mode, pixivisor listens via the mic or line-in (depending on hardware platform and whats attached) and reconstructs the image from the audio. You can then manipulate the audio however you want.

This demo uses a korg monotron's low pass filter and LFOs to mangle an animated gif of a cat: https://www.youtube.com/watch?t=63&v=g2W1W4fwEkg

Its really interesting to me to see how the audio modulation is represented in the receiver's output.


I would expect to see the frequency response of the speaker + mic combination show up as darker and lighter horizontal bands in the image (since these spectrograms are plotted with frequency on the y axis). It might look really cool!

Edit: depending on the time alignment / phase response of the speakers, you might see the low frequency parts of the image get distorted to the right.


Give it a try! The output file contains both very low frequencies and also frequencies up to Nyquist, in this case 24kHz.


I'll try to give it a shot! What are you using to generate the spectrograms?


  sox -c 1 -r 48000 -b 32 -e float -t raw out.raw -n spectrogram


please post your results!

science is fun


transfer it via phone line, you'll got facsimile




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: