I remember this because I got in a big argument with someone about whether this could possibly work. Of course, I never got off my ass and implemented it, which I guess makes me a huge loser.
"Unfortunately, there's no indication in the paper of what software was used to develop the process (although the scatterplots in the paper do look decidedly R-like)."
I used to work with Avery Wang, the guy who devised the algorithm. He used Matlab.
The article and comments seem to suggest that the use of Matlab or R is a prerequisite for performing calculations such as this. However, MIR (Music Information Retrieval) libraries exist for a number of languages, including Java[1] and Ansi C[2], amongst others. A good dynamic language for experimenting with this sort of thing is SuperCollider[3].
By the way, the psycho-acoustically spectral measurements referred to in the article are called MFCCs[4] - basically an FFT reading weighted according to the sensitivity of our ears. They are often used in both music and (especially) speech recognition because they tend to accurately sum up the timbre we perceive in a given sound. Timbre is much easier to extract from a digital audio file than pitch or vocal information, hence why it tends to be successful in applications such as this.
This is exactly how you identify chemical compounds using X-Ray Crystallography. You shine x-rays of different frequencies onto a compound, measure the magnitude of the reflections, noting down the 3 highest peaks.
Then, you look up those peaks in a book, which has compounds ordered by the wavelength of the highest peak.
It takes minutes to do it by hand, I'm not surprised computers can do it better.
Aww man, when I read the headline I was expecting a SpinVox-like scandal. Like a room full of idiot-savants in Bangalore that knew every pop hit for the last 50 years or something.
heh. It is actually fascinating how strong crowd sourcing can be. You would logically opt the manual root & surprisingly lot of times esp for cost and simplicity of solution
5-6 hrs of a good developer == 1 month of 3 ops in India or elsewhere. Except that getting ops to work itself can be painstaking.
"Specifically, a fixed length of audio is converted to audio DNA; this conversion process extracts certain features from the signal based on the psycho acoustic considerations. The system has two components, one that enables the extraction of Audio DNA from a few seconds of recording, and the other is an efficient search engine that finds the exact match for the DNA.
The audio DNA is based on extracting 64 sub DNAs every 3 seconds. The sub DNAs are generated by looking at the energy differences along the frequency and time axes. These 64 sub DNA form the chromosomes of the system, which enables the system to uniquely identify the chosen song."
Looks to me like pretty much the same technology. This is again not a surprise. Most implementations of this idea will be using similar techniques. What I am amazed at is that somebody thought that all this was feasible.
It would be much better if you could hum or whistle a tune, and it would recognize it. I saw a PhD thesis once about this, with an actual implementation that worked pretty well. The only problem was that the database of songs was very small. It's probably hard to scale this type of search.
I remember once seeing a "dictionary" of songs. Each one was indexed by whether notes were higher, lower or the same as the previous note. Using D for down, S for same and U for up, and using # for the first note, here's the Start Spangled Banner ...
#DDUUUUDDDUUSSUDDDDUUSDDD
Many, many tunes can be separated with the first 20 symbols.
Anyone have a reference? I'd like to acquire a copy ...
I tried to use different "query by humming" services but the percentage of false positives (when a service produced a list of melodies and no one matched yours) was really huge. And even if the melody was in the list and you tried to find it again with the same service (by humming the same tune) the probability of getting it in the list again was pretty low.
Anyway, I think the idea of query by humming is not a dead end. However, such a hypothetical service should somehow collect and use a database of different "hums".
I was working on a little side startup that used crowdsourcing to help ID songs, I was just getting into researching how programs like shazam and midomi worked until I killed the project. His paper and the way it works is quite nice, but it's not perfect for other rare music and songs without elements that really stand out(frequencies or otherwise like house music). Thanks for the link!
I have often sat in coffee shops wondering what method of data extrapolation Shazam used to parse audio to be able to search it's music db. I would think about how I would do it. I use Shazam all the time so it's nice to finally know the basic idea.
I'm tired of these software 'engineering' types who insist that computers are run by using 'maths' and 'numbers' (whatever those are).
Clearly, computers are run by aphasic tonally-separated spinning disks. These disks fire puffs of air out the sides of the computer, creating little tiny tornados, which summon air spirits to call the fire spirits, which causes the screen to light up and the keys to make tappy-tap noises.
anyway.
to be clear:
not statistics.
not math.
not regulated pulses of electrons.
MAGIC!
In my experience, the people who are most informed about how a computer actually works are those most convinced that it runs on magic. All my compE friends insist that cpus are maintained and operated by tiny gnomes.
While less technical people don't understand, they have 'faith' that there's a logical, scientific explanation for how computer's work.
A good hack is indistinguishable from magic. (Yes I know that the original quote is "Any sufficiently advanced technology is indistinguishable from magic")
http://news.slashdot.org/comments.pl?sid=7310&cid=823710
(from 2000).
I remember this because I got in a big argument with someone about whether this could possibly work. Of course, I never got off my ass and implemented it, which I guess makes me a huge loser.