Hacker News new | past | comments | ask | show | jobs | submit login

Could this approach be used for media compression? I've wondered how compressible a popular-music track could be if you had a sufficiently rich language to describe it. This seems like a method to answer that question.



This would be basically MIDI, right?


Or sheet music. It always amazed me that humans came up with any solution at all to "here's a piece of paper, tell me what your song sounds like" to say nothing of one that actually works to some degree.

I've always wondered how much classical music sounds the way it does because sheet music is the way it is.


my guess is that the sheet music has an enormous effect. because it can encode somethings very well, and other things poorly.


An example of this is Chinese guqin tablature. It can be centuries old and includes a lot of detail on where to place fingers and how to strike the strings, which can give you hints about pitch and timbre when combined with knowing the tuning, strings, etc. But the tablature has almost nothing to say about the LENGTH of each note, so rhythm has to be inferred by the performer from what they know about the culture.


Look up the documentary "thin red line" on YouTube


MIDI encodes very little information about timbre, which is a huge source of variation in modern pop music.


Very little? More like none.


Program change + general MIDI instrument set is implementation-dependent but was pretty common in the 90s, and encodes timbre in an extremely limited way. Now of course nobody outside of fringe artists really use it.


it got very basic description in form of instrument index, so program 0 is piano, 4 guitar and so on.


MIDI is just a protocol to send instructions to turn notes on/off (and some other expressive info aswell).


Would probably require an enormous dictionary on the decoding end


It calls to mind the old joke about how someone wrote a compressor that turns Microsoft Word from a 20MB file into a 1 byte file, except the compressor is 20MB. (Adjust the file name and size until it's funny. When I first heard it, 20MB was an extraordinarily large size.)

But in this case you could imagine the right balance where it does end up with a significant savings.


Would anything approaching typical bitrates used in audio codecs imply an enormous dictionary? Also I wonder if any statement could be made about the learnability of codecs, e.g., are Fourier transforms something deep networks can arrive at?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: