Hacker News new | past | comments | ask | show | jobs | submit login
Generate your own sounds with NSynth (tensorflow.org)
147 points by pkmital on June 24, 2017 | hide | past | favorite | 30 comments



This story reminded me to clean up a very different synth and put it on github:

https://github.com/wildsparx/synthem80

Unlike NSynth, synthem80 is directed to a specific and humble goal - make early 80s-style arcade sounds. It uses a mini-language to control an engine similar to that in Pacman.

For instance, the sound when Pacman eats a ghost:

    ./synthem80 -o eat-monster.sw 'timer(t=0.5) wav(wf=1 f=2 a=45) add(a=< b=48) nmc(wf=5 f=<)'


I'm sorry but is the Deep Learning Hype strong enough to warp people's sensory perception? Every sample on this page sounds terrible IMHO, and pretty much what you would get if you would spend 10 minutes implementing the most naive spectrogram resynthesis you could think of. Granted, there is great promise in finding the "manifold of music", which seems to be the goal here, but what they show is just not anywhere near that promise.


Agreed. The texture is nice - I enjoy a low-fi sound - but the fun of sound engineering is building your own signal paths to modulate or destroy sound interactively. The more abstracted the sound generation method, the more of a toy and the less of a tool it is, because the rising non-linearities make it increasingly difficult to pursue a specific objective. This has alway sbeen a limiting factor for FM, where undirected noodling can certainly yield interesting results, but not very controllable ones beyond3 or 4 operators.

I do think it's interesting and valuable work. But it's worth bearing in mind that there's no shortage of great resynthesis tools already, and that musicians are besieged with offers from technologists for Sounds! That! Have! Never! Been! Possible! Before! While you can always rely on Jordan Rudess to provide a celebrity endorsement to the keyboard collector crowd, most hobbyist musicians eventually get over chasing novelty and end up reducing their equipment load to a smaller number of really well-engineered devices or software tools that they really like and get to know inside out.


The 'cello' and 'laaa...' actually made me quickly remove my headphones. Having 'character' is not even close to how I would describe these.


They're using very low quality sample rates, 8 bit, not pretty. Until it can do 32 hit samples it's going to sound horrible.


I've read the articles about NSynth with interest, but I can't figure out why they're using 8-bit and low sample rates. Surely, it's not that much more computationally intensive that they can't tinker at 8 bits and then do a render at a high resolution once they've settled on some parameters they like.


Possibly the same reason all the Style Transfer implementations use very low resolution images? All the neural net applications I've seen seem to have problems with high resolutions in any form.


The 8-bit is actually reasonable: they have one output per possible value, so 16 bit would mean 65k outputs... They could probably do a secondary step that adds less significant bits. The low samplerate is probably because it's originally used for speech, and a lot of speech databases are in 16 kHz.


It's probably a similar reason why 8 bit homebrew computers are more popular than 16: the complexity isn't linear.


Yeah, granted there are neural resynthesis packages which do function, they are just waaay too slow for realtime audio production at the moment (and probably will be for a long time, now that moore's law is dead).


i feel stupid and do not get what this is all about. so there is something that synthesizes sounds by feeding it audio files? i dont get what is happening here. i tried semi hard to understand, but i figure someone can give the big picture that i think im missing.


Could this approach be used for media compression? I've wondered how compressible a popular-music track could be if you had a sufficiently rich language to describe it. This seems like a method to answer that question.


This would be basically MIDI, right?


Or sheet music. It always amazed me that humans came up with any solution at all to "here's a piece of paper, tell me what your song sounds like" to say nothing of one that actually works to some degree.

I've always wondered how much classical music sounds the way it does because sheet music is the way it is.


my guess is that the sheet music has an enormous effect. because it can encode somethings very well, and other things poorly.


An example of this is Chinese guqin tablature. It can be centuries old and includes a lot of detail on where to place fingers and how to strike the strings, which can give you hints about pitch and timbre when combined with knowing the tuning, strings, etc. But the tablature has almost nothing to say about the LENGTH of each note, so rhythm has to be inferred by the performer from what they know about the culture.


Look up the documentary "thin red line" on YouTube


MIDI encodes very little information about timbre, which is a huge source of variation in modern pop music.


Very little? More like none.


Program change + general MIDI instrument set is implementation-dependent but was pretty common in the 90s, and encodes timbre in an extremely limited way. Now of course nobody outside of fringe artists really use it.


it got very basic description in form of instrument index, so program 0 is piano, 4 guitar and so on.


MIDI is just a protocol to send instructions to turn notes on/off (and some other expressive info aswell).


Would probably require an enormous dictionary on the decoding end


It calls to mind the old joke about how someone wrote a compressor that turns Microsoft Word from a 20MB file into a 1 byte file, except the compressor is 20MB. (Adjust the file name and size until it's funny. When I first heard it, 20MB was an extraordinarily large size.)

But in this case you could imagine the right balance where it does end up with a significant savings.


Would anything approaching typical bitrates used in audio codecs imply an enormous dictionary? Also I wonder if any statement could be made about the learnability of codecs, e.g., are Fourier transforms something deep networks can arrive at?


I'm just starting to learn tensorflow from a developer non-data-scientist view. This is great. From a laymen view, it seems it needs a training session for eliminating noise or static.


I think the N sounds for 'not a' synth. I have a heap of synths and they make much nicer sounds!


Eh seen this submitted before, totally agree with the early criticism here because it's the same as the last time.

Woo hoo you built a noise maker! Kazoos for everybody!


But it also does convolution! I totally wasn't doing that in Max/MSP in 2003.

Snark aside, there's a lot of really awesome creative potential stemming from WaveNet. This just seems like the least novel application I've seen.


Apple missed a golden objective-c opportunity here.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: