Generate your own sounds with NSynth

asher · on June 25, 2017

This story reminded me to clean up a very different synth and put it on github:

Unlike NSynth, synthem80 is directed to a specific and humble goal - make early 80s-style arcade sounds. It uses a mini-language to control an engine similar to that in Pacman.

For instance, the sound when Pacman eats a ghost:

    ./synthem80 -o eat-monster.sw 'timer(t=0.5) wav(wf=1 f=2 a=45) add(a=< b=48) nmc(wf=5 f=<)'

svantana · on June 24, 2017

I'm sorry but is the Deep Learning Hype strong enough to warp people's sensory perception? Every sample on this page sounds terrible IMHO, and pretty much what you would get if you would spend 10 minutes implementing the most naive spectrogram resynthesis you could think of. Granted, there is great promise in finding the "manifold of music", which seems to be the goal here, but what they show is just not anywhere near that promise.

anigbrowl · on June 24, 2017

Agreed. The texture is nice - I enjoy a low-fi sound - but the fun of sound engineering is building your own signal paths to modulate or destroy sound interactively. The more abstracted the sound generation method, the more of a toy and the less of a tool it is, because the rising non-linearities make it increasingly difficult to pursue a specific objective. This has alway sbeen a limiting factor for FM, where undirected noodling can certainly yield interesting results, but not very controllable ones beyond3 or 4 operators.

I do think it's interesting and valuable work. But it's worth bearing in mind that there's no shortage of great resynthesis tools already, and that musicians are besieged with offers from technologists for Sounds! That! Have! Never! Been! Possible! Before! While you can always rely on Jordan Rudess to provide a celebrity endorsement to the keyboard collector crowd, most hobbyist musicians eventually get over chasing novelty and end up reducing their equipment load to a smaller number of really well-engineered devices or software tools that they really like and get to know inside out.

mbell · on June 24, 2017

The 'cello' and 'laaa...' actually made me quickly remove my headphones. Having 'character' is not even close to how I would describe these.

shams93 · on June 25, 2017

They're using very low quality sample rates, 8 bit, not pretty. Until it can do 32 hit samples it's going to sound horrible.

SwellJoe · on June 25, 2017

I've read the articles about NSynth with interest, but I can't figure out why they're using 8-bit and low sample rates. Surely, it's not that much more computationally intensive that they can't tinker at 8 bits and then do a render at a high resolution once they've settled on some parameters they like.

doomlaser · on June 25, 2017

Possibly the same reason all the Style Transfer implementations use very low resolution images? All the neural net applications I've seen seem to have problems with high resolutions in any form.

svantana · on June 25, 2017

The 8-bit is actually reasonable: they have one output per possible value, so 16 bit would mean 65k outputs... They could probably do a secondary step that adds less significant bits. The low samplerate is probably because it's originally used for speech, and a lot of speech databases are in 16 kHz.

mysterydip · on June 25, 2017

It's probably a similar reason why 8 bit homebrew computers are more popular than 16: the complexity isn't linear.

microcolonel · on June 24, 2017

Yeah, granted there are neural resynthesis packages which do function, they are just waaay too slow for realtime audio production at the moment (and probably will be for a long time, now that moore's law is dead).

the_cat_kittles · on June 24, 2017

i feel stupid and do not get what this is all about. so there is something that synthesizes sounds by feeding it audio files? i dont get what is happening here. i tried semi hard to understand, but i figure someone can give the big picture that i think im missing.

sowbug · on June 24, 2017

Could this approach be used for media compression? I've wondered how compressible a popular-music track could be if you had a sufficiently rich language to describe it. This seems like a method to answer that question.

kuschku · on June 24, 2017

This would be basically MIDI, right?

noonespecial · on June 24, 2017

Or sheet music. It always amazed me that humans came up with any solution at all to "here's a piece of paper, tell me what your song sounds like" to say nothing of one that actually works to some degree.

I've always wondered how much classical music sounds the way it does because sheet music is the way it is.

the_cat_kittles · on June 24, 2017

my guess is that the sheet music has an enormous effect. because it can encode somethings very well, and other things poorly.

oddlyaromatic · on June 25, 2017

An example of this is Chinese guqin tablature. It can be centuries old and includes a lot of detail on where to place fingers and how to strike the strings, which can give you hints about pitch and timbre when combined with knowing the tuning, strings, etc. But the tablature has almost nothing to say about the LENGTH of each note, so rhythm has to be inferred by the performer from what they know about the culture.

ssttoo · on June 25, 2017

Look up the documentary "thin red line" on YouTube

ssalazar · on June 25, 2017

MIDI encodes very little information about timbre, which is a huge source of variation in modern pop music.

nnd · on June 25, 2017

Very little? More like none.

ssalazar · on June 27, 2017

Program change + general MIDI instrument set is implementation-dependent but was pretty common in the 90s, and encodes timbre in an extremely limited way. Now of course nobody outside of fringe artists really use it.

iammyIP · on June 25, 2017

it got very basic description in form of instrument index, so program 0 is piano, 4 guitar and so on.

stevehiehn · on June 24, 2017

MIDI is just a protocol to send instructions to turn notes on/off (and some other expressive info aswell).

tomcam · on June 24, 2017

Would probably require an enormous dictionary on the decoding end

sowbug · on June 24, 2017

It calls to mind the old joke about how someone wrote a compressor that turns Microsoft Word from a 20MB file into a 1 byte file, except the compressor is 20MB. (Adjust the file name and size until it's funny. When I first heard it, 20MB was an extraordinarily large size.)

But in this case you could imagine the right balance where it does end up with a significant savings.

ulber · on June 24, 2017

Would anything approaching typical bitrates used in audio codecs imply an enormous dictionary? Also I wonder if any statement could be made about the learnability of codecs, e.g., are Fourier transforms something deep networks can arrive at?

sebringj · on June 24, 2017

I'm just starting to learn tensorflow from a developer non-data-scientist view. This is great. From a laymen view, it seems it needs a training session for eliminating noise or static.

ryan-allen · on June 25, 2017

I think the N sounds for 'not a' synth. I have a heap of synths and they make much nicer sounds!

6stringmerc · on June 24, 2017

Eh seen this submitted before, totally agree with the early criticism here because it's the same as the last time.

Woo hoo you built a noise maker! Kazoos for everybody!

seandougall · on June 25, 2017

But it also does convolution! I totally wasn't doing that in Max/MSP in 2003.

Snark aside, there's a lot of really awesome creative potential stemming from WaveNet. This just seems like the least novel application I've seen.

funkychicken · on June 25, 2017

Apple missed a golden objective-c opportunity here.