I wouldn’t discount ML. The nonlinearities are the bread and butter of modern ML models. In fact, two linear layers without a nonlinearity inbetween is equivalent to one big linear layer. So nonlinearities are required.
To put it another way, I would gladly bet any reasonable sum of money that in a double blind test, the listener wouldn’t be able to tell the difference from a genuine guitar pedal. (Not necessarily this pedal, but I suspect ML will model the effects more than adequately for human hearing precision.)
FWIW, I say this as someone who used to argue that graphics programmers were doing gamedev all wrong because they weren’t modeling light, they were approximating light. ML models were the way out.
I also think much of the problem is that ML devs often don’t have traditional signal processing experience, so they haven’t been modeling signals in quite the right way. (I’m trying to rectify that a bit with my FFT tutorials: https://twitter.com/theshawwn/status/1398796224921321472?s=2...) It remains to be seen, but Fourier space has recently been making strides in ML, and it’s likely much easier for a model to approximate a nonlinear waveform in frequency space than as a raw waveform.
To put it another way, if human speech is getting to the point where ML models can trick people, what are the chances that a future model won’t be able to do it for guitars?
I've heard images are better modeled in DCT space (which isn't based on complex numbers) because it's better at energy compaction than FFT, and also because it doesn't assume that the image is periodic. Also some people think that the FFT is insufficient, even for audio, because it doesn't model time-domain hearing perception. Some people say that wavelets are better at modeling images than purely frequency-domain transforms, because they take spatiality more into account. From what I've heard, wavelets work well for modeling human vision (in fact convolutional neural network input kernels tend to converge to Gabor filters, which I don't know howw they differ from Gabor wavelets) and noise reduction, but have fallen flat for image/video compression codec design.
All excellent points, and I think you should DM me on twitter to chat about this more. (I hope you will!)
DCT is on my radar. But there are several serious limitations that I think are overlooked. For example, convolution is no longer a simple component-wise multiplication. This seems, to me, a big deal.
In other words, you're probably right, but I'm focused solely on FFTs on the (very low) chance that people have overlooked something that will work well.
Sorry I don't work on neural networks much, and have my plate too full with other projects (and my DSP is a bit rusty?) to hold a conversation on this right now. And I don't use Twitter much either.
> The nonlinearities are the bread and butter of modern ML models
I guess I didn't get my point across. What I meant was that pedal settings tend to be non-linear with multiple sweet spots (which often depend on the guitar and amp) so you shouldn't just do a linear range from 1-10^N (where N is the number of knobs) for training data, as someone else had suggested. Moreover, there are also dependencies on the impedance chain, gain structure, feedback, reflections, etc., which seem well-suited to circuit and physical modeling. Digital pedals, as I note, are largely software anyway so it doesn't make sense to me to try to model them with ML any more than it does to model Microsoft Word using ML (though I'm sure someone has tried.)
In general ML seems most useful when you don't have good analytical models - but in the case of circuits and software we have very good analytical models.
That's fair, and true! But one interesting thing about ML models is that they're often much more performant. For example, it's relatively expensive to evaluate analog circuits digitally. An ML model that can do it on a raspi with no delay and no quality loss is interesting, to me at least.
actually, the researchers of aalto university (now at neural dsp), who pioneered the guitar ml technology, were working on speech initially and did this one as a side project
I wouldn’t discount ML. The nonlinearities are the bread and butter of modern ML models. In fact, two linear layers without a nonlinearity inbetween is equivalent to one big linear layer. So nonlinearities are required.
To put it another way, I would gladly bet any reasonable sum of money that in a double blind test, the listener wouldn’t be able to tell the difference from a genuine guitar pedal. (Not necessarily this pedal, but I suspect ML will model the effects more than adequately for human hearing precision.)
FWIW, I say this as someone who used to argue that graphics programmers were doing gamedev all wrong because they weren’t modeling light, they were approximating light. ML models were the way out.
I also think much of the problem is that ML devs often don’t have traditional signal processing experience, so they haven’t been modeling signals in quite the right way. (I’m trying to rectify that a bit with my FFT tutorials: https://twitter.com/theshawwn/status/1398796224921321472?s=2...) It remains to be seen, but Fourier space has recently been making strides in ML, and it’s likely much easier for a model to approximate a nonlinear waveform in frequency space than as a raw waveform.
To put it another way, if human speech is getting to the point where ML models can trick people, what are the chances that a future model won’t be able to do it for guitars?