I'm an MLE, I would probably chop the songs into short segments, add noise (part...

osrec · on Dec 6, 2023

Why add noise to the training set, rather than attempt to denoise the input?

sdenton4 · on Dec 6, 2023

So you want a location-sensitive hash, or embedding, and you want it to be noise resistant.

The ml approach is to define a family of data augmentations A, and a network N, such that for some augmentation f, we have N(f(x)) ~= N(x). Then we learn the weights of N, and on real data have N(x')~=N(x).

The denoising approach is to define a set of denoising algorithms D and hash function H, so that H(D(x'))~=H(x). This largely relies on D(x')~=x, which may have real problems.

So the neutral network learns the function we actually need, with the properties we want, where the denoiser is designed for a proxy problem.

But that's not all...

Eventually our noise model needs extending (eg, reverb is a problem): the ML approach adds a new set of augmentations to A. This is fine: it's easy to add new augmentations.

But the denoiser might need some real algorithm work, and hope that there's no bad interaction with other parts of the pipeline, or too much additional compute overhead. (And de-reverb is notoriously hard.)

casualscience · on Dec 6, 2023

That could work, I think denoising a song to be a perfect match to the original recording is probably a very hard problem, so hard that your model will still need to be robust to some deviation from the original track, and therefore you need to do what I said above anyway.

Generally it's much easier to generate noised pairs from clean input than it is to do the reverse, i.e. go record lots of noised inputs from the wild and match to the original song. So the denoising problem you mention would be tougher still due to covariate shift. I think the features you learn trying to fingerprint the song through noise will probably be a bit more robust, but I don't have a mathematical proof.

bottled_poe · on Dec 6, 2023

Because then you’re training it on data that is more similar to the operating environment for the application. It’s a better fit for purpose. If the target environment was a clean audio signal, you’d optimise for that instead.

mk67 · on Dec 6, 2023

Adding noise is generally helpful for regularization in ML. Most modern deep learning approaches do this in one way or the other - mostly dropout. It improves generalization capabilities of the model.

wiz21c · on Dec 6, 2023

To start from an original song and move it towards something that resebme a real life recording ? IOW : make the NN learn to distinguish between the song sound and its environment ?

omnster · on Dec 6, 2023

I'd assume adding noise is done once per song and is thus a bit computationally cheaper than trying to denoise each input.