So you want a location-sensitive hash, or embedding, and you want it to be noise...

So you want a location-sensitive hash, or embedding, and you want it to be noise resistant.

The ml approach is to define a family of data augmentations A, and a network N, such that for some augmentation f, we have N(f(x)) ~= N(x). Then we learn the weights of N, and on real data have N(x')~=N(x).

The denoising approach is to define a set of denoising algorithms D and hash function H, so that H(D(x'))~=H(x). This largely relies on D(x')~=x, which may have real problems.

So the neutral network learns the function we actually need, with the properties we want, where the denoiser is designed for a proxy problem.

But that's not all...

Eventually our noise model needs extending (eg, reverb is a problem): the ML approach adds a new set of augmentations to A. This is fine: it's easy to add new augmentations.

But the denoiser might need some real algorithm work, and hope that there's no bad interaction with other parts of the pipeline, or too much additional compute overhead. (And de-reverb is notoriously hard.)