I noticed that RNNoise doesn't appear to be an open model, you can't re-train it...

ArsenArsen · on Jan 31, 2021

The documentation is a bit poor. The original data is available for download (with more info about the entire process, most of which is outside of my grasp as I am not an ML person) in the demo blog post: https://jmvalin.ca/demo/rnnoise/ (towards the bottom of the page)

ArsenArsen · on Jan 31, 2021

Coming back with information from #xiph on freenode:

  16:57 <ArsenArsen> where and under what license is the training data used for RNNoise?
  18:38 <rillian> ArsenArsen: There's a copy of what I believe is the training data on the xiph server, but afaik it's never been published
  18:39 <rillian> the original submission page has an EULA waiving copyright and liability claims, and agreeing that it _may_ be released CC0.
  18:40 <rillian> it looks like that didn't actually happen.
  18:41 <rillian> there may have been concerns about auditing it for privacy issues, but there's a lot of audio to listen to, 6.5G compressed
  18:41 <rillian> jmspeex, TD-Linux: what's the status of publishing the rnnoise training data?
  18:43 <jmspeex> Are you talking about the data that was used to train the default RNNoise model or the noise that got collected with the demo?
  18:43 <rillian> jmspeex: I think debian just cares about the training data for the default model.
  18:44 <jmspeex> There was never plan to release that -- it includes data from databases we cannot release
  18:44 <jmspeex> but I don't see what the issue is. Distributing the model is not the same as distributing the data
  18:45 <rillian> ah, I see. I didn't realize you'd used proprietary sources as well.

pabs3 · on Jan 31, 2021

Any idea about the license for the original data?

pabs3 · on Jan 31, 2021

The paper links to the McGill TSP speech database (English & French) as one of the sources of the data, which claims to be BSD licensed:

http://www-mmsp.ece.mcgill.ca/Documents/Data/

pabs3 · on Jan 31, 2021

The other source of data mentioned in the paper is the NTT Multi-Lingual Speech Database for Telephonometry, which seems to be commercial, so presumably under a proprietary license.

https://www.ntt-at.com/product/multilingual/ https://www.ntt-at.com/product/speech2002/

pabs3 · on Jan 31, 2021

Hmm, OTOH, the 6.4GB data tarball says that it is from contributors who responded to the demo and is licensed under CC0.

ArsenArsen · on Jan 31, 2021

+1, that data is CC0, and I believe that's all the data that was used for training.

jmvalin · on Jan 31, 2021

No, exactly none of that data was used for training. The training was done before the demo that was asking for noise contributions. The contributions are CC0, but were never used (i.e. totally unknown dataset quality).

the-dude · on Jan 31, 2021

So far we have 3 ideas!

pabs3 · on Jan 31, 2021

Also any idea if the training required nvidia GPUs or was it done on CPUs or GPUs with non-proprietary drivers?

ArsenArsen · on Jan 31, 2021

There is training instructions in the repository. The training scripts appear to be using some pretty standard ML libraries (I'm seeing keras and mentions of tensorflow), so I imagine that the requirements are the same as those.

I don't feel I'm qualified to elaborate on this specifically, again, I'm no ML person. For more info look here: https://github.com/xiph/rnnoise/tree/master/training https://github.com/xiph/rnnoise/blob/master/TRAINING-README