It says it can be 100 times faster than in real-time. So can it be run in real-t...

kleiba · on May 19, 2020

Sometimes the distinction is made between "real-time" and "online" processing.

The first one refers to the speed of the processing in relation to the length of the recording - so, say, you can process a 10 minute recording in 1 minute then you're 10x real-time. However, your analysis might require the full track to be available for best outcomes, and so you cannot really start with the processing until the full source is available.

The latter is what "online" processing refers-to, the ability to process on-the-fly in parallel to the recording. Obviously, this cannot be faster than real-time ;-) but hopefully it is not slower, either. Often times, though, you get a (somewhat constant and) hopefully slow offset, i.e., you can process a 10 minute recording online in the same time but you need another 10 seconds on top of that.

This is, by the way, not restricted to source separation, it applies to other disciplines as well, say, automatic speech recognition.

FraKtus · on May 19, 2020

Exactly, while fast if this method needs to parse the full track before starting to generate the results then it can't be used in real-tine.

To be used with arbitrary audio in real-time, after initialization and setup you need an API that looks like:

ProcessAudio (samples, num_namples)

And it would return n packets of num_namples samples. One packet for each generated track.

nunja · on May 19, 2020

I experimented with the spleeter architecture quite a bit and I would say this is not suitable for real time audio processing. The reason is that the model needs at least 512 frames of audio samples to produce an output usable for source separation. This adds a ton of latency. I tried with smaller windows but the results are very bad.

LegitShady · on May 19, 2020

This person https://github.com/diracdeltas/spleeter4max

created an max for live native version of spleeter and demos it here:

https://www.youtube.com/watch?v=4pcJoI5CUOA&feature=youtu.be

It's way faster than real time, im not sure why slowing it down would be an advantage. You still need to take the resultant data and do things with them, as a dj, and faster is better.