NoiseTorch: Real-time microphone noise suppression on Linux written in Go

tazjin · on July 18, 2020

I've recently built the inverse of this using NSFV (https://github.com/werman/noise-suppression-for-voice), i.e. suppressing noise in incoming audio.

A lot of people - despite being forced to work from home - simply don't seem to care about the way their audio sounds. Many don't even try to tackle these problems after it's been pointed out to them that they're being a nuisance in online meetings.

I gave up on trying to help people fix their setups, or convincing them that it matters, and switched to doing this on the receiver end. It's been a massive quality-of-life improvement.

If you're interested in the setup, you basically just need a small script that loads the pulseaudio plugin and wires up the sources/sinks correctly.

My setup script is here: https://cs.tvl.fyi/depot@canon/-/blob/tools/nsfv-setup/defau...

And some more context: https://cl.tvl.fyi/c/depot/+/578

basilgohar · on July 18, 2020

I think this is an out-of-sight, out-of-mind kind of issue. They simply don't understand how their noise, which they do not perceive, can be so detrimental to others. Moreover, a lot of people simply can't grasp the difference good hardware or even just a different setup (moving away from noise sources like fans, open windows, appliances running, etc.) can impact the quality of their sound. Lastly, a lot of people either cannot or think they cannot do anything about it, so they dismiss others' concerns because "everyone else has problems too", equating their noise to be the same as others'.

g_p · on July 18, 2020

Another issue I've seen is people using their device's built-in speaker and microphone (to form a loopback-fest) that the onboard echo cancellation tries its best to deal with.

I think there's certainly a part around "but I can't see the difference" - it's hard to get rapid feedback on if it's better or worse, since you won't notice the difference in change to setup.

In any case, being able to create a pulseaudio sink that puts the audio from one application through a noise removal chain sounds to me like a decent "quick fix" for me - I've tried to listen to some webinars with such horrible audio it was pretty much impossible to listen to, yet with otherwise worthwhile content. I wonder if this would be enough to improve it, or if the issues lie elsewhere (low quality transcodes).

edraferi · on July 19, 2020

I’ve tried several different audio solutions, including nice wired headphones + mod mic wireless. Ultimately people think I sound best just using my laptop’s internal microphone array and bookshelf speakers. The whole process was a frustrating distraction.

ponker · on July 18, 2020

Why don’t the services like Zoom and Teams fix this on the server side?

tazjin · on July 18, 2020

Google Meet supports noise-cancelling in outgoing audio (for business customers), but the user needs to enable it once in their settings. In my experience this last bit is already a hurdle ...

(disclaimer: I work at Alphabet)

draugadrotten · on July 18, 2020

Teams have something in the pipe for this, that will cancel out noise. I think it was a youtube video linked here a while back.

Here's another https://www.youtube.com/watch?v=oCrCkgjZEXQ

spacechild1 · on July 18, 2020

Zoom actually does noise and echo cancellation by default, but you can turn it off.

ralphm · on July 18, 2020

If services are indeed doing, or moving to, end-to-end encrypted media, there's nothing the server can do here.

bigiain · on July 18, 2020

Doesn’t mean they can’t do anything about the noise, just that they can’t do it on the server.

The RNNoise link from the bottom of that post runs the noise suppression in real time in JavaScript. Zoom et al. could do this client side while still doing proper E2E. (Although Zoom already uses more cou that I think it needs to...)

sdwvit · on July 18, 2020

Or they simply don’t care or don’t want to invest effort into solving it ️

btashton · on July 18, 2020

Why is this the company employees problem. Seems like work should be supplying good audio hardware if this is a real issue.

kazagistar · on July 18, 2020

After getting good hardware it took hours to get it set up just right.

basilgohar · on July 18, 2020

This is always a possibility, but we can kill ourselves if we try to figure who's sincere and who's not.

quercusa · on July 20, 2020

"which they do not perceive" is key here. I suspect that HN readers tend to an above-average degree of sensory sensitivity that causes them to notice things most people don't. Add to that even a little training/experience in audio and the noises that drive you crazy are likely completely inaudible to others.

g_p · on July 18, 2020

For anyone wanting to try this, it is pretty straightforward - first install noise-suppression-for-voice (AUR package available, binaries available via Github releases). You want to have librnnoise_ladspa.so and librnnoise_lv2.so available.

Then identify your current output sink by running `pactl list sinks short`. One will be "RUNNING", and this is your active sink. Keep this name to hand.

Create and enable an output sink using this plugin:

pacmd load-module module-ladspa-sink sink_name=denoise_sink_for_apps.stereo sink_master=YOUR_OUTPUT_SINK_FROM_ABOVE_HERE label=noise_suppressor_stereo plugin=librnnoise_ladspa.so control=0

The value of sink_master should be the output sink name from above, and the control=0 parameter can be adjusted - that seems to be the voice auto detect threshold. https://github.com/werman/noise-suppression-for-voice/issues... suggested 0, but I found it benefited from being higher. You can compare before/after by changing your pulseaudio output sink at system level (or application level) back and forth.

tazjin · on July 18, 2020

Thanks for these notes!

If you have Nix installed, you can also try my script directly with this command:

nix-build -E '(import (builtins.fetchGit "https://cl.tvl.fyi/depot") {}).tools.nsfv-setup'

closeparen · on July 18, 2020

I find most of the nuisance on Zoom calls is from children and pets. For example, the mailman usually comes to my boss's house during standup, and he's out of commission for a solid five minutes due to the barking.

civ0 · on July 27, 2020

I just tried this and really like it.

Unfortunately the latency for the audio output from Teamspeak 3 just rose to multiple seconds over a fews hours of use.

The latencies reported by PulseAudio did not represent that, but I could hear people talking for a few seconds after their icon indicated that they stopped.

Did you encounter similiar issues? Since I don't know much about PulseAudio I am at a loss right now and have to endure the noise.

sgt · on July 18, 2020

Would this work as a general suppressor against noisy neighbors across the street listening to bass? Can't hear the music, only feel the bass.

RMPR · on July 18, 2020

Does it work with Pipewire?

drblah · on July 18, 2020

As far as I can see this uses RNNoise. If you haven't checked it out yet you should, because it is simply amazing. It is a super effective noise gate / noise removal tool that does not require any configuration whatsoever.

My study mates and I have been using it over the last four months when working from home. It removes the noise of keyboards, seaguls and vacuum cleaners.

It is essentially the same as Nvidia RTX voice except it is much lighter on the system and does not require an Nvidia GPU. In our testing RNNoise performs similarly.

This project looks super cool. It seems to make RNNoise much more accessible. Normally you would have to manually set up the pulseaudio plumbing for this to work.

swyx · on July 18, 2020

do you need Linux to run your version? would love to get this running on my Mac.

drblah · on July 18, 2020

I have mainly used the built in RNNoise support in Mumble. But you can use https://github.com/werman/noise-suppression-for-voice/ and build the VST plugin (This is also what NoiseTorch uses i think). Then use any application that can load VST plugins to pipe your mic through. I have had reasonably good luck with it on Windows with Equalizer APO.

tyfon · on July 18, 2020

Someone also recently made a plugin [1] for OBS using this.

[1] https://gitlab.com/gravydanger/obs-rnnoise/

Abishek_Muthian · on July 18, 2020

Nicely done!

I went through some core libraries being used in the project, there's a pure Go pulseaudio implementation[1] which seems to deserve few more stars and the GUI framework nucular[2] seems support even metal rendering on macOS. I like how the native GUI frameworks for Go are becoming viable alternative to Qt.

Off-topic, Since this thread might attract audio programmers-

I was looking at ambient noise cancellation, audio amplification implementation for TWS earphones(BL 5.0) without those features on Android[3], would the latency defeat the purpose because it isn't implemented on device and does android bluetooth/audio APIs provide necessary access to implement such features in an app?

[1]https://github.com/lawl/pulseaudio

[2]https://github.com/aarzilli/nucular

[3]https://needgap.com/problems/22-enabling-hearing-aid-feature...

lawl · on July 18, 2020

Hey everyone author here! Awesome to see this on HN.

I'm happy to answer any questions, but this is a slightly inopportune moment to hit HN for me as I need to leave soon :) Some responses might be delayed by a day or so!

kaielvin · on July 18, 2020

Thanks for the work.

Is incorporating NoiseTorch into pulseEffects something that could be considered? The interest being to have all filters managed under one app.

lawl · on July 18, 2020

I've seen pulseeffects mentioned a few times, I must admit that I don't know exactly what it is and will need to research it first.

nickjj · on July 18, 2020

I'm not saying this tool is bad but I would be really careful about using tools like this in an environment where audio quality really matters (Youtube videos, podcasts, etc.).

Noise reduction tools work by removing specific frequencies from the source, some of which overlap with your natural voice.

This is why you start to sound robotic and get weird cutouts if you try to use tools to remove too much noise or background sounds. It's one of those things where, if you're not used to hearing your entire vocal range, you might not be aware at how much is getting cut out from tools that reduce noise.

It's too bad they don't have a before / after with a few voice samples in the readme.

manojlds · on July 18, 2020

This demonstration with Nvidia RTX Voice sounds pretty good

https://youtu.be/Q-mETIjcIV0

nickjj · on July 18, 2020

Definitely sounds better than I thought it would have and I've watched tons of this guy's videos in the past.

It really distorts his voice / range in some cases, such as when he taps his desk with that orange hammer. The difference there is night and day. It chops out his his natural voice's range. It seems to degrade his voice the more intense the background noise is, such as the leaf blower (lol), but that's reasonable to expect. But at the same time, even the mechanical keyboard has a very noticeable negative effect on his range.

It's one of those things where I wish so much that it worked perfectly, but I couldn't realistically think about using it for any recording work due to things like the above. There's just too many common noises (typing, etc.) that drastically distorts your voice.

9:23 in that video is hilarious though. Have to love Jerry!

amcoastal · on July 18, 2020

I wonder if its the algorithm degrading his voice or if the input sound is already degraded. Is it possible a leaf blower or a hammer would cause enough "noise" to make it so our ears couldnt hear his voice clearly as well? Then when you subtract out the portion of the sound attributed to the leafblower, youre hearing the parts of his voice that werent being jumbled by the leaf blower?

jacobush · on July 18, 2020

Like the blown out whites of a photograph. You can adjust levels, but if the input peaked, there’s just no information left in the data.

nickjj · on July 18, 2020

Hard to say because softer noises like typing still makes his voice sound like it's cutting out unnaturally. It's like the frequencies are being subtracted out of his normal tone, but it's more subtle than the leaf blower so you may not notice it without good headphones. It makes him sound very choppy and mechanical.

simias · on July 18, 2020

With the leaf blower I suspect that when it gets too close the microphone/ADC is saturating, which clips his voice. I wonder if it would've sounded better had he attempted to lower the gain on the microphone.

rcxdude · on July 18, 2020

Results can be mixed. Personally when I tried it it gave me a lisp.

exhilaration · on July 18, 2020

That's pretty amazing

asutekku · on July 18, 2020

The difference in here is that RNNoise does not just remove some specific frequency, it uses neural networks to remove it which results in much higher quality compared to what you were implying.

lawl · on July 18, 2020

Hey (author here)

I have personally not noticed voice quality suffering too much, but you are of course right. And this is not what it was made for. My personal use case is mostly voip where RNNoise (imo) does an amazing job.

g_p · on July 18, 2020

Looks excellent and keen to delve into the code a bit.

One quick question since you'll clearly know the codebase - do you think this could easily be adapted to create a "playback-side" noise filter?

Use-case rationale here is noisy and poor quality podcasts or "other people's" audio - it would be awesome to be able to configure your tool as the output for Chrome or Firefox or whatever program I'm listening to, then route the cleaned audio from your tool to the physical audio port.

Is that something which would be feasible to do here?

lawl · on July 18, 2020

> do you think this could easily be adapted to create a "playback-side" noise filter?

Yes, the hardest part about this is making the UI not confusing when you now have two separate instances loaded in PulseAudio.

g_p · on July 18, 2020

Agreed, but now this has piqued my interest in a good way.

Having two instances loaded might be a bit confusing as you say - I imagine it would need to be something like "NoiseTorch for Recording" and NoiseTorch for Playback.

I'd need to go and play around with Pulse but I guess it would be possible to present 2 interfaces into Pulse with different names, then hope users can see the distinction when selecting a microphone versus the output device.

nickjj · on July 18, 2020

Would it be possible to upload a few before / after samples with varying degrees of background noise? Even if it's all the same person that would be a huge help to gauge the quality.

lawl · on July 18, 2020

Yes! https://github.com/lawl/NoiseTorch/issues/19

I just wont get to it today unfortunately.

nickjj · on July 18, 2020

Cool thanks.

Just a suggestion if you do it, please include realistic room noises in some of the samples.

I looked at the RNNoise examples and it was pretty bad. I mean, the audio quality of the speaker got completely mangled but the background noise was also comically high. It sounded like the person just sat down in the middle of the street in NYC or was inside of a busy train terminal.

ClawsOnPaws · on July 18, 2020

Here are some demos, I believe this is the same algorithm: https://jmvalin.ca/demo/rnnoise/

lawl · on July 18, 2020

Yes and no. NoiseTorch also has VAD (Voice Activity Detection). RNNoise also returns the probability of a sound sample being voice, I use that to clamp the microphone completely if its < the configured probability.

This works really well for situations like Discord or Teamspeak where you're usually not constantly talking, but doing things that can still set off "normal" voice activation. RNNoise's model often knows it's not voice, but cannot denoise it completely.

formerly_proven · on July 18, 2020

Yes, classic noise suppression sounds very poor very quickly. Noisy or poor audio is like blurred photos or videos, very hard to fix, while noisy or shaky videos are easily fixed (especially temporal de-noising on videos is akin to magic, it can extend the performance of the camera by multiple stops with very low IQ impact).

That's why these ML tools are potentially huge, good ol' noise suppression just isn't good.

brownbat · on July 18, 2020

How long until we can get some kind of open AI project to take in incoming bad quality voice and output clear noiseless human speech (in our, or whoever's voice we want), so podcasters don't have to buy expensive microphones and try to soundproof their rooms anymore?

I know we're not there yet, but I feel like we're about to break "garbage in garbage out" with AI.

nickjj · on July 18, 2020

I'm just a video course / podcaster who spent a decent amount of time researching audio and I'm not a deep down audio engineer.

But based on the results I see with automated software tools that only try to reduce noise, I would say we're no where near there and a really good solution would involve things that haven't been invented yet. I think we'll have manned trips to Mars well before you have a software solution that can emulate the sound of a moderately treated room with ~2ms of latency or less.

With that said, I think we're there today if all you want to do is help reduce the noise of an air conditioner so you can chat with a friend on Hangouts, Discord or Zoom. This is a scenario where audio quality doesn't matter, but not hearing an A/C or lawn mower is worth having the person talking sound like a choppy robot. You probably won't even notice it too much with earbuds.

bigiain · on July 18, 2020

Click through to the RNNoise link at the bottom. Lots of tweakable demos, and a real time JavaScript implementation to play with too...

ipunchghosts · on July 18, 2020

That's not how this works. It's much more sophisticated than that.

dsteinman · on July 18, 2020

This might be useful to use along side with DeepSpeech (https://github.com/mozilla/DeepSpeech), which doesn't work very well in noisy environments.

ACAVJW4H · on July 18, 2020

It might be a stupid question but, aside from the obvious benefits of saving bandwidth by omitting useless noise in transport, doesn't it make sense to employ these technologies server-side? One could maybe make Jitsi or BigBlueButton use similar technologies? It would make it much more ubiquitous, better platform support (would work on mobile or low CPU/GPU clients) and also save on system provisioning as maybe the neural net could be utilized better by running for different audio sources concurrently

bufferoverflow · on July 18, 2020

As a system owner, it makes financial sense to do it on the client. Imagine you're managing Zoom. You will need tens of thousands of GPUs running 24/7 just for noise suppression.

spacechild1 · on July 18, 2020

I know that Zoom does noise reduction and echo cancellation by default, but I don't know if they do it client-side or server-side (for peer-to-peer calls it has to be client-side, obviously)

kaielvin · on July 18, 2020

I believe Discords does a lot of noise filtering and cutting-off. I suspect it is server-side (given that they have a web app), but I am not certain.

wenc · on July 18, 2020

Very nice. Krisp.ai is a commercial option, and NVIDIA RTX is free but requires a CUDA card, so this is a great alternative.

Noise suppression is becoming more and more common. My Jabra headset has it built in.

kbouck · on July 18, 2020

When testing Krisp.ai, I recorded myself speaking inches away from a noisy water boiler. In the playback, I could not even hear the water boiler, but voice came through clearly. Signed up for the service immediately after that.

orware · on July 18, 2020

I signed up for it too last weekend after coming across it after doing some research (I had been making a bunch of video recordings a few days prior, and once the videos were added into Camtasia and the audio played back I noticed a lot of background hum coming from my HVAC return outside of the room I'm in).

Was impressed with the Krisp.ai tech as well and probably works similarly to this tool and the other Nvidia solution that I can't try out since I don't have an RTX card (main difference might be the overall training set that Krisp has already run their algorithm through?).

I haven't had any Zoom meetings since purchasing Krisp, but I had been using the built-in mic from my LG Tone headset for those meetings.

Since making those video recordings I've been using my blue Yeti mic (and a pair of headphones connected to the mic for listening) as my primary and I've continued running a bunch of small tests to try and see if I can be happy with using Krisp enabled all the time.

Currently, I don't feel comfortable with leaving it on all of the time though for recordings, particularly with something like the blue Yeti mic which is able to capture pretty rich audio. In my testing, Krisp did a great job of eliminating the background HVAC humming noise, but replaced that issue with two others: some minor (but distracting) hiss/noise between words as I'm playing back the recorded audio, and also currently is limited to 16000mhz frequency (not sure if mhz is correct or not in this case...this is what support shared with me when I asked about audio quality degradation). The support person did respond though and say that the team is working on the increasing the frequencies they are able to work with though so I guess there might be some improvements in the near future on it?

After seeing the latency figures on the NoiseTorch page it makes me wonder if the Krisp latency is similar or not (so far I haven't noticed any latency issues with Krisp).

As far as remaining thoughts...I kind of wish there was a bit more configuration options available for Krisp, but the simplicity of it is also a benefit (for others that might not be as technical and just want a simple solution that does appear to work overall). I haven't gotten it to work for playback needs (it has the toggle for it, but nothing seems to happen when I try and toggle that on). Also, still not sure what the overall differences/improvements with Krisp Rooms enabled (I am recording in a room, but after reading their description/blog announcement page it kind of seems like it's more for conference rooms where multiple people are speaking and extra echo cancellation might be useful? ref: https://krisp.ai/blog/krisp-rooms-launch/)

Since I'm already out with a year subscription with them I'll continue to try and figure out how to use it effectively, but not as excited about it at the moment compared to how I was last weekend initially (impressive overall though...hopefully it continues to improve :-).

fred123 · on July 18, 2020

16kHz sample rate (= max frequency 8kHz) should be enough for speech only. Human voice is mostly <0.5kHz. You may hear some difference for hisses or for room sounds etc. but I’m sure you’re unable to hear any difference to higher sample rate in a voice chat setting

rstuart4133 · on July 19, 2020

From the developer of RNNoise, which is the technique being used here:

"As strange as it may sound, you should not be expecting an increase in intelligibility. Humans are so good at understanding speech in noise that an enhancement algorithm — especially one that isn't allowed to look ahead of the speech it's denoising — can only destroy information. So why are we doing this in the first place? For quality. The enhanced speech is much less annoying to listen to and likely causes less listener fatigue"

https://jmvalin.ca/demo/rnnoise/

sandworm101 · on July 18, 2020

Does noise suppression work in reverse? Can I use it to isolate the noise from the human voices? There are lots of situations where someone might want to isolate and analyse background noises or conversations.

fred123 · on July 18, 2020

Yes. Noise suppression is very similar to speech separation (separating multiple speaker voices that talk at the same time). For example you can use ConvTasNet for both speech separation and denoising; in the denoising case you set target track 1 = speech, track 2 = noise, hence you get a noise-only track.

I guess you can also simply subtract the clean speech from the original mixture to get the noise-only track.

hu3 · on July 18, 2020

I'm curious about the impact of Go's Garbage Collection in a real-time project like this.

From reading past comments in other Go related threads I was led to believe this was impossible to achieve with Go.

I'm talking about threads like this: https://news.ycombinator.com/item?id=21036037

kaielvin · on July 18, 2020

Alternatively there is the pulseaudio module: module-echo-cancel (https://askubuntu.com/questions/18958/realtime-noise-removal...), which I have been using so far.

I haven't tried NoiseTorch yet. How do the two compare?

lawl · on July 18, 2020

NoiseTorch uses RNNoise, which uses a mix of deep learning and DSP to remove noise. I haven't used module-echo-cancel yet, but it's probably "just" classical DSP, rnnoise may deliver better results.

kaielvin · on July 18, 2020

Indeed, after some testing, the filtering is much better.

formerly_proven · on July 18, 2020

Most noise suppression I've seen so far can shave off a few dB (worth gold already), but when you try to suppress more noise it always starts to impact the signal very negatively. Interesting to see whether these ML approaches can do better. I suspect they might depend even more on the type of your voice than conventional noise suppression.

fred123 · on July 18, 2020

Note that most state of the art machine learning based denoising models perform MUCH better than rnnoise quality wise, but they are mostly not tuned for real time use.

If you’re interested, have a look at some of the Interspeech 2020 Deep Noise Suppression submissions.

fred123 · on July 18, 2020

Some examples here: https://paperswithcode.com/task/speech-enhancement

Some of them have audio samples.

gingerlime · on July 18, 2020

Anything similar for MacOS ? I tried krisp.ai which is nice but seems too heavy on my 2015 MacBook Air together with zoom

manojlds · on July 18, 2020

Any of these remove dog barking noise?

speedgoose · on July 18, 2020

I would guess. RTX Voice removes my cat's sounds.

manojlds · on July 18, 2020

Yeah but with my rudimentary skills I struggled with dog barks as they are closer to our speech.

bhouston · on July 18, 2020

This should be included in Linux by default it is this good. :)

Or at least available via apt-get.

jcastro · on July 18, 2020

I've been using this for the past few days and it's been fantastic, every distro should just do this out of the box.

42droids · on July 18, 2020

Thank you for making this, I really can't wait to try it. In fact, I am now shocked this didn't exist before... :)

captn3m0 · on July 18, 2020

noisetorch-bin and noisetorch-git packages already on AUR: https://aur.archlinux.org/packages/?O=0&SeB=nd&K=noisetorch&...

freedomben · on July 18, 2020

Is this using GTK? What bindings?

hu3 · on July 18, 2020

Not GTK but https://github.com/aarzilli/nucular which is a Go port of https://github.com/vurtun/nuklear

kochthesecond · on July 18, 2020

This is pretty cool!

thomasfedb · on July 18, 2020

I read NoseTorch, was intrigued.

ped4enko · on July 24, 2020

How well did you choose the Golang for this task?

sahoo · on July 18, 2020

Only if the sound card was detected in Linux. Sigh.

shock · on July 18, 2020

What do you mean? NoiseTorch deals with PulseAudio, it doesn't deal with hardware directly, so, yes, Linux needs to have a driver for your soundcard.

sahoo · on July 19, 2020

I mean, is there a non linux port?

formerly_proven · on July 18, 2020

Are you implying I can't do any sound I/O without having a driver for said I/O? Preposterous.

PaulDavisThe1st · on July 18, 2020

That's trivially correct. You can't get anything on a screen without a driver for your graphics card. You can't get any input from a keyboard without a driver for the keyboard.

However, Linux comes with drivers for more or less every audio interface that is possible to use on Linux. That is, there are essentially no 3rd party drivers - it either works with the drivers in the kernel(1) or it doesn't.

(1) depending on how your distro built the drivers. for the most part, things are OK.