A lot of people - despite being forced to work from home - simply don't seem to care about the way their audio sounds. Many don't even try to tackle these problems after it's been pointed out to them that they're being a nuisance in online meetings.
I gave up on trying to help people fix their setups, or convincing them that it matters, and switched to doing this on the receiver end. It's been a massive quality-of-life improvement.
If you're interested in the setup, you basically just need a small script that loads the pulseaudio plugin and wires up the sources/sinks correctly.
I think this is an out-of-sight, out-of-mind kind of issue. They simply don't understand how their noise, which they do not perceive, can be so detrimental to others. Moreover, a lot of people simply can't grasp the difference good hardware or even just a different setup (moving away from noise sources like fans, open windows, appliances running, etc.) can impact the quality of their sound. Lastly, a lot of people either cannot or think they cannot do anything about it, so they dismiss others' concerns because "everyone else has problems too", equating their noise to be the same as others'.
Another issue I've seen is people using their device's built-in speaker and microphone (to form a loopback-fest) that the onboard echo cancellation tries its best to deal with.
I think there's certainly a part around "but I can't see the difference" - it's hard to get rapid feedback on if it's better or worse, since you won't notice the difference in change to setup.
In any case, being able to create a pulseaudio sink that puts the audio from one application through a noise removal chain sounds to me like a decent "quick fix" for me - I've tried to listen to some webinars with such horrible audio it was pretty much impossible to listen to, yet with otherwise worthwhile content. I wonder if this would be enough to improve it, or if the issues lie elsewhere (low quality transcodes).
I’ve tried several different audio solutions, including nice wired headphones + mod mic wireless. Ultimately people think I sound best just using my laptop’s internal microphone array and bookshelf speakers. The whole process was a frustrating distraction.
Google Meet supports noise-cancelling in outgoing audio (for business customers), but the user needs to enable it once in their settings. In my experience this last bit is already a hurdle ...
Doesn’t mean they can’t do anything about the noise, just that they can’t do it on the server.
The RNNoise link from the bottom of that post runs the noise suppression in real time in JavaScript. Zoom et al. could do this client side while still doing proper E2E. (Although Zoom already uses more cou that I think it needs to...)
"which they do not perceive" is key here. I suspect that HN readers tend to an above-average degree of sensory sensitivity that causes them to notice things most people don't. Add to that even a little training/experience in audio and the noises that drive you crazy are likely completely inaudible to others.
For anyone wanting to try this, it is pretty straightforward - first install noise-suppression-for-voice (AUR package available, binaries available via Github releases). You want to have librnnoise_ladspa.so and librnnoise_lv2.so available.
Then identify your current output sink by running `pactl list sinks short`. One will be "RUNNING", and this is your active sink. Keep this name to hand.
Create and enable an output sink using this plugin:
The value of sink_master should be the output sink name from above, and the control=0 parameter can be adjusted - that seems to be the voice auto detect threshold. https://github.com/werman/noise-suppression-for-voice/issues... suggested 0, but I found it benefited from being higher. You can compare before/after by changing your pulseaudio output sink at system level (or application level) back and forth.
I find most of the nuisance on Zoom calls is from children and pets. For example, the mailman usually comes to my boss's house during standup, and he's out of commission for a solid five minutes due to the barking.
Unfortunately the latency for the audio output from Teamspeak 3 just rose to multiple seconds over a fews hours of use.
The latencies reported by PulseAudio did not represent that, but I could hear people talking for a few seconds after their icon indicated that they stopped.
Did you encounter similiar issues? Since I don't know much about PulseAudio I am at a loss right now and have to endure the noise.
As far as I can see this uses RNNoise. If you haven't checked it out yet you should, because it is simply amazing. It is a super effective noise gate / noise removal tool that does not require any configuration whatsoever.
My study mates and I have been using it over the last four months when working from home. It removes the noise of keyboards, seaguls and vacuum cleaners.
It is essentially the same as Nvidia RTX voice except it is much lighter on the system and does not require an Nvidia GPU. In our testing RNNoise performs similarly.
This project looks super cool. It seems to make RNNoise much more accessible. Normally you would have to manually set up the pulseaudio plumbing for this to work.
I have mainly used the built in RNNoise support in Mumble. But you can use https://github.com/werman/noise-suppression-for-voice/ and build the VST plugin (This is also what NoiseTorch uses i think). Then use any application that can load VST plugins to pipe your mic through. I have had reasonably good luck with it on Windows with Equalizer APO.
I went through some core libraries being used in the project, there's a pure Go pulseaudio implementation[1] which seems to deserve few more stars and the GUI framework nucular[2] seems support even metal rendering on macOS. I like how the native GUI frameworks for Go are becoming viable alternative to Qt.
Off-topic, Since this thread might attract audio programmers-
I was looking at ambient noise cancellation, audio amplification implementation for TWS earphones(BL 5.0) without those features on Android[3], would the latency defeat the purpose because it isn't implemented on device and does android bluetooth/audio APIs provide necessary access to implement such features in an app?
Hey everyone author here! Awesome to see this on HN.
I'm happy to answer any questions, but this is a slightly inopportune moment to hit HN for me as I need to leave soon :)
Some responses might be delayed by a day or so!
I'm not saying this tool is bad but I would be really careful about using tools like this in an environment where audio quality really matters (Youtube videos, podcasts, etc.).
Noise reduction tools work by removing specific frequencies from the source, some of which overlap with your natural voice.
This is why you start to sound robotic and get weird cutouts if you try to use tools to remove too much noise or background sounds. It's one of those things where, if you're not used to hearing your entire vocal range, you might not be aware at how much is getting cut out from tools that reduce noise.
It's too bad they don't have a before / after with a few voice samples in the readme.
Definitely sounds better than I thought it would have and I've watched tons of this guy's videos in the past.
It really distorts his voice / range in some cases, such as when he taps his desk with that orange hammer. The difference there is night and day. It chops out his his natural voice's range. It seems to degrade his voice the more intense the background noise is, such as the leaf blower (lol), but that's reasonable to expect. But at the same time, even the mechanical keyboard has a very noticeable negative effect on his range.
It's one of those things where I wish so much that it worked perfectly, but I couldn't realistically think about using it for any recording work due to things like the above. There's just too many common noises (typing, etc.) that drastically distorts your voice.
9:23 in that video is hilarious though. Have to love Jerry!
I wonder if its the algorithm degrading his voice or if the input sound is already degraded. Is it possible a leaf blower or a hammer would cause enough "noise" to make it so our ears couldnt hear his voice clearly as well? Then when you subtract out the portion of the sound attributed to the leafblower, youre hearing the parts of his voice that werent being jumbled by the leaf blower?
Hard to say because softer noises like typing still makes his voice sound like it's cutting out unnaturally. It's like the frequencies are being subtracted out of his normal tone, but it's more subtle than the leaf blower so you may not notice it without good headphones. It makes him sound very choppy and mechanical.
With the leaf blower I suspect that when it gets too close the microphone/ADC is saturating, which clips his voice. I wonder if it would've sounded better had he attempted to lower the gain on the microphone.
The difference in here is that RNNoise does not just remove some specific frequency, it uses neural networks to remove it which results in much higher quality compared to what you were implying.
I have personally not noticed voice quality suffering too much, but you are of course right. And this is not what it was made for. My personal use case is mostly voip where RNNoise (imo) does an amazing job.
Looks excellent and keen to delve into the code a bit.
One quick question since you'll clearly know the codebase - do you think this could easily be adapted to create a "playback-side" noise filter?
Use-case rationale here is noisy and poor quality podcasts or "other people's" audio - it would be awesome to be able to configure your tool as the output for Chrome or Firefox or whatever program I'm listening to, then route the cleaned audio from your tool to the physical audio port.
Is that something which would be feasible to do here?
Agreed, but now this has piqued my interest in a good way.
Having two instances loaded might be a bit confusing as you say - I imagine it would need to be something like "NoiseTorch for Recording" and NoiseTorch for Playback.
I'd need to go and play around with Pulse but I guess it would be possible to present 2 interfaces into Pulse with different names, then hope users can see the distinction when selecting a microphone versus the output device.
Would it be possible to upload a few before / after samples with varying degrees of background noise? Even if it's all the same person that would be a huge help to gauge the quality.
Just a suggestion if you do it, please include realistic room noises in some of the samples.
I looked at the RNNoise examples and it was pretty bad. I mean, the audio quality of the speaker got completely mangled but the background noise was also comically high. It sounded like the person just sat down in the middle of the street in NYC or was inside of a busy train terminal.
Yes and no. NoiseTorch also has VAD (Voice Activity Detection). RNNoise also returns the probability of a sound sample being voice, I use that to clamp the microphone completely if its < the configured probability.
This works really well for situations like Discord or Teamspeak where you're usually not constantly talking, but doing things that can still set off "normal" voice activation. RNNoise's model often knows it's not voice, but cannot denoise it completely.
Yes, classic noise suppression sounds very poor very quickly. Noisy or poor audio is like blurred photos or videos, very hard to fix, while noisy or shaky videos are easily fixed (especially temporal de-noising on videos is akin to magic, it can extend the performance of the camera by multiple stops with very low IQ impact).
That's why these ML tools are potentially huge, good ol' noise suppression just isn't good.
How long until we can get some kind of open AI project to take in incoming bad quality voice and output clear noiseless human speech (in our, or whoever's voice we want), so podcasters don't have to buy expensive microphones and try to soundproof their rooms anymore?
I know we're not there yet, but I feel like we're about to break "garbage in garbage out" with AI.
I'm just a video course / podcaster who spent a decent amount of time researching audio and I'm not a deep down audio engineer.
But based on the results I see with automated software tools that only try to reduce noise, I would say we're no where near there and a really good solution would involve things that haven't been invented yet. I think we'll have manned trips to Mars well before you have a software solution that can emulate the sound of a moderately treated room with ~2ms of latency or less.
With that said, I think we're there today if all you want to do is help reduce the noise of an air conditioner so you can chat with a friend on Hangouts, Discord or Zoom. This is a scenario where audio quality doesn't matter, but not hearing an A/C or lawn mower is worth having the person talking sound like a choppy robot. You probably won't even notice it too much with earbuds.
It might be a stupid question but, aside from the obvious benefits of saving bandwidth by omitting useless noise in transport, doesn't it make sense to employ these technologies server-side? One could maybe make Jitsi or BigBlueButton use similar technologies? It would make it much more ubiquitous, better platform support (would work on mobile or low CPU/GPU clients) and also save on system provisioning as maybe the neural net could be utilized better by running for different audio sources concurrently
As a system owner, it makes financial sense to do it on the client. Imagine you're managing Zoom. You will need tens of thousands of GPUs running 24/7 just for noise suppression.
I know that Zoom does noise reduction and echo cancellation by default, but I don't know if they do it client-side or server-side (for peer-to-peer calls it has to be client-side, obviously)
When testing Krisp.ai, I recorded myself speaking inches away from a noisy water boiler. In the playback, I could not even hear the water boiler, but voice came through clearly. Signed up for the service immediately after that.
I signed up for it too last weekend after coming across it after doing some research (I had been making a bunch of video recordings a few days prior, and once the videos were added into Camtasia and the audio played back I noticed a lot of background hum coming from my HVAC return outside of the room I'm in).
Was impressed with the Krisp.ai tech as well and probably works similarly to this tool and the other Nvidia solution that I can't try out since I don't have an RTX card (main difference might be the overall training set that Krisp has already run their algorithm through?).
I haven't had any Zoom meetings since purchasing Krisp, but I had been using the built-in mic from my LG Tone headset for those meetings.
Since making those video recordings I've been using my blue Yeti mic (and a pair of headphones connected to the mic for listening) as my primary and I've continued running a bunch of small tests to try and see if I can be happy with using Krisp enabled all the time.
Currently, I don't feel comfortable with leaving it on all of the time though for recordings, particularly with something like the blue Yeti mic which is able to capture pretty rich audio. In my testing, Krisp did a great job of eliminating the background HVAC humming noise, but replaced that issue with two others: some minor (but distracting) hiss/noise between words as I'm playing back the recorded audio, and also currently is limited to 16000mhz frequency (not sure if mhz is correct or not in this case...this is what support shared with me when I asked about audio quality degradation). The support person did respond though and say that the team is working on the increasing the frequencies they are able to work with though so I guess there might be some improvements in the near future on it?
After seeing the latency figures on the NoiseTorch page it makes me wonder if the Krisp latency is similar or not (so far I haven't noticed any latency issues with Krisp).
As far as remaining thoughts...I kind of wish there was a bit more configuration options available for Krisp, but the simplicity of it is also a benefit (for others that might not be as technical and just want a simple solution that does appear to work overall). I haven't gotten it to work for playback needs (it has the toggle for it, but nothing seems to happen when I try and toggle that on). Also, still not sure what the overall differences/improvements with Krisp Rooms enabled (I am recording in a room, but after reading their description/blog announcement page it kind of seems like it's more for conference rooms where multiple people are speaking and extra echo cancellation might be useful? ref: https://krisp.ai/blog/krisp-rooms-launch/)
Since I'm already out with a year subscription with them I'll continue to try and figure out how to use it effectively, but not as excited about it at the moment compared to how I was last weekend initially (impressive overall though...hopefully it continues to improve :-).
16kHz sample rate (= max frequency 8kHz) should be enough for speech only. Human voice is mostly <0.5kHz. You may hear some difference for hisses or for room sounds etc. but I’m sure you’re unable to hear any difference to higher sample rate in a voice chat setting
From the developer of RNNoise, which is the technique being used here:
"As strange as it may sound, you should not be expecting an increase in intelligibility. Humans are so good at understanding speech in noise that an enhancement algorithm — especially one that isn't allowed to look ahead of the speech it's denoising — can only destroy information. So why are we doing this in the first place? For quality. The enhanced speech is much less annoying to listen to and likely causes less listener fatigue"
Does noise suppression work in reverse? Can I use it to isolate the noise from the human voices? There are lots of situations where someone might want to isolate and analyse background noises or conversations.
Yes. Noise suppression is very similar to speech separation (separating multiple speaker voices that talk at the same time). For example you can use ConvTasNet for both speech separation and denoising; in the denoising case you set target track 1 = speech, track 2 = noise, hence you get a noise-only track.
I guess you can also simply subtract the clean speech from the original mixture to get the noise-only track.
NoiseTorch uses RNNoise, which uses a mix of deep learning and DSP to remove noise. I haven't used module-echo-cancel yet, but it's probably "just" classical DSP, rnnoise may deliver better results.
Most noise suppression I've seen so far can shave off a few dB (worth gold already), but when you try to suppress more noise it always starts to impact the signal very negatively. Interesting to see whether these ML approaches can do better. I suspect they might depend even more on the type of your voice than conventional noise suppression.
Note that most state of the art machine learning based denoising models perform MUCH better than rnnoise quality wise, but they are mostly not tuned for real time use.
If you’re interested, have a look at some of the Interspeech 2020 Deep Noise Suppression submissions.
That's trivially correct. You can't get anything on a screen without a driver for your graphics card. You can't get any input from a keyboard without a driver for the keyboard.
However, Linux comes with drivers for more or less every audio interface that is possible to use on Linux. That is, there are essentially no 3rd party drivers - it either works with the drivers in the kernel(1) or it doesn't.
(1) depending on how your distro built the drivers. for the most part, things are OK.
A lot of people - despite being forced to work from home - simply don't seem to care about the way their audio sounds. Many don't even try to tackle these problems after it's been pointed out to them that they're being a nuisance in online meetings.
I gave up on trying to help people fix their setups, or convincing them that it matters, and switched to doing this on the receiver end. It's been a massive quality-of-life improvement.
If you're interested in the setup, you basically just need a small script that loads the pulseaudio plugin and wires up the sources/sinks correctly.
My setup script is here: https://cs.tvl.fyi/depot@canon/-/blob/tools/nsfv-setup/defau...
And some more context: https://cl.tvl.fyi/c/depot/+/578