Interesting idea, but personally I like the Dynamic Range (DR) measurement developed by the Pleasurize Music Foundation more. It also provides a one-value output that has a direct relation to the statistics of the input audio, and AFAIK it is a somewhat established measurand in audio engineering.
Self-plug: A while back I reverse-engineered the DR algorithm and implemented it as a Python script. It's [DRmeter on GitHub](https://github.com/janw/drmeter).
From what I quickly gathered from the specs^1, Replay Gain (RG) and Dymanic Range (DR) are indeed similar in the grand scheme of things. Both algorithms employ percentile-based statistic of RMS values. While RG is determined by the 95% percentile of RMS values on 50ms audio frames, DR uses the 80% percentile on 3-second-long audio frame RMS values. So RG is definitely on the side of short-term analysis, while DR is long-term.
What differentiates RG more plainly though is its stronger psycho-acoustical foundation in regards to frequency response: It applies a so-called "Loudness filter" (modelled after the Equal-Loudness contour^2 found in human hearing) before doing the RMS statistics. The Equal-Loudness contour (also known as "Isophone" where I come from) is in turn modelled after non-linear response of the ear to sound pressure levels in relation to the frequency. It basically adjusts for what in layman's terms I'd call "Importance" of the different ranges.
Therefore I'd say RG is very much focused on what human hearing will perceive as loud (too loud, or just loud enough), while DR focuses on "exposing offenders in all ranges of the spectrum". My educated guess would be: Loud bassy sounds would not harm an RG score, while having a significant impact on the DR, as its not attenuating low frequencies.
Would be interesting to see a comparison of the two!
Ironic that Spotify will apparently penalize the music for being "too loud" when their ads are regularly ear-splittingly loud. It's egregious. When listening with earbuds, I have to have a quick reaction on the volume control because the commercials literally hurt my ears.
Of course it's by design. And sure, I've considered paying for it, but I don't. Does that mean my complaints about the free service are somehow invalid? I'm still a customer, just a different kind of customer, and if they make the experience bad enough for me that I avoid it in some circumstances, that's relevant to their interests. If they didn't want free-tier customers, they wouldn't be offering the free-tier service.
Pros: they potentially play the Spotify ad to their friends, effectively advertising the service.
Cons: each playback means bandwidth + royalties cost.
So the question is, does the probability times the return from their friends bring more money than the free playbacks? I'd really like to know the answer.
If Spotify hardly cared whether free tier users stay or go, the value of their advertising spots would eventually be driven to zero. Do you really think they're going to do that?
It's hard to trust a service like this without some information on where they pull the numbers from. How did they determine which normalization algorithm each service uses?
The "give us your email address for more information" part also seems slimy.
TIDAL has openly said that they are using -14 LUFS for normalization while Spotify has said they are using ReplayGain. The other platforms required more investigation, and we're continuing to refine the estimates, but we feel that we've gotten pretty close.
From the email they sent when I put in a sample song and asked for the more detailed analysis:
"Since streaming services are going to turn loud music down anyway, more and more people are deciding they would prefer to take control of this process themselves, and optimize their music for the best possible results."
and
"We recommend avoiding very large negative LP values, especially on YouTube because songs like this often sound “smaller” than those with LP scores closer to zero.
Hear it for yourself
For example, compare the loud sections of these two Metallica songs on YouTube - The Day That Never Comes (LP -5.8)[1] and Hardwired (LP -2.4)[2]. Which has more impact?
Loudness normalization means you have the opportunity to make your music sound better, too. In our experience, LP between 0 and -2 on YouTube will work well for even the loudest genres. For example, Drake’s recent hit God’s Plan has a score of LP -0.8 on YouTube, and it sounds huge."
Ironic you pick on Metallica as example as they basically "won" the loudness wars[0] with the release of Death Magnetic. So ridiculously overblown and compressed that it was clipping constantly and heavily featured in mainstream media. Even one of the album's mastering engineers complained.
Ironically, after a brief sampling, the first Metallica song seems to have more "impact" to me, and the second seems to be smaller. Admittedly, I didn't listen to the whole of either one. But I found it odd enough to mention.
I believe they are concerned about the Loudness War, where songs released over the past 10+ years have been mastered to be as close to 0db as possible.
This was done for decades by radio stations applying strong compression, because they knew that someone going from frequency to frequency was more likely to stop on a station that sounded loud, implying that they were getting a stronger signal and could listen to it longer before they drove out of range. The longer the listener stayed on their station, the more ad impressions. But then the record labels started doing it and so even buying a CD meant you were getting a lower-quality album.
A few years ago Apple mandated the use of their Sound Check tool that penalizes iTunes Radio songs that are all-loud, all-the-time. And it sounds (no pun intended) that Spotify and other services followed their lead.
Google cache of article because the site is unresponsive:
I have an album from a few years ago on streaming services that was mastered to about -9 on the LUFS scale. It sounds nice and loud on SoundCloud, which doesn’t do any normalization, but on Spotify, the normalization algorithm turns it down enough that it sounds noticeably softer than music by other electronic artists (although some of that probably comes down to my mixing skills versus those of experienced recording engineers).
Spotify used to normalize to -11 LUFS, but more recently has migrated to -14, which other streaming services already use and seems to be becoming the standard. For my new album that was just released, I mastered to about -11 to -13 LUFS, which tends to be loud enough to still stand out un-normalized, but is close enough to target that it doesn’t take a noticeable hit on normalization.
The idea is that clueless execs force mastering engineers to heavily compress the recordings to make them "loud". This shows the execs that if they do that, the streaming services will undo any extra "loudness" leaving them with just a compressed record.
A great example of this is the album Death Magnetic by Metallica.
The album version is compressed to hell but the version that shipped with Guitar Hero has none of that extra compression, presumably because someone who wasn't deaf mixed it.
But this isn't changing the dynamic range. It's not normalizing the song on it's own. It's normalizing the with regard to the rest of the musical catalogue, which simply amounts to turning the volume down a certain amount on the whole song. So, to repeat, that changes noting about the dynamic range of the song.
In fact, if music producers stopped trying to make their music so loud, there would be more room for dynamic range in the music. The goal of this site seems to be that they wouldn't mix a whole song to be top volume, since it won't play any louder most places anyway. This in turn could lead to the return of using more dynamic range. I have my doubts that this will actually happen.
it is limiting the high end volume which.... changes the dynamic range. [0]
Listen to a good recording of something like The William Tell Overture and you should notice that the quiet parts are less than 10% or so of volume of the loud parts.
Listening to that Dire Straits album really illustrates the point.
Loudness normalisation and limiting are different things.
Limiting is about turning down the music when it peaks above a certain threshold measured in decibels, and loudness normalisation is about normalising music to its perceived loudness, measured in LUFS.
Excessive dynamic range compression wasn't done at that time and many records were well mastered. Other good examples are Pink Floyd and even Led Zeppelin records. Not generally considered audiophile quality, but they just sound great.
Back then these records were made to be played back on big hi-hi systems. Big speakers that filled the room and were placed far away from the listener. This is not a standard setup today. You're more likely to see music played from wimpy TV speakers, headphones or earbuds.
We debated between using the word "penalty" or something more neutral, like "score." In the end, we felt that since most music will be turned down, penalty seemed appropriate, though it is a bit cheeky.
At one time, television used to allow advertisers to up the volume for their commercials. I don't recall if that was broadcast or cable. I thought a regulation was put into place for that, requiring a constant set volume, but like many things, it's either ignored or gone away.
My understanding is that the rule put a maximum volume in place for shows. An average show will run the gamut, and may only hit the upper limit once or twice (for example during some action or music sequence). Commercials are engineered to be as close to the upper limit as possible for their entire duration. It doesn't matter that there's an upper limit to the volume, because it sounds relatively much louder than whatever was on just before it, and it sounds as loud as the loudest most exciting and dramatic bits of the show.
I get a sense that this site is targeting the kinds of people who take part in the "loudness war." Producers and engineers I suppose? Whoever decides that their song needs to be louder so that it yields more attention.
Wasn't YouTube supposed to do this a few years ago?[0] Yet some videos at full volume are still quieter than other videos at 10% volume. Does it not work?
This is about automatic volume reduction, not normalization. In other words, YouTube isn't going to adjust your volume up, only down if it's too loud (or saturated?)
I don't know what audio formats use, maybe some of them do treat volume logarithmically, although I think this only makes sense after transforming to the frequency domain, otherwise your time signal goes through 0 all the time, which is not ideal when using logarithms.
For the raw sample data it's also kind of inevitable to use a linear scale when you want different sounds to add nicely, if you use a logarithmic scale then adding two sounds would distort them, which isn't ideal. When using a linear scale you can hide the quantization noise pretty easily with dither which just sounds like you're adding white noise.
Audio is commonly represented in a format called pulse code modulation (PCM), where the amplitude of the sound is recorded using an integer or a floating point number (a "sample") anywhere from 12,000 to 192,000 times a second. Each sample is usually linearly proportional to amplitude rather than logarithmically.
I am unaware of any audio program which internally represents audio in dB. Waveform displays default to linear scales, though logarithmic scaling is usually available. Just about the only place where logarithmic scales are the default is frequency domain graphical representations like spectrograms.
Variations in pressure, which form the basis of sound, are linear, and thus, are measured in a linear unit (Pascal, Pa ). These (or a similar unit derived from electrical changes associated with pressure changes near a mic) are what are used for sound capture. It's awkward because we're dealing with a range from ~20 Micropascal to 100,000,000+ Micropascal in the world.
Rescaling to dB, a logarithmic unit, both better mirrors how humans hear, and brings the numbers under control, 0-135dB is nicer.
But at least for me, I like capturing and storing in a Pascal-ish unit, because I'd rather capture pressure linearly and convert to human-interpretable log units where needed. If nothing else, because things like waveform addition are a bit more straightforward mentally on linear scales. Folks have made other decisions in some formats, but I've never found it compelling.
Self-plug: A while back I reverse-engineered the DR algorithm and implemented it as a Python script. It's [DRmeter on GitHub](https://github.com/janw/drmeter).