Hacker News new | past | comments | ask | show | jobs | submit login

Why do audio formats use a linear scale, instead of exponential (like our ears seems to use)?



I don't know what audio formats use, maybe some of them do treat volume logarithmically, although I think this only makes sense after transforming to the frequency domain, otherwise your time signal goes through 0 all the time, which is not ideal when using logarithms.

For the raw sample data it's also kind of inevitable to use a linear scale when you want different sounds to add nicely, if you use a logarithmic scale then adding two sounds would distort them, which isn't ideal. When using a linear scale you can hide the quantization noise pretty easily with dither which just sounds like you're adding white noise.


I'm not sure what you are referring to with audio "formats," but audio programs work in dB, which is logarithmic.


Audio is commonly represented in a format called pulse code modulation (PCM), where the amplitude of the sound is recorded using an integer or a floating point number (a "sample") anywhere from 12,000 to 192,000 times a second. Each sample is usually linearly proportional to amplitude rather than logarithmically.

I am unaware of any audio program which internally represents audio in dB. Waveform displays default to linear scales, though logarithmic scaling is usually available. Just about the only place where logarithmic scales are the default is frequency domain graphical representations like spectrograms.


Actually 32bit float is such a logarithmic format.


Variations in pressure, which form the basis of sound, are linear, and thus, are measured in a linear unit (Pascal, Pa ). These (or a similar unit derived from electrical changes associated with pressure changes near a mic) are what are used for sound capture. It's awkward because we're dealing with a range from ~20 Micropascal to 100,000,000+ Micropascal in the world.

Rescaling to dB, a logarithmic unit, both better mirrors how humans hear, and brings the numbers under control, 0-135dB is nicer.

But at least for me, I like capturing and storing in a Pascal-ish unit, because I'd rather capture pressure linearly and convert to human-interpretable log units where needed. If nothing else, because things like waveform addition are a bit more straightforward mentally on linear scales. Folks have made other decisions in some formats, but I've never found it compelling.


Some audio formats do use a non-linear scale, e.g. https://en.wikipedia.org/wiki/%CE%9C-law_algorithm




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: