Hacker News new | past | comments | ask | show | jobs | submit login

For human speech:

o The tall vertical lines reflect "plosives" - sudden releases of sound energy often at the begining of words from having mouth/airway closed then open, as in the first letter of "put" or "tea"

o The high frequencies come from "fricatives" like the first letter of "see" or "free" where air is being passed through the teeth or almost closed lips

o The lower frequencies are where most of the recognizable speech content is, corresponding to the way the resonant frequencies of the mouth and throat are being changed (articulation) by moving the tongue, lips and teeth. Specifically the speech content is in changes to the "formants" which are the changing resonant frequencies showing up as bright mostly horizontal bands in the lower frequencies

Noise may show up in various ways depending on what the noise source is. A fixed frequency spectrum background hum is going to show up as one or more horizontal frequency bands across the entire spectrogram. High frequency noise is going to show up as much more energy in the higher frequencies, which don't have a lot of energy for clean speech (fricatives only).




Thanks for sharing this! I didn’t know about these terms before. Every consider writing a blog post/tutorial on your knowledge of human speech in spectrograms? This is much more digestible than most of what’s out there




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: