Vietnamese and Cantonese are tonal languages that has 6 levels phonetic tones [1]. The linear time-frequency spectogram is very useful for this type of human language analysis and not only for bird. Would be interesting to apply non-linear time-frequency analysis for this domain as well.
[1]https://en.wikipedia.org/wiki/Tone_(linguistics)