The role of size normalization in vowel recognition and speaker identification | Zendy

Roy D. Patterson | Zendy; Toshio Irino | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

The role of size normalization in vowel recognition and speaker identification

Author(s) -

Roy D. Patterson,

Toshio Irino

Publication year - 2013

Publication title -

proceedings of meetings on acoustics

Language(s) - English

Resource type - Conference proceedings

ISSN - 1939-800X

DOI - 10.1121/1.4798776

Subject(s) - vocal tract , formant , speech recognition , vowel , normalization (sociology) , loudness , computer science , speaker recognition , acoustics , physics , computer vision , sociology , anthropology

There is size information in speech sounds because the vocal tract and the vocal cords both grow as a child develops into an adult. Specifically, average glottal pulse rate and mean formant frequency decrease as speaker size increases. Nevertheless, human speech recognition is effectively size invariant across the full range of sizes in the normal population of speakers and well beyond. It is also the case that listeners can discriminate speaker size with great accuracy; indeed, with greater accurately than they can discriminate the loudness of sound or the brightness of light. The paper describes a model of how the central auditory system transforms the auditory spectrum of a vowel sound into our perception of who is speaking and what they are saying. The model suggests that the system combines information about vocal resonator size with a small amount of contextual information to determine what the person is saying (vowel type) and how long their vocal tract is. Then it uses the glottal period informati...

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research