The role of size normalization in vowel recognition and speaker identification
Author(s) -
Roy D. Patterson,
Toshio Irino
Publication year - 2013
Publication title -
proceedings of meetings on acoustics
Language(s) - English
Resource type - Conference proceedings
ISSN - 1939-800X
DOI - 10.1121/1.4798776
Subject(s) - vocal tract , formant , speech recognition , vowel , normalization (sociology) , loudness , computer science , speaker recognition , acoustics , physics , computer vision , sociology , anthropology
There is size information in speech sounds because the vocal tract and the vocal cords both grow as a child develops into an adult. Specifically, average glottal pulse rate and mean formant frequency decrease as speaker size increases. Nevertheless, human speech recognition is effectively size invariant across the full range of sizes in the normal population of speakers and well beyond. It is also the case that listeners can discriminate speaker size with great accuracy; indeed, with greater accurately than they can discriminate the loudness of sound or the brightness of light. The paper describes a model of how the central auditory system transforms the auditory spectrum of a vowel sound into our perception of who is speaking and what they are saying. The model suggests that the system combines information about vocal resonator size with a small amount of contextual information to determine what the person is saying (vowel type) and how long their vocal tract is. Then it uses the glottal period informati...
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom