Speaker, accent, and language identification using multilingual phone strings
Author(s) -
Tanja Schultz,
Qin Jin,
Kornel Laskowski,
Alicia Tribble,
Alex Waibel
Publication year - 2002
Publication title -
repository kitopen (karlsruhe institute of technology)
Language(s) - English
Resource type - Conference proceedings
DOI - 10.3115/1289189.1289271
Subject(s) - computer science , speech recognition , phone , stress (linguistics) , natural language processing , language identification , identification (biology) , artificial intelligence , robustness (evolution) , hidden markov model , first language , speaker recognition , speaker identification , natural language , linguistics , philosophy , botany , biochemistry , chemistry , gene , biology
In this paper we investigated the identification of non-verbal cues from spoken speech, namely speaker, accent, and language. For these tasks, a joint framework is developed which uses phone strings, derived from different language phone recognizers, as intermediate features and which performs classification decisions based on their perplexities. Our evaluation on variable distance data proved the robustness of the approach, achieving a 96.7% speaker identification rate. Furthermore, we achieved 93.7% accent discrimination accuracy between native and non-native speakers. For language identification, we obtained 95.5% classification accuracy for utterances 5 seconds in length and up to 99.89% on longer utterances. The experiments were carried out in a language independent nature, on languages not presented to the phone recognizers for training, suggesting that they could be successfully ported to non-verbal cue classification in other languages.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom