Time-Scale Feature Extractions for Emotional Speech Characterization
Author(s) -
Mohamed Chétouani,
Ammar Mahdhaoui,
Fabien Ringeval
Publication year - 2009
Publication title -
cognitive computation
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.86
H-Index - 52
eISSN - 1866-9964
pISSN - 1866-9956
DOI - 10.1007/s12559-009-9016-9
Subject(s) - weighting , computer science , speech recognition , feature (linguistics) , relevance (law) , scale (ratio) , artificial intelligence , term (time) , natural language processing , feature extraction , prosody , pattern recognition (psychology) , linguistics , philosophy , medicine , physics , quantum mechanics , political science , law , radiology
,Emotional,speech,characterization,is,an important issue for the understanding,of interaction. This article discusses the time-scale analysis problem in feature extraction for emotional speech processing. We describe a computational,framework,for combining,segmental,and supra-segmental features for emotional,speech detection. The statistical fusion is based on the estimation of local a posteriori class probabilities and the overall decision employs,weighting factors directly related to the duration of the individual speech segments. This strategy is applied to a real-world application: detection of Italian motherese in authentic and longitudinal parent–infant interaction at home. The results suggest that short- and long-term infor- mation, respectively, represented by the short-term spec- trum and the prosody parameters (fundamental frequency and,energy) provide,a robust and,efficient time-scale analysis. A similar fusion methodology,is also investigated by the use of a phonetic-specific characterization process. This strategy is motivated by the fact that there are varia- tions across emotional states at the phoneme,level. A time- scale based on both vowels and consonants is proposed and it provides a relevant and discriminant feature space for acted emotion recognition. The experimental results on two different databases Berlin (German) and Aholab (Basque) show,that the best performance,are obtained by our pho- neme-dependent,approach. These findings demonstrate the relevance of taking into account,phoneme,dependency (vowels/consonants) for emotional speech characterization.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom