Time-Scale Feature Extractions for Emotional Speech Characterization | Zendy

Mohamed Chétouani | Zendy; Ammar Mahdhaoui | Zendy; Fabien Ringeval | Zendy

AI Assistant Blog Pricing

Open Access

Time-Scale Feature Extractions for Emotional Speech Characterization

Author(s) -

Mohamed Chétouani,

Ammar Mahdhaoui,

Fabien Ringeval

Publication year - 2009

Publication title -

cognitive computation

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.86

H-Index - 52

eISSN - 1866-9964

pISSN - 1866-9956

DOI - 10.1007/s12559-009-9016-9

Subject(s) - weighting , computer science , speech recognition , feature (linguistics) , relevance (law) , scale (ratio) , artificial intelligence , term (time) , natural language processing , feature extraction , prosody , pattern recognition (psychology) , linguistics , philosophy , medicine , physics , quantum mechanics , political science , law , radiology

,Emotional,speech,characterization,is,an important issue for the understanding,of interaction. This article discusses the time-scale analysis problem in feature extraction for emotional speech processing. We describe a computational,framework,for combining,segmental,and supra-segmental features for emotional,speech detection. The statistical fusion is based on the estimation of local a posteriori class probabilities and the overall decision employs,weighting factors directly related to the duration of the individual speech segments. This strategy is applied to a real-world application: detection of Italian motherese in authentic and longitudinal parent–infant interaction at home. The results suggest that short- and long-term infor- mation, respectively, represented by the short-term spec- trum and the prosody parameters (fundamental frequency and,energy) provide,a robust and,efficient time-scale analysis. A similar fusion methodology,is also investigated by the use of a phonetic-specific characterization process. This strategy is motivated by the fact that there are varia- tions across emotional states at the phoneme,level. A time- scale based on both vowels and consonants is proposed and it provides a relevant and discriminant feature space for acted emotion recognition. The experimental results on two different databases Berlin (German) and Aholab (Basque) show,that the best performance,are obtained by our pho- neme-dependent,approach. These findings demonstrate the relevance of taking into account,phoneme,dependency (vowels/consonants) for emotional speech characterization.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom

About

About Careers Publisher Partners Contact Us Our institutional solutions Get Organisational Trial or Quote

Learn

FAQs Blog Terms of Use Privacy Policy

Download the Zendy App

Discover

Explore

Home ZAIA Blog