z-logo
open-access-imgOpen Access
Fuzzy-based voiced-unvoiced segmentation for emotion recognition using spectral feature fusions
Author(s) -
Yusnita Mohd Ali,
Alhan Farhanah Abd Rahim,
Emilia Noorsal,
Zuhaila Mat Yassin,
Nor Fadzilah Mokhtar,
Mohamad Helmy Ramlan
Publication year - 2020
Publication title -
indonesian journal of electrical engineering and computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.241
H-Index - 17
eISSN - 2502-4760
pISSN - 2502-4752
DOI - 10.11591/ijeecs.v19.i1.pp196-206
Subject(s) - mel frequency cepstrum , formant , segmentation , speech recognition , pattern recognition (psychology) , computer science , feature (linguistics) , artificial intelligence , fusion , energy (signal processing) , cepstrum , feature extraction , mathematics , vowel , statistics , linguistics , philosophy
Despite abundant growth in automatic emotion recognition system (ERS) studies using various techniques in feature extractions and classifiers, scarce sources found to improve the system via pre-processing techniques. This paper proposed a smart pre-processing stage using fuzzy logic inference system (FIS) based on Mamdani engine and simple time-based features i.e. zero-crossing rate (ZCR) and short-time energy (STE) to initially identify a frame as voiced (V) or unvoiced (UV). Mel-frequency cepstral coefficients (MFCC) and linear prediction coefficients (LPC) were tested with K-nearest neighbours (KNN) classifiers to evaluate the proposed FIS V-UV segmentation. We also introduced two feature fusions of MFCC and LPC with formants to obtain better performance. Experimental results of the proposed system surpassed the conventional ERS which yielded a rise in accuracy rate from 3.7% to 9.0%. The fusion of LPC and formants named as SFF LPC-fmnt indicated a promising result between 1.3% and 5.1% higher accuracy rate than its baseline features in classifying between neutral, angry, happy and sad emotions. The best accuracy rates yielded for male and female speakers were 79.1% and 79.9% respectively using SFF MFCC-fmnt fusion technique.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here