z-logo
open-access-imgOpen Access
Extraction of Voiced Regions of Speech from Emotional Speech Signals Using Wavelet-Pitch Method
Author(s) -
Lakshmi Srinivas Dendukuri,
Shaik Jakeer Hussain
Publication year - 2021
Publication title -
periodica polytechnica. electrical engineering and computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.158
H-Index - 13
eISSN - 2064-5279
pISSN - 2064-5260
DOI - 10.3311/ppee.15373
Subject(s) - speech recognition , wavelet , computer science , speech enhancement , pitch detection algorithm , speech processing , mathematics , energy (signal processing) , pattern recognition (psychology) , artificial intelligence , noise reduction , statistics
Extraction of voiced regions of speech is one of the latest topics in speech domain for various speech applications. Emotional speech signals contain most of the information in voiced regions of speech. In this particular work, voiced regions of speech are extracted from emotional speech signals using wavelet-pitch method. Daubechies wavelet (Db4) is applied on the speech frames after downsampling the speech signals. Autocorrelation function is performed on the extracted approximation coefficients of each speech frame and corresponding pitch values are obtained. A local threshold is defined on obtained pitch values to extract voiced regions. The threshold values are different for male and female speakers, as male pitch values are low compared to the female pitch values in general. The obtained pitch values are scaled down and are compared with the thresholds to extract the voiced frames. The transition frames between the voiced and unvoiced frames are also extracted if the previous frame is voiced frame, to preserve the emotional content in extracted frames. The extracted frames are reshaped to have desired emotional speech signal. Signal to Noise Ratio (SNR), Normalized Root Mean Square Error (NRMSE) and statistical parameters are used as evaluation metrics. This particular work provides better SNR and Normalized Root Mean Square Error values compared to the zero crossing-energy and residual signal based methods in voiced region extraction. Db4 wavelet provides better results compared to Haar and Db2 wavelets in extracting voiced regions using wavelet-pitch method from emotional speech signals.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here