Extraction of Voiced Regions of Speech from Emotional Speech Signals Using Wavelet-Pitch Method | Zendy

Lakshmi Srinivas Dendukuri | Zendy; Shaik Jakeer Hussain | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Extraction of Voiced Regions of Speech from Emotional Speech Signals Using Wavelet-Pitch Method

Author(s) -

Lakshmi Srinivas Dendukuri,

Shaik Jakeer Hussain

Publication year - 2021

Publication title -

periodica polytechnica electrical engineering and computer science

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.158

H-Index - 13

eISSN - 2064-5279

pISSN - 2064-5260

DOI - 10.3311/ppee.15373

Subject(s) - speech recognition , wavelet , computer science , speech enhancement , pitch detection algorithm , speech processing , mathematics , energy (signal processing) , pattern recognition (psychology) , artificial intelligence , noise reduction , statistics

Extraction of voiced regions of speech is one of the latest topics in speech domain for various speech applications. Emotional speech signals contain most of the information in voiced regions of speech. In this particular work, voiced regions of speech are extracted from emotional speech signals using wavelet-pitch method. Daubechies wavelet (Db4) is applied on the speech frames after downsampling the speech signals. Autocorrelation function is performed on the extracted approximation coefficients of each speech frame and corresponding pitch values are obtained. A local threshold is defined on obtained pitch values to extract voiced regions. The threshold values are different for male and female speakers, as male pitch values are low compared to the female pitch values in general. The obtained pitch values are scaled down and are compared with the thresholds to extract the voiced frames. The transition frames between the voiced and unvoiced frames are also extracted if the previous frame is voiced frame, to preserve the emotional content in extracted frames. The extracted frames are reshaped to have desired emotional speech signal. Signal to Noise Ratio (SNR), Normalized Root Mean Square Error (NRMSE) and statistical parameters are used as evaluation metrics. This particular work provides better SNR and Normalized Root Mean Square Error values compared to the zero crossing-energy and residual signal based methods in voiced region extraction. Db4 wavelet provides better results compared to Haar and Db2 wavelets in extracting voiced regions using wavelet-pitch method from emotional speech signals.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research