Power Wavelet Cepstral Coefficients (PWCC): An Accurate Auditory Model-Based Feature Extraction Method for Robust Speaker Recognition | Zendy

Youssef Zouhir | Zendy; Mohamed Zarka | Zendy; Kais Ouni | Zendy; Lilia El Amraoui | Zendy

Open Access

Power Wavelet Cepstral Coefficients (PWCC): An Accurate Auditory Model-Based Feature Extraction Method for Robust Speaker Recognition

Author(s) -

Youssef Zouhir,

Mohamed Zarka,

Kais Ouni,

Lilia El Amraoui

Publication year - 2025

Publication title -

ieee access

Language(s) - English

Resource type - Magazines

SCImago Journal Rank - 0.587

H-Index - 127

eISSN - 2169-3536

DOI - 10.1109/access.2025.3576659

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

Human capability for Speaker Recognition (SR) exceeds recent machine learning approaches, even in noisy environments. To bridge this gap, researchers investigate the human auditory system to support machine learning algorithm performance. The paper introduces a novel feature extraction method, named “Power Wavelet Cepstral Coefficients” (PWCC), for enhancing SR accuracy. This method is derived from the “Normalized Wavelet FilterBank” (NWFB), which utilizes an “Equivalent Rectangular Bandwidth” rate (ERB-rate) scale and additionally integrates a "Noise Suppression Module" (NSM). The NWFB imitates the cochlea’s frequency selectivity using “Morlet Wavelet filters” alongside an ERB-rate scale. The NSM applies a medium-duration power analysis, an asymmetrical noise-suppression module incorporating a temporal masking component, and a spectral smoothing module to reduce the impact of noisy signal. To assess the performance of the proposed PWCC method, experiments were conducted using clean speech signals from the TIMIT database, corrupted with various noises from the AURORA dataset. Using a “Gaussian Mixture Model-Universal Background Model” (GMM-UBM) classifier, the PWCC method demonstrated superior SR accuracy in noisy environments compared to traditional methods such as PNCC and MFCC. Furthermore, PWCC maintained higher precision, recall, and F1-scores than PNCC and MFCC under overall noise conditions. For instance, with babble noise at 15dB SNR, PWCC achieved a recognition rate of 92.06%, compared to 75.24% for PNCC and 68.33% for MFCC.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research