z-logo
open-access-imgOpen Access
Robust Speaker Diarization Based on Daubechies Wavelet, Nonlinear Energy Operator and Pyknogram
Author(s) -
Sukhvinder Kaur
Publication year - 2019
Publication title -
international journal of recent technology and engineering
Language(s) - English
Resource type - Journals
ISSN - 2277-3878
DOI - 10.35940/ijrte.d8535.118419
Subject(s) - speaker diarisation , speech recognition , computer science , cluster analysis , pattern recognition (psychology) , mel frequency cepstrum , artificial intelligence , segmentation , word error rate , speaker recognition , feature extraction
Two common disciplines of speech processing are speaker recognition “identification and verification of speaker”, and speaker diarization, “who spoke when”. Motivated by various applications in automatic speaker recognition, speaker indexing, word counting, and audio transcription, speaker diarization (SD) becomes a significant area of signal processing. The basic designing steps of SD are feature extraction, voice activity detection (VAD), segmentation, and clustering. VAD process is accomplished by Daubechies 40, discrete wavelets transform (DWT). Initially, DWT was used for compression, scaling, and denoising of audio-stream and then partitioned into small frames of size 0.12 seconds. Next, features of each frame were extracted by applying nonlinear energy operator (NEO) based pyknogram. To measure the similarity between frames, a sliding window on delta-BIC distance metric was applied. A negative value of its output represents the same segments and vice-versa. To improve the output of the segmentation process, resegmentation was applied by information change rate method. At last, hierarchical clustering groups the homogeneous segments that correspond to a particular speaker and has been graphically represented by the dendrogram. The performance of SD was evaluated by F-measure and speaker diarization error rate (SER) and their results were compared with the traditional speaker diarization system that uses MFCC and BIC for segmentation and clustering. It reveals a significant reduction of 12.3% of SER in the proposed diarization system.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here