A robust BFCC feature extraction for ASR system
Author(s) -
TaWen Kuan,
AnChao Tsai,
Po-Hsun Sung,
Jhing-Fa Wang,
Hsien-Shun Kuo
Publication year - 2016
Publication title -
artificial intelligence research
Language(s) - English
Resource type - Journals
eISSN - 1927-6982
pISSN - 1927-6974
DOI - 10.5430/air.v5n2p14
Subject(s) - mel frequency cepstrum , spectrogram , speech recognition , feature extraction , cepstrum , computer science , hidden markov model , robustness (evolution) , pattern recognition (psychology) , artificial intelligence , noise (video) , wavelet , biochemistry , chemistry , image (mathematics) , gene
An auditory-based feature extraction algorithm naming the Basilar-membrane Frequency-band Cepstral Coefficient (BFCC) is proposed to increase the robustness for automatic speech recognition. Compared to Fourier spectrogram based of the Mel-Frequency Cepstral Coefficient (MFCC) method, the proposed BFCC method engages an auditory spectrogram based on agammachirp wavelet transform to simulate the auditory response of human inner ear to improve the noise immunity. In addition, the Hidden Markov Model (HMM) is used for evaluating the proposed BFCC in phases of training and testing purposes conducted by AURORA-2 corpus with different Signal-to-Noise Ratios (SNRs) degrees of datasets. The experimental results indicate the proposed BFCC, compared with MFCC, Gammatone Wavelet Cepstral Coefficient (GWCC), and Gammatone Frequency Cepstral Coefficient (GFCC), improves the speech recognition rate by 13%, 17%, and 0.5% respectively, on average given speech samples with SNRs ranging from -5 to 20 dB.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom