Convolutional neural network acoustic model for robust Indonesian speech recognition in noisy environment | Zendy

Marvin Jerremy Budiman | Zendy; Dessi Puji Lestari | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Convolutional neural network acoustic model for robust Indonesian speech recognition in noisy environment

Author(s) -

Marvin Jerremy Budiman,

Dessi Puji Lestari

Publication year - 2020

Publication title -

iop conference series materials science and engineering

Language(s) - English

Resource type - Journals

eISSN - 1757-899X

pISSN - 1757-8981

DOI - 10.1088/1757-899x/803/1/012027

Subject(s) - hidden markov model , speech recognition , computer science , convolutional neural network , acoustic model , normalization (sociology) , mixture model , mel frequency cepstrum , pattern recognition (psychology) , artificial neural network , noise (video) , artificial intelligence , feature (linguistics) , feature extraction , speech processing , linguistics , philosophy , sociology , anthropology , image (mathematics)

Noise causes the decreasing accuracy of automatic speech recognition (ASR). Several techniques have been developed and proposed to overcome this problem. Using artificial neural network (ANN) as acoustic model is one of the techniques. Convolutional neural network (CNN) is a variant of ANN that has been used for acoustic modeling. Another approach is to do pre-processing to the speech signal or to the extracted acoustic feature from speech signal, such as cepstral mean and variance normalization (CMVN). On this work, CNN acoustic models were trained by using CMVN pre-processed acoustic feature to make a noise-robust speech recognition system. Two group of models were made, each to handle 2 kinds of noise (babble noise and street noise). Those acoustic models were tested with noisy speech at different SNR (signal-to-noise ratio) value. Testing results from CNN acoustic models were compared with the ones from Gaussian Mixture Model-Hidden Markov Model (GMM-HMM) acoustic models. Testing results showed the increasing accuracy scores of acoustic models when models were trained using more variation of training data. CNN acoustic models that were trained using FBANK feature have higher accuracy scores than GMM-HMM models that were built using the same feature.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research