
Out Domain Data Augmentation on Punjabi Children Speech Recognition using Tacotron
Author(s) -
Taniya Hasija,
Virender Kadyan,
Kalpna Guleria
Publication year - 2021
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1950/1/012044
Subject(s) - speech recognition , computer science , mel frequency cepstrum , word error rate , speech corpus , natural language processing , artificial neural network , cepstrum , domain (mathematical analysis) , feature (linguistics) , artificial intelligence , feature extraction , speech synthesis , linguistics , mathematics , mathematical analysis , philosophy
The performance of Automatic Speech Recognition (ASR) is directly proportional to the quality of the corpus used and the training data quantity. Data scarcity and more children’s speech variability degrades the performance of ASR systems. As Punjabi is a tonal language and low resource language, less data is available for Punjabi children’s speech. It leads to poor ASR performance for Punjabi children speech recognition. To overcome limited data conditions, in this paper, two corpora of different domains are evaluated for testing the feasibility of ASR performance. We have implemented Tacotron as an artificial speech synthesis system for Punjabi Language. The speech audios synthesized by Tacotron are merged with available speech corpus and tested on Punjabi children ASR using Mel Frequency Cepstral Coefficients (MFCC) + pitch feature extraction, and Deep Neural Network (DNN) acoustic modeling. It is noticed that the merged data corpus has shown reduced Word Error Rate (WER) of the ASR system with a Relative Improvement (RI) of 9-12%.