z-logo
open-access-imgOpen Access
CNN-Based Speaker Verification and Speech Recognition in Tibetan
Author(s) -
Zhenye Gan,
Yue Yu,
Rui Wang,
Xin Zhao
Publication year - 2020
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1693/1/012180
Subject(s) - speech recognition , computer science , speaker recognition , connectionism , speaker verification , artificial neural network , artificial intelligence , speaker diarisation , speech processing , hidden markov model , pattern recognition (psychology)
In recent years, there have a little studies on speaker and speech recognition in Tibetan, which are mainly based on traditional methods of probability statistics. With the development of deep learning, neural networks have been widely used in speaker and automatic speech recognition, which have achieved remarkable results. In this paper, we utilize end-to-end model to study speaker verification and speech recognition in Tibetan. This article uses the ResCNN network for Tibetan speaker verification. In speech recognition, we adopt the DFCNN-CTC structure, where connectionist temporal classification (CTC) directly outputs the probability of sequence prediction without external post-processing. We have made some improvements to the two models. Experiments show that the improved model reduces EER by 3% and WER by 18% in speaker verification and speech recognition, respectively.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here