
Self-Supervised Pre-Trained Speech Representation Based End-to-End Mispronunciation Detection and Diagnosis of Mandarin
Author(s) -
Yunfei Shen,
Qingqing Liu,
Zhixing Fan,
Jiajun Liu,
Aishan Wumaier
Publication year - 2022
Publication title -
ieee access
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.587
H-Index - 127
ISSN - 2169-3536
DOI - 10.1109/access.2022.3212417
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Mispronunciation Detection and Diagnosis (MDD) is an essential basic technology in Computer-Assisted Pronunciation Training (CAPT) and Computer-Assisted Language Learning (CALL). MDD research in Mandarin is faced with the problem of lack of relevant data, which is a typical low-resource scenario. In recent years, self-supervised pre-trained speech representation has developed rapidly and achieved significant performance improvement in low-resource speech recognition scenarios, making it necessary to be applied to MDD tasks. First, we build a Mandarin MDD dataset called PSC-Reading for the Putonghua Proficiency Test (PSC) passage reading section. Then we extended the end-to-end MDD system based on CTC/Attention hybrid architecture and Transformer architecture, using features extracted from self-supervised pre-training speech representation models such as Wav2Vec 2.0 and WavLM to replace conventional speech features like MFCC and Fbank, and conduct experiments on the PSC-Reading dataset. Experimental results show that, compared with the baseline model CNN-RNN-CTC, our WavLM-based model obtains 20.5% relative improvement on the F1 score metric.