
Integration of Phonotactic Features for Language Identification on Code-Switched Speech
Author(s) -
Koena Ronny Mabokela
Publication year - 2022
Publication title -
international journal on natural language computing (print)/international journal on natural language computing
Language(s) - English
Resource type - Journals
eISSN - 2319-4111
pISSN - 2278-1307
DOI - 10.5121/ijnlc.2022.11102
Subject(s) - bigram , computer science , phonotactics , speech recognition , phone , hidden markov model , language model , artificial intelligence , language identification , natural language processing , identification (biology) , support vector machine , code (set theory) , natural language , trigram , phonology , linguistics , philosophy , botany , set (abstract data type) , biology , programming language
In this paper, phoneme sequences are used as language information to perform code-switched language identification (LID). With the one-pass recognition system, the spoken sounds are converted into phonetically arranged sequences of sounds. The acoustic models are robust enough to handle multiple languages when emulating multiple hidden Markov models (HMMs). To determine the phoneme similarity among our target languages, we reported two methods of phoneme mapping. Statistical phoneme-based bigram language models (LM) are integrated into speech decoding to eliminate possible phone mismatches. The supervised support vector machine (SVM) is used to learn to recognize the phonetic information of mixed-language speech based on recognized phone sequences. As the back-end decision is taken by an SVM, the likelihood scores of segments with monolingual phone occurrence are used to classify language identity. The speech corpus was tested on Sepedi and English languages that are often mixed. Our system is evaluated by measuring both the ASR performance and the LID performance separately. The systems have obtained a promising ASR accuracy with data-driven phone merging approach modelled using 16 Gaussian mixtures per state. In code-switched speech and monolingual speech segments respectively, the proposed systems achieved an acceptable ASR and LID accuracy.