Integration of Phonotactic Features for Language Identification on Code-Switched Speech | Zendy

Koena Ronny Mabokela | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Integration of Phonotactic Features for Language Identification on Code-Switched Speech

Author(s) -

Koena Ronny Mabokela

Publication year - 2022

Publication title -

international journal on natural language computing (print)/international journal on natural language computing

Language(s) - English

Resource type - Journals

eISSN - 2319-4111

pISSN - 2278-1307

DOI - 10.5121/ijnlc.2022.11102

Subject(s) - bigram , computer science , phonotactics , speech recognition , phone , hidden markov model , language model , artificial intelligence , language identification , natural language processing , identification (biology) , support vector machine , code (set theory) , natural language , trigram , phonology , linguistics , philosophy , botany , set (abstract data type) , biology , programming language

In this paper, phoneme sequences are used as language information to perform code-switched language identification (LID). With the one-pass recognition system, the spoken sounds are converted into phonetically arranged sequences of sounds. The acoustic models are robust enough to handle multiple languages when emulating multiple hidden Markov models (HMMs). To determine the phoneme similarity among our target languages, we reported two methods of phoneme mapping. Statistical phoneme-based bigram language models (LM) are integrated into speech decoding to eliminate possible phone mismatches. The supervised support vector machine (SVM) is used to learn to recognize the phonetic information of mixed-language speech based on recognized phone sequences. As the back-end decision is taken by an SVM, the likelihood scores of segments with monolingual phone occurrence are used to classify language identity. The speech corpus was tested on Sepedi and English languages that are often mixed. Our system is evaluated by measuring both the ASR performance and the LID performance separately. The systems have obtained a promising ASR accuracy with data-driven phone merging approach modelled using 16 Gaussian mixtures per state. In code-switched speech and monolingual speech segments respectively, the proposed systems achieved an acceptable ASR and LID accuracy.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore