z-logo
open-access-imgOpen Access
Exploiting acoustic similarities between Tamil and Indian English in the development of an HMM‐based bilingual synthesiser
Author(s) -
VijayaRajSolomon Sherlin Solomi,
Parthasarathy Vijayalakshmi,
Thangavelu Nagarajan
Publication year - 2017
Publication title -
iet signal processing
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.384
H-Index - 42
ISSN - 1751-9683
DOI - 10.1049/iet-spr.2016.0163
Subject(s) - tamil , hidden markov model , computer science , speech recognition , pronunciation , set (abstract data type) , artificial intelligence , speech synthesis , phone , natural language processing , gaussian , linguistics , philosophy , physics , quantum mechanics , programming language
In this study, an efficient hidden Markov model (HMM)‐based bilingual speech synthesiser for the Indian language Tamil and Indian English is developed. Initially, phone mapping approach is tried to synthesise English text using Tamil corpus alone by mapping English phonemes to the perceptually similar phonemes in Tamil, and is found that the approach is language‐dependent and requires a large dictionary for Indian pronunciation. Therefore, given the speech data for both languages, the straight‐forward approach to develop a bilingual synthesiser is to build separate synthesiser for each language and combine them or by merging the perceptually similar phonemes. These approaches introduce language‐switching/influence in the synthesised speech. To minimise switching and influence, HMM‐based bilingual synthesisers are developed by merging acoustically similar phonemes, derived based on model parameters and likelihood Gaussian using various distance metrics. The performance of these synthesisers are evaluated based on mean opinion score (MOS), language‐switching and language‐influence. Results reveal that the set of phonemes derived using product‐of‐likelihood Gaussians in the likelihood space is the optimum set of phonemes that can be merged and the system developed by merging these phonemes outperforms the rest with an MOS of 3.66. Furthermore, only 8% and 23% of the sentences have language‐switching/influence, respectively.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here