Open Access
Investigating the feasibility of harvesting broadcast speech data to develop resources for South African languages
Author(s) -
Jaco Badenhorst,
Febe de Wet
Publication year - 2021
Language(s) - English
DOI - 10.55492/dhasa.v3i03.3820
Subject(s) - computer science , phone , speech recognition , natural language processing , artificial neural network , deep neural networks , languages of africa , artificial intelligence , linguistics , philosophy
Sufficient target language data remains an important factor in the development of automatic speech recognition (ASR) systems. For instance, the substantial improvement in acoustic modelling that deep architectures have recently achieved for well-resourced languages requires vast amounts of speech data. Moreover, the acoustic models in state-of-the-art ASR systems that generalise well across different domains are usually trained on various corpora, not just one or two. Diverse corpora containing hundreds of hours of speech data are not available for resource limited languages. In this paper, we investigate the feasibility of creating additional speech resources for the official languages of South Africa by employing a semi-automatic data harvesting procedure. Factorised time-delay neural network models were used to generate phone-level transcriptions of speech data harvested from different domains.