Investigating the feasibility of harvesting broadcast speech data to develop resources for South African languages | Zendy

Jaco Badenhorst | Zendy; Febe de Wet | Zendy

AI Assistant Blog Pricing

Open Access

Investigating the feasibility of harvesting broadcast speech data to develop resources for South African languages

Author(s) -

Jaco Badenhorst,

Febe de Wet

Publication year - 2021

Language(s) - English

DOI - 10.55492/dhasa.v3i03.3820

Subject(s) - computer science , phone , speech recognition , natural language processing , artificial neural network , deep neural networks , languages of africa , artificial intelligence , linguistics , philosophy

Sufficient target language data remains an important factor in the development of automatic speech recognition (ASR) systems. For instance, the substantial improvement in acoustic modelling that deep architectures have recently achieved for well-resourced languages requires vast amounts of speech data. Moreover, the acoustic models in state-of-the-art ASR systems that generalise well across different domains are usually trained on various corpora, not just one or two. Diverse corpora containing hundreds of hours of speech data are not available for resource limited languages. In this paper, we investigate the feasibility of creating additional speech resources for the official languages of South Africa by employing a semi-automatic data harvesting procedure. Factorised time-delay neural network models were used to generate phone-level transcriptions of speech data harvested from different domains.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom

About

About Careers Publisher Partners Contact Us Our institutional solutions Get Organisational Trial or Quote

Learn

FAQs Blog Terms of Use Privacy Policy

Download the Zendy App

Discover

Explore

Home ZAIA Blog