CNN-Based Phone Segmentation Experiments in a Less-Represented Language
Author(s) -
Céline Manenti,
Thomas Pellegrini,
Julien Pinquier
Publication year - 2016
Publication title -
interspeech 2022
Language(s) - English
Resource type - Conference proceedings
DOI - 10.21437/interspeech.2016-796
Subject(s) - computer science , segmentation , convolutional neural network , phone , speech recognition , artificial intelligence , word error rate , convolution (computer science) , task (project management) , frame (networking) , natural language processing , binary number , speech segmentation , language model , text segmentation , binary classification , boundary (topology) , artificial neural network , linguistics , support vector machine , mathematics , philosophy , telecommunications , mathematical analysis , arithmetic , management , economics
These last years, there has been a regain of interest in unsupervised sub-lexical and lexical unit discovery. Speech segmentation into phone-like units may be a first interesting step for such a task. In this article, we report speech segmentation experiments in Xitsonga, a less-represented language spoken in South Africa. We chose to use convolutional neural networks (CNN) with FBANK static coefficients as input. The models take binary decisions whether a boundary is present or not at each signal sliding frame. We compare the use of a model trained exclusively on Xitsonga data to the use of a bootstrap model trained on a larger corpus of another language, the BUCKEYE U.S. English corpus. Using a two-convolution-layer model, a 79% F-measure was obtained on BUCKEYE, with a 20 ms error tolerance. This performance is equal to the human inter-annotator agreement rate. We then used this bootstrap model to segment Xitsonga data and compared the results when adapting it with 1 to 20 minutes of Xitsonga data.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom