method for lexical tone classification in audio-visual speech | Zendy

João Vítor Possamai de Menezes | Zendy; Maria Mendes Cantoni | Zendy; Denis K Burnham | Zendy; Adriano Vilela Barbosa | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

method for lexical tone classification in audio-visual speech

Author(s) -

João Vítor Possamai de Menezes,

Maria Mendes Cantoni,

Denis K Burnham,

Adriano Vilela Barbosa

Publication year - 2020

Publication title -

journal of speech sciences

Language(s) - English

Resource type - Journals

ISSN - 2236-9740

DOI - 10.20396/joss.v9i00.14960

Subject(s) - speech recognition , computer science , microphone , linear discriminant analysis , artificial intelligence , classifier (uml) , pattern recognition (psychology) , audio signal , parameterized complexity , speech processing , signal (programming language) , speech coding , telecommunications , sound pressure , algorithm , programming language

This work presents a method for lexical tone classification in audio-visual speech. The method is applied to a speech data set consisting of syllables and words produced by a female native speaker of Cantonese. The data were recorded in an audio-visual speech production experiment. The visual component of speech was measured by tracking the positions of active markers placed on the speaker's face, whereas the acoustic component was measured with an ordinary microphone. A pitch tracking algorithm is used to estimate F0 from the acoustic signal. A procedure for head motion compensation is applied to the tracked marker positions in order to separate the head and face motion components. The data are then organized into four signal groups: F0, Face, Head, Face+Head. The signals in each of these groups are parameterized by means of a polynomial approximation and then used to train an LDA (Linear Discriminant Analysis) classifier that maps the input signals into one of the output classes (the lexical tones of the language). One classifier is trained for each signal group. The ability of each signal group to predict the correct lexical tones was assessed by the accuracy of the corresponding LDA classifier. The accuracy of the classifiers was obtained by means of a k-fold cross validation method. The classifiers for all signal groups performed above chance, with F0 achieving the highest accuracy, followed by Face+Head, Face, and Head, respectively. The differences in performance between all signal groups were statistically significant.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore