Vowel recognition of patients after total laryngectomy using Mel Frequency Cepstral Coefficients and mouth contour
Author(s) -
Rafał Pietruch,
Antoni Grzanka
Publication year - 2010
Publication title -
journal of automatic control
Language(s) - English
Resource type - Journals
eISSN - 2406-0984
pISSN - 1450-9903
DOI - 10.2298/jac1001033p
Subject(s) - computer science , speech recognition , formant , vocal tract , support vector machine , cepstrum , pattern recognition (psychology) , artificial intelligence , laryngectomy , signal (programming language) , vowel , naive bayes classifier , artificial neural network , modality (human–computer interaction) , larynx , programming language , linguistics , philosophy
The paper addresses a problem of isolated vowels recognition in patients following total laryngectomy. The visual and acoustic speech modalities were separately incorporated in the machine learning algorithms. The authors used the Mel Frequency Cepstral Coefficients as acoustic descriptors of a speech signal. A lip contour was extracted from a video signal of the speaking faces using OpenCV software library. In a vowels recognition procedure the three types of classifiers were used for comparison purposes: Artificial Neural Networks, Support Vector Machines and Naive Bayes. The highest recognition rate was evaluated using Support Vector Machines. For a group of the laryngectomees having a different quality of speech the authors achieved 75% for acoustic and 40% for visual recognition performances. The authors obtained higher recognition rate than in a previous research where 10 cross-sectional areas of a vocal tract were estimated. Using presented image processing algorithm the visual features can be extracted automatically from a video signal
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom