z-logo
open-access-imgOpen Access
Developing phoneme‐based lip‐reading sentences system for silent speech recognition
Author(s) -
ElBialy Randa,
Chen Daqing,
Fenghour Souheil,
Hussein Walid,
Xiao Perry,
Karam Omar H.,
Li Bo
Publication year - 2023
Publication title -
caai transactions on intelligence technology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.613
H-Index - 15
ISSN - 2468-2322
DOI - 10.1049/cit2.12131
Subject(s) - computer science , speech recognition , word recognition , artificial intelligence , schema (genetic algorithms) , natural language processing , word error rate , viseme , transformer , segmentation , reading (process) , speech processing , linguistics , acoustic model , machine learning , engineering , philosophy , voltage , electrical engineering
Lip‐reading is a process of interpreting speech by visually analysing lip movements. Recent research in this area has shifted from simple word recognition to lip‐reading sentences in the wild. This paper attempts to use phonemes as a classification schema for lip‐reading sentences to explore an alternative schema and to enhance system performance. Different classification schemas have been investigated, including character‐based and visemes‐based schemas. The visual front‐end model of the system consists of a Spatial‐Temporal (3D) convolution followed by a 2D ResNet. Transformers utilise multi‐headed attention for phoneme recognition models. For the language model, a Recurrent Neural Network is used. The performance of the proposed system has been testified with the BBC Lip Reading Sentences 2 (LRS2) benchmark dataset. Compared with the state‐of‐the‐art approaches in lip‐reading sentences, the proposed system has demonstrated an improved performance by a 10% lower word error rate on average under varying illumination ratios.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here