
Visual Speech Recognition
Author(s) -
Supriya A. Patil,
Vaibhav Dhoble,
Saatvik Gawade,
Pratiksha Jagdale,
Rohan Jinde
Publication year - 2022
Publication title -
international journal of advanced research in science, communication and technology
Language(s) - English
Resource type - Journals
ISSN - 2581-9429
DOI - 10.48175/ijarsct-2874
Subject(s) - computer science , handset , speech recognition , robustness (evolution) , artificial intelligence , audio visual , computer vision , face (sociological concept) , facial recognition system , noise reduction , feature extraction , multimedia , social science , biochemistry , chemistry , sociology , gene , operating system
The audio-visual speech recognition approach attempts to boost noise-robustness in mobile situations by extracting lip movement from side-face images. Although earlier bimodal speech recognition algorithms used frontal face (lip) images, these approaches are difficult for consumers to utilize because they need them to talk while holding a device with a camera in front of their face. Our proposed solution, which uses a small camera put in a handset to capture lip movement, is more natural, simple, and convenient. This approach also effectively avoids a reduction in the input speech's signal-to-noise ratio (SNR). Optical-flow analysis extracts visual features, which are then coupled with audio features in the context of CNN-based recognition.