
The Research of Lip Reading Based on STCNN and ConvLSTM
Author(s) -
Yijie Zhu
Publication year - 2020
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1651/1/012076
Subject(s) - softmax function , computer science , artificial intelligence , convolutional neural network , reading (process) , pattern recognition (psychology) , set (abstract data type) , data set , process (computing) , machine learning , speech recognition , political science , law , programming language , operating system
Aiming at the problems in temporal model during the research of lip reading, a deep learning model is proposed based on spatiotemporal convolutional neural networks (STCNN) and Convolutional Long Short-Term Memory (ConvLSTM). Firstly, STCNN is used to learn the features of the extracted lip image, and then the learned features are sent to ConvLSTM to process the time series data, which is classified by softmax, and finally the CTC loss function is used to optimize the results. Using GRID data set for training, comparing with experiments, it is found that the recognition accuracy of this model achieves 95.0% at the word level. Experiments show that the model can improve the accuracy of lip reading.