Performance Improvement of Kinect Software Development Kit–Constructed Speech Recognition Using a Client–Server Sensor Fusion Strategy for Smart Human–Computer Interface Control Applications
Author(s) -
Ing-Jr Ding,
Shih-Kai Lin
Publication year - 2017
Publication title -
ieee access
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.587
H-Index - 127
ISSN - 2169-3536
DOI - 10.1109/access.2017.2679116
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Using the Kinect sensor device and its related Microsoft software development kit (SDK), a Kinect-SDK speech recognition system can be easily established. However, such speech recognition systems exhibit substandard recognition performance and unreliable recognition decision-making because of the arbitrary placement of only one Kinect sensor. For sensing and control in Industry 4.0, correctness of the command recognized via sensing is essential for target control. For enhancing conventional Kinect-SDK speech recognition, this paper presents a client-server Kinect-SDK speech recognition scheme in which sensor deployment strategies and sensor fusion calculations are implemented using a TCP/IP decision server and multiple TCP/IP Kinect sensor clients. For sensor deployment, three deployment strategies are proposed: central, face-to-face, and diagonal-corner deployment. For sensor fusion calculations, three data fusion algorithms are proposed: sensor fusion by voting, voice energy comparisons, and voice energy comparisons with thresholds. The recognition performance of the conventional Kinect-SDK approach can be significantly improved by finely hybridizing sensor deployments and sensor data fusion; experimental results showed that Kinect-SDK speech recognition using the diagonal-corner deployment strategy hybridized with sensor fusion by voice energy comparisons with thresholds had the highest average recognition accuracy, which was significantly higher than that of the conventional Kinect SDK-speech recognition approach (14.93%). In addition, we implemented this strategy for the operation control of a remote multimedia player and a two-wheel automobile car in a laboratory office space.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom