Performance Improvement of Kinect Software Development Kit–Constructed Speech Recognition Using a Client–Server Sensor Fusion Strategy for Smart Human–Computer Interface Control Applications | Zendy

Ing-Jr Ding | Zendy; Shih-Kai Lin | Zendy

Open Access

Performance Improvement of Kinect Software Development Kit–Constructed Speech Recognition Using a Client–Server Sensor Fusion Strategy for Smart Human–Computer Interface Control Applications

Author(s) -

Ing-Jr Ding,

Shih-Kai Lin

Publication year - 2017

Publication title -

ieee access

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.587

H-Index - 127

ISSN - 2169-3536

DOI - 10.1109/access.2017.2679116

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

Using the Kinect sensor device and its related Microsoft software development kit (SDK), a Kinect-SDK speech recognition system can be easily established. However, such speech recognition systems exhibit substandard recognition performance and unreliable recognition decision-making because of the arbitrary placement of only one Kinect sensor. For sensing and control in Industry 4.0, correctness of the command recognized via sensing is essential for target control. For enhancing conventional Kinect-SDK speech recognition, this paper presents a client-server Kinect-SDK speech recognition scheme in which sensor deployment strategies and sensor fusion calculations are implemented using a TCP/IP decision server and multiple TCP/IP Kinect sensor clients. For sensor deployment, three deployment strategies are proposed: central, face-to-face, and diagonal-corner deployment. For sensor fusion calculations, three data fusion algorithms are proposed: sensor fusion by voting, voice energy comparisons, and voice energy comparisons with thresholds. The recognition performance of the conventional Kinect-SDK approach can be significantly improved by finely hybridizing sensor deployments and sensor data fusion; experimental results showed that Kinect-SDK speech recognition using the diagonal-corner deployment strategy hybridized with sensor fusion by voice energy comparisons with thresholds had the highest average recognition accuracy, which was significantly higher than that of the conventional Kinect SDK-speech recognition approach (14.93%). In addition, we implemented this strategy for the operation control of a remote multimedia player and a two-wheel automobile car in a laboratory office space.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research