
Multimodal data fusion algorithm applied to robots
Author(s) -
Xin Zhang,
Zhiquan Feng,
Jinglan Tian,
Xiaohui Yang
Publication year - 2020
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1453/1/012040
Subject(s) - computer science , mode (computer interface) , set (abstract data type) , robot , channel (broadcasting) , artificial intelligence , intersection (aeronautics) , sensor fusion , feature (linguistics) , data set , ambiguity , pattern recognition (psychology) , human–robot interaction , algorithm , speech recognition , human–computer interaction , engineering , computer network , linguistics , philosophy , programming language , aerospace engineering
In recent years, the use of multimodal human-computer interaction technology to achieve the enhancement of human intelligence has become a new topic in human-computer interaction research. When the robot can’t react correctly in a single mode, it is necessary to realize multimodal fusion. To this end, this paper proposes a multimodal fusion algorithm that applies the data obtained by the CNN feature layer to the decision-level. The speech recognition text is semantically matched with the text in the text library, and the similar probability vector is returned. At the same time, the similarity probability vector of the gesture recognition is obtained, and the data is filtered by the threshold, and the set of high probability data codes is assigned to the two modes. The intersection operation, and the final instruction is sent to the robot. The experimental results show that the influence of environmental factors on the single channel result is reduced, and the single mode ambiguity problem is eliminated. The multi-channel fusion algorithm with additional weight is more accurate than the common multi-channel fusion algorithm. At the same time, it has also been well received by many test users.