Binary Neural Networks for Classification of Voice Commands From Throat Microphone | Zendy

Fabio Cisne Ribeiro | Zendy; Raphael Torres Santos Carvalho | Zendy; Paulo Cesar Cortez | Zendy; Victor Hugo C. De Albuquerque | Zendy; Pedro Pedrosa Reboucas Filho | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Binary Neural Networks for Classification of Voice Commands From Throat Microphone

Author(s) -

Fabio Cisne Ribeiro,

Raphael Torres Santos Carvalho,

Paulo Cesar Cortez,

Victor Hugo C. De Albuquerque,

Pedro Pedrosa Reboucas Filho

Publication year - 2018

Publication title -

ieee access

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.587

H-Index - 127

ISSN - 2169-3536

DOI - 10.1109/access.2018.2881199

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

Multi-class pattern classification has many applications including speech recognition, and it is not easy to extend from two-class neural networks (NNs). This paper presents a study about using binary classifiers with NNs together with a perceptual linear prediction (PLP) method for feature extraction to increase the classification rate of voice commands captured using a throat microphone, comparing this method with a single NN. Because there is no other data set with voice commands captured using a throat microphone in the Brazilian Portuguese language in researched literature, we created a data set with isolated voice commands with utterances captured from 150 people (men and women). All the voice samples are captured in Brazilian Portuguese, and they are the digits “0”through “9”and the words “Ok”and “Cancel”. The results show that the throat microphone is robust in noise environment, achieving 95.4% of hit rate in our speech recognition system with multiple NNs using the one-against-all approach, better performance than a simple NN that reach 91.88%. This result is very representative, since both classifiers obtained high hit rates. But, it requires 535% more time for training the multiple NNs compared with simple NN. The best configuration on PLP extraction order is 9 or 10 for voice samples captured by the throat microphone, which was observed that poor stressed vowel and fricative-like words “3” and “7”in Portuguese confuses the classifier.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research