
A Multi-Stage Ensembled-Learning Approach for Signal Classification Based on Deep CNN and LGBM Models
Author(s) -
Jingwen Yu,
AUTHOR_ID,
Qidong Lu,
Zhiliang Qin,
Jiali Yu,
Yingying Li,
Qin Yu
Publication year - 2022
Publication title -
journal of communications
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.185
H-Index - 35
eISSN - 2374-4367
pISSN - 1796-2021
DOI - 10.12720/jcm.17.1.30-38
Subject(s) - computer science , artificial intelligence , concatenation (mathematics) , convolutional neural network , deep learning , pattern recognition (psychology) , benchmark (surveying) , feature (linguistics) , convolution (computer science) , feature extraction , machine learning , artificial neural network , mathematics , linguistics , philosophy , geodesy , combinatorics , geography
In this paper, we propose a novel ensembled-learning architecture incorporating a hybrid multi-stage concatenation of the deep Convolutional Neural Network (CNN) model and the Light Gradient-Boosting-Machine (LGBM) model for the task of signal classification. It is well known that CNN is capable of learning discriminant features in various domains of signal representations. On the other hand, the LGBM model possess notable advantages such as the feasibility of parallel implementations and the potential of achieving comparable accuracies over various benchmark datasets. For the purpose of leveraging the advantages of both frameworks, we propose three steps to construct the proposed architecture. First, a Mel spectrogram is constructed as a two-dimensional (2-D) three-channel image and employed at the input to a CNN model featuring the squeeze-and-excitation (SE) attention mechanism (i.e., SeResNet) to derive the one-dimensional (1-D) deep feature of the raw signal from the final convolution layer. Secondly, we directly extract a number of 1-D statistical features based on the prior expert knowledge from the source domain of acoustics. Finally, the learned deep features and the extracted statistical features are fused and concatenated to form an input vector for the LGBM model to further improve classification accuracies. The proposed architecture is experimented on the Google Speech Commands Dataset and the Urbansound8K Dataset. Numerical results show that the proposed approach achieves the state-of-the-art accuracies and shows non-negligible performance gains over the standalone schemes.