Open Access
Importance evaluation of spectral lines in Laser-induced breakdown spectroscopy for classification of pathogenic bacteria
Author(s) -
Wei Wang,
Geer Teng,
Xiaolei Qiao,
Yu Zhao,
Jinglin Kong,
Liqiang Dong,
Xutai Cui
Publication year - 2018
Publication title -
biomedical optics express
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.362
H-Index - 86
ISSN - 2156-7085
DOI - 10.1364/boe.9.005837
Subject(s) - principal component analysis , artificial intelligence , support vector machine , pattern recognition (psychology) , classifier (uml) , random forest , feature (linguistics) , laser induced breakdown spectroscopy , feature extraction , computer science , mathematics , laser , physics , optics , philosophy , linguistics
The correct classification of pathogenic bacteria is significant for clinical diagnosis and treatment. Compared with the use of whole spectral data, using feature lines as the inputs of the classification model can improve the correct classification rate (CCR) and reduce the analyzing time. In order to select feature lines, we need to investigate the contribution to the CCR of each spectral line. In this paper, two algorithms, important weights based on principal component analysis (IW-PCA) and random forests (RF), were proposed to evaluate the importance of spectra lines. The laser-induced plasma spectra (LIBS) of six common clinical pathogenic bacteria species were measured and a support vector machine (SVM) classifier was used to classify the LIBS of bacteria species. In the proposed IW-PCA algorithm, the product of the loading of each line and the variance of the corresponding principal component were calculated. The maximum product of each line calculated from the first three PCs was used to represent the line's importance weight. In the RF algorithm, the Gini index reduction value of each line was considered as the line's importance weight. The experimental results demonstrated that the lines with high importance were more suitable for classification and can be chosen as feature lines. The optimal number of feature lines used in the SVM classifier can be determined by comparing the CCRs with a different number of feature lines. Importance weights evaluated by RF are more suitable for extracting feature lines using LIBS combined with an SVM classification mechanism than those evaluated by IW-PCA. Furthermore, the two methods mutually verified the importance of selected lines and the lines evaluated important by both IW-PCA and RF contributed more to the CCR.