Premium
Ant colony optimization for variable selection in discriminant linear analysis
Author(s) -
Pontes Aline S.,
Araújo Alisson,
Marinho Weverton,
Gonçalves Dias Diniz Paulo H.,
Araújo Gomes Adriano,
Goicoechea Hector C.,
Silva Edvan C.,
Araújo Mario C.U.
Publication year - 2020
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/cem.3292
Subject(s) - linear discriminant analysis , multicollinearity , pattern recognition (psychology) , overfitting , feature selection , artificial intelligence , mathematics , chemometrics , collinearity , partial least squares regression , principal component analysis , computer science , statistics , machine learning , linear regression , artificial neural network
A new algorithm using ant colony optimization (ACO) for selection of variables in linear discriminant analysis (LDA) is presented. The role of ACO is explored in the context of LDA classification in which spectral variable multicollinearity is a known cause of generalization problems. The proposed ACO‐LDA presents a metaheuristic that mimics the ant's cooperative behavior, randomly depositing pheromones at vector elements corresponding to the most relevant variables. Such cooperative ant‐like behavior, which is absent in the genetic algorithm, increases the probability of discarding noninformative variables, favoring construction of more parsimonious models than genetic algorithm–linear discriminate analysis (GA‐LDA). The classification performance of ACO‐LDA is assessed in two case studies: (i) classification of edible vegetable oils (with respect to base oil) via ultraviolet–visible (UV‐Vis) spectrometry and (ii) simultaneous classification of tea samples with respect to type and geographic origin via near‐infrared (NIR) spectrometry. In the first study, ACO‐LDA was tested in a data set involving wide absorption bands in the UV region with low‐resolution and strong spectral overlapping. In the second study, its capacity to manage a data matrix with high dimensionality was evaluated. In both studies, ACO‐LDA selected a small subset of variables, which led to correct classifications for almost all of the samples, achieving a performance level similar to the well‐established partial least squares–discriminant analysis (PLS‐DA), and considerably better than GA‐LDA. The use of ACO to select LDA classification variables can minimize generalization problems commonly associated with multicollinearity.