Premium
Enhanced genetic operators design for waveband selection in multivariate calibration based on NIR spectroscopy
Author(s) -
Cernuda Carlos,
Lughofer Edwin,
Hintenaus Peter,
Märzinger Wolfgang
Publication year - 2014
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/cem.2583
Subject(s) - crossover , computer science , partial least squares regression , calibration , genetic algorithm , dimensionality reduction , chemometrics , feature selection , benchmark (surveying) , selection (genetic algorithm) , curse of dimensionality , genetic programming , algorithm , dimension (graph theory) , point (geometry) , data mining , artificial intelligence , mathematics , machine learning , statistics , geometry , geodesy , pure mathematics , geography
Nowadays, the techniques employed in data acquisition provide huge amounts of data. Some parts of the information are related to the others, making dimensionality reduction desirable, and losing less information as much as possible, in order to decrease computational times and complexity when applying any ensuing data mining technique. Genetic algorithms offer the possibility of selecting which variables contain the most relevant information to represent all the original ones. The traditional genetic operators seem to be too general, leading to results that could be improved by means of designed genetic operators that employ some available problem‐specific information. Especially, when dealing with calibration by means of near‐infrared spectral data, which use to contain thousands of variables, it is known that not isolated wavelengths but wavebands allow a more robust model design. This aspect should be taken into account when crossing individuals. We propose three crossover operators specifically designed for calibration with near‐infrared spectral data, based on a pseudo‐random two‐point crossover, where the first point is chosen randomly, and the selection of the second point is guided by problem‐specific information. We compare their performance with that of state‐of‐the‐art operators. We combine these new genetic algorithm‐based variable selection designs with partial least squares regression and fuzzy systems based calibration. Our benchmark consists of two real‐world high‐dimensional data sets, corresponding to polyetheracrylat, where hydroxyl number, viscosity, and acidity are on‐line monitored; and melamine resin production, where the chilling point (CP) is considered in order to regulate the condensation. We show that designed operators promote wavebands selection, achieve better‐quality solutions, and converge faster and smoother than state‐of‐the‐art operators. Copyright © 2014 John Wiley & Sons, Ltd.