z-logo
Premium
Modified PCA and PLS: Towards a better classification in Raman spectroscopy–based biological applications
Author(s) -
Guo Shuxia,
Rösch Petra,
Popp Jürgen,
Bocklitz Thomas
Publication year - 2020
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/cem.3202
Subject(s) - overfitting , principal component analysis , pattern recognition (psychology) , artificial intelligence , partial least squares regression , classifier (uml) , transferability , chemometrics , raman spectroscopy , computer science , mathematics , biological system , machine learning , biology , artificial neural network , logit , physics , optics
Raman spectra of biological samples often exhibit variations originating from changes of spectrometers, measurement conditions, and cultivation conditions. Such unwanted variations make a classification extremely challenging, especially if they are more significant compared with the differences between groups to be separated. A classifier is prone to such unwanted variations (ie, intragroup variations) and can fail to learn the patterns that can help separate different groups (ie, intergroup differences). This often leads to a poor generalization performance and a degraded transferability of the trained model. A natural solution is to separate the intragroup variations from the intergroup differences and build the classifier based on merely the latter information, for example, by a well‐designed feature extraction. This forms the idea of this contribution. Herein, we modified two commonly applied feature extraction approaches, principal component analysis (PCA) and partial least squares (PLS), in order to extract merely the features representing the intergroup differences. Both of the methods were verified with two Raman spectral datasets measured from bacterial cultures and colon tissues of mice, respectively. In comparison to ordinary PCA and PLS, the modified PCA was able to improve the prediction on the testing data that bears significant difference to the training data, while the modified PLS could help avoid overfitting and lead to a more stable classification.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here