Premium
Assessment of robustness and transferability of classification models built for cancer diagnostics using Raman spectroscopy
Author(s) -
Sattlecker Martina,
Stone Nick,
Smith Jennifer,
Bessant Conrad
Publication year - 2011
Publication title -
journal of raman spectroscopy
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.748
H-Index - 110
eISSN - 1097-4555
pISSN - 0377-0486
DOI - 10.1002/jrs.2798
Subject(s) - linear discriminant analysis , robustness (evolution) , support vector machine , artificial intelligence , raman spectroscopy , pattern recognition (psychology) , computer science , mathematics , machine learning , chemistry , optics , physics , biochemistry , gene
Over recent years, Raman spectroscopy has been demonstrated as a prospective tool for application in cancer diagnostics. The use of Raman spectroscopy for this purpose relies on pattern recognition methods that have been developed to perform well on data achieved under laboratory conditions. However, the application of Raman spectroscopy as a routine clinical tool is likely to result in imperfect data due to instrument‐to‐instrument variation. Such corruption to the pure tissue spectral data is expected to negatively impact the classification performance of the diagnostic model. In this paper, we present a thorough assessment of the robustness of the Raman approach. This was achieved by perturbing a set of spectra in different ways, including various linear shifts, nonlinear shifts and random noise and using previously optimised classification models to predict the class membership of each spectrum in a testing set. The loss of predictive power with increased corruption was used to calculate a score, which allows an easy comparison of the model robustness. For this approach, three different types of classification models, including linear discriminant analysis (LDA), partial least square discriminant analysis (PLS‐DA) and support vector machine (SVM), built for lymph node diagnostics were the subject of the robustness testing. The results showed that a linear perturbation had the highest impact on the performance of all classification models. Among all linear corruption methods, a gradient y ‐shift resulted in the highest performance loss. Thus, the factor most likely to affect the predictive outcome of models when using different systems is a gradient y ‐shift. Copyright © 2010 John Wiley & Sons, Ltd.