Premium
Effect of signal intensity normalization on the multivariate analysis of spectral data in complex ‘real‐world’ datasets
Author(s) -
Beattie J. Renwick,
Glenn Josephine V.,
Boulton Michael E.,
Stitt Alan W.,
McGarvey John J.
Publication year - 2009
Publication title -
journal of raman spectroscopy
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.748
H-Index - 110
eISSN - 1097-4555
pISSN - 0377-0486
DOI - 10.1002/jrs.2146
Subject(s) - normalization (sociology) , raman spectroscopy , multivariate statistics , pattern recognition (psychology) , statistics , mathematics , computer science , spectral line , principal component analysis , signal processing , artificial intelligence , biological system , algorithm , analytical chemistry (journal) , optics , chemistry , physics , digital signal processing , biology , chromatography , astronomy , sociology , anthropology , computer hardware
Spectral signal intensities, especially in ‘real‐world’ applications with nonstandardized sample presentation due to uncontrolled variables/factors, commonly require additional spectral processing to normalize signal intensity in an effective way. In this study, we have demonstrated the complexity of choosing a normalization routine in the presence of multiple spectrally distinct constituents by probing a dataset of Raman spectra. Variation in absolute signal intensity (90.1% of total variance) of the Raman spectra of these complex biological samples swamps the variation in useful signals (9.4% of total variance), degrading its diagnostic and evaluative potential. Using traditional spectral band choices, it is shown that normalization results are more complex than generally encountered in traditionally designed sample sets investigating limited chemical species. We demonstrate that no choice of a single band proves to be appropriate for predicting all the reference parameters, instead requiring a tailored normalization routine for each parameter. Of the reference parameters studied in the chosen system, signals from pathogenic adducts in ocular tissues called advanced glycation endproducts were most prominent when normalizing about the 1550–1690 cm −1 region of the spectrum (17.5% of total variance, compared with 0.3% for unnormalized), while prediction of pentosidine and gender were optimized by normalization about the 1570 ( R 2 = 0.97 vs 0.57 for unnormalized) and 1003 cm −1 (p < 0.1 vs p < 0.01 for unnormalized) bands, respectively. The data obtained point to the extreme sensitivity of multivariate analysis to signal intensity normalization. Some general guidelines for making appropriate band choices are given, including the use of peak‐finding routines. Copyright © 2008 John Wiley & Sons, Ltd.