Premium
Chemometric tools for classification and elucidation of protein secondary structure from infrared and circular dichroism spectroscopic measurements
Author(s) -
Navea Susana,
Tauler Romá,
Goormaghtigh Erik,
de Juan Anna
Publication year - 2006
Publication title -
proteins: structure, function, and bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.699
H-Index - 191
eISSN - 1097-0134
pISSN - 0887-3585
DOI - 10.1002/prot.20890
Subject(s) - protein secondary structure , principal component analysis , partial least squares regression , circular dichroism , linear discriminant analysis , pattern recognition (psychology) , chemistry , biological system , protein structure , vibrational circular dichroism , artificial intelligence , crystallography , mathematics , computer science , biology , statistics , biochemistry
Abstract Protein classification and characterization often rely on the information contained in the protein secondary structure. Protein class assignment is usually based on X‐ray diffraction measurements, which need the protein in a crystallized form, or on NMR spectra, to obtain the structure of a protein in solution. Simple spectroscopic techniques, such as circular dichroism (CD) and infrared (IR) spectroscopies, are also known to be related to protein secondary structure, but they have seldom been used for protein classification. To see the potential of CD, IR, and combined CD/IR measurements for protein classification, unsupervised pattern recognition methods, Principal Component Analysis (PCA) and cluster analysis, are proposed first to check for natural grouping tendencies of proteins according to their measured spectra. Partial Least Squares Discriminant Analysis (PLS‐DA), a supervised pattern recognition method, is used afterwards to test the possibility to model explicitly each protein class and to test these models in class assignment of unknown proteins. Determination of the protein secondary structure, understood as the prediction of the abundance of the different secondary structure motifs in the biomolecule, was carried out with the local regression method interval Partial Least Squares (iPLS). CD, IR, and CD/IR measurements were correlated to the fraction of the motif to be predicted, determined from X‐ray measurements. iPLS builds models extracting the spectral information most correlated to a specific secondary motif and avoids the use of irrelevant spectral regions. Spectral intervals chosen by iPLS models provide structural information which can be used to confirm previous biochemical assignments or identify new motif‐related spectral features. The predictive ability of the models built with the selected spectral regions has a quality similar to previous classical approaches. Proteins 2006. © 2006 Wiley‐Liss, Inc.