Premium
The evaluation of two‐step multivariate adaptive regression splines for chromatographic retention prediction of peptides
Author(s) -
Put Raf,
Vander Heyden Yvan
Publication year - 2007
Publication title -
proteomics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.26
H-Index - 167
eISSN - 1615-9861
pISSN - 1615-9853
DOI - 10.1002/pmic.200600676
Subject(s) - multivariate adaptive regression splines , multivariate statistics , partial least squares regression , test set , linear regression , context (archaeology) , mars exploration program , bayesian multivariate linear regression , set (abstract data type) , regression analysis , calibration , computer science , mathematics , chromatography , artificial intelligence , chemistry , machine learning , statistics , paleontology , biology , programming language , physics , astronomy
Both the multivariate adaptive regression splines (MARS) and the two‐step MARS (TMARS) methodologies were applied in a quantitative structure–retention relationship (QSRR) context. For seven RPLC systems, QSRR models were built that describe the retention times of a set of peptides using a large set of molecular descriptors as potential predictor variables. The use of QSRR models for chromatographic retention prediction of peptides may be valuable in proteomic research to improve the number of correct peptide identifications. Always, 70% of the samples was used to derive the QSRR models (calibration set), whereas the remaining 30% of the peptides were treated as an independent external test set. For four systems, the models obtained by TMARS have better predictive abilities than the MARS models. The MARS and TMARS model performance was compared with those of other multivariate modelling techniques. For five out of seven systems it was observed that the uninformative variable elimination by the partial least squares (PLS) approach outperforms all other methods studied. For three systems predictive errors smaller than 30 s were obtained. PLS regression and a multiple linear regression model based on three descriptors led to the best predictivities for the remaining two systems.