z-logo
Premium
How well can the accuracy of comparative protein structure models be predicted?
Author(s) -
Eramian David,
Eswar Narayanan,
Shen MinYi,
Sali Andrej
Publication year - 2008
Publication title -
protein science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.353
H-Index - 175
eISSN - 1469-896X
pISSN - 0961-8368
DOI - 10.1110/ps.036061.108
Subject(s) - similarity (geometry) , set (abstract data type) , regression , protein structure prediction , sequence (biology) , computer science , correlation , support vector machine , function (biology) , mean squared error , contrast (vision) , statistics , standard deviation , mathematics , artificial intelligence , protein structure , biology , biochemistry , genetics , geometry , evolutionary biology , image (mathematics) , programming language
Comparative structure models are available for two orders of magnitude more protein sequences than are experimentally determined structures. These models, however, suffer from two limitations that experimentally determined structures do not: They frequently contain significant errors, and their accuracy cannot be readily assessed. We have addressed the latter limitation by developing a protocol optimized specifically for predicting the Cα root‐mean‐squared deviation (RMSD) and native overlap (NO3.5Å) errors of a model in the absence of its native structure. In contrast to most traditional assessment scores that merely predict one model is more accurate than others, this approach quantifies the error in an absolute sense, thus helping to determine whether or not the model is suitable for intended applications. The assessment relies on a model‐specific scoring function constructed by a support vector machine. This regression optimizes the weights of up to nine features, including various sequence similarity measures and statistical potentials, extracted from a tailored training set of models unique to the model being assessed: If possible, we use similarly sized models with the same fold; otherwise, we use similarly sized models with the same secondary structure composition. This protocol predicts the RMSD and NO3.5Å errors for a diverse set of 580,317 comparative models of 6174 sequences with correlation coefficients ( r ) of 0.84 and 0.86, respectively, to the actual errors. This scoring function achieves the best correlation compared to 13 other tested assessment criteria that achieved correlations ranging from 0.35 to 0.71.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here