z-logo
Premium
Benchmarking template selection and model quality assessment for high‐resolution comparative modeling
Author(s) -
Sadowski M. I.,
Jones D. T.
Publication year - 2007
Publication title -
proteins: structure, function, and bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.699
H-Index - 191
eISSN - 1097-0134
pISSN - 0887-3585
DOI - 10.1002/prot.21531
Subject(s) - template , sequence (biology) , selection (genetic algorithm) , computer science , similarity (geometry) , discriminative model , set (abstract data type) , pattern recognition (psychology) , artificial intelligence , benchmarking , multiple sequence alignment , sequence alignment , ranking (information retrieval) , data mining , biology , peptide sequence , genetics , image (mathematics) , marketing , gene , business , programming language
Comparative modeling is presently the most accurate method of protein structure prediction. Previous experiments have shown the selection of the correct template to be of paramount importance to the quality of the final model. We have derived a set of 732 targets for which a choice of ten or more templates exist with 30–80% sequence identity and used this set to compare a number of possible methods for template selection: BLAST, PSI‐BLAST, profile–profile alignment, HHpred HMM–HMM comparison, global sequence alignment, and the use of a model quality assessment program (MQAP). In addition, we have investigated the question of whether any structurally defined subset of the sequence could be used to predict template quality better than overall sequence similarity. We find that template selection by BLAST is sufficient in 75% of cases but that there are examples in which improvement (global RMSD 0.5 Å or more) could be made. No significant improvement is found for any of the more sophisticated sequence‐based methods of template selection at high sequence identities. A subset of 118 targets extending to the lowest levels of sequence similarity was examined and the HHpred and MQAP methods were found to improve ranking when available templates had 35–40% maximum sequence identity. Structurally defined subsets in general are found to be less discriminative than overall sequence similarity, with the coil residue subset performing equivalently to sequence similarity. Finally, we demonstrate that if models are built and model quality is assessed in combination with the sequence‐template sequence similarity that a extra 7% of “best” models can be found. Proteins 2007. © 2007 Wiley‐Liss, Inc.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here