Premium
Assessment of comparative modeling in CASP2
Author(s) -
Martin Andrew C. R.,
MacArthur Malcolm W.,
Thornton Janet M.
Publication year - 1997
Publication title -
proteins: structure, function, and bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.699
H-Index - 191
eISSN - 1097-0134
pISSN - 0887-3585
DOI - 10.1002/(sici)1097-0134(1997)1+<14::aid-prot4>3.0.co;2-o
Subject(s) - computer science , computational biology , biology
An assessment is presented for all submissions to the comparative modeling challenge in the 1996 Critical Assessment of Structure Prediction (CASP2). Of the original 12 target structures, 9 were solved prior to the meeting: 8 by X‐ray crystallography and 1 by NMR spectroscopy. These targets varied over a large range of difficulty, as assessed by the percentage sequence identity with the principal parent structure, which ranged from 20% up to 85%. The overall quality of the models reflected the similarity of the principal parent. As expected, when the sequence alignment was correct, the core was accurately modeled, with the largest deviations occurring in the loops. Models were built which gave Cα root‐mean‐square deviations (RMSDs) compared with the observed structure of <1 Å for targets with high parental similarity; even at 26% sequence identity, the best model structures had Cα deviations of only 2.2 Å. Overall, these deviations are comparable with those observed between the parent structure and the target, but locally there are several examples where the model approaches closer to the target than does the parent. There were three targets below 25% sequence identity, and the models generated for these targets were, in general, significantly less accurate. This principally reflects errors in the alignment which, if systematically shifted, can generate Cα RMSDs <18 Å. Compared with CASP1, the geometry of the models was significantly improved with no D ‐amino acids. By far the major contribution to RMSD error was the alignment accuracy, which varied from 100% down to 7% over the range of targets. In the structurally variable regions, global shifts, caused by hinge bending, were the major source of error, giving significantly lower local RMSDs than global RMSDs. In over 50% of these noncore regions, the difference between global and local RMSDs was more than 3 Å, and was as high as 10 Å for one structurally variable region. For the side chains, the χ1 RMSDs are strongly correlated with the Cα RMSDs. For models with Cα deviations less than 1 Å, on average 78.5% of side chains are placed in the correct rotamer, although the χ1 RMSDs, though clearly better than random, were disappointing at around 46°. As the backbone deviations increased, the side chain placement became less accurate, with an average χ1 RMSD of 75° on a 1.5–2.5 Å Cα backbone (average 51.4% correct rotamer). Refinement by energy minimization or molecular dynamics made only minor adjustments to improve local geometry and generally made small, but not significant, improvements to the RMSD. In total, 19 groups submitted 62 models (89 coordinate sets) that could be assessed. Most modelers used manual adjustments to sequence alignments and, in general, good alignments were obtained down to 25% sequence identity. The modeling methods ranged from “classical” modeling, involving core building followed by loop and side chain addition, to more sophisticated approaches based on probability distributions, Monte Carlo sampling or distance geometry. For each target, several groups produced equally good models, given the expected errors in the structures (about 0.5 Å). No one method came out as clearly superior, although the approaches that inherit directly from the parents generally performed better than the more radical techniques. However, for each target there were some poor models, usually reflecting a poor sequence alignment, and the range of accuracy for each target is therefore large. Fully automated methods are able to perform very well for “easy” targets (85% sequence identity with parent), but when modeling using a distantly related parent, care and expertise, especially in performing the alignment, still appear to be important factors in generating accurate models. Proteins, Suppl. 1:14–28, 1997. © 1998 Wiley‐Liss, Inc.