Premium
SimFold energy function for de novo protein structure prediction: Consensus with Rosetta
Author(s) -
Fujitsuka Yoshimi,
Chikenji George,
Takada Shoji
Publication year - 2005
Publication title -
proteins: structure, function, and bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.699
H-Index - 191
eISSN - 1097-0134
pISSN - 0887-3585
DOI - 10.1002/prot.20748
Subject(s) - benchmark (surveying) , protein structure prediction , in silico , folding (dsp implementation) , casp , function (biology) , computer science , protein folding , protein structure , computational biology , artificial intelligence , biology , biochemistry , engineering , evolutionary biology , geodesy , gene , electrical engineering , geography
Predicting protein tertiary structures by in silico folding is still very difficult for proteins that have new folds. Here, we developed a coarse‐grained energy function, SimFold, for de novo structure prediction, performed a benchmark test of prediction with fragment assembly simulations for 38 test proteins, and proposed consensus prediction with Rosetta. The SimFold energy consists of many terms that take into account solvent‐induced effects on the basis of physicochemical consideration. In the benchmark test, SimFold succeeded in predicting native structures within 6.5 Å for 12 of 38 proteins; this success rate was the same as that by the publicly available version of Rosetta (ab initio version 1.2) run with default parameters. We investigated which energy terms in SimFold contribute to structure prediction performance, finding that the hydrophobic interaction is the most crucial for the prediction, whereas other sequence‐specific terms have weak but positive roles. In the benchmark, well‐predicted proteins by SimFold and by Rosetta were not the same for 5 of 12 proteins, which led us to introduce consensus prediction. With combined decoys, we succeeded in prediction for 16 proteins, four more than SimFold or Rosetta separately. For each of 38 proteins, structural ensembles generated by SimFold and by Rosetta were qualitatively compared by mapping sampled structural space onto two dimensions. For proteins of which one of the two methods succeeded and the other failed in prediction, the former had a less scattered ensemble located around the native. For proteins of which both methods succeeded in prediction, often two ensembles were mixed up. Proteins 2006. © 2005 Wiley‐Liss, Inc.